EP1364015A2 - Molecules for diagnostics and therapeutics - Google Patents

Molecules for diagnostics and therapeutics

Info

Publication number
EP1364015A2
EP1364015A2 EP01966454A EP01966454A EP1364015A2 EP 1364015 A2 EP1364015 A2 EP 1364015A2 EP 01966454 A EP01966454 A EP 01966454A EP 01966454 A EP01966454 A EP 01966454A EP 1364015 A2 EP1364015 A2 EP 1364015A2
Authority
EP
European Patent Office
Prior art keywords
polypeptide
polynucleotide
antibody
cell
proteins
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01966454A
Other languages
German (de)
French (fr)
Inventor
Jackson Stuart
Stephen E Lincoln
Christina M Altus
Gerard E Dufour
Michael S Chalup
Jennifer L Hillman
Anissa Lee Jones
Jimmy Y Yu
Rachel J. Wright
Darryl Gietzen
Toomy F. Liu
Pierre E. Yap
Christoher R Dahl
Monika G. Momiyama
Diana L. Bradley
Sameer D. Rohatgi
Bernard Harris
Ann M. Roseberry Lincoln
Edward H Jr. Gerstin
Careyna H Peralta
Marie H DAVID
Scott R PANZER
Vincent Flores
Abel Daffo
Rakesh Marwaha
Alice J. CHEN
Simon C CHANG
Alan P. Au
Rebekah R INMAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Incyte Corp
Original Assignee
Incyte Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics Inc filed Critical Incyte Genomics Inc
Publication of EP1364015A2 publication Critical patent/EP1364015A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; CARE OF BIRDS, FISHES, INSECTS; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2217/00Genetically modified animals
    • A01K2217/05Animals comprising random inserted nucleic acids (transgenic)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to human molecules and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
  • the human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders, autoimmune/inflammatory disorders, infections, developmental disorders, endocrine disorders, 5 metabolic disorders, neurological disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders.
  • the identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment. Therefore, these genes and their products are useful as diagnostics and therapeutics.
  • genes may encode, for example, enzyme molecules, molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription 5 factor molecules, chromatin molecules, cell membrane molecules, and organelle associated molecules.
  • enzyme molecules molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription 5 factor molecules, chromatin molecules, cell membrane molecules, and organelle associated molecules.
  • cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body.
  • a wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered o patterns of cell proliferation, cell differentiation, and apoptosis.
  • Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation.
  • Molecules which directly or indirectly modulate cell cycle progression fall into 5 several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer.
  • Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis).
  • Oncoproteins, encoded by oncogenes can affect cell proliferation in a variety of ways and include growth factors, growth factor receptors, intracellular signal transducers, nuclear 5 transcription factors, and cell-cycle control proteins.
  • DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals 5 have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
  • DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes.
  • a genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, o the expression of a large number of genes.
  • the interactions may be expected, such as when the genes are part of the same signaling pathway.
  • the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes. 5
  • the cellular processes of biogenesis and biodegradation involve a number of key enzyme classes including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. These enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well o regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation, apoptosis, etc., and through catalyzing critical steps in DNA replication and repair, and the process of translation.
  • enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well o regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation
  • Oxidoreductases 5 Many pathways of biogenesis and biodegradation require oxidoreductase (dehydrogenase or reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor.
  • Potential cofactors include cytochromes, oxygen, disulfide, iron-sulfur proteins, flavin adenine dinucleotide (FAD), and the nicotinamide adenine dinucleotides NAD and NADP (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K., pp. 779-793).
  • Reductase activity catalyzes the transfer of electrons between substrate(s) and cofactor(s) with concurrent oxidation of the cofactor.
  • the reverse dehydrogenase reaction catalyzes the reduction of a cofactor and consequent oxidation of the substrate.
  • Oxidoreductase enzymes are a broad superfamily of proteins that catalyze numerous reactions in all cells of organisms ranging from bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, estrogens, androgens, and prostaglandins.
  • SCADs Short-chain alcohol dehydrogenases
  • retinol dehydrogenase is a SCAD-family member (Simon, A. et al. (1995) J. Biol. Chem. 270:1107-1112) that converts retinol to retinal, the precursor of retinoic acid.
  • Retinoic acid a regulator of differentiation and apoptosis, has been shown to down-regulate genes involved in cell proliferation and inflammation (Chai, X. et al. (1995) J. Biol. Chem. 270:3900-3904).
  • retinol dehydrogenase has been linked to hereditary eye diseases such as autosomal recessive childhood-onset severe retinal dystrophy (Simon, A. et al. (1996) Genomics 36:424-430).
  • Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B. (1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; Brown, J.K. and H. Imam (1991) J. Inherit. Metab. Dis. 14:436-458). Many pathways of neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a cofactor, such as NAD + /NADH (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences.
  • a cofactor such as NAD + /NADH
  • neurotransmitter degradation pathways that utilize NAD + NADH-dependent oxidoreductase activity include those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain stem, spinal cord and retina) (Newsholme, supra, pp. 790, 792).
  • L-DOPA precursor of dopamine, a neuronal excitatory compound
  • glycine an inhibitory neurotransmitter in the brain and spinal cord
  • histamine liberated from mast cells during the inflammatory response
  • taurine an inhibitory neurotransmitter of the brain stem, spinal cord and retina
  • Epigenetic or genetic defects in neurotransmitter metabolic pathways can result in a spectrum of disease states in different tissues including Parkinson disease and inherited myoclonus (McCance, K.L. and S.E. Huether (1994
  • Tetrahydrofolate is a derivatized glutamate molecule that acts as a carrier, providing activated one-carbon units to a wide variety of biosynthetic reactions, including synthesis of purines, pyrimidines, and the amino acid methionine.
  • Tetrahydrofolate is generated by the activity of a holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase.
  • tetrahydrofolate dehydrogenase plays an important role in generating building blocks for nucleic and amino acids, crucial to proliferating cells.
  • 3-Hydroxyacyl-CoA dehydrogenase (3HACD) is involved in fatty acid metabolism. It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells.
  • 3HACD and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are associated with peroxisomal bifunctional enzyme deficiency. This interruption in fatty acid metabolism produces accumulation of very-long chain fatty acids, disrupting development of the brain, bone, and adrenal glands. Infants born with this deficiency typically die within 6 months (Watkins, P.
  • a ⁇ amyloid- ⁇
  • APP amyloid precursor protein
  • 3HACD has been shown to bind the A ⁇ peptide, and is overexpressed in neurons affected in Alzheimer' s disease.
  • an antibody against 3HACD can block the toxic effects of A ⁇ in a cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMIM, #602057).
  • Steroids such as estrogen, testosterone, corticosterone, and others, are generated from a common precursor, cholesterol, and are interconverted into one another.
  • a wide variety of enzymes act upon cholesterol, including a number of dehydrogenases.
  • Steroid dehydrogenases such as the hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and D. Ghosh (1997) Steroids 62:95-100).
  • One such dehydrogenase is 3-oxo-5- ⁇ -steroid dehydrogenase (OASD), a icrosomal membrane protein highly expressed in prostate and other androgen- responsive tissues.
  • OASD 3-oxo-5- ⁇ -steroid dehydrogenase
  • OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most potent androgen.
  • Dihydrotestosterone is essential for the formation of the male phenotype during embryogenesis, as well as for proper androgen-mediated growth of tissues such as the prostate and male genitalia.
  • a defect in OASD that prevents the conversion of testosterone into dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective formation of the external genitalia (Andersson, S. et al. (1991) Nature 354:159-161; Labrie, F. et al. (1992) Endocrinology 131:1571-1573; OMTJVI #264600).
  • OASD plays a central role in sexual differentiation and androgen physiology.
  • 17 ⁇ -hydroxy steroid dehydrogenase (17 ⁇ HSD6) plays an important role in the regulation of the male reproductive hormone, dihydrotestosterone (DHTT).
  • 17 ⁇ HSD6 acts to reduce levels of DHTT by oxidizing a precursor of DHTT, 3 -diol, to androsterone which is readily glucuronidated and removed from tissues.
  • 17 ⁇ HSD6 is active with both androgen and estrogen substrates when expressed in embryonic kidney 293 cells.
  • 17 ⁇ HSD At least five other isozymes of 17 ⁇ HSD have been 5 identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for different steroid substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol. Chem. 272:15959-15966).
  • 17 ⁇ HSDl preferentially reduces estradiol and is abundant in the ovary and placenta.
  • 17 ⁇ HSD2 catalyzes oxidation of androgens and is present in the endometrium and placenta.
  • 17 ⁇ HSD3 is exclusively a reductive enzyme in the testis (Geissler, W.M. et al. (1994) Nat. Genet. o 7:34-39).
  • An excess of androgens such as DHTT can contribute to certain disease states such as benign prostatic hyperplasia and prostate cancer.
  • Oxidoreductases are components of the fatty acid metabolism pathways in mitochondria and peroxisomes.
  • the main beta-oxidation pathway degrades both saturated and unsaturated fatty acids, while the auxiliary pathway performs additional steps required for the degradation of unsaturated 5 fatty acids.
  • the auxiliary beta-oxidation enzyme 2,4-dienoyl-CoA reductase catalyzes the removal of even-numbered double bonds from unsaturated fatty acids prior to their entry into the main beta- oxidation pathway.
  • the enzyme may also remove odd-numbered double bonds from unsaturated fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al.
  • 2,4-dienoyl-CoA reductase is located in both mitochondria and o peroxisomes. Inherited deficiencies in mitochondrial and peroxisomal beta-oxidation enzymes are associated with severe diseases, some of which manifest themselves soon after birth and lead to death within a few years. Defects in beta-oxidation are associated with Reye's syndrome, Zellweger syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, and bifunctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoefler, 5 supra; Cotran, R.S. et al. (1994) Robbins Pathologic Basis of Disease, W.B. Saunders Co.,
  • Peroxisomal beta-oxidation is impaired in cancerous tissue. Although neoplastic human breast epithelial cells have the same number of peroxisomes as do normal cells, fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have lower fatty-acyl-CoA oxidase and bifunctional enzyme (including enoyl-CoA hydratase) activities 5 than normal tissue (Cable, S. et al. (1992) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol.
  • Isocitrate dehydrogenase Another important oxidoreductase is isocitrate dehydrogenase, which catalyzes the conversion of isocitrate to a-ketoglutarate, a substrate of the citric acid cycle.
  • Isocitrate dehydrogenase can be either NAD or NADP dependent, and is found in the cytosol, mitochondria, and peroxisomes. Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, 0 neurotransmitters, and growth factors.
  • HPR Hydroxypyruvate reductase
  • a peroxisomal 2-hydroxyacid dehydrogenase in the glycolate pathway catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of both NADH and NADPH.
  • the reverse dehydrogenase reaction reduces NAD + and NADP + .
  • HPR recycles nucleotides and bases back into pathways leading to the synthesis of ATP and GTP. ATP 5 and GTP are used to produce DNA and RNA and to control various aspects of signal transduction and energy metabolism.
  • Inhibitors of purine nucleotide biosynthesis have long been employed as antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis of serine and cellular serine levels available for protein synthesis.
  • the mitochondrial electron transport (or respiratory) chain is a series of oxidoreductase-type o enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons from NADH through a series of redox centers within these complexes to oxygen, and the coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the primary source of energy for driving a cell's many energy-requiring reactions.
  • the key complexes in the respiratory chain are NADH:ubiquinone oxidoreductase (complex I), succinate:ubiquinone 5 oxidoreductase (complex II), cytochrome c r b oxidoreductase (complex ffi), cytochrome c oxidase (complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on the inner matrix side of the mitochondrial membrane except complex U, which is on the cytosolic side.
  • Complex II transports electrons generated in the citric acid cycle to the respiratory chain.
  • the o electrons generated by oxidation of succinate to fumarate in the citric acid cycle are transferred through electron carriers in complex II to membrane bound ubiquinone (Q).
  • Q membrane bound ubiquinone
  • Transcriptional regulation of these nuclear-encoded genes appears to be the predominant means for controlling the biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain are associated with a variety of disease conditions. 5 Other dehydrogenase activities using NAD as a cofactor are also important in mitochondrial function.
  • 3-hydroxyisobutyrate dehydrogenase important in valine catabolism, catalyzes the NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within mitochondria. Elevated levels of 3-hydroxyisobutyrate have been reported in a number of disease states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraff, P.M. et al. (1989) J. Biol. 5 Chem. 264:5899-5903).
  • JND isovaleryl-CoA-dehydrogenase
  • JND is involved in leucine metabolism and catalyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA.
  • Human IVD is a tetrameric flavoprotein that is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial 0 import signal sequence.
  • a genetic deficiency caused by a mutation in the gene encoding FVD, results in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial import and processing of the JND precursor (Vockley, J. et al.
  • Transferases 5 Transferases are enzymes that catalyze the transfer of molecular groups. The reaction may involve an oxidation, reduction, or cleavage of covalent bonds, and is often specific to a substrate or to particular sites on a type of substrate. Transferases participate in reactions essential to such functions as synthesis and degradation of cell components, regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Transferases are o involved in key steps in disease processes involving these functions. Transferases are frequently classified according to the type of group transferred.
  • methyl transferases transfer one- carbon methyl groups
  • amino transferases transfer nitrogenous amino groups
  • similarly denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, phosphorous-containing, sulfur-containing, or selenium-containing groups, as well as small 5 enzymatic groups such as Coenzyme A.
  • Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in fatty acid metabolism and transport.
  • Choline O-acetyl transferase catalyzes the biosynthesis of the neurotransmitter acetylcholine.
  • o Amino transferases play key roles in protein synthesis and degradation, and they contribute to other processes as well. For example, the amino transferase 5-aminolevulinic acid synthase catalyzes the addition of succinyl-CoA to glycine, the first step in heme biosynthesis.
  • GTK glutamine-phenylpyruvate amino transferase
  • GTK glutamine transaminase K
  • Other amino acid substrates for GTK include L-methionine, L-histidine, and L-tyrosine.
  • GTK also catalyzes the conversion of kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D-aspartate (NMD A) receptor in the brain and may exert a neuromodulatory function. Alteration of the kynurenine metabolic pathway may be associated with several neurological disorders. GTK also plays a role in the metabolism of halogenated xenobiotics conjugated to glutathione, leading to nephrotoxicity in rats and neurotoxicity in humans. GTK is expressed in kidney, liver, and brain. Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: EC 2.6.1.64; Perry, S.J.
  • a second amino transferase associated with this pathway is kynurenine/ ⁇ -aminoadipate amino transferase (AadAT).
  • AadAT catalyzes the reversible conversion of ⁇ -aminoadipate and ⁇ -ketoglutarate to ⁇ -ketoadipate and L-glutamate during lysine metabolism.
  • AadAT also catalyzes the transamination of kynurenine to kynurenic acid.
  • a cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. (1970) Biochim. Biophys. Acta 198:219- 228; Buchli, R. et al. (1995) J. Biol. Chem. 270:29330-29335).
  • Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, and other foreign substances.
  • Another mammalian glycosyl transferase mammalian UDP-galactose- ceramide galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of galactocerebrosides in myelin membranes of the nervous system.
  • the UDP-glycosyl transferases share a conserved signature domain of about 50 amino acid residues (PROSITE: PDOC00359, http://expasy.hcuge.ch/sprot/prosite.html).
  • Methyl transferases are involved in a variety of pharmacologically important processes. Nicotinamide N-methyl transferase catalyzes the N-methylation of nicotinamides and other pyridines, an important step in the cellular handling of drugs and other foreign compounds. Phenylethanolamine N-methyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6- O-methylguanine-DNA methyl transferase reverses DNA methylation, an important step in carcinogenesis.
  • Uroporphyrin-III C-methyl transferase which catalyzes the transfer of two methyl groups from S-adenosyl-L-methionine to uroporphyrinogen III, is the first specific enzyme in the biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia.
  • Protein- arginine methyl transferases catalyze the posttranslational methylation of arginine residues in proteins, resulting in the mono- and dimethylation of arginine on the guanidino group.
  • Substrates include histones, myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in mRNA processing, splicing, and transport.
  • Protein-arginine methyl transferase interacts with proteins upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with interferon, suggesting an important role for methylation in cytokine receptor signaling (Lin, W.-J. et al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and Scott, H.S. et al. (1998) Genomics 48:330-340).
  • Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important in energy-requiring and -releasing reactions.
  • the metabolic enzyme creatine kinase catalyzes the reversible phosphate transfer between creatine/creatine phosphate and ATP/ADP.
  • Glycocyamine kinase catalyzes phosphate transfer from ATP to guanidoacetate, and arginine kinase catalyzes phosphate transfer from ATP to arginine.
  • a cysteine-containing active site is conserved in this family (PROSITE: PDOC00103).
  • Prenyl transferases are heterodimers, consisting of an alpha and a beta subunit, that catalyze the transfer of an isoprenyl group.
  • An example of a prenyl transferase is the mammalian protein farnesyl transferase.
  • the alpha subunit of farnesyl transferase consists of 5 repeats of 34 amino acids each, with each repeat containing an invariant tryptophan (PROSITE: PDOC00703).
  • Saccharyl transferases are glycating enzymes involved in a variety of metabolic processes. Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts.
  • Coenzyme A (CoA) transferase catalyzes the transfer of CoA between two carboxylic acids.
  • Succinyl CoA:3-oxoacid CoA transferase for example, transfers CoA from succinyl-CoA to a recipient such as acetoacetate.
  • Acetoacetate is essential to the metabolism of ketone bodies, which accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). Hydrolases
  • Hydrolysis is the breaking of a covalent bond in a substrate by introduction of a molecule of water.
  • the reaction involves a nucleophilic attack by the water molecule's oxygen atom on a target bond in the substrate.
  • the water molecule is split across the target bond, breaking the bond and generating two product molecules.
  • Hydrolases participate in reactions essential to such functions as synthesis and degradation of cell components, and for regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Hydrolases are involved in key steps in disease processes involving these functions.
  • Hydrolytic enzymes may be grouped by substrate specificity into classes including phosphatases, peptidases, lysophospholipases, phosphodiesterases, glycosidases, and glyoxalases.
  • Phosphatases hydrolytically remove phosphate groups from proteins, an energy-providing step that regulates many cellular processes, including intracellular signaling pathways that in turn control cell growth and differentiation, cell-cell contact, the cell cycle, and oncogenesis.
  • Lysophospholipases regulate intracellular lipids by catalyzing the hydrolysis of ester bonds to remove an acyl group, a key step in lipid degradation.
  • Small LPL isoforms approximately 15-30 kD, function as hydrolases; larger isoforms function both as hydrolases and transacylases.
  • Peptidases also called proteases, cleave peptide bonds that form the backbone of peptide or protein chains. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Since typical protein half-lives range from hours to a few days, peptidases are continually cleaving precursor proteins to their active form, o removing signal sequences from targeted proteins, and degrading aged or defective proteins.
  • Peptidases function in bacterial, parasitic, and viral invasion and replication within a host.
  • peptidases include trypsin and chymotrypsin (components of the complement cascade and the blood-clotting cascade) lysosomal cathepsins, calpains, pepsin, renin, and chymosin (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New s York NY, pp. 1-5).
  • the phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell growth and replication as well as protein synthesis. Another phosphodiesterase is acid o sphingomyelinase, which hydrolyzes the membrane phospholipid sphingomyelin to ceramide and phosphorylcholine. Phosphorylcholine is used in the synthesis of phosphatidylcholine, which is involved in numerous intracellular signaling pathways.
  • Ceramide is an essential precursor for the generation of gangliosides, membrane lipids found in high concentration in neural tissue.
  • Defective acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyelin molecules in 5 lysosomes, resulting in Niemann-Pick disease.
  • Glycosidases catalyze the cleavage of hemiacetyl bonds of glycosides, which are compounds that contain one or more sugar.
  • Mammalian lactase-phlorizin hydrolase for example, is an intestinal enzyme that splits lactose.
  • Mammalian beta-galactosidase removes the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated o with a gangliosidosis known as Morquio disease type B.
  • Vertebrate lysosomal alpha-glucosidase which hydrolyzes glycogen, maltose, and isomaltose
  • vertebrate intestinal sucrase-isomaltase which hydrolyzes sucrose, maltose, and isomaltose
  • the glyoxylase system is involved in gluconeogenesis, the production of glucose from 5 storage compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and glyoxylase II, which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. Glyoxylases are involved in hyperglyce ia, non-insulin-dependent diabetes mellitus, the detoxification of bacterial toxins, and in the control of cell proliferation and microtubule assembly. Lyases Lyases are a class of enzymes that catalyze the cleavage of C-C, C-O, C-N, C-S, C-(halide),
  • Lyases are critical components of cellular biochemistry with roles in metabolic energy production including fatty acid metabolism, as well as other diverse enzymatic processes. Further classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group.
  • the group of C-C lyases include carboxyl-lyases (decarboxylases), aldehyde-lyases (aldolases), oxo-acid-lyases and others.
  • the C-O lyase group includes hydro-lyases, lyases acting on polysaccharides and other lyases.
  • the C-N lyase group includes ammonia-lyases, amidine-lyases, amine-lyases (deaminases) and other lyases.
  • Proper regulation of lyases is critical to normal physiology. For example, mutation induced deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in the genetically-linked disorder familial porphyria cutanea tarda (Mendez, M. et al. (1998) Am. J. Genet. 63:1363-1375).
  • adenosine deaminase (ADA) deficiency stems from genetic mutations in the ADA gene, resulting in the disorder severe combined immunodeficiency disease (SCID) (Hershfield, M.S. (1998) Semin. Hematol. 35:291-298). Isomerases
  • Isomerases are a class of enzymes that catalyze geometric or structural changes within a molecule to form a single product. This class includes racemases and epimerases, cis-trans- isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) Biochemistry, W.H. Freeman and Co., New York NY, pp.483-507).
  • Racemases are a subset of isomerases that catalyze inversion of a molecules configuration around the asymmetric carbon atom in a substrate having a single center of asymmetry, thereby interconvertmg two racemers.
  • Epimerases are another subset of isomerases that catalyze inversion of configuration around an asymmetric carbon atom in a substrate with more than one center of symmetry, thereby interconvertmg two epimers.
  • Racemases and epimerases can act on amino acids and derivatives, hydroxy acids and derivatives, as well, as carbohydrates and derivatives.
  • the interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4'-epimerase.
  • Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer of electrons from a substrate that becomes oxidized to a substrate that becomes reduced. This class of enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases.
  • oxidoreductase levels are physiologically important.
  • genetically-linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. (1977) Pediat. Res. 11:1198-1202).
  • Transferases transfera chemical group from one compound (the donor) to another compound (the acceptor).
  • the types of groups transferred by these enzymes include acyl groups, amino groups, phosphate groups
  • topoisomerases are enzymes that affect the topological state of DNA. For example, defects in topoisomerases or their regulation can affect normal physiology.
  • Ligases catalyze the formation of a bond between two substrate molecules. The process involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are classified based on the nature of the type of bond they form, which can include carbon-oxygen, carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds. Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RNA (tRNA) synthetases which are important RNA-associated enzymes with roles in translation. Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA.
  • tRNA aminoacyl-transfer RNA
  • the 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, and each class is characterized by a distinctive topology of the catalytic domain.
  • Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold.
  • Class II enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel ⁇ -sheet motif, as well as N- and C- terminal regulatory domains.
  • Class II enzymes are separated into two groups based on the heterodimeric or homodimeric structure of the enzyme; the latter group is further subdivided by the structure of the N- and C-terminal regulatory domains (Hartlein, M. and S. Cusack (1995) J. Mol. Evol.
  • Ligases forming carbon-sulfur bonds mediate a large number of cellular biosynthetic intermediary metabolism processes involve intermolecular transfer of carbon atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic acid cycle, synthesis of fatty acids and long-chain phospholipids, synthesis of alcohols and aldehydes, synthesis of intermediary metabolites, and reactions involved in the amino acid degradation pathways. Some of these reactions require input of energy, usually in the form of conversion of ATP to either ADP or AMP and pyrophosphate.
  • a carbon substrate is derived from a small molecule containing at least two carbon atoms.
  • the carbon substrate is often covalently bound to a larger molecule which acts as a carbon substrate carrier molecule within the cell.
  • the carrier molecule is coenzyme A.
  • Coenzyme A is structurally related to derivatives of the nucleotide ADP and consists of 4'-phosphopantetheine linked via a phosphodiester bond to the alpha phosphate group of adenosine 3',5'-bisphosphate.
  • the terminal thiol group of 4'-phos ⁇ ho ⁇ antetheine acts as the site for carbon substrate bond formation.
  • the predominant carbon substrates which utilize CoA as a carrier molecule during biosynthesis and intermediary metabolism in the cell are acetyl, succinyl, and propionyl moieties, collectively referred to as acyl groups.
  • Other carbon substrates include enoyl lipid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein.
  • Acyl-CoA and acetyl-CoA are synthesized in the cell by acyl-CoA synthetase and acetyl-CoA synthetase, respectively.
  • acyl-CoA synthetase activity i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight carboxylic acids and is found in muscle mitochondria and the cytosol of other tissues; ⁇ ) medium-chain acyl-CoA synthetase, which activates fatty acids containing between four and eleven carbon atoms (predominantly from dietary sources), and is present only in liver mitochondria; and iii) acyl CoA synthetase, which is specific for long chain fatty acids with between six and twenty carbon atoms, and is found in microsomes and the mitochondria.
  • acyl-CoA synthetase activity has been identified from many sources including bacteria, yeast, plants, mouse, and man.
  • the activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by cAMP-dependent protein kinase.
  • Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase (glutamate-ammonia ligase) that catalyzes the animation of glutamic acid to glutamine by ammonia using the energy of ATP hydrolysis.
  • glutamine synthetase glutamine synthetase
  • Glutamine is the primary source for the amino group in various amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and pyrimidine ribonucleotide interconversions.
  • Overexpression of glutamine synthetase has been observed in primary liver cancer (Christa, L. et al. (1994) Gastroent. 106:1312-1320).
  • Acid-amino-acid ligases are represented by the ubiquitin proteases which are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of 5 cellular proteins in eukaryotic cells and some bacteria.
  • UCS ubiquitin conjugation system
  • the UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression.
  • proteins targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein.
  • Ub is first activated by a ubiquitin-activating enzyme (El), and then transferred to one of several Ub- 0 conjugating enzymes (E2).
  • E2 then links the Ub molecule through its C-terminal glycine to an internal lysine (acceptor lysine) of a target protein.
  • the ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease.
  • the UCS is implicated in the degradation of mitotic cyclic krnases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors 5 associated with signal transduction, transcriptional regulators, and mutated or damaged proteins
  • Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis.
  • This enzyme is also similar to another carbon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea 5 cycle (Powell, S.M. et al. (1992) FEBS Lett. 303:4-10).
  • de novo synthesis of the pyrimidine nucleotides uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide orotidylate derived from orotate and phosphoribosyl pyrophosphate (PPRP).
  • PPRP phosphoribosyl pyrophosphate
  • ATCase aspartate transcarbamylase
  • carbamyl phosphate synthetase II carbamyl phosphate synthetase II
  • DHOase dihydroorotase 5
  • Ligases forming carbon-carbon bonds include the carboxylases acetyl-CoA carboxylase and pyruvate carboxylase.
  • Acetyl-CoA carboxylase catalyzes the carboxylation of acetyl-CoA from CO 2 5 and H 2 O using the energy of ATP hydrolysis.
  • Acetyl-CoA carboxylase is the rate-limiting step in the biogenesis of long-chain fatty acids.
  • Two isoforms of acetyl-CoA carboxylase, types I and types II, are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297- 306).
  • Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle.
  • o Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a DNA chain using the energy from ATP hydrolysis to first activate the free 5 '-phosphate of one nucleotide and then react it with the 3'-OH group of the adjacent nucleotide.
  • This resealing reaction is used in both DNA replication to join small DNA fragments called Okazaki fragments that are 5 transiently formed in the process of replicating new DNA, and in DNA repair.
  • DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the DNA can occur.
  • Bloom's syndrome is an inherited human disease in which individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, B. et al. o (1994) The Molecular Biology of the Cell. Garland Publishing Inc., New York NY, p. 247).
  • Cell division is the fundamental process by which all living things grow and reproduce. In unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, while in multicellular species many rounds of cell division are required to replace cells lost by wear or by programmed cell death, and for cell differentiation to produce a new tissue or organ. Details of 0 the cell division cycle may vary, but the basic process consists of three principle events. The first event, interphase, involves preparations for cell division ⁇ replication of the DNA, and production of essential proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of the cell. The final event, cytokinesis, is division and fission of the cell cytoplasm.
  • the sequence and timing of cell cycle transitions is under the control of the cell cycle regulation 5 system which controls the process by positive or negative regulatory circuits at various check points. Regulated progression of the cell cycle depends on the integration of growth control pathways with the basic cell cycle machinery.
  • Cell cycle regulators have been identified by selecting for human and yeast cDNAs that block or activate cell cycle arrest signals in the yeast mating pheromone pathway when they are overexpressed.
  • Known regulators include human CPR (cell cycle 0 progression restoration) genes, such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including CDC91, that block the arrest signals.
  • the CPR genes express a variety of proteins including cyclins, tumor suppressor binding proteins, chaperones, transcription factors, translation factors, and RNA-binding proteins (Edwards, M.C. et al.(1997) Genetics 147:1063-1076).
  • cyclin-dependent kinases Cdks
  • the Cdks are composed of a kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levels of regulation.
  • Cdk There appears to be a single Cdk in Saccharomyces cerevisiae and Saccharomyces pombe whereas mammals have a variety of specialized Cdks. Cyclins act by binding to and activating cyclin-dependent protein kinases which then phosphorylate and activate selected o proteins involved in the mitotic process.
  • the Cdk-cyclin complex is both positively and negatively regulated by phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53.
  • Cdks are further regulated by binding to inhibitors and other proteins such as Sucl that modify their specificity or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 10:1503-1515; and Mathias, N. et al. (1996) Mol. Cell Biol. 16:6634-6643). 5 Reproduction
  • the male and female reproductive systems are complex and involve many aspects of growth and development.
  • the anatomy and physiology of the male and female reproductive systems are reviewed in (Guyton, A.C. (1991) Textbook of Medical Physiology, W.B. Saunders Co., Philadelphia PA, pp. 899-928).
  • the male reproductive system includes the process of spermatogenesis, in which the sperm are formed, and male reproductive functions are regulated by various hormones and their effects on accessory sexual organs, cellular metabolism, growth, and other bodily functions.
  • Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones released from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell divisions before undergoing meiosis and full maturation. The testes secrete several male sex hormones, the most abundant being testosterone, that is essential for growth and division of the immature sperm, and for the masculine characteristics of the male body. Three other male sex hormones, gonadotropin-releasing hormone (GnRH), luteinizing hormone (LH), and follicle- stimulating hormone (FSH) control sexual function.
  • GnRH gonadotropin-releasing hormone
  • LH luteinizing hormone
  • FSH follicle- stimulating hormone
  • the uterus, ovaries, fallopian tubes, vagina, and breasts comprise the female reproductive system.
  • the ovaries and uterus are the source of ova and the location of fetal development, respectively.
  • the fallopian tubes and vagina are accessory organs attached to the top and bottom of the uterus, respectively.
  • Both the uterus and ovaries have additional roles in the development and loss of reproductive capability during a female's lifetime.
  • the primary role of the breasts is lactation. Multiple endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other tissues coordinate reproduction and lactation. These signals vary during the monthly menstruation cycle and during the female's lifetime. Similarly, the sensitivity of reproductive organs to these endocrine signals varies during the female's lifetime.
  • a combination of positive and negative feedback to the ovaries, pituitary and hypothalamus glands controls physiologic changes during the monthly ovulation and endometrial cycles.
  • the anterior pituitary secretes two major gonadotropin hormones, follicle-stimulating hormone (FSH) and luteinizing hormone (LH), regulated by negative feedback of steroids, most notably by ovarian estradiol. If fertilization does not occur, estrogen and progesterone levels decrease. This sudden reduction of the ovarian hormones leads to menstruation, the desquamation of the endometrium.
  • FSH follicle-stimulating hormone
  • LH luteinizing hormone
  • Hormones further govern all the steps of pregnancy, parturition, lactation, and menopause.
  • hCG human chorionic gonadotropin
  • estrogens progesterone
  • hCS human chorionic somatomammotropin
  • hCG a glycoprotein similar to luteinizing hormone
  • hCS is similar to growth hormone and is crucial for fetal nutrition.
  • the female breast also matures during pregnancy. Large amounts of estrogen secreted by the placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the secretion of prolactin by the pituitary gland.
  • Parturition involves several hormonal changes that increase uterine contractility toward the end of pregnancy, as follows.
  • the levels of estrogens increase more than those of progesterone.
  • Oxytocin is secreted by the neurohypophysis. Concomitantly, uterine sensitivity to oxytocin 5 increases.
  • the fetus itself secretes oxytocin, cortisol (from adrenal glands), and prostaglandins. Menopause occurs when most of the ovarian follicles have degenerated.
  • the ovary then produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands.
  • Mean levels of circulating FSH and LH increase, even as ovulatory cycles continue. Therefore, the ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual 0 cycles. Consequently, menstrual bleeding ceases and reproductive capability ends.
  • Tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis.
  • Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of 5 proteins which control cell cycle progression in response to extracellular signals, such as growth factors and other mitogens, and intracellular cues, such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. 0 Growth factors were originally described as serum factors required to promote cell proliferation. Most growth factors are large, secreted polypeptides that act on cells in their local environment.
  • Growth factors bind to and activate specific cell surface receptors and initiate intracellular signal transduction cascades.
  • Many growth factor receptors are classified as receptor tyrosine kinases which undergo autophosphorylation upon ligand binding.
  • Autophosphorylation 5 enables the receptor to interact with signal transduction proteins characterized by the presence of SH2 or SH3 domains (Src homology regions 2 or 3). These proteins then modulate the activity state of small G-proteins, such as Ras, Rab, and Rho, along with GTPase activating proteins (GAPs), guanine nucleotide releasing proteins (GNRPs), and other guanine nucleotide exchange factors.
  • GAPs GTPase activating proteins
  • GNRPs guanine nucleotide releasing proteins
  • Small G proteins act as molecular switches that activate other downstream events, such as mitogen-activated o protein kinase (MAP kinase) cascades.
  • MAP kinases ultimately activate transcription
  • GPCR G-protein coupled receptor
  • TGF- ⁇ transforming growth factor beta
  • Some growth factors act on some cells to stimulate cell proliferation and on other cells to inhibit it. Growth factors may also stimulate a cell at one concentration and inhibit the same cell at another concentration. Most growth factors also have a multitude of other actions besides the regulation of cell growth and division: they can control the proliferation, survival, differentiation, migration, or function of cells depending on the circumstance.
  • the tumor necrosis factor/nerve growth factor (TNF/NGF) family can activate or inhibit cell death, as well as regulate proliferation and differentiation.
  • the cell response depends on the type of cell, its stage of differentiation and transformation status, which surface receptors are stimulated, and the types of stimuli acting on the cell (Smith, A. et al. (1994) Cell 76:959-962; and Nocenti i, G. et al. (1997) Proc. Natl. Acad. Sci. USA 94:6216-6221).
  • ECM extracellular matrix
  • ECM molecules such as laminin or fibronectin
  • Tenascin-C and -R expressed in developing and lesioned neural tissue, provide stimulatory/anti-adhesive or inhibitory properties, respectively, for axonal growth (Faissner, A. (1997) Cell Tissue Res. 290:331-341).
  • Cancers are associated with the activation of oncogenes which are derived from normal cellular genes. These oncogenes encode oncoproteins which convert normal cells into malignant cells. Some oncoproteins are mutant isoforms of the normal protein, and other oncoproteins are abnormally expressed with respect to location or amount of expression.
  • oncoprotein causes cancer by altering transcriptional control of cell proliferation.
  • Five classes of oncoproteins are known to affect cell cycle controls. These classes include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins.
  • Viral oncogenes are integrated into the human genome after infection of human cells by certain viruses. Examples of viral oncogenes include v-src, v-abl, and v-fps.
  • oncogenes have been identified and characterized. These include sis, erbA, erbB, her- 2, mutated G s , src, abl, ras, crk, jun, fos, myc, and mutated tumor-suppressor genes such as RB, p53, mdm2, Cipl, pl6, and cyclin D. Transformation of normal genes to oncogenes may also occur by chromosomal translocation.
  • the Philadelphia chromosome characteristic of chronic myeloid leukemia and a subset of acute lymphoblastic leukemias, results from a reciprocal translocation between chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint cluster region (bcr) on chromosome 22.
  • Tumor-suppressor genes are involved in regulating cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in uncontrolled cell proliferation.
  • the retinoblastoma gene product (RB) in a non-phosphorylated state, binds several early- response genes and suppresses their transcription, thus blocking cell division. Phosphorylation of RB causes it to dissociate from the genes, releasing the suppression, and allowing cell division to 0 proceed. Apoptosis
  • Apoptosis is the genetically controlled process by which unneeded or defective cells undergo programmed cell death. Selective elimination of cells- is as important for morphogenesis and tissue remodeling as is cell proliferation and differentiation. Lack of apoptosis may result in hyperplasia 5 and other disorders associated with increased cell proliferation. Apoptosis is also a critical component of the immune response. Immune cells such as cytotoxic T-cells and natural killer cells prevent the spread of disease by inducing apoptosis in tumor cells and virus-infected cells. In addition, immune cells that fail to distinguish self molecules from foreign molecules must be eliminated by apoptosis to avoid an autoimmune response. 0 Apoptotic cells undergo distinct morphological changes.
  • Hallmarks of apoptosis include cell shrinkage, nuclear and cytoplasmic condensation, and alterations in plasma membrane topology.
  • Biochemically, apoptotic cells are characterized by increased intracellular calcium concentration, fragmentation of chromosomal DNA, and expression of novel cell surface components.
  • Apoptosis generally proceeds in response to a signal which is transduced intracellularly and results in altered patterns of gene expression and protein activity.
  • Signaling molecules such as hormones and cytokines are known both to stimulate and to inhibit apoptosis through interactions with cell surface receptors. Transcription factors also play an important role in the onset of apoptosis.
  • a number of downstream effector molecules, o particularly proteases such as the cysteine proteases called caspases have been implicated in the degradation of cellular components and the proteolytic activation of other apoptotic effectors.
  • Biochemical pathways are responsible for regulating metabolism, growth and development, protein secretion and trafficking, environmental responses, and ecological interactions including immune response and response to parasites.
  • DNA Deoxyribonucleic acid
  • the bulk of human DNA is nuclear, in the form of linear chromosomes, while mitochondrial DNA is circular.
  • DNA replication begins at specific sites called origins of replication. Bidirectional synthesis occurs from the origin via two growing forks that move in opposite directions. Replication is semi-conservative, with each daughter duplex containing one old strand and its newly synthesized complementary partner. Proteins involved in DNA replication include DNA polymerases, DNA primase, telomerase, DNA helicase, topoisomerases, DNA ligases, replication factors, and DNA-binding proteins.
  • DNA Recombination and Repair Cells are constantly faced with replication errors and environmental assault (such as ultraviolet irradiation) that can produce DNA damage.
  • Damage to DNA consists of any change that modifies the structure of the molecule. Changes to DNA can be divided into two general classes, single base changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may produce a disorder, such as cancer. Changes in DNA are recognized by repair systems within the cell. These repair systems act to correct the damage and thus prevent any deleterious affects of a mutational event. Repair systems can be divided into three general types, direct repair, excision repair, and retrieval systems.
  • Proteins involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link repair proteins, recombination and repair proteins, RAD51 proteins, and BLN and WRN proteins that are homologs of RecQ helicase.
  • cells become exceedingly sensitive to environmental mutagens, such as ultraviolet irradiation.
  • environmental mutagens such as ultraviolet irradiation.
  • Patients with disorders associated with a loss in DNA repair systems often exhibit a high sensitivity to environmental mutagens. Examples of such disorders include xeroderma pigmentosum (XP), Bloom's syndrome (BS), and Werner's syndrome (WS) (Yamagata, K. et al. (1998) Proc. Natl. Acad. Sci. USA 95:8733- 8738), ataxia telangiectasia, Cockayne's syndrome, and Fanconi's anemia.
  • XP xeroderma pigmentosum
  • BS Bloom's syndrome
  • WS Werner's syndrome
  • Recombination is the process whereby new DNA sequences are generated by the movements of large pieces of DNA.
  • homologous recombination which occurs during meiosis and DNA repair, parent DNA duplexes align at regions of sequence similarity, and new DNA molecules form by the breakage and joining of homologous segments.
  • Proteins involved include RAD51 recombinase.
  • site-specific recombination two specific but not necessarily homologous DNA 5 sequences are exchanged.
  • this process generates a diverse collection of antibody and T cell receptor genes. Proteins involved in site-specific recombination in the immune system include recombination activating genes 1 and 2 (RAG1 and RAG2).
  • RNA Metabolism 0 Ribonucleic acid is a linear single-stranded polymer of four nucleotides, ATP, CTP,
  • RNA is transcribed as a copy of DNA, the genetic material of the organism, h retroviruses RNA rather than DNA serves as the genetic material. RNA copies of the genetic material encode proteins or serve various structural, catalytic, or regulatory roles in organisms. RNA is classified according to its cellular localization and function. Messenger RNAs 5 (mRNAs) encode polypeptides. Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate mRNA into polypeptides.
  • mRNAs Messenger RNAs 5
  • rRNAs Ribosomal RNAs
  • Transfer RNAs are cytosolic adaptor molecules that function in mRNA translation by recognizing both an mRNA codon and the amino acid that matches that codon.
  • Heterogeneous nuclear RNAs include mRNA precursors and other nuclear RNAs of various sizes.
  • Small o nuclear RNAs are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) and rejoins exons in pre-mRNAs.
  • RNA Processing synthesizes an RNA copy of DNA.
  • Proteins involved include multi-subunit RNA polymerases, transcription factors HA, KB, HD, HE, HF, JJH, and JJJ.
  • Many 5 transcription factors incorporate DNA-binding structural motifs which comprise either ⁇ -helices or ⁇ - sheets that bind to the major groove of DNA.
  • Four well-characterized structural motifs are helix- ' turn-helix, zinc finger, leucine zipper, and helix-loop-helix.
  • RNAs are necessary for processing of transcribed RNAs in the nucleus.
  • Pre- o mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns.
  • the spliceosomal complex is comprised of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6.
  • snRNPs contains a single species of snRNA and about ten proteins.
  • the RNA components of some snRNPs recognize and base-pair with intron consensus sequences.
  • the protein components mediate spliceosome 5 assembly and the splicing reaction.
  • Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, p. 863).
  • hnRNPs Heterogeneous nuclear ribonucleoproteins
  • Some examples of hnRNPs include the yeast proteins 5 Hr lp, involved in cleavage and polyadenylation at the 3' end of the RNA; Cbp80p, involved in capping the 5' end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, involved in export of mRNA from the nucleus (Shen, E.C.
  • HnRNPs have been shown to be important targets of the autoimmune response in rheumatic diseases (Biamonti, supra). Many snRNP proteins, hnRNP proteins, and alternative splicing factors are characterized by 0 an RNA recognition motif (RRM). (Reviewed in Birney, E. et al. (1993) Nucleic Acids Res.
  • the RRM is about 80 amino acids in length and forms four ⁇ -strands and two ⁇ - helices arranged in an ⁇ / ⁇ sandwich.
  • the RRM contains a core RNP-1 octapeptide motif along with surrounding conserved sequences.
  • RNA Stability and Degradation 5 RNA helicases alter and regulate RNA conformation and secondary structure by using energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes.
  • the most well-characterized and ubiquitous family of RNA helicases is the DEAD-box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family.
  • DEAD-box helicases Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. o DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box. helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 337:121-122.)
  • DEAD-box 1 protein may play a role in the progression of 5 neuroblastoma (Nb) and retinoblastoma (Rb) tumors.
  • DEAD-box helicases have been implicated either directly or indirectly in ultraviolet light-induced tumors, B cell lymphoma, and myeloid malignancies. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168.)
  • RNases Ribonucleases catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus cleaving the RNA.
  • RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of o pre-tRNAs as part of their maturation process.
  • RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle.
  • RNase H domains are often found as a domain associated with reverse transcriptases.
  • RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase 5 activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Protein Translation
  • the eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome.
  • the ribosome also contains more than fifty proteins.
  • the ribosomal proteins have a prefix which denotes 5 the subunit to which they belong, either L (large) or S (small).
  • L (large) or S (small) Three important sites are identified on the ribosome.
  • the aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome.
  • the peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds.
  • the exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (Translation is reviewed in Stryer, L. 0 (1995) Biochemistry, W.H. Freeman and Company, New York NY, pp. 875-908; and Lodish, H. et al. (1995) Molecular Cell Biology. Scientific American Books, New York NY, pp. 119-138.) tRNA Charging
  • Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA.
  • the aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of 5 an amino acid with its cognate tRNA.
  • the 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, Class I and Class U. Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (DUD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals. 0 Translation Initiation
  • Initiation of translation can be divided into three stages.
  • the first stage brings an initiator transfer RNA (Met-fRNA f ) together with the 40S ribosomal subunit to form the 43S preinitiation complex.
  • the second stage binds the 43S preinitiation complex to the mRNA, followed by migration of the complex to the correct AUG initiation codon.
  • the third stage brings the 60S ribosomal subunit 5 to the 40S subunit to generate an 80S ribosome at the initiation codon.
  • Regulation of translation primarily involves the first and second stage in the initiation process (Pain, V.M. (1996) Eur. J. Biochem. 236:747-771).
  • eIF2 a guanine nucleotide binding protein
  • eIF2B a guanine nucleotide exchange protein
  • elFIA and eJF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal structural proteins.
  • eIF3 is also involved in association of the 40S ribosomal 5 subunit with mRNA.
  • the Met-tRNA t , elFl A, eJF3, and 40S ribosomal subunit together make up the 43S preinitiation complex (Pain, supra). Additional factors are required for binding of the 43S preinitiation complex to an mRNA molecule, and the process is regulated at several levels.
  • eJJF4F is a complex consisting of three proteins: eIF4E, eIF4A, and eIF4G.
  • eJJF4E recognizes and binds to the mRNA 5 -terminal m 7 GTP cap
  • eJP4A is a bidirectional RNA-dependent helicase
  • eIF4G is a scaffolding polypeptide.
  • eIF4G has three binding domains.
  • the .N-terminal third of eJJF4G interacts with eJF4E, the central third interacts with eIF4A, and the C-terminal third interacts with eIF3 bound to the 43S preinitiation complex.
  • eJF4G acts as a bridge between the 40S ribosomal subunit and the mRNA (Hentze, M.W. (1997) Science 275:500-501).
  • the ability of eJF4F to initiate binding of the 43S preinitiation complex is regulated by structural features of the mRNA.
  • the mRNA molecule has an untranslated region (UTR) between the 5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that impede binding of the 43S preinitiation complex.
  • the helicase activity of eIF4A is thought to function in removing this secondary structure to facilitate binding of the 43S preinitiation complex (Pain, supra).
  • Elongation is the process whereby additional amino acids are joined to the initiator methionine to form the complete polypeptide chain.
  • the elongation factors EFl ⁇ , EFl ⁇ ⁇ , and EF2 are involved in elongating the polypeptide chain following initiation.
  • EFl ⁇ is a GTP-binding protein. In EFl ⁇ 's GTP-bound form, it brings an aminoacyl-tRNA to the ribosome' s A site. The amino acid attached to the newly arrived aminoacyl-tRNA forms a peptide bond with the initiator methionine.
  • the GTP on EFl ⁇ is hydrolyzed to GDP, and EFl ⁇ -GDP dissociates from the ribosome.
  • EFl ⁇ ⁇ binds EFl ⁇ -GDP and induces the dissociation of GDP from EFl ⁇ , allowing EFl ⁇ to bind GTP and a new cycle to begin.
  • EF-G another GTP-binding protein, catalyzes the translocation of tRNAs from the A site to the P site and finally to the E site of the ribosome. This allows the processivity of translation.
  • the release factor eRF carries out termination of translation. eRF recognizes stop codons in the mRNA, leading to the release of the polypeptide chain from the ribosome.
  • Proteins may be modified after translation by the addition of phosphate, sugar, prenyl, fatty acid, and other chemical groups. These modifications are often required for proper protein activity. Enzymes involved in post-translational modification include kinases, phosphatases, glycosyltransferases, and prenyltransferases. The conformation of proteins may also be modified after translation by the introduction and rearrangement of disulfide bonds (rearrangement catalyzed by protein disulfide isomerase), the isomerization of proline sidechains by prolyl isomerase, and by interactions with molecular chaperone proteins.
  • Proteins may also be cleaved by proteases. Such cleavage may result in activation, inactivation, or complete degradation of the protein.
  • proteases include serine proteases, cysteine proteases, aspartic proteases, and metalloproteases.
  • Signal peptidase in the endoplasmic reticulum (ER) lumen cleaves the signal peptide from membrane or secretory proteins that are imported into the ER.
  • Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria.
  • UCS ubiquitin conjugation system
  • the UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression.
  • proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein.
  • Proteins involved in the UCS include ubiquitin-activating enzyme, ubiquitin-conjugating enzymes, ubiquitin-ligases, and ubiquitin C-terminal hydrolases.
  • the ubiquitinated protein is then recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease.
  • Lipids are water-insoluble, oily or greasy substances that are soluble in nonpolar solvents such as chloroform or ether.
  • Neutral fats triacylglycerols
  • Polar lipids such as phospholipids, sphingolipids, glycolipids, and cholesterol, are key structural components of cell membranes.
  • Lipid metabolism is involved in human diseases and disorders. In the arterial disease atherosclerosis, fatty lesions form on the inside of the arterial wall. These lesions promote the loss of arterial flexibility and the formation of blood clots (Guyton, A.C. Textbook of Medical Physiology (1991) W.B. Saunders Company, Philadelphia PA, pp.760-763).
  • the GM 2 ganglioside (a sphingolipid) accumulates in lysosomes of the central nervous system due to a lack of the enzyme N-acetylhexosaminidase.
  • Patients suffer nervous system degeneration leading to early death (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY, p. 2171).
  • the Niemann-Pick diseases are caused by defects in lipid metabolism.
  • Niemann-Pick diseases types A and B are caused by accumulation of sphingomyelin (a sphingolipid) and other lipids in the central nervous system due to a defect in the enzyme sphingomyelinase, leading to neurodegeneration and lung disease.
  • Niemann-Pick disease type C results from a defect in cholesterol transport, leading to the accumulation of sphingomyelin and cholesterol in lysosomes and a secondary reduction in sphingomyelinase activity.
  • Neurological symptoms such as grand mal seizures, ataxia, and loss of previously learned speech, manifest 1-2 years after birth.
  • Fatty acids are long-chain organic acids with a single carboxyl group and a long non-polar hydrocarbon tail.
  • Long-chain fatty acids are essential components of glycolipids, phospholipids, and cholesterol, which are building blocks for biological membranes, and of triglycerides, which are biological fuel molecules.
  • Long-chain fatty acids are also substrates for eicosanoid production, and are important in the functional modification of certain complex carbohydrates and proteins. 16- carbon and 18-carbon fatty acids are the most common.
  • Fatty acid synthesis occurs in the cytoplasm. In the first step, acetyl-Coenzyme A (CoA) carboxylase (ACC) synthesizes malonyl-CoA from acetyl-CoA and bicarbonate.
  • CoA acetyl-Coenzyme A
  • ACC carboxylase
  • FAS fatty acid synthase
  • FAS catalyzes the synthesis of palmitate from acetyl-CoA and malonyl-CoA.
  • FAS contains acetyl transferase, malonyl transferase, ⁇ -ketoacetyl synthase, acyl carrier protein, ⁇ -ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase activities.
  • the final product of the FAS reaction is the 16-carbon fatty acid palmitate.
  • Triacylglycerols also known as triglycerides and neutral fats, are major energy stores in animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3-phosphate is produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from glycerol by glycerol kinase. Fatty acid-CoA's are produced from fatty acids by fatty acyl-CoA synthetases. Glyercol-3-phosphate is acylated with two fatty acyl-CoA's by the enzyme glycerol phosphate acyltransferase to give phosphatidate.
  • Phosphatidate phosphatase converts phosphatidate to diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride acyltransferase.
  • Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol synthetase complex bound to the ER membrane.
  • a major class of phospholipids are the phosphoglycerides, which are composed of a glycerol backbone, two fatty acid chains, and a phosphorylated alcohol.
  • Phosphoglycerides are components of cell membranes.
  • Principal phosphoglycerides are phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol.
  • Many enzymes involved in phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology and Biotechnology, VCH Publishers Inc., New York NY, pp. 494-501).
  • Phosphatidate is converted to CDP-diacylglycerol by the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 2.7.7.41).
  • the enzyme phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl ethanolamine, using a pyruvate cofactor (Voelker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244).
  • Phosphatidyl choline is formed using diet-derived choline by the reaction of CDP-choline with 1,2-diacylglycerol, catalyzed by diacylglycerol cholinephosphotransferase (ExPASy ENZYME 2.7.8.2).
  • Cholesterol composed of four fused hydrocarbon rings with an alcohol at one end, moderates the fluidity of membranes in which it is incorporated.
  • cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, estrogen, and testosterone.
  • Bile salts derived from cholesterol facilitate the digestion of lipids.
  • Cholesterol in the skin forms a barrier that prevents excess water evaporation from the body.
  • Farnesyl and geranylgeranyl groups which are derived from cholesterol biosynthesis intermediates, are post-translationally added to signal transduction proteins such as ras and protein-targeting proteins such as rab. These modifications are important for the activities of these proteins (Guyton, supra; Stryer, supra, pp. 279-280, 691-702, 934).
  • Mammals obtain cholesterol derived from both de novo biosynthesis and the diet.
  • the liver is the major site of cholesterol biosynthesis in mammals.
  • Two acetyl-CoA molecules initially condense to form acetoacetyl-CoA, catalyzed by a tbiolase.
  • Acetoacetyl-CoA condenses with a third acetyl-CoA to form hydroxymethylglutaryl-CoA (HMG-CoA), catalyzed by HMG-CoA synthase.
  • Conversion of HMG-CoA to cholesterol is accomplished via a series of enzymatic steps known as the mevalonate pathway.
  • the rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG- CoA reductase.
  • mevalonate pathway enzymes include mevalonate kinase, phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase, dimethylallyl transferase, geranyl transferase, farnesyl-diphosphate farnesyltransferase, squalene monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholesterol reductase.
  • Cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, aldosterone, estrogen, and testosterone.
  • cholesterol is converted to pregnenolone by cholesterol monooxygenases.
  • the other steroid hormones are synthesized from pregnenolone by a series of enzyme-catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and demethylations. Examples of these enzymes include steroid ⁇ -isomerase, 3 ⁇ -hydroxy- ⁇ 5 -steroid dehydrogenase, steroid 21 -monooxygenase, steroid 19-hydroxylase, and 3 ⁇ -hydroxysteroid dehydrogenase. Cholesterol is also the precursor to vitamin D.
  • Isoprenoid groups are found in vitamin K, ubiquinone, retinal, dolichol phosphate (a carrier of oligosaccharides needed for N-linked glycosylation), and farnesyl and geranylgeranyl groups that modify proteins. Enzymes involved include farnesyl transferase, polyprenyl transferases, dolichyl phosphatase, and dolichyl kinase. Sphingolipid Metabolism
  • Sphingolipids are an important class of membrane lipids that contain sphingosine, a long chain amino alcohol. They are composed of one long-chain fatty acid, one polar head alcohol, and sphingosine or sphingosine derivative.
  • the three classes of sphingolipids are sphingomyelins, cerebrosides, and gangliosides. Sphingomyelins, which contain phosphocholine or phosphoethanolamine as their head group, are abundant in the myelin sheath surrounding nerve cells.
  • Galactocerebrosides which contain a glucose or galactose head group, are characteristic of the brain. Other cerebrosides are found in nonneural tissues. Gangliosides, whose head groups contain multiple sugar units, are abundant in the brain, but are also found in nonneural tissues.
  • Sphingolipids are built on a sphingosine backbone.
  • Sphingosine is acylated to ceramide by the enzyme sphingosine acetyltransferase.
  • Ceramide and phosphatidyl choline are converted to sphingomyelin by the enzyme ceramide choline phosphotransferase.
  • Cerebrosides are synthesized by the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues to ceramide by transferase enzymes yields gangliosides. Eicosanoid Metabolism
  • Eicosanoids including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated from phospholipids by phospholipase A 2 and from diacylglycerols by diacylglycerol lipase.
  • Leukotrienes are produced from arachidonate by the action of lipoxygenases.
  • Prostaglandin synthase, reductases, and isomerases are responsible for the synthesis of the prostaglandins.
  • Prostaglandins have roles in inflammation, blood flow, ion transport, synaptic transmission, and sleep.
  • Prostacyclin and the thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and thromboxane synthases, respectively.
  • acetyl-CoA molecules derived from fatty acid oxidation in the liver can condense to form acetoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone.
  • These three products are known as ketone bodies.
  • Enzymes involved in ketone body metabolism include HMG-CoA synthetase, HMG-CoA cleavage enzyme, D-3-hydroxybutyrate dehydrogenase, acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase.
  • Ketone bodies are a normal fuel supply of the heart and renal cortex.
  • Acetoacetate produced by the liver is transported to cells where the acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle, i times of starvation, ketone bodies produced from stored triacylglyerols become an important fuel source, especially for the brain. Abnormally high levels of ketone bodies are observed in diabetics. Diabetic coma can result if ketone body levels become too great. Lipid Mobilization
  • Diazepam binding inhibitor also known as endozepine and acyl CoA-binding protein, is an endogenous ⁇ -aminobutyric acid (GAB A) receptor ligand which is thought to down-regulate the effects of GABA.
  • DBI binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters (OMIM * 125950 Diazepam Binding Inhibitor; DBI; PROSITE PDOC00686 Acyl-CoA-binding protein signature).
  • Fat stored in liver and adipose triglycerides may be released by hydrolysis and transported in the blood. Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol esters in the blood are transported in lipoprotein particles.
  • the particles consist of a core of hydrophobic lipids surrounded by a shell of polar lipids and apolipoproteins.
  • the protein components serve in the solubilization of hydrophobic lipids and also contain cell-targeting signals.
  • Lipoproteins include chylomicrons, chylomicron remnants, very-low-density lipoproteins (VLDL), intermediate- density lipoproteins (IDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL).
  • VLDL very-low-density lipoproteins
  • IDL intermediate- density lipoproteins
  • LDL low-density lipoproteins
  • HDL high-density lipoproteins
  • Triacylglycerols in chylomicrons and VLDL are hydrolyzed by lipoprotein lipases that line blood vessels in muscle and other tissues that use fatty acids.
  • Cell surface LDL receptors bind LDL particles which are then internalized by endocytosis. Absence of the LDL receptor, the cause of the disease familial hypercholesterolemia, leads to increased plasma cholesterol levels and ultimately to atherosclerosis.
  • Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters from HDL to apolipoprotein B -containing lipoproteins. Cholesteryl ester transfer protein is important in the reverse cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. et al. (1997) Curr. Opin.
  • Macrophage scavenger receptors which bind and internalize modified lipoproteins, play a role in lipid transport and may contribute to atherosclerosis (Greaves, D.R. et al. (1998) Curr. Opin. Lipidol. 9:425-432).
  • SREBP sterol regulatory element binding protein
  • OSBP oxysterol-binding protein
  • Mitochondrial and peroxisomal beta-oxidation enzymes degrade saturated and unsaturated 0 fatty acids by sequential removal of two-carbon units from CoA-activated fatty acids.
  • the main beta- oxidation pathway degrades both saturated and unsaturated fatty acids while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
  • Mitochondria oxidize short-, medium-, and long- 5 chain fatty acids to produce energy for cells.
  • Mitochondrial beta-oxidation is a major energy source for cardiac and skeletal muscle, liver, it provides ketone bodies to the peripheral circulation when glucose levels are low as in starvation, endurance exercise, and diabetes (Eaton, S. et al. (1996) Biochem. J. 320:345-357).
  • Peroxisomes oxidize medium-, long-, and very-long-chain fatty acids, dicarboxylic fatty acids, branched fatty acids, prostaglandins, xenobiotics, and bile acid o intermediates.
  • the chief roles of peroxisomal beta-oxidation are to shorten toxic lipophilic carboxylic acids to facilitate their excretion and to shorten very-long-chain fatty acids prior to mitochondrial beta-oxidation (Mannaerts, G.P. and P.P. van Veldhoven (1993) Biochimie 75:147- 158).
  • Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, 5 acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogenase, ⁇ - ketothiolase, 2,4-dienoyl CoA reductase, and isomerase.
  • LPLs Lysophospholipases
  • a particular substrate for LPLs lysophosphatidylcholine, causes lysis of cell membranes when it is formed or imported into a cell.
  • LPLs are regulated by lipid factors including acylcarnitine, arachidonic acid, and phosphatidic acid.
  • the secretory phospholipase A 2 (PLA2) superfamily comprises a number of heterogeneous enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of phosphoglycerides. Hydrolysis of the glycerophospholipids releases free fatty acids and lysophospholipids.
  • PLA2 activity generates precursors for the biosynthesis of biologically active 5 lipids, hydroxy fatty acids, and platelet-activating factor.
  • PLA2 hydrolysis of the sn-2 ester bond in phospholipids generates free fatty acids, such as arachidonic acid and lysophospholipids.
  • Carbohydrates including sugars or saccharides, starch, and cellulose, are aldehyde or ketone compounds with multiple hydroxyl groups. The importance of carbohydrate metabolism is 0 demonstrated by the sensitive regulatory system in place for maintenance of blood glucose levels. Two pancreatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and increased glucose release from cells, respectively. Carbohydrates have three important roles in mammalian cells. First, carbohydrates are used as energy stores, fuels, and metabolic intermediates. Carbohydrates are broken down to form energy in glycolysis and are stored as 5 glycogen for later use. Second, the sugars deoxyribose and ribose form part of the structural support of DNA and RNA, respectively.
  • carbohydrate modifications are added to secreted and membrane proteins and lipids as they traverse the secretory pathway.
  • Cell surface carbohydrate- containing macromolecules including glycoproteins, glycolipids, and transmembrane proteoglycans, mediate adhesion with other cells and with components of the extracellular matrix.
  • the extracellular o matrix is comprised of diverse glycoproteins, glycosaminoglycans (GAGs), and carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the cell surface.
  • GAGs glycosaminoglycans
  • carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the cell surface.
  • the interaction of the cell with the surrounding matrix profoundly influences cell shape, strength, flexibility, motility, and adhesion.
  • Carbohydrate metabolism is altered in several disorders including diabetes mellitus, hyperglycemia, hypoglycemia, galactosemia, galactokinase deficiency, and UDP-galactose-4- epimerase deficiency (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine, McGraw- Hill, New York NY, pp. 2208-2209).
  • Altered carbohydrate metabolism is associated with cancer.
  • Reduced GAG and proteoglycan expression is associated with human lung carcinomas (Nackaerts, K. et al. (1997) Int. J. Cancer 74:335-345).
  • Glycolysis Enzymes of the glycolytic pathway convert the sugar glucose to pyruvate while simultaneously producing ATP.
  • the pathway also provides building blocks for the synthesis of cellular components such as long-chain fatty acids. After glycolysis, pyrvuate is converted to acetyl- Coenzyme A, which, in aerobic organisms, enters the citric acid cycle.
  • Glycolytic enzymes include hexokinase, phosphoglucose isomerase, phosphofructokinase, aldolase, triose phosphate isomerase, glyceraldehyde 3-phosphate dehydrogenase, phosphoglycerate kinase, phosphoglyceromutase, enolase, and pyruvate kinase.
  • phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of glycolysis.
  • Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate and amino acids.
  • the pathway which functions mainly in times of starvation and intense exercise, occurs mostly in the liver and kidney.
  • responsible enzymes include pyruvate carboxylase, phosphoenolpyruvate carboxykinase, fructose 1,6-bisphos ⁇ hatase, and glucose-6-phosphatase. Pentose Phosphate Pathway
  • Pentose phosphate pathway enzymes are responsible for generating the reducing agent NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5-phosphate. Ribose-5- phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme A, NAD + , FAD, RNA, and DNA.
  • the pentose phosphate pathway has both oxidative and non- oxidative branches. The oxidative branch steps, which are catalyzed by the enzymes glucose-6- phosphate dehydrogenase, lactonase, and 6-phosphogluconate dehydrogenase, convert glucose-6- phosphate and NADP + to ribulose-6-phosphate and NADPH.
  • non-oxidative branch steps which are catalyzed by the enzymes phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase, allow the interconversion of three-, four-, five-, six-, and seven-carbon sugars.
  • Glucouronate Metabolism phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase
  • Glucuronate is a monosaccharide which, in the form of D-glucuronic acid, is found in the GAGs chondroitin and dermatan. D-glucuronic acid is also important in the detoxification and excretion of foreign organic compounds such as phenol. Enzymes involved in glucuronate metabolism include UDP-glucose dehydrogenase and glucuronate reductase. Disaccharide Metabolism
  • Disaccharides must be hydrolyzed to monosaccharides to be digested. Lactose, a disaccharide found in milk, is hydrolyzed to galactose and glucose by the enzyme lactase. Maltose is derived from plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in insects and mushrooms, is hydrolyzed to glucose by the enzyme trehalase (OMIM *275360 Trehalase; Ruf, J. et al. (1990) J. Biol. Chem. 265:15034-15039).
  • Lactase, maltase, sucrase, and trehalase are bound to mucosal cells lining the small intestine, where they participate in the digestion of dietary disaccharides.
  • lactose synthetase composed of the catalytic subunit galactosyltransferase and the modifier subunit ⁇ -lactalbumin, converts UDP- galactose and glucose to lactose in the mammary glands. Glycogen, Starch, and Chitin Metabolism
  • Glycogen is the storage form of carbohydrates in mammals. Mobilization of glycogen maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly in the liver and in skeletal muscle in the form of cytoplasmic granules. These granules contain enzymes that catalyze the synthesis and degradation of glycogen, as well as enzymes that regulate these processes. Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a transferase, ⁇ -l,6-glucosidase, and phosphoglucomutase.
  • Enzymes that catalyze the synthesis of glycogen include UDP-glucose pyrophosphorylase, glycogen synthetase, a branching enzyme, and nucleoside diphosphokinase.
  • the enzymes of glycogen synthesis and degradation are tightly regulated by the hormones insulin, glucagon, and epinephrine.
  • Starch a plant-derived polysaccharide, is hydrolyzed to maltose, maltotriose, and ⁇ -dextrin by ⁇ -amylase, an enzyme secreted by the salivary glands and pancreas.
  • Chitin is a polysaccharide found in insects and Crustacea.
  • a chitotriosidase is secreted by macrophages and may play a role in the degradation of chitin-containing pathogens (Boot, R.G. et al. (1995) J. Biol. Chem. 270:26252-26256).
  • Peptidoglycans and Glycosaminoglycans are secreted by macrophages and may play a role in the degradation of chitin-containing pathogens.
  • GAGs are anionic linear unbranched polysaccharides composed of repetitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed of a core protein attached to one or more GAGs. GAGs are found on the cell surface, inside cells, and in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases including autoimmune thyroid disease, autoimmune diabetes mellitus, and systemic lupus erythematosus (Hansen, C. et al. (1996) Clin. Exp. Rheum. 14 (Suppl.
  • GAGs include chondroitin sulfate, keratan sulfate, heparin, heparan sulfate, dermatan sulfate, and hyaluronan.
  • HA GAG hyaluronan
  • GAG hyaluronan The GAG hyaluronan (HA) is found in the extracellular matrix of many cells, especially in soft connective tissues, and is abundant in synovial fluid (Pitsillides, A.A. et al. (1993) Int. J. Exp. Pathol. 74:27-34). HA seems to play important roles in cell regulation, development, and differentiation (Laurent, T.C. and J.R. Fraser (1992) FASEB J. 6:2397-2404).
  • Hyaluronidase is an enzyme that degrades HA to oligosaccharides. Hyaluronidases may function in cell adhesion, infection, angiogenesis, signal transduction, reproduction, cancer, and inflammation.
  • Proteoglycans also known as peptidoglycans, are found in the extracellular matrix of connective tissues such as cartilage and are essential for distributing the load in weight-bearing joints.
  • Cell-surface-attached proteoglycans anchor cells to the extracellular matrix. Both extracellular and cell-surface proteoglycans bind growth factors, facilitating their binding to cell-surface receptors and subsequent triggering of signal transduction pathways.
  • NH 4 + is assimilated into amino acids by the actions of two enzymes, glutamate dehydrogenase and glutamine synthetase.
  • the carbon skeletons of amino acids come from the intermediates of glycolysis, the pentose phosphate pathway, or the citric acid cycle.
  • humans can synthesize only thirteen (nonessential amino acids). The remaining nine must come from the diet (essential amino acids).
  • Enzymes involved in nonessential amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, adenosylhomocysteinase, cystathionine ⁇ -synthase, cystathionine ⁇ -lyase, phosphoglycerate dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine hydroxylmethyltransferase, and glycine synthase.
  • Metabolism of amino acids takes place almost entirely in the liver, where the amino group is removed by aminotransferases (transaminases), for example, alanine aminotransferase.
  • the amino group is transferred to ⁇ -ketoglutarate to form glutamate.
  • Glutamate dehydrogenase converts glutamate to NH 4 + and ⁇ -ketoglutarate.
  • NH 4 + is converted to urea by the urea cycle which is catalyzed by the enzymes arginase, ornithine transcarbamoylase, arginosuccinate synthetase, and arginosuccinase.
  • Carbamoyl phosphate synthetase is also involved in urea formation.
  • Enzymes involved in the metabolism of the carbon skeleton of amino acids include serine dehydratase, asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain ⁇ -keto dehydrogenase complex, isovaleryl CoA dehydrogenase, ⁇ -methylcrotonyl CoA carboxylase, phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase.
  • Polyamines which include spermidine, putrescine, and spermine, bind tightly to nucleic acids and are abundant in rapidly proliferating cells.
  • Enzymes involved in polyamine synthesis include ornithine decarboxylase.
  • Metabolic pathways feature anaerobic and aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine (GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the triphosphate drives reactions needed for cell maintenance, growth, and proliferation.
  • ADP adenosine diphosphate
  • ATP triphosphate
  • GDP/GTP guanosine
  • UDP/UTP uridine
  • CDP/CTP cytidine
  • Digestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are converted in the liver to glucose. Enzymes involved in these conversions include galactose- 1- phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts glucose to pyruvate in a series of reactions coupled to ATP synthesis.
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to CO 2 with concomitant formation of NADH, FADH j , and GTP.
  • the transport of electrons from NADH and FADH 2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P 1 by the F Q F J ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis mclude the F Q F J ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c-,, FeS protein, and cytochrome c oxidase.
  • Triglycerides are hydrolyzed to fatty acids and glycerol by lipases. Glycerol is then phosphorylated to glycerol-3-phosphate by glycerol kinase and glycerol phosphate dehydrogenase, and degraded by the glycolysis. Fatty acids are transported into the mitochondria as fatty acyl- carnitine esters and undergo oxidative degradation.
  • Cofactors are small molecular weight inorganic or organic compounds that are required for the action of an enzyme. Many cofactors contain vitamins as a component. Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, tetrahydrofolate, lipoamide, and heme. The vitamins biotin and cobalamin are associated with enzymes as well. Heme, a prosthetic group found in myoglobin and hemoglobin, consists of protoporphyrin group bound to iron.
  • Porphyrin groups contain four substituted pyrroles covalently joined in a ring, often with a bound metal atom.
  • Enzymes involved in porphyrin synthesis include ⁇ - aminolevulinate synthase, ⁇ -aminolevulinate dehydrase, porphobilinogen deaminase, and cosynthase. Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte turnover.
  • Enzymes involved in heme degradation include heme oxygenase and biliverdin reductase. Iron is a required cofactor for many enzymes.
  • iron is found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q reductase. Iron is transported in the blood by the protein rransferrin. Binding of transferrin to the transferrm receptor on cell surfaces allows uptake by receptor mediated endocytosis. Cytosolic iron is bound to ferritin protein.
  • a molybdenum-containing cofactor (molybdopterin) is found in enzymes including sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental retardation and lens dislocation. Other diseases caused by defects in cofactor metabolism include pernicious anemia and methylmalonic aciduria. Secretion and Trafficking Eukaryotic cells are bound by a lipid bilayer membrane and subdivided into functionally distinct, membrane bound compartments.
  • the membranes maintain the essential differences between the cytosol, the extracellular environment, and the lumenal space of each intracellular organelle.
  • lipid membranes are highly impermeable to most polar molecules, transport of essential nutrients, metabolic waste products, cell signaling molecules, macromolecules and proteins across lipid membranes and between organelles must be mediated by a variety of transport-associated molecules. Protein Trafficking
  • ER-bound ribosomes In eukaryotes, some proteins are synthesized on ER-bound ribosomes, co-translationally imported into the ER, delivered from the ER to the Golgi complex for post-translational processing and sorting, and transported from the Golgi to specific intracellular and extracellular destinations. All cells possess a constitutive transport process which maintains homeostasis between the cell and its environment. In many differentiated cell types, the basic machinery is modified to carry out specific transport functions. For example, in endocrine glands, hormones and other secreted proteins are packaged into secretory granules for regulated exocytosis to the cell exterior.
  • ER-bound ribosomes Synthesis of most integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organelle occurs on ER-bound ribosomes. These proteins are co-translationally imported into the ER. The proteins leave the ER via membrane-bound vesicles which bud off the ER at specific sites and fuse with each other (homotypic fusion) to form the ER-Golgi Intermediate Compartment (ERGIC). The ERGIC matures progressively through the cis, medial, and trans cisternal stacks of the Golgi, modifying the enzyme composition by retrograde transport of specific Golgi enzymes. In this way, proteins moving through the Golgi undergo post-translational modification, such as glycosylation.
  • post-translational modification such as glycosylation.
  • the final Golgi compartment is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination.
  • TGN Trans-Golgi Network
  • secretory vesicle which contains proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Secretory vesicles eventually fuse with the plasma membrane (Glick, B.S. and V. Malhotra (1998) Cell 95:883-889).
  • the secretory process can be constitutive or regulated. Most cells have a constitutive pathway for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse with the plasma membrane. In many cells, such as endocrine cells, digestive cells, and neurons, vesicle pools derived from the TGN collect in the cytoplasm and do not fuse with the plasma membrane until they are directed to by a specific signal. Endocytosis
  • Endocytosis wherein cells internalize material from the extracellular environment, is essential for transmission of neuronal, metabolic, and proliferative signals; uptake of many essential nutrients; and defense against invading organisms. Most cells exhibit two forms of endocytosis. The first, phagocytosis, is an actin-driven process exemplified in macrophage and neutrophils. Material to be endocytosed contacts numerous cell surface receptors which stimulate the plasma membrane to extend and surround the particle, enclosing it in a membrane-bound phagosome. In the mammalian immune system, IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes.
  • Activation of the Fc receptors initiates a signal cascade involving src-family cytosolic kinases and the monomeric GTP-binding (G) protein Rho.
  • G GTP-binding
  • the resulting actin reorganization leads to phagocytosis of the particle. This process is an important component of the humoral immune response, allowing the processing and presentation of bacterial-derived peptides to antigen-specific T-lymphocytes.
  • the second form of endocytosis is a more generalized uptake of material from the external milieu.
  • pinocytosis is activated by ligand binding to cell surface receptors. Activation of individual receptors stimulates an internal response that includes 5 coalescence of the receptor-ligand complexes and formation of clathrin-coated pits. Invagination of the plasma membrane at clathrin-coated pits produces an endocytic vesicle within the cell cytoplasm. These vesicles undergo homotypic fusion to form an early endosomal (EE) compartment.
  • the tubulovesicular EE serves as a sorting site for incoming material.
  • ATP-driven proton pumps in the EE membrane lowers the pH of the EE lumen (pH 6.3-6.8).
  • the acidic environment causes many 0 ligands to dissociate from their receptors.
  • the receptors, along with membrane and other integral membrane proteins, are recycled back to the plasma membrane by budding off the tubular extensions of the EE in recycling vesicles (RV).
  • RV recycling vesicles
  • This selective removal of recycled components produces a carrier vesicle containing ligand and other material from the external environment.
  • the carrier vesicle fuses with TGN-derived vesicles which contain hydrolytic enzymes.
  • the acidic environment 5 of the resulting late endosome (LE) activates the hydrolytic enzymes which degrade the ligands and other material. As digestion takes place, the LE fuses with the lysosome where digestion is completed (Mellman, I. (1996) Annu. Rev. Cell Dev. Biol. 12:575-625).
  • Recycling vesicles may return directly to the plasma membrane.
  • Receptors internalized and returned directly to the plasma membrane have a turnover rate of 2-3 minutes.
  • Receptors following this route have a turnover rate of 5-10 minutes.
  • Still other RVs are retained within the cell until an appropriate signal is received (Mellman, supra; and James, D.E. et al. (1994) Trends Cell Biol. 4:120-126).
  • Vesicle Formation 5 Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles.
  • vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes.
  • the process begins with the budding of a vesicle out of the donor membrane.
  • the membrane-bound vesicle contains proteins to be transported and is o surrounded by a protective coat made up of protein subunits recruited from the cytosol.
  • the initial budding and coating processes are controlled by a cytosolic ras-like GTP-binding protein, ADP- ribosylating factor (Arf), and adapter proteins (AP).
  • Clathrin coats form on the TGN and PM surfaces, whereas coatomer or COP coats form on the ER and Golgi.
  • COP coats can further be distinguished as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPII, involved in 5 anterograde traffic from the ER to the Golgi (Mellman, supra).
  • the COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. (Harter, C.
  • VAMP vesicle-associated membrane protein
  • a cytosolic prenylated GTP-binding protein Rab (a member of the Ras superfamily)
  • Rab a member of the Ras superfamily
  • GTP-bound Rab proteins are directed into nascent transport vesicles where they interact with VAMP.
  • GAPs GTPase activating proteins
  • a cytosolic protein, guanine-nucleotide dissociation inhibitor (GDI) helps return GDP-bound Rab o proteins to their membrane of origin.
  • Rab isoforms have been identified and appear to associate with specific compartments within the cell.
  • Rab proteins appear to play a role in mediating the function of a viral gene, Rev, which is essential for replication of HJV-1, the virus responsible for AIDS (Flavell, R.A. et al. (1996) Proc. Natl. Acad. Sci. USA 93:4421-4424).
  • N-ethylmaleimide sensitive factor (NSF) and soluble NSF-attachment protein ( ⁇ -SNAP and ⁇ -SNAP) 0 are two such proteins that are conserved from yeast to man and function in most intracellular membrane fusion reactions.
  • Seel represents a family of yeast proteins that function at many different stages in the secretory pathway including membrane fusion. Recently, mammalian homologs of Seel, called Munc-18 proteins, have been identified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Hata et al. supra). 5
  • the SNARE complex involves three SNARE molecules, one in the vesicular membrane and two in the target membrane.
  • Synaptotagmin is an integral membrane protein in the synaptic vesicle which associates with the t-SNARE syntaxin in the docking complex. Synaptotagmin binds calcium in a complex with negatively charged phospholipids, which allows the cytosolic SNAP protein to displace synaptotagmin from syntaxin and fusion to occur. Thus, synaptotagmin is a negative regulator of fusion in the neuron (Littleton, J.T. et al. (1993) Cell 74: 1125-1134). The most abundant 5 membrane protein of synaptic vesicles appears to be the glycoprotein synaptophysin, a 38 kDa protein with four transmembrane domains.
  • v-SNARE v-SNARE
  • t-SNAREs t-SNAREs
  • associated proteins v-SNARE
  • Different isoforms of SNAREs and Rabs show distinct cellular and subcellular distributions.
  • VAMP-1/synaptobrevin, membrane-anchored synaptosome-associated 0 protein of 25 kDa (SNAP-25), syntaxin-1, Rab3A, Rabl5, and Rab23 are predominantly expressed in the brain and nervous system.
  • syntaxin, VAMP, and Rab proteins are associated with distinct subcellular compartments and their vesicular carriers.
  • NPCs nuclear pore complexes
  • All nuclear proteins are imported from the cytoplasm, their site of synthesis.
  • tRNA and mRNA are exported from the nucleus, their site of synthesis, to the cytoplasm, their site of function.
  • Processing of small nuclear RNAs involves export into the cytoplasm, assembly with proteins and modifications such as hypermethylation to produce small nuclear ribonuclear proteins o (snRNPs), and subsequent import of the snRNPs back into the nucleus.
  • snRNPs small nuclear ribonuclear proteins o
  • ribosomes require the initial import of ribosomal proteins from the cytoplasm, their incorporation with RNA into ribosomal subunits, and export back to the cytoplasm. (G ⁇ rlich, D. and I.W. Mattaj (1996) Science 271:1513-1518.)
  • NLS nuclear localization signals
  • NLS nuclear localization signals
  • NTF2 o homodimeric protein nuclear transport factor 2
  • abnormal hormonal secretion is linked to disorders such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone, ACTH).
  • cancer cells secrete excessive amounts of hormones or other biologically active peptides.
  • Disorders related to excessive secretion of biologically active peptides by tumor cells include fasting hypoglycemia due to increased insulin secretion from insulinoma-islet cell tumors; hypertension due to increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal medulla and sympathetic paraganglia; and carcinoid syndrome, which is characterized by abdominal cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors.
  • vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors.
  • Biologically active peptides that are ectopically synthesized in and secreted from tumor cells include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung and bladder cancers); calcitonin (lung and breast cancers); and thyroid-stimulating hormone (medullary thyroid carcinoma).
  • ACTH and vasopressin lung and pancreatic cancers
  • parathyroid hormone lung and bladder cancers
  • calcitonin lung and breast cancers
  • thyroid-stimulating hormone medullary thyroid carcinoma.
  • Such peptides may be useful as diagnostic markers for tumorigenesis (Schwartz, M.Z. (1997) Semin. Pediatr. Surg. 3:141-146; and Said, S.I. and G.R. Faloona (1975) N. Engl. J. Med. 293:155-160).
  • Defective nuclear transport may play a role in cancer.
  • the BRCA1 protein contains three potential NLSs which interact with importin alpha, and is transported into the nucleus by the importin/NPC pathway.
  • the BRCA1 protein is aberrantly localized in the cytoplasm.
  • the mislocation of the BRCA1 protein in breast cancer cells may be due to a defect in the NPC nuclear import pathway (Chen, C.F. et al. (1996) J. Biol. Chem. 271:32863-32868).
  • Organisms respond to the environment by a number of pathways.
  • Heat shock proteins including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to cellular proteins.
  • Aquaporins are channels that transport water and, in some cases, nonionic small solutes such as urea and glycerol. Water movement is important for a number of physiological processes including renal fluid filtration, aqueous humor generation in the eye, cerebrospinal fluid production in the brain, and appropriate hydration of the lung. Aquaporins are members of the major intrinsic protein (MIP) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev. Physiol. 58:619-648; Ishibashi, K. et al. (1997) J. Biol. Chem. 272:20782-20786).
  • MIP major intrinsic protein
  • MTs The metallothioneins
  • cysteine-rich proteins that bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in metal detoxification or the metabolism and homeostasis of metals.
  • Arsenite-resistance proteins have been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997) Mutat. Res. 386:307-314).
  • Proteins involved in light perception include rhodopsin, transducin, and cGMP phosphodiesterase. Proteins involved in odor perception include multiple olfactory receptors. Other proteins are important in human Orcadian rhythms and responses to wounds. Immunity and Host Defense
  • the cellular components of the humoral immune system include six different types of leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, eosinophils, and basophils) and plasma cells. Additionally, fragments of megakaryocytes, a seventh type of white blood cell in the bone marrow, occur in large numbers in the blood as platelets.
  • Leukocytes are formed from two stem cell lineages in bone marrow.
  • the myeloid stem cell line produces granulocytes and monocytes and, the lymphoid stem cell produces lymphocytes.
  • Lymphoid cells travel to the thymus, spleen and lymph nodes, where they mature and differentiate into lymphocytes.
  • Leukocytes are responsible for defending the body against invading pathogens. Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them by phagocytosis. Monocytes enter tissues and differentiate into macrophages which are extremely phagocytic.
  • Lymphocytes and plasma cells are a part of the immune system which recognizes specific foreign molecules and organisms and inactivates them, as well as signals other cells to attack the invaders. Granulocytes and monocytes are formed and stored in the bone marrow until needed.
  • Megakaryocytes are produced in bone marrow, where they fragment into platelets and are released into the bloodstream.
  • the main function of platelets is to activate the blood clotting mechanism.
  • Lymphocytes and plasma cells are produced in various lymphogenous organs, including the lymph nodes, spleen, thymus, and tonsils.
  • Tissue 5 inflammation in response to pathogen invasion results in production of chemo-attractants for leukocytes, such as endotoxins or other bacterial products, prostaglandins, and products of leukocytes or platelets.
  • Basophils participate in the release of the chemicals involved in the inflammatory process.
  • the main function of basophils is secretion of these chemicals to such a degree that they have been 0 referred to as "unicellular endocrine glands.”
  • a distinct aspect of basopbilic secretion is that the contents of granules go directly into the extracellular environment, not into vacuoles as occurs with neutrophils, eosinophils and monocytes.
  • Basophils have receptors for the Fc fragment of immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE with anti-IgE or other ligands triggers degranulation.
  • Eosinophils are bi- or multi-nucleated white blood cells which contain eosinophilic granules.
  • Ig receptors particularly IgG and IgE.
  • eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. They have specific functions in parasitic infections and allergic reactions, and are thought to detoxify some of the substances released by mast cells and basophils which cause inflammation. Additionally, o they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation. Macrophages are monocytes that have left the blood stream to settle in tissue. Once monocytes have migrated into tissues, they do not re-enter the bloodstream.
  • the mononuclear phagocyte system is comprised of precursor cells in the bone marrow, monocytes in circulation, and macrophages in tissues.
  • the system is capable of very fast and extensive phagocytosis.
  • a 5 macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive for many more months.
  • Macrophages are also capable of ingesting large particles, including red blood cells and malarial parasites. They increase several-fold in size and transform into macrophages that are characteristic of the tissue they have entered, surviving in tissues for several months.
  • Mononuclear phagocytes are essential in defending the body against invasion by foreign 0 pathogens, particularly intracellular microorganisms such as M. tuberculosis, listeria, leishmania and toxoplasma. Macrophages can also control the growth of tumorous cells, via both phagocytosis and secretion of hydrolytic enzymes. Another important function of macrophages is that of processing antigen and presenting them in a biochemically modified form to lymphocytes.
  • the immune system responds to invading microorganisms in two major ways: antibody 5 production and cell mediated responses.
  • Antibodies are immunoglobulin proteins produced by
  • T cells T-Iymphocytes
  • the infected cell is either killed or signals are secreted which activate macrophages and other cells to destroy the infected cell (Paul, supra).
  • T-lymphocytes originate in the bone marrow or liver in fetuses. Precursor cells migrate via the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is crucial because of positive and negative selection of T cells that will react with foreign antigen and not with self molecules.
  • T cells After processing, T cells continuously circulate in the blood and secondary lymphoid tissues, such as lymph nodes, spleen, certain epithelium-associated tissues in the 0 gastrointestinal tract, respiratory tract and skin.
  • T-lymphocytes When T-lymphocytes are presented with the complementary antigen, they are stimulated to proliferate and release large numbers of activated T cells into the lymph system and the blood system. These activated T cells can survive and circulate for several days.
  • T memory cells are created, which remain in the lymphoid tissue for months or years. Upon subsequent exposure to that specific antigen, these memory cells will 5 respond more rapidly and with a stronger response than induced by the original antigen. This creates an "immunological memory" that can provide immunity for years.
  • T cells There are two major types of T cells: cytotoxic T cells destroy infected host cells, and helper T cells activate other white blood cells via chemical signals.
  • helper T cells activate other white blood cells via chemical signals.
  • T H 1 activates macrophages to destroy ingested microorganisms, while another, T H 2, stimulates the production of o antibodies by B cells.
  • Cytotoxic T cells directly attack the infected target cell.
  • virus-infected cells peptides derived from viral proteins are generated by the proteasome. These peptides are transported into the ER by the transporter associated with antigen processing (TAP) (Pamer, E. and P. Cresswell (1998) Annu. Rev. Immunol. 16:323-358).
  • TAP antigen processing
  • the peptides bind MHC I chains, and the 5 peptide/MHC I complex is transported to the cell surface.
  • Receptors on the surface of T cells bind to antigen presented on cell surface MHC molecules.
  • T cells Once activated by binding to antigen, T cells secrete ⁇ -interferon, a signal molecule that induces the expression of genes necessary for presenting viral (or other) antigens to cytotoxic T cells. Cytotoxic T cells kill the infected cell by stimulating programmed cell death. o Helper T cells constitute up to 75% of the total T cell population. They regulate the immune functions by producing a variety of lymphokines that act on other cells in the immune system and on bone marrow. Among these lymphokines are: interleukins-2,3,4,5,6; granulocyte-monocyte colony stimulating factor, and ⁇ -interferon.
  • Helper T cells are required for most B cells to respond to antigen.
  • an activated helper 5 cell contacts a B cell, its centrosome and Golgi apparatus become oriented toward the B cell, aiding the directing of signal molecules, such as transmembrane-bound protein called CD40 ligand, onto the B cell surface to interact with the CD40 transmembrane protein.
  • Secreted signals also help B cells to proliferate and mature and, in some cases, to switch the class of antibody being produced.
  • B-lymphocytes produce antibodies which react with specific antigenic proteins presented by pathogens. Once activated, B cells become filled with extensive rough endoplasmic 5 reticulum and are known as plasma cells. As with T cells, interaction of B cells with antigen stimulates proliferation of only those B cells which produce antibody specific to that antigen.
  • Antibodies or immunoglobulins (Ig), are the founding members of the Ig superfamily and the central components of the humoral immune response. Antibodies are either expressed on the surface of B cells or secreted by B cells into the circulation. Antibodies bind and neutralize blood- borne foreign antigens.
  • the prototypical antibody is a tetramer consisting of two identical heavy 5 polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition.
  • the five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the ⁇ , ⁇ , e, ⁇ , and ⁇ .
  • H-chain types There are two types of L- chains, K and ⁇ , either of which may associate as a pair with any H-chain pair.
  • IgG the most o common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
  • H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs c. 5 within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. In addition, H chains such as ⁇ have been shown to associate with other polypeptides during differentiation of the B cell. •
  • Antibodies can be described in terms of their two main functional domains. Antigen o recognition is mediated by the Fab (antigen binding fragment) region of the antibody, while effector functions are mediated by the Fc (crystallizable fragment) region. Binding of antibody to an antigen, such as a bacterium, triggers the destruction of the antigen by phagocytic white blood cells such as macrophages and neutrophils. These cells express surface receptors that specifically bind to the antibody Fc region and allow the phagocytic cells to engulf, ingest, and degrade the antibody-bound 5 antigen.
  • an antigen such as a bacterium
  • the Fc receptors expressed by phagocytic cells are single-pass transmembrane glycoproteins of about 300 to 400 amino acids (Sears, D.W. et al. (1990) J. Immunol. 144:371-378).
  • the extracellular portion of the Fc receptor typically contains two or three Ig domains.
  • AIDS Abnormal Immunodeficiency Syndrome
  • helper T cells are depleted, leaving the patient susceptible to infection by microorganisms and parasites.
  • Another widespread medical condition attributable to the immune system is that of allergic reactions to certain antigens. Allergic reactions include: hay fever, asthma, anaphylaxis, and urticaria (hives).
  • Leukemias are an excess production of white blood cells, to the point where a major portion of the body' s metabolic resources are directed solely at proliferation of white blood cells, leaving other tissues to starve.
  • Leukopenia or agranulocytosis occurs when the bone marrow stops producing white blood cells. This leaves the body unprotected against foreign microorganisms, including those which normally inhabit skin, mucous membranes, and gastrointestinal tract. If all white blood cell production stops completely, infection will occur within two days and death may follow only 1 to 4 days later.
  • Impaired phagocytosis occurs in several diseases, including monocytic leukemia, systemic lupus, and granulomatous disease. In such a situation, macrophages can phagocytize normally, but the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts oxygen to lethally reactive forms results in abscess formation in liver, lungs, spleen, lymph nodes, and beneath the skin. Eosinophilia is an excess of eosinophils commonly observed in patients with allergies (hay fever, asthma), allergic reactions to drugs, rheumatoid arthritis, and cancers (Hodgkin' s disease, lung, and liver cancer) (Isselbacher, K.J. et al. (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, Inc., New York NY).
  • the complement system serves as an effector system and is involved in infectious agent recognition. It can function as an independent immune network or in conjunction with other humoral immune responses.
  • the complement system is comprised of numerous plasma and membrane proteins that act in a cascade of reaction sequences whereby one component activates the next. The result is a rapid and amplified response to infection through either an inflammatory response or increased phagocytosis.
  • the complement system has more than 30 protein components which can be divided into functional groupings including modified serine proteases, membrane-binding proteins and regulators of complement activation. Activation occurs through two different pathways the classical and the alternative. Both pathways serve to destroy infectious agents through distinct triggering mechanisms that eventually merge with the involvement of the component C3.
  • the classical pathway requires antibody binding to infectious agent antigens.
  • the antibodies serve to define the target and initiate the complement system cascade, culminating in the destruction of the infectious agent.
  • the complement can be seen as an effector arm of the humoral immune system.
  • the alternative pathway of the complement system does not require the presence of preexisting antibodies for targeting infectious agent destruction. Rather, this pathway, through low levels of an activated component, remains constantly primed and provides surveillance in the non- immune host to enable targeting and destruction of infectious agents. In this case foreign material triggers the cascade, thereby facilitating phagocytosis or lysis (Paul, supra, pp.918-919).
  • Inflammatory responses are divided into four categories on the basis of pathology and include allergic inflammation, cytotoxic antibody mediated inflammation, immune complex mediated inflammation and monocyte mediated inflammation. Inflammation manifests as a combination of each of these forms with one predominating.
  • Allergic acute inflammation is observed in individuals wherein specific antigens stimulate IgE antibody production.
  • Mast cells and basophils are subsequently activated by the attachment of antigen-IgE complexes, resulting in the release of cytoplasmic granule contents such as Mstamine.
  • the products of activated mast cells can increase vascular permeability and constrict the smooth muscle of breathing passages, resulting in anaphylaxis or asthma.
  • Acute inflammation is also mediated by cytotoxic antibodies and can result in the destruction of tissue through the binding of complement-fixing antibodies to cells.
  • the responsible antibodies are of the IgG or IgM types. Resultant clinical disorders include autoimmune hemolytic anemia and thrombocytopenia as associated with systemic lupus erythematosis.
  • Immune complex mediated acute inflammation involves the IgG or IgM antibody types which combine with antigen to activate the complement cascade.
  • immune complexes bind to neutrophils and macrophages they activate the respiratory burst to form protein- and vessel- damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines.
  • Clinical manifestations include rheumatoid arthritis and systemic lupus erythematosus.
  • macrophages are activated and process antigen for presentation to T cells that subsequently produce lymphokines and monokines. This type of inflammatory response is likely important for defense against intracellular parasites and certain viruses.
  • Clinical associations include, granulomatous disease, tuberculosis, leprosy, and sarcoidosis (Paul, W.E., supra, pp.1017-1018).
  • Intercellular communication is essential for the growth and survival of multicellular organisms, and in particular, for the function of the endocrine, nervous, and immune systems.
  • intercellular communication is critical for developmental processes such as tissue construction and organogenesis, in which cell proliferation, cell differentiation, and morphogenesis must be spatially and temporally regulated in a precise and coordinated manner.
  • Cells communicate with one another through the secretion and uptake of diverse types of signaling molecules such as hormones, growth factors, neuropeptides, and cytokines.
  • Hormones are signaling molecules that coordinately regulate basic physiological processes from embryogenesis throughout adulthood. These processes include metabolism, respiration, reproduction, excretion, fetal tissue differentiation and organogenesis, growth and development, homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly integrated and interdependent. Hormones are secreted by endocrine glands, primarily the hypothalamus and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes.
  • Hormones are often secreted in diurnal, pulsatile, and cyclic patterns. Hormone secretion is regulated by perturbations in blood biochemistry, by other upstream-acting hormones, by neural impulses, and by negative feedback loops. Blood hormone concentrations are constantly monitored and adjusted to maintain optimal, steady-state levels. Once secreted, hormones act only on those target cells that express specific receptors.
  • hyposecretion often occurs when a hormone's gland of origin is damaged or otherwise impaired. Hypersecretion often results from the proliferation of tumors derived from hormone- secreting cells. Inappropriate hormone levels may also be caused by defects in regulatory feedback loops or in the processing of hormone precursors. Endocrine malfunction may also occur when the target cell fails to respond to the hormone.
  • Hormones can be classified biochemically as polypeptides, steroids, eicosanoids, or amines.
  • Polypeptides which include diverse hormones such as insulin and growth hormone, vary in size and function and are often synthesized as inactive precursors that are processed intracellularly into mature, active forms.
  • Amines which include epinephrine and dopamine, are amino acid derivatives that function in neuroendocrine signaling.
  • Steroids which include the cholesterol-derived hormones estrogen and testosterone, function in sexual development and reproduction.
  • Eicosanoids which include prostaglandins and prostacyclins, are fatty acid derivatives that function in a variety of processes.
  • polypeptides and some amines are soluble in the circulation where they are highly susceptible to proteolytic degradation within seconds after their secretion. Steroids and lipids are insoluble and must be transported in the circulation by carrier proteins. The following discussion will focus primarily on polypeptide hormones.
  • Hypothalamic hormones include thyrotropin-releasing hormone, gonadotropin- releasing hormone, somatostatin, growth-hormone releasing factor, corticotropin-releasing hormone, substance P, dopamine, and prolactin-releasing hormone. These hormones directly regulate the secretion of hormones from the anterior lobe of the pituitary.
  • Hormones secreted by the anterior pituitary include adrenocorticotropic hormone (ACTH), melanocyte-stimulating hormone, somatotropic hormones such as growth hormone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), ⁇ - lipotropin, and ⁇ -endorphins.
  • ACTH adrenocorticotropic hormone
  • melanocyte-stimulating hormone such as growth hormone and prolactin
  • glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), ⁇ - lipotropin, and ⁇ -endorphins.
  • FSH follicle-stimulating hormone
  • ⁇ -endorphins ⁇ -endorphins.
  • disorders of the hypothalamus and pituitary often result from lesions such as primary brain tumors, adenomas, infarction associated with pregnancy, hypophysectomy, aneurysms, vascular malformations, thrombosis, infections, immunological disorders, and complications due to head trauma. Such disorders have profound effects on the function of other endocrine glands.
  • disorders associated with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus,
  • Kallman's disease Hand-Schuller-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sella syndrome, and dwarfism.
  • Disorders associated with hyperpituitarism include acromegaly, giantism, and syndrome of inappropriate ADH secretion (SIADH), often caused by benign adenomas.
  • SIADH inappropriate ADH secretion
  • Thyroid hormones secreted by the thyroid and parathyroid primarily control metabolic rates and the regulation of serum calcium levels, respectively.
  • Thyroid hormones include calcitonin, somatostatin, and thyroid hormone.
  • the parathyroid secretes parathyroid hormone.
  • Disorders associated with hypothyroidism include goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism.
  • Disorders associated with hyperthyroidism include thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and
  • Plummer's disease Disorders associated with hyperparathyroidism include Conn disease (chronic hypercalemia) leading to bone resorption and parathyroid hyperplasia.
  • Pancreatic hormones secreted by the pancreas regulate blood glucose levels by modulating the rates of carbohydrate, fat, and protein metabolism.
  • Pancreatic hormones include insulin, glucagon, amylin, ⁇ - aminobutyric acid, gastrin, somatostatin, and pancreatic polypeptide.
  • the principal disorder associated with pancreatic dysfunction is diabetes mellitus caused by insufficient insulin activity. Diabetes mellitus is generally classified as either Type I (insulin-dependent, juvenile diabetes) or Type JJ (non-insulin-dependent, adult diabetes). The treatment of both forms by insulin replacement therapy is well known.
  • Diabetes mellitus often leads to acute complications such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic complications leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
  • hypoglycemia insulin shock
  • coma coma
  • diabetic ketoacidosis lactic acidosis
  • chronic complications leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
  • Growth factors are secreted proteins that mediate intercellular communication. Unlike hormones, which travel great distances via the circulatory system, most growth factors are primarily local mediators that act on neighboring cells. Most growth factors contain a hydrophobic N-terminal signal peptide sequence which directs the growth factor into the secretory pathway. Most growth factors also undergo post-translational modifications within the secretory pathway. These modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond formation. Once secreted, growth factors bind to specific receptors on the surfaces of neighboring target cells, and the bound receptors trigger intracellular signal transduction pathways. These signal transduction pathways elicit specific cellular responses in the target cells. These responses can include the modulation of gene expression and the stimulation or inhibition of cell division, cell differentiation, and cell motility.
  • Growth factors fall into at least two broad and overlapping classes.
  • the broadest class includes the large polypeptide growth factors, which are wide-ranging in their effects. These factors include epidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor- ⁇ (TGF- ⁇ ), insulin-like growth factor (IGF), nerve growth factor (NGF), and platelet-derived growth factor (PDGF), each defining a family of numerous related factors.
  • the large polypeptide growth factors act as mitogens on diverse cell types to stimulate wound healing, bone synthesis and remodeling, extracellular matrix synthesis, and proliferation of epithelial, epidermal, and connective tissues.
  • TGF- ⁇ , EGF, and FGF families also function as inductive signals in the differentiation of embryonic tissue.
  • NGF functions specifically as a neurotrophic factor, promoting neuronal growth and differentiation.
  • Another class of growth factors includes the hematopoietic growth factors, which are narrow in their target specificity. These factors stimulate the proliferation and differentiation of blood cells such as B-lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, macrophages, and their stem cell precursors. These factors include the colony-stimulating factors (G-CSF, M-CSF, GM-CSF, and CSF1-3), erythropoietin, and the cytokines. The cytokines are specialized hematopoietic factors secreted by cells of the immune system and are discussed in detail below.
  • Growth factors play critical roles in neoplastic transformation of cells in vitro and in tumor progression in vivo. Overexpression of the large polypeptide growth factors promotes the proliferation and transformation of cells in culture. Inappropriate expression of these growth factors by tumor cells in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth factors are both structurally and functionally related to oncoproteins, the potentially cancer- 5 causing products of proto-oncogenes.
  • FGF and PDGF family members are themselves homologous to oncoproteins, whereas receptors for some members of the EGF, NGF, and FGF families are encoded by proto-oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors, CRC Press, Ann Arbor MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical 0 Approach, Oxford University Press, New York NY; Habenicht, A., ed. (1990) Growth Factors, Differentiation Factors, and Cytokines, Springer- Verlag, New York NY).
  • Neuropeptides and vasomediators comprise a family of small peptide factors, typically of 20 amino acids or less. These factors generally function in neuronal excitation and o inhibition of vasoconstriction/vasodilation, muscle contraction, and hormonal secretions from the brain and other endocrine tissues.
  • neuropeptides and neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin IT and related peptides involved in smooth muscle stimulation, vasopressin, vasoactive intestinal peptide, and circulatory system-borne signaling 5 molecules such as angiotensin, complement, calcitonin, endothelins, formyl-methionyl peptides, glucagon, cholecystokinin, gastrin, and many of the peptide hormones discussed above.
  • neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin IT and related peptides involved in smooth muscle stimulation, vas
  • NP/VMs can transduce signals directly, modulate the activity or release of other neurotransmitters and hormones, and act as catalytic enzymes in signaling cascades.
  • the effects of NP/VMs range from extremely brief to long-lasting. (Reviewed in Martin, C.R. et al. (1985) Endocrine Physiology, Oxford o University Press, New York NY, pp. 57-62.)
  • Cytokines comprise a family of signaling molecules that modulate the immune system and the inflammatory response. Cytokines are usually secreted by leukocytes, or white blood cells, in response to injury or infection. Cytokines function as growth and differentiation factors that act 5 primarily on cells of the immune system such as B- and T-lymphocytes, monocytes, macrophages, and granulocytes. Like other signaling molecules, cytokines bind to specific plasma membrane receptors and trigger intracellular signal transduction pathways which alter gene expression patterns. There is considerable potential for the use of cytokines in the treatment of inflammation and immune system disorders.
  • Cytokine structure and function have been extensively characterized in vitro. Most cytokines 5 are small polypeptides of about 30 kilodaltons or less. Over 50 cytokines have been identified from human and rodent sources. Examples of cytokine subfamilies include the interferons (IFN- ⁇ , - ⁇ , and - ⁇ ), the interleukins (IL1-IL13), the tumor necrosis factors (TNF- ⁇ and - ⁇ ), and the chemokines. Many cytokines have been produced using recombinant DNA techniques, and the activities of individual cytokines have been determined in vitro. These activities include regulation of leukocyte 0 proliferation, differentiation, and motility.
  • cytokine activity may not reflect the full scope of that cytokine' s activity in vivo.
  • Cytokines are not expressed individually in vivo but are instead expressed in combination with a multitude of other cytokines when the organism is challenged with a stimulus. Together, these cytokines collectively modulate the immune response in a manner appropriate for that 5 particular stimulus. Therefore, the physiological activity of a cytokine is determined by the stimulus itself and by complex interactive networks among co-expressed cytokines which may demonstrate both synergistic and antagonistic relationships.
  • Chemokines comprise a cytokine subfamily with over 30 members. (Reviewed in Wells, T. N.C. and M.C. Peitsch (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initially identified as o chemotactic proteins that recruit monocytes and macrophages to sites of inflammation. Recent evidence indicates that chemokines may also play key roles in hematopoiesis and FflV-1 infection. Chemokines are small proteins which range from about 6-15 kilodaltons in molecular weight. Chemokines are further classified as C, CC, CXC, or CX 3 C based on the number and position of critical cysteine residues.
  • the CC chemokines for example, each contain a conserved motif 5 consisting of two consecutive cysteines followed by two additional cysteines which occur downstream at 24- and 16-residue intervals, respectively (ExPASy PROSITE database, documents PS00472 and PDOC00434).
  • the presence and spacing of these four cysteine residues are highly conserved, whereas the intervening residues diverge significantly.
  • a conserved tyrosine located about 15 residues downstream of the cysteine doublet seems to be important for chemotactic o activity.
  • Most of the human genes encoding CC chemokines are clustered on chromosome 17, although there are a few examples of CC chemokine genes that map elsewhere.
  • chemokines include lymphotactin (C chemokine); macrophage chemotactic and activating factor (MCAF/MCP-1; CC chemokine); platelet factor 4 and JL-8 (CXC chemokines); and fractalkine and neurotractin (CX 3 C chemokines).
  • C chemokine lymphotactin
  • MCAF/MCP-1 macrophage chemotactic and activating factor
  • CC chemokine CC chemokine
  • platelet factor 4 and JL-8 CXC chemokines
  • fractalkine and neurotractin CX 3 C chemokines
  • receptor describes proteins that specifically recognize other molecules.
  • the category is broad and includes proteins with a variety of functions.
  • the bulk of receptors are cell surface proteins which bind extracellular ligands and produce cellular responses in the areas of growth, differentiation, endocytosis, and immune response.
  • Other receptors facilitate the selective transport of proteins out of the endoplasmic reticulum and localize enzymes to particular locations in the cell.
  • the term may also be applied to proteins which act as receptors for ligands with known or unknown chemical composition and which interact with other cellular components. For example, the steroid hormone receptors bind to and regulate transcription of DNA.
  • Regulatory proteins such as growth factors coordinately control these cellular processes and act as mediators in cell-cell signaling pathways.
  • Growth factors are secreted proteins that bind to specific cell-surface receptors on target cells.
  • the bound receptors trigger intracellular signal transduction pathways which activate various downstream effectors that regulate gene expression, cell division, cell differentiation, cell motility, and other cellular processes.
  • Cell surface receptors are typically integral plasma membrane proteins. These receptors recognize hormones such as catecholamines; peptide hormones; growth and differentiation factors; small peptide factors such as thyrotropin-releasing hormone; galanin, somatostatin, and tachykinins; and circulatory system-borne signaling molecules.
  • LDL low density lipoproteins
  • transferrin glucose- or mannose-terminal glycoproteins
  • galactose-terminal glycoproteins galactose-terminal glycoproteins
  • immunoglobulins phosphovitellogenins
  • fibrin proteinase-inhibitor complexes
  • plasminogen activators thrombospondin
  • growth factor receptors including receptors for epidermal growth factor, platelet-derived growth factor, fibroblast growth factor, as well as the growth modulator ⁇ -thrombin, contain intrinsic protein kinase activities. When growth factor binds to the receptor, it triggers the autophosphorylation of a serine, threonine, or tyrosine residue on the receptor. These phosphorylated sites are recognition sites for the binding of other cytoplasmic signaling proteins. These proteins participate in signaling pathways that eventually link the initial receptor activation at the cell surface to the activation of a specific intracellular target molecule. In the case of tyrosine residue autophosphorylation, these signaling proteins contain a common domain referred to as a Src homology (SH) domain.
  • SH Src homology
  • SH2 domains and SH3 domains are found in phospholipase C- ⁇ , PI-3-K p85 regulatory subunit, Ras-GTPase activating protein, and pp60 Q"src (Lowenstein, E.J. et al. (1992) Cell 70:431-442).
  • the cytokine family of receptors share a different common binding domain and include transmembrane receptors for growth hormone (GH), interleukins, erythropoietin, and prolactin.
  • GH growth hormone
  • Other receptors and second messenger-binding proteins have intrinsic serine/threonine protein kinase activity.
  • PK-C calcium- and 5 diacylglycerol-activated/phospholipid-dependant protein kinase
  • PK-R RNA-dependant protein kinase
  • serine/threonine protein kinases including nematode Twitchin, have fibronectin-like, immunoglobulin C2-like domains.
  • G-protein coupled receptors are integral membrane proteins characterized by the 0 presence of seven hydrophobic transmembrane domains which span the plasma membrane and form a bundle of antiparallel alpha ( ⁇ ) helices, These proteins range in size from under 400 to over 1000 amino acids (Strosberg, A.D. (1991) Eur. J. Biochem. 196:1-10; Coughlin, S.R. (1994) Curr. Opin. Cell Biol. 6: 191-197).
  • the amino-terminus of the GPCR is extracellular, of variable length and often glycosylated; the carboxy-terminus is cytoplasmic and generally phosphorylated.
  • Extracellular loops 5 of the GPCR alternate with intracellular loops and link the transmembrane domains.
  • the most conserved domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops.
  • the transmembrane domains account for structural and functional features of the receptor. In most cases, the bundle of ⁇ helices forms a binding pocket.
  • the extracellular N-terminal segment or one or more of the three extracellular loops may also participate in ligand binding. o Ligand binding activates the receptor by inducing a conformational change in intracellular portions of the receptor.
  • the activated receptor interacts with an intracellular heterotrimeric guanine nucleotide binding (G) protein complex which mediates further intracellular signaling activities, generally the production of second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate, or interactions with ion channel proteins (Baldwin, J.M. (1994) Curr. Opin. Cell 5 Biol. 6:180-190).
  • G guanine nucleotide binding
  • GPCRs include those for acetylcholine, adenosine, epinephrine and norepinephrine, bombesin, bradykinin, chemokines, dopamine, endothelin, ⁇ -aminobutyric acid (GABA), follicle- stimulating hormone (FSH), glutamate, gonadotropin-releasing hormone (GnRH), hepatocyte growth factor, histamine, leukotrienes, melanocortins, neuropeptide Y, opioid peptides, opsins, prostanoids, o serotonin, somatostatin, tachykinins, thrombin, thyrotropin-releasing hormone (TRH), vasoactive intestinal polypeptide family, vasopressin and oxytocin, and orphan receptors.
  • GABA ⁇ -aminobutyric acid
  • FSH follicle- stimulating hormone
  • GnRH gonadotropin
  • GPCR mutations which may cause loss of function or constitutive activation, have been associated with numerous human diseases (Coughlin, supra). For instance, retinitis pigmentosa may arise from mutations in the rhodopsin gene. Rhodopsin is the retinal photoreceptor which is located 5 within the discs of the eye rod cell. Parma, J. et al. (1993, Nature 365:649-651) report that somatic activating mutations in the thyrotropin receptor cause hyperfunctioning thyroid adenomas and suggest that certain GPCRs susceptible to constitutive activation may behave as protooncogenes. Nuclear Receptors
  • Nuclear receptors bind small molecules such as hormones or second messengers, leading to increased receptor-binding affinity to specific chromosomal DNA elements. In addition the affinity for other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate and modulate gene expression. Examples of such receptors include the steroid hormone receptors family, the retinoic acid receptors family, and the thyroid hormone receptors family. Ligand-Gated Receptor Ion Channels
  • Ligand-gated receptor ion channels fall into two categories.
  • the first category extracellular ligand-gated receptor ion channels (ELGs), rapidly transduce neurotransmitter-binding events into electrical signals, such as fast synaptic neurotransmission. ELG function is regulated by post- translational modification.
  • the second category intracellular ligand-gated receptor ion channels (ILGs), are activated by many intracellular second messengers and do not require post-translational modification(s) to effect a channel-opening response.
  • ELGs depolarize excitable cells to the threshold of action potential generation. Jxi non- excitable cells, ELGs permit a limited calcium ion-influx during the presence of agonist.
  • ELGs include channels directly gated by neurotransmitters such as acetylcholine, L-glutamate, glycine, ATP, serotonin, GABA, and histamine.
  • ELG genes encode proteins having strong structural and functional similarities. ILGs are encoded by distinct and unrelated gene families and include receptors for cAMP, cGMP, calcium ions, ATP, and metabolites of aracbidonic acid.
  • Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens.
  • Scavenger receptors types I and JJ are trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a transmembrane domain, a large extracellular domain, and a C-terminal cysteine-rich domain.
  • the extracellular domain contains a short spacer domain, an ⁇ -helical coiled-coil domain, and a triple helical collagenous domain.
  • T-Cell Receptors have been shown to bind a spectrum of ligands, including chemically modified lipoproteins and albumin, polyribonucleotides, polysaccharides, phospholipids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. (1995) Cell 80:603-609).
  • the scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa.
  • T cells play a dual role in the immune system as effectors and regulators, coupling antigen recognition with the transmission of signals that induce cell death in infected cells and stimulate proliferation of other immune cells.
  • TCR T cell receptor
  • MHC major histocompatibility molecule
  • Both TCR subunits have an extracellular domain containing both variable and constant regions, a transmembrane domain that traverses the membrane once, and a short intracellular domain (Saito, H. et al. (1984) Nature 309:757-762).
  • the genes for the TCR subunits are constructed through somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context with the TCR initiates signaling cascades that induce the proliferation, o maturation, and function of cellular components of the immune system (Weiss, A. (1991) Annu. Rev. Genet. 25:487-510).
  • TCR genes and alterations in TCR expression have been noted in lymphomas, leukemias, autoimmune disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. Engl. J. Med. 313:529-533; Weiss, supra).
  • Intracellular signaling is the general process by which cells respond to extracellular signals (hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of biochemical reactions that begins with the binding of a signaling molecule to a cell membrane receptor and ends with the activation of an intracellular target molecule.
  • Intermediate steps in the o process involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, and their deactivation by protein phosphatases, and the eventual translocation of some of these activated proteins to the cell nucleus where the transcription of specific genes is triggered.
  • the intracellular signaling process regulates all types of cell functions including cell proliferation, cell differentiation, and gene transcription, and involves a diversity of molecules including protein 5 kinases and phosphatases, and second messenger molecules, such as cyclic nucleotides, calcium- calmodulin, inositol, and various mitogens, that regulate protein phosphorylation.
  • Protein kinases and phosphatases play a key role in the intracellular signaling process by controlling the phosphorylation and activation of various signaling proteins.
  • the high energy o phosphate for this reaction is generally transferred from the adenosine triphosphate molecule (ATP) to a particular protein by a protein kinase and removed from that protein by a protein phosphatase.
  • ATP adenosine triphosphate molecule
  • Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serine/threonine kinases, STK).
  • a few protein kinases have dual specificity for serine/threonine and 5 tyrosine residues. Almost all kinases contain a conserved 250-300 amino acid catalytic domain containing specific residues and sequence motifs characteristic of the kinase family (Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Books, Vol 1:7-20, Academic Press, San Diego CA).
  • STKs include the second messenger dependent protein kinases such as the cyclic-AMP dependent protein kinases (PKA), involved in mediating hormone-induced cellular responses; calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle contraction, glycogen breakdown, and neurotransmission; and the mitogen-activated protein kinases (MAP) which mediate signal transduction from the cell surface to the nucleus via phosphorylation cascades.
  • PKA cyclic-AMP dependent protein kinases
  • CaM calcium-calmodulin dependent protein kinases
  • MAP mitogen-activated protein kinases
  • Altered PKA expression is implicated in a variety of disorders and diseases including cancer, thyroid disorders, diabetes, atherosclerosis, and cardiovascular disease (Isselbacher, K . et al. (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, New York NY, pp. 416-431, 1887).
  • PTKs are divided into transmembrane,
  • Transmembrane PTKs are receptors for most growth factors.
  • Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intracellular regions of cell surface receptors.
  • Receptors that function through non-receptor PTKs include those for cytokines and hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes. Many of these PTKs were first identified as the products of mutant oncogenes in cancer cells in which their activation was no longer subject to normal cellular controls.
  • HPK histidine protein kinase family
  • a histidine residue in the N-terminal half of the molecule (region I) is an autophosphorylation site.
  • Three additional motifs located in the C-terminal half of the molecule include an invariant asparagine residue in region II and two glycine-rich loops characteristic of nucleotide binding domains in regions III and IV.
  • Recently a branched chain alpha-ketoacid dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra).
  • the two principal categories of protein phosphatases are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs).
  • PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508).
  • PTPs reverse the effects of protein tyrosine kinases and play a significant role in cell cycle and cell signaling processes (Charbonneau, supra).
  • PTPs may prevent or reverse cell transformation and the growth of various cancers by controlling the levels of tyrosine phosphorylation in cells. This hypothesis is supported by studies showing that overexpression of PTPs can suppress transformation in cells, and that specific inhibition of PTPs can enhance cell transformation (Charbonneau, supra).
  • Phospholipid and Inositol-Phosphate Signaling Inositol phospholipids are involved in an intracellular signaling pathway that begins with binding of a signaling molecule to a G-protein linked receptor in the plasma membrane.
  • IP 3 diffuses through the plasma membrane to induce calcium release from the endoplasmic reticulum (ER), while diacylglycerol remains in the membrane and helps activate protein kinase C, an STK that phosphorylates selected proteins in the target cell.
  • the calcium response initiated by IP 3 is te ⁇ ninated by the dephosphorylation of IP 3 by specific inositol phosphatases.
  • Cellular responses that are mediated by this pathway are glycogen breakdown in the liver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and thrombin-induced platelet aggregation.
  • Cyclic Nucleotide Signaling Cyclic nucleotides function as intracellular second messengers to transduce a variety of extracellular signals including hormones, light, and neurotransmitters.
  • cyclic-AMP dependent protein kinases PKA
  • PKA cyclic-AMP dependent protein kinases
  • Visual excitation and the phototransmission of light signals in the eye is controlled by cyclic-GMP regulated, Ca 2+ -specific channels. Because of the importance of cellular levels of cyclic nucleotides in mediating these various responses, regulating the synthesis and breakdown of cyclic nucleotides is an important matter.
  • adenylyl cyclase which synthesizes cAMP from AMP, is activated to increase cAMP levels in muscle by binding of adrenaline to ⁇ -andrenergic receptors, while activation of guanylate cyclase and increased cGMP levels in photore ⁇ eptors leads to reopening of the Ca 2+ -specific channels and recovery of the dark state in the eye.
  • hydrolysis of cyclic nucleotides by cAMP and cGMP-specific phosphodiesterases (PDEs) produces the opposite of these and other effects mediated by increased cyclic nucleotide levels.
  • PDEs appear to be particularly important in the regulation of cyclic nucleotides, considering the diversity found in this family of proteins. At least seven families of mammalian PDEs (PDE1-7) have been identified based on substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, J.A. (1995) Physiological Reviews 75:725-748). PDE inhibitors have been found to be particularly useful in treating various clinical disorders. Rolipram, a specific inhibitor of PDE4, has been used in the treatment of depression, and similar inhibitors are undergoing evaluation as anti-inflammatory agents. Theophylline is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and other respiratory diseases (Banner, K.H. and C.P. Page (1995) Eur. Respir. J. 8:996-1000). G-Protein Signaling
  • G-proteins are critical mediators of signal transduction between a particular class of extracellular receptors, the G-protein coupled receptors (GPCR), and intracellular second messengers such as cAMP and Ca 2+ .
  • G-proteins are linked to the cytosolic side of a GPCR such that activation of the GPCR by ligand binding stimulates binding of the G-protein to GTP, inducing an "active" state in the G-protein. In the active state, the G-protein acts as a signal to trigger other events in the cell such as the increase of cAMP levels or the release of Ca 2+ into the cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intracellular proteins.
  • the three polypeptide subunits of heterotrimeric G-proteins are the , ⁇ , and ⁇ subunits.
  • the subunit binds and hydrolyzes GTP.
  • the ⁇ and ⁇ subunits form a tight complex that anchors the protein to the inner side of the plasma membrane.
  • the ⁇ subunits also known as G- ⁇ proteins or ⁇ transducins, contain seven tandem repeats of the WD-repeat sequence motif, a motif found in many proteins with regulatory functions. Mutations and variant expression of ⁇ transducin proteins are linked with various disorders (Neer, E.J. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1 * 998) Mol. Cell 1:565-574).
  • LMW GTP-proteins are GTPases which regulate cell growth, cell cycle control, protein secretion, and intracellular vesicle interaction. They consist of single polypeptides which, like the subunit of the heterotrimeric G-proteins, are able to bind and hydrolyze GTP, thus cycling between an inactive and an active state. At least sixty members of the LMW G-protein superfamily have been identified and are currently grouped into the six subfamilies of ras, rho, arf, sari, ran, and rab. Activated ras genes were initially found in human cancers, and subsequent studies confirmed that ras function is critical in determining whether cells continue to grow or become differentiated. Other members of the LMW G-protein superfamily have roles in signal transduction that vary with the function of the activated genes and the locations of the G-proteins.
  • Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by determining whether GTP or GDP is bound.
  • GTPase-activating protein GAP
  • GTP-ras GTPase-activating protein
  • GDP-ras guanine nucleotide releasing protein
  • RGS G-protein signaling
  • cytokine interleukin
  • Ca +2 is another second messenger molecule that is even more widely used as an intracellular mediator than cAMP.
  • Ca +2 can enter the cytosol in response to extracellular signals:
  • One pathway acts primarily in nerve signal transduction where Ca +2 enters a nerve terminal through a voltage-gated Ca +2 channel.
  • the second is a more ubiquitous pathway in which Ca +2 is released from the ER into the cytosol in response to binding of an extracellular signaling molecule to a receptor.
  • Ca 2+ directly activates regulatory enzymes, such as protein kinase C, which trigger signal transduction pathways.
  • Ca 2+ also binds to specific Ca 2+ -binding proteins (CBPs) such as calmodulin (CaM) which then activate multiple target proteins in the cell including enzymes, membrane transport pumps, and ion channels.
  • CBPs Ca 2+ -binding proteins
  • CaM calmodulin
  • CaM interactions are involved in a multitude of cellular processes including, but not limited to, gene regulation, DNA synthesis, cell cycle progression, mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion homeostasis, exocytosis, and metabolic regulation (Celio, M.R. et al. (1996) Guidebook to Calcium-binding Proteins, Oxford University Press, Oxford, UK, pp. 15-20).
  • Some CBPs can serve as a storage depot for Ca 2+ in an inactive state.
  • Calsequestrin is one such CBP that is expressed in isoforms specific to cardiac muscle and skeletal muscle.
  • Cell division is the fundamental process by which all living things grow and reproduce. In most organisms, the cell cycle consists of three principle steps; interphase, mitosis, and cytokinesis. Interphase, involves preparations for cell division, replication of the DNA and production of essential proteins. In mitosis, the nuclear material is divided and separates to opposite sides of the cell. Cytokinesis is the final division and fission of the cell cytoplasm to produce the daughter cells.
  • cyclins act by binding to and activating a group of cyclin-dependent protein kinases (Cdks) which then phosphorylate and activate selected proteins involved in the mitotic process.
  • Cdks cyclin-dependent protein kinases
  • PDZ domain-containing proteins 5 Ceretain proteins in intracellular signaling pathways serve to link or cluster other proteins involved in the signaling cascade.
  • a conserved protein domain called the PDZ domain has been identified in various membrane-associated signaling proteins. This domain has been implicated in receptor and ion channel clustering and in the targeting of multiprotein signaling complexes to specialized functional regions of the cytosolic face of the plasma membrane.
  • PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate kinase) protein family, members of which bind to the intracellular domains of receptors and channels.
  • MAGUK membrane-associated guanylate kinase
  • PDZ domains are also found in diverse membrane-localized proteins such as protein tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins 5 such as syntrophins and neuronal nitric oxide synthase (nNOS).
  • nNOS neuronal nitric oxide synthase
  • Membrane Transport Molecules o
  • the plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm and the extracellular environment, and between the cytoplasm and lumenal spaces of cellular organelles requires specific transport proteins.
  • Each transport protein carries a particular class of molecule, such as ions, sugars, or amino acids, and often is specific to a certain molecular species of the class.
  • a variety of human inherited diseases are caused by a mutation in a transport protein. For 5 example, cystinuria is an inherited disease that results from the inability to transport cystine, the disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cysti ⁇ e in the urine leads to the formation of cystine stones in the kidneys.
  • Transport proteins are multi-pass transmembrane proteins, which either actively transport molecules across the membrane or passively allow them to cross. Active transport involves o directional pumping of a solute across the membrane, usually against an electrochemical gradient.
  • Active transport is tightly coupled to a source of metabolic energy, such as ATP hydrolysis or an electrochemically favorable ion gradient.
  • Passive transport involves the movement of a solute down its electrochemical gradient.
  • Transport proteins can be further classified as either carrier proteins or channel proteins.
  • Carrier proteins which can function in active or passive transport, bind to a 5 specific solute to be transported and undergo a conformational change which transfers the bound solute across the membrane.
  • Channel proteins which only function in passive transport, form hydrophilic pores across the membrane. When the pores open, specific solutes, such as inorganic ions, pass through the membrane and down the electrochemical gradient of the solute.
  • Carrier proteins which transport a single solute from one side of the membrane to the other are called uniporters.
  • coupled transporters link the transfer of one solute with simultaneous or sequential transfer of a second solute, either in the same direction (symport) or in the opposite direction (antiport).
  • intestinal and kidney epithelium contains a variety of symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium moves into the cell down its electrochemical gradient and brings the solute into the cell with it. The sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous Na + /K + ATPase.
  • Sodium-coupled transporters include the mammalian glucose transporter (SGLTl), iodide transporter (NTS), and multivitamin transporter (SMVT). All three transporters have twelve putative transmembrane segments, extracellular glycosylation sites, and cytoplasmically-oriented N- and C-termini. NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for specific targeting of radioisotopes to the thyroid gland (Levy, O. et al. (1997) Proc. Natl. Acad. Sci. USA 94:5568-5573).
  • SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is implicated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et al. (1998) J. Biol. Chem. 273:7501-7506).
  • Transporters play a major role in the regulation of pH, excretion of drugs, and the cellular K + /Na + balance.
  • Monocarboxylate anion transporters are proton-coupled symporters with a broad substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and beta-hydroxybutyrate. At least seven isoforms have been identified to date. The isoforms are predicted to have twelve transmembrane (TM) helical domains with a large intracellular loop between TM6 and TM7, and play a critical role in maintaining intracellular pH by removing the protons that are produced stoichiometrically with lactate during glycolysis. The best characterized
  • H(+)-monocarboxylate transporter is that of the erythrocyte membrane, which transports L-lactate and a wide range of other aliphatic monocarboxylates.
  • Other cells possess H(+)-linked monocarboxylate transporters with differing substrate and inhibitor selectivities.
  • cardiac muscle and tumor cells have transporters that differ in their K m values for certain substrates, including stereoselectivity for L- over D-lactate, and in their sensitivity to inhibitors.
  • Na(+)-monocarboxylate cotransporters on the luminal surface of intestinal and kidney epithelia, which allow the uptake of lactate, pyruvate, and ketone bodies in these tissues.
  • Organic anion transporters are selective for hydrophobic, charged molecules with electron-attracting side groups.
  • Organic cation transporters such as the ammonium transporter, mediate the secretion of a variety of drugs and endogenous metabolites, and contribute to the maintenance of intercellular pH.
  • ABC transporters can transport substances that differ markedly in chemical structure and size, ranging from small molecules such as ions, sugars, amino acids, peptides, and phospholipids, to lipopeptides, large proteins, and complex hydrophobic drugs.
  • ABC proteins consist of four modules: two nucleotide-binding domains (NBD), which hydrolyze ATP to supply the energy required for transport, and two membrane-spanning domains (MSD), each containing six putative transmembrane segments. These four modules may be encoded by a single gene, as is the case for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes.
  • NBD nucleotide-binding domains
  • MSD membrane-spanning domains
  • each gene product contains a single NBD and MSD. These "half-molecules" form homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major histocompatibility (MHC) peptide transport system.
  • MHC major histocompatibility
  • CFTR cystic fibrosis
  • ALDP adrenoleukodystrophy protein
  • ALDP adrenoleukodystrophy protein
  • PMP70 peroxisomal membrane protein-70
  • SUR hyperinsulinemic hypoglycemia
  • MDR multidrug resistance
  • Fatty acid transport protein an integral membrane protein with four transmembrane segments, is expressed in tissues exhibiting high levels of plasma membrane fatty acid flux, such as muscle, heart, and adipose. Expression of FATP is upregulated in 3T3-L1 cells during adipose conversion, and expression in COS7 fibroblasts elevates uptake of long-chain fatty acids (Hui, T.Y. et al. (1998) J. Biol. Chem. 273:27420-27429).
  • Ion Channels The electrical potential of a cell is generated and maintained by controlling the movement of ions across the plasma membrane.
  • the movement of ions requires ion channels, which form an ion- selective pore within the membrane.
  • ion channels There are two basic types of ion channels, ion transporters and gated ion channels.
  • Ion transporters utilize the energy obtained from ATP hydrolysis to actively transport an ion against the ion's concentration gradient.
  • Gated ion channels allow passive flow of an ion down the ion' s electrochemical gradient under restricted conditions.
  • these types of ion channels generate, maintain, and utilize an electrochemical gradient that is used in 1) electrical impulse conduction down the axon of a nerve cell, 2) transport of molecules into cells against concentration gradients, 3) initiation of muscle contraction, and 4) endocrine cell secretion.
  • Ion transporters generate and maintain the resting electrical potential of a cell. Utilizing the energy derived from ATP hydrolysis, they transport ions against the ion's concentration gradient. These transmembrane ATPases are divided into three families.
  • the phosphorylated (P) class ion transporters including Na + -K + ATPase, Ca 2+ -ATPase, and H + -ATPase, are activated by a phosphorylation event.
  • P-class ion transporters are responsible for maintaining resting potential distributions such that cytosolic concentrations of Na + and Ca 2+ are low and cytosolic concentration of K + is high.
  • the vacuolar (V) class of ion transporters includes H + pumps on intracellular organelles, such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within the lumen of these organelles that is required for function.
  • the coupling factor (F) class consists of H + pumps in the mitochondria. F-class ion transporters utilize a proton gradient to generate ATP from ADP and inorganic phosphate (P ; ).
  • the resting potential of the cell is utilized in many processes involving carrier proteins and gated ion channels.
  • Carrier proteins utilize the resting potential to transport molecules into and out of the cell.
  • Amino acid and glucose transport into many cells is linked to sodium ion co-transport (symport) so that the movement of Na + down an electrochemical gradient drives transport of the other molecule up a concentration gradient.
  • cardiac muscle links transfer of Ca 2+ out of the cell with transport of Na + into the cell (antiport).
  • Ion channels share common structural and mechanistic themes.
  • the channel consists of four or five subunits or protein monomers that are arranged like a barrel in the plasma membrane. Each subunit typically consists of six potential transmembrane segments (SI, S2, S3, S4, S5, and S6).
  • the center of the barrel forms a pore lined by ⁇ -helices or ⁇ -strands.
  • the side chains of the amino acid residues comprising the ⁇ -helices or ⁇ -strands establish the charge (cation or anion) selectivity of the channel.
  • the degree of selectivity, or what specific ions are allowed to pass through the channel depends on the diameter of the narrowest part of the pore.
  • Gated ion channels control ion flow by regulating the opening and closing of pores. These channels are categorized according to the manner of regulating the gating function. Mechanically- gated channels open pores in response to mechanical stress, voltage-gated channels open pores in response to changes in membrane potential, and ligand-gated channels open pores in the presence of a specific ion, nucleotide, or neurotransmitter.
  • Voltage-gated Na + and K + channels are necessary for the function of electrically excitable cells, such as nerve and muscle cells. Action potentials, which lead to neurotransmitter release and muscle contraction, arise from large, transient changes in the permeability of the membrane to Na + and K + ions. Depolarization of the membrane beyond the threshold level opens voltage-gated Na + channels. Sodium ions flow into the cell, further depolarizing the membrane and opening more voltage-gated Na + channels, which propagates the depolarization down the length of the cell. Depolarization also opens voltage-gated potassium channels. Consequently, potassium ions flow outward, which leads to repolarization of the membrane. Voltage-gated channels utilize charged residues in the fourth transmembrane segment (S4) to sense voltage change.
  • S4 fourth transmembrane segment
  • the open state lasts only about 1 millisecond, at which time the channel spontaneously converts into an inactive state that cannot be opened irrespective of the membrane potential. Inactivation is mediated by the channel's N-terminus, which acts as a plug that closes the pore. The transition from an inactive to a closed state requires a return to resting potential.
  • Voltage-gated Na + channels are heterotrimeric complexes composed of a 260 kDa pore forming ⁇ subunit that associates with two smaller auxiliary subunits, ⁇ l and ⁇ 2.
  • the ⁇ 2 subunit is an integral membrane glycoprotein that contains an extracellular Ig domain, and its association with ⁇ and ⁇ l subunits correlates with increased functional expression of the channel, a change in its gating properties, and an increase in whole cell capacitance due to an increase in membrane surface area.
  • Voltage-gated Ca 2+ channels are involved in presynaptic neurotransmitter release, and heart and skeletal muscle contraction.
  • the voltage-gated Ca 2+ channels from skeletal muscle (L-type) and brain (N-type) have been purified, and though their functions differ dramatically, they have similar subunit compositions.
  • the channels are composed of three subunits.
  • the ⁇ j subunit forms the membrane pore and voltage sensor, while the ⁇ 2 ⁇ and ⁇ subunits modulate the voltage-dependence, gating properties, and the current amplitude of the channel.
  • These subunits are encoded by at least six ⁇ ls one ⁇ 2 ⁇ , and four ⁇ genes.
  • a fourth subunit, ⁇ has been identified in skeletal muscle. (Walker, D. et al. (1998) J. Biol. Chem. 273:2361-2367; and Jay, S.D. et al. (1990) Science 248:490- 492.)
  • Chloride channels are necessary in endocrine secretion and in regulation of cytosolic and organelle pH.
  • Cl " enters the cell across a basolateral membrane through an Na + , K7C1 " cotransporter, accumulating in the cell above its electrochemical equilibrium concentration.
  • the cystic fibrosis transmembrane conductance regulator (CFTR) is a chloride channel encoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans.
  • Loss of CFTR function decreases transepithelial water secretion and, as a result, the layers of mucus that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating "chronic obstructive pulmonary disease” (Al- Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266). Many intracellular organelles contain H + -ATPase pumps that generate transmembrane pH and electrochemical differences by moving protons from the cytosol to the organelle lumen.
  • Cl " is the sole counterion of H + translocation in a number of organelles, including chromaffin granules, Golgi vesicles, lysosomes, and endosomes.
  • Functions that require a low vacuolar pH include uptake of small molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as pro-hormones by proteolytic enzymes, and protein degradation in lysosomes (Al-Awqati, supra).
  • Ligand-gated channels open their pores when an extracellular or intracellular mediator binds to the channel.
  • .Neurotransmitter-gated channels are channels that open when a neurotransmitter binds to their extracellular domain. These channels exist in the postsynaptic membrane of nerve or muscle cells.
  • Chloride channels open in response to inhibitory neurotransmitters, such as ⁇ -aminobutyric acid (GABA) and glycine, leading to hyperpolarization of the membrane and the subsequent generation of an action potential.
  • GABA ⁇ -aminobutyric acid
  • Ligand-gated channels can be regulated by intracellular second messengers. Calcium- activated K + channels are gated by internal calcium ions. In nerve cells, an influx of calcium during depolarization opens K + channels to modulate the magnitude of the action potential (Ishi, T.M. et al. (1997) Proc. Natl. Acad. Sci. USA 94:11651-11656). Cyclic nucleotide-gated (CNG) channels are gated by cytosolic cyclic nucleotides. The best examples of these are the cAMP-gated Na + channels involved in olfaction and the cGMP-gated cation channels involved in vision. Both systems involve ligand-mediated activation of a G-protein coupled receptor which then alters the level of cyclic nucleotide within the cell.
  • Ion channels are expressed in a number of tissues where they are implicated in a variety of processes. CNG channels, while abundantly expressed in photoreceptor and olfactory sensory cells, are also found in kidney, lung, pineal, retinal ganglion cells, testis, aorta, and brain. Calcium- activated K + channels may be responsible for the vasodilatory effects of bradykinin in the kidney and for shunting excess K + from brain capillary endothelial cells into the blood. They are also implicated in repolarizing granulocytes after agonist-stimulated depolarization (Ishi, supra). Ion channels have been the target for many drug therapies.
  • Neurotransmitter-gated channels have been targeted in therapies for treatment of insomnia, anxiety, depression, and schizophrenia.
  • Voltage-gated channels have been targeted in therapies for arrhythmia, ischemic stroke, head trauma, and neurodegenerative disease (Taylor, CP. and L.S. Narasimhan (1997) Adv. Pharmacol. 39:47-98).
  • Disease Correlation The etiology of numerous human diseases and disorders can be attributed to defects in the transport of molecules across membranes. Defects in the trafficking of membrane-bound transporters and ion channels are associated with several disorders, e.g. cystic fibrosis, glucose-galactose malabsorption syndrome, hypercholesterolemia, von Gierke disease, and certain forms of diabetes mellitus.
  • Single-gene defect diseases resulting in an inability to transport small molecules across membranes include, e.g., cystinuria, iminoglycinuria, Hartup disease, and Fanconi disease (van't Hoff, W.G. (1996) Exp. Nephrol. 4:253-262; Talente, G.M. et al. (1994) Ann. Intern. Med. 120:218-226; and Chillon, M. et al. (1995) New Engl. J. Med. 332:1475-1480).
  • proteases The cellular processes regulating modification and maintenance of protein molecules coordinate their conformation, stabilization, and degradation. Each of these processes is mediated by key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and molecular chaperones.
  • proteases protease inhibitors
  • transferases transferases
  • isomerases and molecular chaperones.
  • Proteases cleave proteins and peptides at the peptide bond that forms the backbone of the peptide and protein chain.
  • Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Typical protein half- lives range from hours to a few days, so that within all living cells, precursor proteins are being cleaved to their active form, signal sequences proteolytically removed from targeted proteins, and aged or defective proteins degraded by proteolysis.
  • Proteases function in bacterial, parasitic, and viral invasion and replication within a host.
  • Four principal categories of mammalian proteases have been identified based on active site structure, mechanism of action, and overall three-dimensional structure. (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York NY, pp. 1-5).
  • SPs serine proteases
  • the serine proteases have a serine residue, usually within a conserved sequence, in an active site composed of the serine, an aspartate, and a histidine residue.
  • SPs include the digestive enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting cascade, and enzymes that control extracellular protein degradation.
  • the main SP sub-families are trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and serases which cleave after serine.
  • Enterokinase the initiator of intestinal digestion, is a serine protease found in the intestinal brush border, where it cleaves the acidic propeptide from trypsinogen to yield active trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA 91:7588-7592).
  • Prolylcarboxypeptidase a lysosomal serine peptidase that cleaves peptides such as angiotensin JJ and III and [des-Arg9] bradykinin, shares sequence homology with members of both the serine carboxypeptidase and prolylendopeptidase families (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 16638).
  • Cysteine proteases have a cysteine as the major catalytic residue at an active site where catalysis proceeds via an intermediate thiol ester and is facilitated by adjacent histidine and aspartic acid residues.
  • CPs are involved in diverse cellular processes ranging from the processing of precursor proteins to intracellular degradation. Mammalian CPs include lysosomal cathepsins and cytosolic calcium activated proteases, calpains.
  • CPs are produced by monocytes, macrophages and other cells of the immune system which migrate to sites of inflammation and secrete molecules involved in tissue repair. Overabundance of these repair molecules plays a role in certain disorders. In autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C degrades collagen, laminin, elastin and other structural proteins found in the extracellular matrix of bones.
  • Aspartic proteases are members of the cathepsin family of lysosomal proteases and include pepsin A, gastricsin, chymosin, renin, and cathepsins D and E. Aspartic proteases have a pair of aspartic acid residues in the active site, and are most active in the pH 2 - 3 range, in which one of the aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial penicillopepsin, mammalian pepsin, renin, chymosin, and certain fungal proteases.
  • cathepsins L and D Abnormal regulation and expression of cathepsins is evident in various inflammatory disease states, i cells isolated from inflamed synovia, the mRNA for stromelysin, cytokines, TJJvlP-1, cathepsin, gelatinase, and other molecules is preferentially expressed.
  • Expression of cathepsins L and D is elevated in synovial tissues from patients with rheumatoid arthritis and osteoarthritis.
  • Cathepsin L expression may also contribute to the influx of mononuclear cells which exacerbates the destruction of the rheumatoid synovium. (Keyszer, G.M. (1995) Arthritis Rheum.
  • Metalloproteases have active sites that include two glutamic acid residues and one histidine residue that serve as binding sites for zinc.
  • Carboxypeptidases A and B are the principal mammalian metalloproteases. Both are exoproteases of similar structure and active sites.
  • Carboxypeptidase A like chymotrypsin, prefers C-terminal aromatic and aliphatic side chains of hydrophobic nature, whereas carboxypeptidase B is directed toward basic arginine and lysine residues.
  • Glycoprotease (GCP), or O-sialoglycoprotein endopeptidase is a metallopeptidase which specifically cleaves O-sialoglycoproteins such as glycophorin A.
  • Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria.
  • UCS ubiquitin conjugation system
  • proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein.
  • the ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease.
  • the UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal 0 transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79: 13-21).
  • a murine proto-oncogene, Unp encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183). 5 Signal Peptidases
  • the mechanism for the translocation process into the endoplasmic reticulum involves the recognition of an N-terminal signal peptide on the elongating protein.
  • the signal peptide directs the protein and attached ribosome to a receptor on the ER membrane.
  • the polypeptide chain passes through a pore in the ER membrane into the lumen while the N-terminal signal peptide remains o attached at the membrane surface. The process is completed when signal peptidase located inside the
  • Protease inhibitors and other regulators of protease activity control the activity and effects of proteases.
  • Protease inhibitors have been shown to control pathogenesis in animal models of 5 proteolytic disorders (Murphy, G. (1991) Agents Actions Suppl. 35:69-76).
  • Serpins are inhibitors of mammalian plasma serine proteases. Many serpins serve to regulate the blood clotting cascade and/or the complement cascade in mammals.
  • Sp32 is a positive regulator of the o mammalian acrosomal protease, acrosin, that binds the proenzyme, proacrosin , and thereby aides in packaging the enzyme into the acrosomal matrix (Baba, T. et al. (1994) J. Biol. Chem. 269:10133- 10140).
  • the Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz domains" containing a series of cysteine residues that are regularly spaced over approximately 50 amino acid residues and form three intrachain disulfide bonds.
  • TFPI-1 and TFPI-2 tissue factor pathway inhibitor
  • bikunin inter- ⁇ -trypsin inhibitor
  • TFPI-1 and TFPI-2 tissue factor pathway inhibitor
  • bikunin inter- ⁇ -trypsin inhibitor
  • TFPI-1 and TFPI-2 tissue factor pathway inhibitor
  • bikunin inter- ⁇ -trypsin inhibitor
  • kallikrein and plasmin serine proteases
  • Aprotinin has clinical utility in reduction of perioperative blood loss.
  • Protein folding in the ER is aided by two principal types of protein isomerases, protein 0 disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI).
  • PDI protein 0 disulfide isomerase
  • PPI peptidyl-prolyl isomerase
  • PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins.
  • PPI an enzyme that catalyzes the isomerization of certain proline imidic bonds in oligopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation.
  • the cyclophilins represent a major class of PPI that was originally 5 identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher,
  • An additional glycosylation mechanism operates in the ER specifically to target lysosomal enzymes to lysosomes and prevent their secretion.
  • Lysosomal enzymes in the ER receive an N- linked oligosaccharide, like plasma membrane and secreted proteins, but are then phosphorylated on 5 one or two mannose residues.
  • the phosphorylation of mannose residues occurs in two steps, the first step being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine phosphotransferase, and the second the removal of the N-acetylglucosamine group by phosphodiesterase.
  • Chaperones Molecular chaperones are proteins that aid in the proper folding of immature proteins and refolding of improperly folded ones, the assembly of protein subunits, and in the transport of unfolded proteins across membranes. Chaperones are also called heat-shock proteins (hsp) because of their tendency to be expressed in dramatically increased amounts following brief exposure of cells to elevated temperatures. This latter property most likely reflects their need in the refolding of proteins that have become denatured by the high temperatures.
  • hsp heat-shock proteins
  • Chaperones may be divided into several classes according to their location, function, and molecular weight, and include hsp60, TCP1, hsp70, hsp40 (also called DnaJ), and hsp90.
  • hsp90 binds to steroid hormone receptors, represses transcription in the absence of the ligand, and provides proper folding of the ligand-binding domain of the receptor in the presence of the hormone (Burston, S.G. and A.R. Clarke (1995) Essays Biochem. 29:125-136).
  • Hsp60 and hsp70 chaperones aid in the transport and folding of newly synthesized proteins.
  • Hsp70 acts early in protein folding, binding a newly synthesized protein before it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the folded protein.
  • Hsp60 along with hsp 10, binds misfolded proteins and gives them the opportunity to refold correctly.
  • All chaperones share an affinity for hydrophobic patches on incompletely folded proteins and the abihty to hydrolyze ATP. The energy of ATP hydrolysis is used to release the hsp- bound protein in its properly folded state (Alberts, supra, pp 214, 571-572).
  • DNA and RNA replication are critical processes for cell replication and function.
  • RNA replication are mediated by the enzymes DNA and RNA polymerase, respectively, by a "templating" process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA.
  • DNA polymerase catalyzes the step wise addition of a deoxyribonucleotide to the 3' -OH end of a polynucleotide strand (the primer strand) that is paired to a second (template) strand.
  • the new DNA strand therefore grows in the 5' to 3' direction (Alberts, B. et al.
  • the substrates for the polymerization reaction are the corresponding deoxynucleotide triphosphates which must base-pair with the correct nucleotide on the template strand in order to be recognized by the polymerase.
  • each of the two strands may serve as a template for the formation of a new complementary strand.
  • Each of the two daughter cells of the dividing cell therefore inherits a new DNA double helix containing one old and one new strand.
  • DNA polymerase is said to be replicated "semiconservatively" by DNA polymerase.
  • DNA polymerase is also involved in the repair of damaged DNA as discussed below under “Ligases.”
  • RNA polymerase uses a DNA template strand to "transcribe" DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside monophosphate to the 3'-OH end of a growing RNA chain. DNA transcription generates messenger RNAs (mRNA) that carry information for protein synthesis, as well as the transfer, ribosomal, and other RNAs that have structural or catalytic functions. In eukaryotes, three discrete RNA polymerases synthesize the three different types of RNA (Alberts, supra, pp. 367-368).
  • mRNA messenger RNAs
  • RNA polymerase I makes the large ribosomal RNAs
  • RNA polymerase II makes the mRNAs that will be translated into proteins
  • RNA polymerase III makes a variety of small, stable RNAs, including 5S ribosomal RNA and the transfer RNAs (tRNA).
  • RNA synthesis is initiated by binding of the RNA polymerase to a promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is completed at a broad, general stop or termination region in the DNA where both the polymerase and the completed RNA chain are released.
  • DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA are corrected before replication or transcription of the DNA can occur. Because of the efficiency of the DNA repair process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra, pp. 245-249).
  • the three steps common to most types of DNA repair are (1) excision of the damaged or altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in this gap by DNA polymerase using the complementary strand as the template; and (3) sealing the break left between the inserted nucleotide(s) and the existing DNA strand by DNA ligase.
  • DNA ligase uses the energy from ATP hydrolysis to activate the 5' end of the broken phosphodiester bond before forming the new bond with the 3'-OH of the DNA strand.
  • Bloom's syndrome an inherited human disease, individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, supra, p. 247).
  • Nucleases comprise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They serve different purposes in nucleic acid metabolism. Nucleases hydrolyze the phosphodiester bonds between adjacent nucleotides either at internal positions (endonucleases) or at the terminal 3' or 5' nucleotide positions (exonucleases).
  • a DNA exonuclease activity in DNA polymerase serves to remove improperly paired nucleotides attached to the 3' -OH end of the growing DNA strand by the polymerase and thereby serves a "proofreading" function. As mentioned above, DNA endonuclease activity is involved in the excision step of the DNA repair process.
  • RNases also serve a variety of functions.
  • RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process.
  • RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle.
  • Pancreatic RNase secreted by the pancreas into the intestine hydrolyzes RNA present in ingested foods.
  • RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections.
  • Methylation of specific nucleotides occurs in both DNA and RNA, and serves different functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in DNA occurs specifically at CG sequences which are base-paired with one another in the DNA double-helix. This pattern of methylation is passed from generation to generation during DNA replication by an enzyme called "maintenance methylase" that acts preferentially on those CG sequences that are base-paired with a CG sequence that is already methylated. Such methylation appears to distinguish active from inactive genes by preventing the binding of regulatory proteins that "turn on” the gene, but permit the binding of proteins that inactivate the gene (Alberts, supra, pp. 448- 451).
  • tRNA methylase produces one of several nucleotide modifications in tRNA that affect the conformation and base-pairing of the molecule and facilitate the recognition of the appropriate mRNA codons by specific tRNAs.
  • the primary methylation pattern is the dimethylation of guanine residues to form N,N-dimethyl guanine.
  • Helicases are enzymes that destabilize and unwind double helix structures in both DNA and RNA. Since DNA replication occurs more or less simultaneously on both strands, the two strands must first separate to generate a replication "fork" for DNA polymerase to act on. Two types of replication proteins contribute to this process, DNA helicases and single-stranded binding proteins. DNA helicases hydrolyze ATP and use the energy of hydrolysis to separate the DNA strands. Single- stranded binding proteins (SSBs) then bind to the exposed DNA strands without covering the bases, thereby temporarily stabilizing them for templating by the DNA polymerase (Alberts, supra, pp. 255- 256).
  • SSBs Single- stranded binding proteins
  • RNA helicases also alter and regulate RNA conformation and secondary structure. Like the DNA helicases, RNA helicases utilize energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes.
  • the most well-characterized and ubiquitous family of RNA helicases is the DEAD- box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family.
  • DEAD-box helicases Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability.
  • DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168). These observations suggest that DDX1 may promote or enhance tumor progression by altering the normal secondary structure and expression levels of RNA in cancer cells. 0 Other DEAD-box helicases have been implicated either directly or indirectly in tumorigenesis (Discussed in Godbout, supra).
  • murine p68 is mutated in ultraviolet light-induced tumors
  • human DDX6 is located at a chromosomal breakpoint associated with B-cell lymphoma.
  • a chimeric protein comprised of DDX10 and NUP98, a nucleoporin protein, may be involved in the pathogenesis of certain myeloid malignancies. 5 Topoisomerases
  • DNA topoisomerase effectively acts as a reversible nuclease that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two strands to o rotate freely about one another to remove the strain of the helix, and then rejoins the original phosphodiester bond between the two strands.
  • DNA Topoisomerase I causes a single-strand break in a DNA helix to allow the rotation of the two strands of the helix about the remaining phosphodiester bond in the opposite strand.
  • DNA topoisomerase II causes a transient break in both strands of a DNA helix where two double helices 5 cross over one another. This type of topoisomerase can efficiently separate two interlocked DNA circles (Alberts, supra, pp.260-262).
  • Type II topoisomerases are largely confined to proliferating cells in eukaryotes, such as cancer cells. For this reason they are targets for anticancer drugs.
  • Topoisomerase II has been implicated in multi-drug resistance (MDR) as it appears to aid in the repair of DNA damage inflicted by DNA binding agents such as doxorubicin and vincristine. 0 Recombinases
  • Genetic recombination is the process of rearranging DNA sequences within an organism's genome to provide genetic variation for the organism in response to changes in the environment.
  • DNA recombination allows variation in the particular combination of genes present in an individual's genome, as well as the timing and level of expression of these genes (see Alberts, supra, pp. 263- 5 273).
  • Two broad classes of genetic recombination are commonly recognized, general recombination and site-specific recombination.
  • General recombination involves genetic exchange between any homologous pair of DNA sequences usually located on two copies of the same chromosome.
  • recombinases that "nick" one strand of a DNA duplex more or less randomly and permit exchange with the complementary strand of another duplex.
  • the process does not normally change the arrangement of genes on a chromosome.
  • the recombinase recognizes specific nucleotide sequences present in one or both of the recombining molecules. Base-pairing is not involved in this form of recombination and therefore does not require DNA homology between the recombining molecules. Unlike general recombination, this form of recombination can alter the relative positions of nucleotide sequences in chromosomes.
  • RNA sequences are necessary for processing of transcribed RNAs in the nucleus.
  • Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns.
  • the primary RNA transcript from DNA is a faithful copy of the gene containing both exon and intron sequences, and the latter sequences must be cut out of the RNA transcript to produce an mRNA that codes for a protein.
  • This "splicing" of the mRNA sequence takes place in the nucleus with the aid of a large, multicomponent ribonucleoprotein complex known as a spliceosome.
  • the spliceosomal complex is composed of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6, and a number of additional proteins.
  • snRNP small nuclear ribonucleoprotein particles
  • Ul small nuclear ribonucleoprotein particles
  • U2, U4, U5, and U6 small nuclear ribonucleoprotein particles
  • U6 small nuclear ribonucleoprotein particles
  • RNA components of some snRNPs recognize and base pair with intron consensus sequences.
  • the protein components mediate spliceosome assembly and the splicing reaction.
  • Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, p. 863).
  • Adhesion Molecules The surface of a cell is rich in transmembrane proteoglycans, glycoproteins, glycolipids, and receptors. These macromolecules mediate adhesion with other cells and with components of the extracellular matrix (ECM). The interaction of the cell with its surroundings profoundly influences cell shape, strength, flexibility, motility, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proliferation and differentiation, tissue construction, and embryonic development. Cadherins
  • Cadherins comprise a family of calcium-dependent glycoproteins that function in mediating cell-cell adhesion in virtually all solid tissues of multicellular organisms. These proteins share multiple repeats of a cadherin-specific motif, and the repeats form the folding units of the cadherin extracellular domain. Cadherin molecules cooperate to form focal contacts, or adhesion plaques, between adjacent epithelial cells.
  • the cadherin family includes the classical cadherins and protocadherins.
  • Classical cadherins include the E-cadherin, N-cadherin, and P-cadherin subfamilies. E-cadherin is present on many types of epithelial cells and is especially important for embryonic development.
  • N-cadherin is present on nerve, muscle, and lens cells and is also critical for embryonic development.
  • P-cadherin is present on cells of the placenta and epidermis. Recent studies report that protocadherins are involved in a variety of cell-cell interactions (Suzuki, S.T. (1996) J. Cell Sci. 109:2609-2611).
  • the intracellular anchorage of cadherins is regulated by their dynamic association with catenins, a family of cytoplasmic signal transduction proteins associated with the actin cytoskeleton.
  • cadherins The anchorage of cadherins to the actin cytoskeleton appears to be regulated by protein tyrosine phosphorylation, and the cadherins are the target of phosphorylation-induced junctional disassembly (Aberle, H. et al. (1996) J. Cell. Biochem. 61:514-523). Integrins
  • Integrins are ubiquitous transmembrane adhesion molecules that link the ECM to the internal cytoskeleton. Integrins are composed of two noncovalently associated transmembrane glycoprotein subunits called a and ⁇ . Integrins function as receptors that play a role in signal transduction. For example, binding of integrin to its extracellular ligand may stimulate changes in intracellular calcium levels or protein kinase activity (Sjaastad, M.D. and W.J. Nelson (1997) BioEssays 19:47-55). At least ten cell surface receptors of the integrin family recognize the ECM component fibronectin, which is involved in many different biological processes including cell migration and embryogenesis (Johansson, S. et al. (1997) Front. Biosci. 2:D126-D146). Lectins
  • Lectins comprise a ubiquitous family of extracellular glycoproteins which bind cell surface carbohydrates specifically and reversibly, resulting in the agglutination of cells (reviewed in Drickamer, K. and M.E. Taylor (1993) Annu. Rev. Cell Biol. 9:237-264). This function is particularly important for activation of the immune response. Lectins mediate the agglutination and mitogenic stimulation of lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. Cell. Biochem. 45:139-146; Paietta, E. et al. (1989) J. Immunol. 143:2850-2857).
  • Lectins are further classified into subfamilies based on carbohydrate-binding specificity and other criteria.
  • the galectin subfamily includes lectins that bind ⁇ -galactoside carbohydrate moieties in a thiol-dependent manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. Chem. 270:3447-3453).
  • Galectins are widely expressed and developmentally regulated. Because all galectins lack an N-terminal signal peptide, it is suggested that galectins are externalized through an atypical secretory mechanism. Two classes of galectins have been defined based on molecular weight and oligomerization properties.
  • Galectins form homodimers and are about 14 to 16 kilodaltons in mass, while large galectins are monomeric and about 29-37 kilodaltons.
  • Galectins contain a characteristic carbohydrate recognition domain (CRD).
  • the CRD is about 140 amino acids and contains several stretches of about 1 - 10 amino acids which are highly conserved among all galectins.
  • a particular 6-amino acid motif within the CRD contains conserved tryptophan and arginine residues which are critical for carbohydrate binding.
  • the CRD of some galectins also contains cysteine residues which may be important for disulfide bond formation. Secondary structure predictions indicate that the CRD forms several ⁇ -sheets.
  • Galectins play a number of roles in diseases and conditions associated with cell-cell and cell- matrix interactions. For example, certain galectins associate with sites of inflammation and bind to cell surface immunoglobulin E molecules. In addition, galectins may play an important role in cancer metastasis. Galectin overexpression is correlated with the metastatic potential of cancers in humans and mice. Moreover, anti-galectin antibodies inhibit processes associated with cell transformation, such as cell aggregation and anchorage-independent growth (See, for example, Su, Z.-Z. et al. (1996) Proc. Natl. Acad. Sci. USA 93:7252-7257). Selectins
  • Selectins comprise a specialized lectin subfamily involved primarily in inflammation and leukocyte adhesion (Reviewed in Lasky, supra). Selectins mediate the recruitment of leukocytes from the circulation to sites of acute inflammation and are expressed on the surface of vascular endothelial cells in response to cytokine signaling. Selectins bind to specific ligands on the leukocyte cell membrane and enable the leukocyte to adhere to and migrate along the endothelial surface. Binding of selection to its ligand leads to polarized rearrangement of the actin cytoskeleton and stimulates signal transduction within the leukocyte (Brenner, B. et al. (1997) Biochem. Biophys. Res. Commun.
  • selectins include lymphocyte adhesion molecule-1 (Lam-1 or L-selectin), endothelial leukocyte adhesion molecule-1 (ELAM-1 or E-selectin), and granule membrane protein- 140 (GMP-140 or P-selectin) (Johnston, G.I. et al. (1989) Cell 56:1033-1044).
  • Ig immunoglobulin
  • MHC major histocompatibility
  • MHC proteins are cell surface markers that bind to and present foreign antigens to T cells. MHC molecules are classified as either class I or class II. Class I MHC molecules (MHC I) are expressed on the surface of almost all cells and are involved in the presentation of antigen to cytotoxic T cells. For example, a cell infected with virus will degrade intracellular viral proteins and express the protein fragments bound to MHC I molecules on the cell surface. The MHC I/antigen complex is recognized by cytotoxic T-cells which destroy the infected cell and the virus within. Class II MHC molecules are expressed primarily on specialized antigen-presenting cells of the immune system, such as B-cells and macrophages.
  • MHC molecules also play an important role in organ rejection following transplantation. Rejection occurs when the recipient's T-cells respond to foreign MHC molecules on the transplanted organ in the same way as to self MHC molecules bound to foreign antigen.
  • Antibodies are either expressed on the surface of B-cells or secreted by B-cells into the circulation. Antibodies bind and neutralize foreign antigens in the blood and other extracellular fluids.
  • the prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules.
  • Antibodies are classified based on their H-chain composition.
  • the five antibody classes, IgA, IgD, IgE, IgG and IgM are defined by the , ⁇ , e, ⁇ , and ⁇ H-chain types.
  • L- chains There are two types of L- chains, K and ⁇ , either of which may associate as a pair with any H-chain pair.
  • IgG the most common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
  • H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region.
  • the constant region consists of about 110 amino acids in L-chains and about 330 or 440 amino acids in H-chains.
  • the amino acid sequence of the constant region is nearly identical among H- or L-chains of a particular class.
  • the variable region consists of about 110 amino acids in both H- and L-chains. However, the amino acid sequence of the variable region differs among H- or L-chains of a particular class.
  • Within each H- or L-chain variable region are three hypervariable regions of extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, the H- and L-chain hypervariable regions come together to form the antigen recognition site. (Reviewed in Alberts, supra, pp. 1206-1213 and 1216-1217.)
  • Both H-chains and L-chains contain repeated Ig domains.
  • a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site.
  • a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region.
  • the immune system is capable of recognizing and responding to any foreign molecule that enters the body. Therefore, the immune system must be armed with a full repertoire of antibodies against all potential antigens. Such antibody diversity is generated by somatic rearrangement of gene segments encoding variable and constant regions.
  • T-cell receptors are both structurally and functionally related to antibodies. (Reviewed in Alberts, supra, pp. 1228-1229.) T-cell receptors are cell surface proteins that bind foreign antigens and mediate diverse aspects of the immune response.
  • a typical T-cell receptor is a heterodimer comprised of two disulfide-linked polypeptide chains called ⁇ and ⁇ . Each chain is about 280 amino acids in length and contains one variable region and one constant region. Each variable or constant region folds into an Ig domain. The variable regions from the ⁇ and ⁇ chains come together in the heterodimer to form the antigen recognition site.
  • T-cell receptor diversity is generated by somatic rearrangement of gene segments encoding the ⁇ and ⁇ chains.
  • T-cell receptors recognize small peptide antigens that are expressed on the surface of antigen-presenting cells and pathogen-infected cells. These peptide antigens are presented on the cell surface in association with major histocompatibility proteins which provide the proper context for antigen recognition.
  • Protein secretion is essential for cellular function. Protein secretion is mediated by a signal peptide located at the amino terminus of the protein to be secreted.
  • the signal peptide is comprised of about ten to twenty hydrophobic amino acids which target the nascent protein from the ribosome to the endoplasmic reticulum (ER). Proteins targeted to the ER may either proceed through the secretory pathway or remain in any of the secretory organelles such as the ER, Golgi apparatus, or lysosomes. Proteins that transit through the secretory pathway are either secreted into the extracellular space or retained in the plasma membrane.
  • Secreted proteins are often synthesized as inactive precursors that are activated by post-translational processing events during transit through the secretory pathway. Such events include glycosylation, proteolysis, and removal of the signal peptide by a signal peptidase. Other events that may occur during protein transport include chaperone-dependent unfolding and folding of the nascent protein and interaction of the protein with a receptor or pore complex. Examples of secreted proteins with amino terminal signal peptides include receptors, extracellular matrix molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of The Cell. Garland Publishing, New York NY, pp. 557- 560, 582-592.)
  • the extracellular matrix is a complex network of glycoproteins, polysaccharides, proteoglycans, and other macromolecules that are secreted from the cell into the extracellular space.
  • the ECM remains in close association with the cell surface and provides a supportive meshwork that profoundly influences cell shape, motility, strength, flexibility, and adhesion. In fact, adhesion of a cell to its surrounding matrix is required for cell survival except in the case of metastatic tumor cells, which have overcome the need for cell-ECM anchorage. This phenomenon suggests that the ECM plays a critical role in the molecular mechanisms of growth control and metastasis. (Reviewed in Ruoslahti, E. (1996) Sci. Am. 275:72-77.) Furthermore, the ECM determines the structure and physical properties of connective tissue and is particularly important for morphogenesis and other processes associated with embryonic development and pattern formation.
  • the collagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, ligaments, tendons, cartilage, blood vessels, and basement membranes. Multiple collagen proteins have been identified. Three collagen molecules fold together in a triple helix stabilized by interchain disulfide bonds. Bundles of these triple helices then associate to form fibrils. Collagen primary structure consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. Glycines are crucial to helix formation as the bulkier amino acid sidechains cannot fold into the triple helical conformation. Because of these strict sequence requirements, mutations in collagen genes have severe consequences.
  • Osteogenesis imperfecta patients have brittle bones that fracture easily; in severe cases patients die in utero or at birth.
  • Ehlers-Danlos syndrome patients have hyperelastic skin, hypermobile joints, and susceptibility to aortic and intestinal rupture.
  • Chondrodysplasia patients have short stature and ocular disorders.
  • Alport syndrome patients have hematuria, sensorineural deafness, and eye lens deformation. (Isselbacher, K.J. et al. (1994) Harrison's Principles of Internal Medicine. McGraw-Hill, Inc., New York NY, pp. 2105-2117; and Creighton, T.E. (1984) Proteins, Structures and Molecular Principles, W.H. Freeman and Company, New York NY, pp. 191-197.)
  • Elastin and related proteins confer elasticity to tissues such as skin, blood vessels, and lungs.
  • Elastin is a highly hydrophobic protein of about 750 amino acids that is rich in proline and glycine residues.
  • Elastin molecules are highly cross-linked, forming an extensive extracellular network of fibers and sheets.
  • Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number of glycoproteins, including fibrillin. Mutations in the gene encoding fibrillin are responsible for Marfan' s syndrome, a genetic disorder characterized by defects in connective tissue. In severe cases, the aortas of afflicted individuals are prone to rupture. (Reviewed hi Alberts, supra, pp. 984- 986.)
  • Fibronectin is a large ECM glycoprotein found in all vertebrates. Fibronectin exists as a 5 dimer of two subunits, each containing about 2,500 amino acids. Each subunit folds into a rod-like structure containing multiple domains. The domains each contain multiple repeated modules, the most common of which is the type HI fibronectin repeat.
  • the type HI fibronectin repeat is about 90 amino acids in length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic proteins.
  • some type HI fibronectin repeats contain a characteristic 0 tripeptide consisting of Arginine-Glycine-Aspartic acid (RGD).
  • the RGD sequence is recognized by the integrin family of cell surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene encoding fibronectin causes early embryonic lethality in mice. The mutant embryos display extensive morphological defects, including defects in the formation of the notochord, somites, heart, blood vessels, neural tube, and extraembryonic structures. (Reviewed in 5 Alberts, supra, pp. 986-987.)
  • Laminin is a major glycoprotein component of the basal lamina which underlies and supports epithelial cell sheets.
  • Laminin is one of the first ECM proteins synthesized in the developing embryo.
  • Laminin is an 850 kilodalton protein composed of three polypeptide chains joined in the shape of a cross by disulfide bonds.
  • Laminin is especially important for angiogenesis and in particular, for o guiding the formation of capillaries. (Reviewed in Alberts, supra, pp. 990-991.)
  • proteoglycans are composed of unbranched polysaccharide chains (glycosaminoglycans) attached to protein cores. Common proteoglycans include aggrecan, betaglycan, decorin, perlecan, serglycin, and syndecan-1. Some of these molecules not only provide 5 mechanical support, but also bind to extracellular signaling molecules, such as fibroblast growth factor and transforming growth factor ⁇ , suggesting a role for proteoglycans in cell-cell communication and cell growth. (Reviewed in Alberts, supra, pp.
  • glycoproteins tenascin-C and tenascin-R are expressed in developing and lesioned neural tissue and provide stimulatory and anti-adhesive (inhibitory) properties, respectively, for axonal growth. o (Faissner, A. (1997) Cell Tissue Res. 290:331-341.)
  • the cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, and movement.
  • the cytoskeleton supports the cell membrane and forms tracks along which 5 organelles and other elements move in the cytosol.
  • the cytoskeleton is a dynamic structure that allows cells to adopt various shapes and to carry out directed movements.
  • Major cytoskeletal fibers include the microtubules, the microfilaments, and the intermediate filaments.
  • Motor proteins including myosin, dynein, and kinesin, drive movement of or along the fibers.
  • the motor protein dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the cell membrane.
  • Tubulins include myosin, dynein, and kinesin.
  • Microtubules cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell membrane that are necessary for sweeping materials across an epithelium and for swimming of sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along tracks of microtubules. For example, microtubules run through nerve cell axons, allowing bidirectional transport of materials and membrane vesicles between the cell body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during cell division. Both stable and short-lived populations of microtubules exist in the cell.
  • Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a heterodimer of ⁇ - and ⁇ - tubulin, multiple isoforms of which exist.
  • the hydrolysis of GTP is linked to the addition of tubulin subunits at the end of a microtubule.
  • the subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule.
  • a microtubule is polarized, one end ringed with ⁇ -tubulin and the other with ⁇ -tubulin, and the two ends differ in their rates of assembly.
  • each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found.
  • Cilia and flagella contain doublet microtubules.
  • Microtubules grow from specialized structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules.
  • the basal body,. the organizing center located at the base of a cilium or flagellum, contains one centriole.
  • Gamma tubulin present in the MTOC is important for nucleating the polymerization of ⁇ - and ⁇ - tubulin heterodimers but does not polymerize into microtubules.
  • Microtubule-associated proteins have roles in the assembly and stabilization of microtubules.
  • One major family of MAPs, assembly MAPs can be identified in neurons as well as non-neuronal cells. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type H.
  • Type I MAPs which include MAP1A and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes.
  • Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neutralizes negatively charged tubulin, leading to stabilization of microtubules.
  • MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolytically processed to generate one heavy chain and one light chain.
  • LC3 Another light chain, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during cell proliferation (Mann, S.S. et al. (1994) J. Biol. Chem. 269:11492-11497).
  • Type H MAPs which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain.
  • MAP2a, MAP2b, and MAP2c are found only in dendrites
  • MAP4 is found in non-neuronal cells
  • Tau is found in axons and dendrites of nerve cells.
  • Alternative splicing of the Tau mRNA leads to the existence of multiple forms of Tau protein.
  • Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and familial frontotemporal dementia and Parkinsonism linked to chromosome 17.
  • the altered Tau phosphorylation leads to a collapse of the microtubule network and the formation of intraneuronal Tau aggregates (Spillantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428- 433).
  • the protein pericentrin is found in the MTOC and has a role in microtubule assembly. Actins
  • Microfilaments are vital to cell locomotion, cell shape, cell adhesion, cell division, and muscle contraction. Assembly and disassembly of the microfilaments allow cells to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intracellular protein in the eukaryotic cell. Human cells contain six isoforms of actin. The three ⁇ -actins are found in different kinds of muscle, nonmuscle ⁇ -actin and nonmuscle ⁇ -actin are found in nonmuscle cells, and another ⁇ -actin is found in intestinal smooth muscle cells.
  • G-actin the monomeric form of actin, polymerizes into polarized, helical F-actin filaments, accompanied by the hydrolysis of ATP to ADP.
  • Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine cell shape. These bundles and networks are connected to the cell membrane.
  • thin filaments containing actin slide past thick filaments containing the motor protein yosin during contraction.
  • a family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein.
  • Actin-associated proteins have roles in cross-linking, severing, and stabilization of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-linking proteins. These proteins have two actin-binding sites, one for each filament. Short cross-linking proteins 5 promote bundle formation while longer, more flexible cross-linking proteins promote network formation. Calmodulin-like calcium-binding domains in actin cross-linking proteins allow calcium regulation of cross-linking. Group I cross-linking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin.
  • Group II cross-linking proteins have a 7,000- MW actin-binding domain and include villin and dematin.
  • Group IH cross-linking proteins have 0 pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and filamin.
  • Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends.
  • Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin.
  • Capping proteins can cap the ends of actin filaments, but cannot break filaments.
  • Capping proteins 5 include CapZ and tropomodulm. The proteins thymosin and profilin sequester actin monomers in the cytosol, allowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium.
  • Intermediate filaments are cytoskeletal fibers with a diameter of about 10 nm, 0 intermediate between that of microfilaments and microtubules. IFs serve structural roles in the cell, reinforcing cells and organizing cells into tissues. IFs are particularly abundant in epidermal cells and in neurons. IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in cell motility.
  • Type I and Type H proteins are the acidic 5 and basic keratins, respectively.
  • Heterodimers of the acidic and basic keratins are the building blocks of keratin JJFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithelia such as nails and hair, and in epithelia that line internal body cavities.
  • Type HI IF proteins include desmin, glial fibrillary acidic protein, vimentin, and peripherin.
  • Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting 5 muscle.
  • Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and astrocytes.
  • Vimentin filaments are found in blood vessel endothelial cells, some epithelial cells, and mesenchymal cells such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organelles in place in the cell.
  • Type TV IFs include the neurofilaments and nestin.
  • Neurofilaments composed of three polypeptides NF-L, NF-M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metabolism of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (Julien, J.P. and W.E. Mushynski (1998) Prog. Nucleic Acid Res. Mol. Biol. 61: 1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane.
  • JJFs have a central ⁇ -helical rod region interrupted by short nonhelical linker segments.
  • the rod region is bracketed, in most cases, by non-helical head and tail domains.
  • the rod regions of intermediate filament proteins associate to form a coiled-coil dimer.
  • a highly ordered assembly process leads from the dimers to the JJFs.
  • ATP nor GTP is needed for JF assembly, unlike that of microfilaments and microtubules.
  • IF-associated proteins IF-associated proteins (IFAPs) mediate the interactions of IFs with one another and with other cell structures.
  • IFAPs cross-link I s into a bundle, into a network, or to the plasma membrane, and may cross-link IFs to the microfilament and microtubule cytoskeleton.
  • IFAPs include BPAG1, plakoglobin, desmoplakin I, desmoplakin H, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
  • Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining cell shape and for muscle contraction.
  • the spectrin-actin cytoskeleton is attached to cell membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormally shaped cells which are more rapidly degraded by the spleen, leading to anemia.
  • the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin.
  • the protein dystrophin links actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy.
  • IFs are also attached to membranes by cytoskeletal-membrane anchors.
  • the nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor.
  • Vimentin IFs are attached to the plasma membrane by ankyrin and plectin.
  • Desmosome and hemidesmosome membrane junctions hold together epithelial cells of organs and skin. These membrane junctions allow shear forces to be distributed across the entire epithelial cell layer, thus providing strength and rigidity to the epithelium.
  • IFs in epithelial cells are attached to the desmosome by plakoglobin and desmoplakins.
  • Desmin IFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin.
  • Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intracellular movements such as phagocytosis and rearrangement of cell contents during mitotic cell division (cytokinesis).
  • the contractile unit of skeletal muscle termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber.
  • Myosins are composed of one or two heavy chains and associated light chains.
  • Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain binding, and a carboxy-terminal tail domain.
  • the tail domains may associate to form an ⁇ -helical coiled coil.
  • Conventional myosins such as those found in muscle tissue, are composed of two myosin heavy-chain subunits, each associated with two light-chain subunits that bind at the neck region and play a regulatory role.
  • Unconventional myosins believed to function in intracellular motion, may contain either one or two heavy chains and associated light chains. There is evidence for about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional.
  • Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosolic and axonemal, have been identified. Cytosolic dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve terminal to the cell body and transport of endocytic vesicles to lysosomes. Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flagella and cilia. Dynein on one microtubule doublet walks along the adjacent microtubule doublet.
  • Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven by the hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and light chains.
  • Kinesins are (+) end-directed motor proteins which act on microtubules.
  • the prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function is particularly important for axonal transport in neurons.
  • Kinesin is also important in all cell types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for maintaining the identity and functionality of these secretory organelles.
  • Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement, and cellular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. Cell Biol. 6:63-68.)
  • the prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two light polypeptide chains 5 (KLCs).
  • KHC subunits are typically referred to as "kinesin.” KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length.
  • Two KHCs dimerize to form a rod-shaped molecule with three distinct regions of secondary structure.
  • a globular motor domain that functions in ATP hydrolysis and microtubule binding.
  • Kinesin motor domains are highly conserved and share over 70% identity.
  • an ⁇ -helical coiled-coil 0 region which mediates dimerization.
  • a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
  • KRPs kinesin-related proteins
  • Some KRPs are 5 required for assembly of the mitotic spindle.
  • Phosphorylation of KRP is required for this activity.
  • Failure to assemble the mitotic spindle results in abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer cells.
  • centromere protein E localizes to the kinetochore of human 0 mitotic chromosomes and may play a role in their segregation to opposite spindle poles.
  • Dynamin is a large GTPase motor protein that functions as a "molecular pinchase,” generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin-coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in 5 neurons. Binding of dynamin to a membrane leads to dynamin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynamin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle.
  • dynamin disassembly. Following disassembly the dynamin may either dissociate from the o membrane or remain associated to the vesicle and be transported to another region of the cell.
  • Three homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. conserveed dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin' s GTPase activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on 5 other proteins.
  • Some dynamin-related proteins do not contain the pleckstrin homology domain or the proline-rich domain. (See McNiven, M.A. (1998) Cell 94:151-154; Scaife, R.M. and R.L. Margolis (1997) Cell. Signal. 9:395-401.)
  • the cytoskeleton is reviewed in Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY.
  • Ribosomal RNAs are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate messenger RNA into polypeptides.
  • the eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome.
  • the ribosome also contains more 0 than fifty proteins.
  • the ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small).
  • Ribosomal protein activities include binding rRNA and organizing the conformation of the junctions between rRNA helices (Woodson, S.A. and N.B. Leontis (1998) Curr. Opin. Struct. Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends Biochem: Sci. 23:208-212.)
  • Three important sites are identified on the ribosome.
  • the aminoacyl- 5 tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome.
  • the peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds.
  • the exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (The ribosome is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, pp. 888-908; and Lodish, H. et al. (1995) Molecular Cell o Biology Scientific American Books, New York NY. pp. 119-138.)
  • chromatin The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are observed: euchromatin, some of which may be transcribed, and heterochromatin so densely packed 5 that much of it is inaccessible to transcription. Chromatin packing thus serves to regulate protein expression in eukaryotes. Bacteria lack chromatin and the chromatin-packing level of gene regulation.
  • the fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by another o class of histones, HI. Low molecular weight non-histone proteins called the high mobility group
  • HMG HMG
  • Chromodomain proteins function in compaction of chromatin into its transcriptionally silent heterochromatin form.
  • Patterns of chromatin structure can be stably inherited, producing heritable patterns of gene expression, i mammals, one of the two X chromosomes in each female cell is inactivated by condensation to heterochromatin during zygote development.
  • the inactive state of this chromosome is inherited, so that adult females are mosaics of clusters of paternal-X and maternal-X clonal cell groups.
  • the condensed X chromosome is reactivated in meiosis.
  • Chromatin is associated with disorders of protein expression such as thalassemia, a genetic anemia resulting from the removal of the locus control region (LCR) required for decondensation of the globin gene locus.
  • LCR locus control region
  • Electron carriers such as cytochromes accept electrons from NADH or FADH 2 and donate them to other electron carriers.
  • Adrenodoxin for example, is an FeS protein that forms a complex with NADPH:adrenodoxin reductase and cytochrome p450.
  • Cytochromes contain a heme prosthetic group, a porphyrin ring containing a tightly bound iron atom. Electron transfer reactions play a crucial role in cellular energy production.
  • Glucose is initially converted 5 to pyruvate in the cytoplasm.
  • Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C0 2 coupled by enzymes to the transport of electrons from NADH and FADH 2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P j .
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, 0 and dihydrolipoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to C0 2 with concomitant formation of NADH, FADH 2 , and GTP.
  • oxidative phosphorylation the transfer of electrons from NADH and FADH 2 to 5 oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P ; by the F Q F J ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis include the F ⁇ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c 1; FeS protein, and cytochrome c oxidase.
  • ATP synthesis requires membrane transport enzymes including the phosphate transporter and the ATP- ADP antiport protein.
  • the ATP-binding casette (ABC) superfamily has also been suggested 5 as belonging to the mitochondrial transport group (Hogue, D.L. et al. (1999) J. Mol. Biol. 285:379- 389). Brown fat uncoupling protein dissipates oxidative energy as heat, and may be involved the fever response to infection and trauma (Cannon, B. et al. (1998) Ann. NY Acad. Sci. 856:171-187).
  • Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix 0 inside the inner membrane.
  • the outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in cells.
  • Mitochondria contain a small amount of DNA.
  • Human mitochondrial DNA encodes 13 5 proteins, 22 tRNAs, and 2 rRNAs.
  • Mitochondrial-DNA encoded proteins include NADH-Q reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits.
  • Cytochrome b5 is a central electron donor for various reductive reactions occurring on the cytoplasmic surface of 0 liver endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum (ER), and microbody membranes.
  • Import of these preproteins from the cytoplasm requires a multisubunit protein complex in the outer membrane known as the translocase of outer mitochondrial membrane (TOM; previously o designated MOM; Pfanner, N. et al. (1996) Trends Biochem. Sci. 21:51-52) and at least three inner membrane proteins which comprise the translocase of inner mitochondrial membrane (TIM; previously designated MIM; Pfanner, supra).
  • TOM translocase of outer mitochondrial membrane
  • TIM previously designated MIM; Pfanner, supra
  • An inside-negative membrane potential across the inner mitochondrial membrane is also required for preprotein import.
  • Preproteins are recognized by surface receptor components of the TOM complex and are translocated through a proteinaceous pore 5 formed by other TOM components. Proteins targeted to the matrix are then recognized by the import machinery of the TIM complex.
  • the import systems of the outer and inner membranes can function independently (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218
  • leader peptide is cleaved by a signal peptidase to generate the mature protein.
  • Most leader peptides are removed in a one step process by a protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl. Acad. Sci. USA 90:5355-5358).
  • MPP mitochondrial processing peptidase
  • a two-step process occurs in which MPP generates an intermediate precursor form which is cleaved by a second enzyme, mitochondrial intermediate peptidase, to generate the mature protein.
  • Mitochondrial dysfunction leads to impaired calcium buffering, generation of free radicals that may participate in deleterious intracellular and extracellular processes, changes in mitochondrial permeability and oxidative damage which is observed in several neurodegenerative diseases.
  • Mitochondria are implicated in disorders of cell proliferation, since they play an important role in a cell's decision to proliferate or self-destruct through apoptosis.
  • the oncoprotein Bcl-2 promotes cell proliferation by stabilizing mitochondrial membranes so that apoptosis signals are not released (Susin, S.A. (1998) Biochim. Biophys. Acta 1366:151-165).
  • Multicellular organisms are comprised of diverse cell types that differ dramatically both in structure and function.
  • the identity of a cell is determined by its characteristic pattern of gene expression, and different cell types express overlapping but distinctive sets of genes throughout development. Spatial and temporal regulation of gene expression is critical for the control of cell proliferation, cell differentiation, apoptosis, and other processes that contribute to organismal development.
  • gene expression is regulated in response to extracellular signals that mediate cell-cell communication and coordinate the activities of different cell types. Appropriate gene regulation also ensures that cells function efficiently by expressing only those genes whose functions are required at a given time.
  • Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene transcription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene' s coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes TV, Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554-
  • the double helix structure and repeated sequences of DNA create topological and chemical features which can be recognized by transcription factors. These features are hydrogen bond donor 5 and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated stretches of sequence which induce distinct bends in the helix.
  • transcription factors recognize specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent transcription factor-binding motifs may be required for gene regulation.
  • DNA-binding structural motifs which comprise either 0 ⁇ helices or ⁇ sheets that bind to the major groove of DNA.
  • Four well-characterized structural motifs are hehx-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA.
  • the helix-turn-helix motif consists of two ⁇ helices connected at a fixed angle by a short chain of amino acids. One of the helices binds to the major groove.
  • Helix-turn-helix motifs are 5 exemplified by the homeobox motif which is present in homeodomain proteins.
  • the Antennapedia and Ultrabithorax proteins of Drosophila melanogaster are prototypical homeodomain proteins (Pabo, CO. and R.T. Sauer (1992) Annu. Rev.
  • the zinc finger motif which binds zinc ions, generally contains tandem repeats of about 30 amino acids consisting of periodically spaced cysteine and histidine residues. Examples of this sequence pattern, designated C2H2 and C3HC4 ("RING" finger), have been described (Lewin, supra). Zinc finger proteins each contain an ⁇ helix and an antiparallel ⁇ sheet whose proximity and conformation are maintained by the zinc ion. Contact with DNA is made by the arginine prece ding 5 the ⁇ helix and by the second, third, and sixth residues of the ⁇ helix. Variants of the zinc finger motif include poorly defined cysteine-rich motifs which bind zinc or other metal ions. These motifs may not contain histidine residues and are generally nonrepetitive.
  • the leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an amphipathic a helix. This structure provides the basis for dimerization of two leucine zipper o proteins. The region adjacent to the leucine zipper is usually basic, and upon protein dimerization, is optimally positioned for binding to the major groove. Proteins containing such motifs are generally referred to as bZIP transcription factors.
  • the helix-loop-helix motif (HLH) consists of a short ⁇ helix connected by a loop to a longer a helix. The loop is flexible and allows the two helices to fold back against each other and to bind to 5 DNA.
  • the transcription factor Myc contains a prototypical HLH motif. Most transcription factors contain characteristic DNA binding motifs, and variations on the above motifs and new motifs have been and are currently being characterized (Faisst, S. and S. Meyer (1992) Nucleic Acids Res. 20:3-26).
  • neoplastic disorders in humans can be attributed to inappropriate gene expression.
  • Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M . (1992) Cancer Surv. 15:89-104).
  • Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentially contributing to malignancy.
  • the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process.
  • Eukaryotic cells are surrounded by plasma membranes which enclose the cell and maintain an environment inside the cell that is distinct from its surroundings.
  • eukaryotic organisms are distinct from prokaryotes in possessing many intracellular organelle and vesicle structures. Many of the metabolic reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry take place within these structures.
  • the plasma membrane and the membranes surrounding organelles and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phospholipids, glycolipids, proteoglycans, and proteins. These components confer identity and functionality to the membranes with which they associate. Integral Membrane Proteins
  • TM proteins transmembrane proteins
  • TM domains are typically comprised of 15 to 25 hydrophobic amino acids which are predicted to adopt an ⁇ -helical conformation.
  • TM proteins are classified as bitopic (Types I and II) and polytopic (Types III and IV) (Singer, S.J. (1990) Annu. Rev. Cell Biol. 6:247-296).
  • Bitopic proteins span the membrane once while polytopic proteins contain multiple membrane-spanning segments.
  • TM proteins function as cell-surface receptors, receptor-interacting proteins, transporters of ions or metabolites, ion channels, cell anchoring proteins, and cell type-specific surface antigens.
  • MPs membrane proteins
  • PDZ domains KDEL, RGD, NGR, and GSL sequence motifs
  • vWFA von Willebrand factor A
  • EGF-like domains EGF-like domains.
  • RGD, NGR, and GSL motif-containing peptides have been used as drug delivery agents in targeted cancer treatment of tumor vasculature (Arap, W. et al. (1998) Science 279:377-380).
  • MPs may also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that mediate interactions with extracellular or intracellular molecules.
  • CCD carbohydrate recognition domain
  • GPCR G-protein coupled receptors
  • GPCRs include receptors for biogenic amines, lipid mediators of inflammation, peptide hormones, and sensory signal mediators.
  • the structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an extracellular N-terminus, and a cytoplasmic C-terminus. Three extracellular loops alternate with three intracellular loops to link the seven transmembrane regions. Cysteine disulfide bridges connect the second and third extracellular loops.
  • the most conserved regions of GPCRs are the transmembrane regions and the first two cytoplasmic loops.
  • a conserved, acidic- Arg-aromatic residue triplet present in the second cytoplasmic loop may interact with G proteins.
  • a GPCR consensus pattern is characteristic of most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. and S. Arkinstall (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego CA, pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been associated with neurological disorders such as schizophrenia, Parkinson's disease, Alzheimer's disease, drug addiction, and feeding disorders. Scavenger Receptors
  • Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens.
  • Scavenger receptors types I and ⁇ are trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a transmembrane domain, a large extracellular domain, and a C-terminal cysteine-rich domain.
  • the extracellular domain contains a short spacer region, an ⁇ -helical coiled-coil region, and a triple helical collagen-like region.
  • scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa.
  • TM4SF transmembrane 4 superfamily
  • TM4SF transmembrane 4 superfamily
  • the TM4SF is comprised of membrane proteins which traverse the cell membrane four times.
  • Members of the TM4SF include platelet and endothelial cell membrane proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) Oncogene 9:1205-1211).
  • Members of the TM4SF share about 25-30% amino acid sequence identity with one another.
  • TM4SF members have been implicated in signal transduction, control of cell adhesion, regulation of cell growth and proliferation, including development and oncogenesis, and cell motility, including tumor cell metastasis.
  • Expression of TM4SF proteins is associated with a variety of tumors and the level of expression may be altered when cells are growing or activated.
  • Tumor antigens are cell surface molecules that are differentially expressed in tumor cells relative to normal cells. Tumor antigens distinguish tumor cells immunologically from normal cells and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032).
  • Leukocyte Antigens are cell surface molecules that are differentially expressed in tumor cells relative to normal cells. Tumor antigens.
  • cell surface antigens include those identified on Ieukocytic cells of the immune system. These antigens have been identified using systematic, monoclonal antibody (mAb)- based "shot gun” techniques. These techniques have resulted in the production of hundreds of mAbs directed against unknown cell surface Ieukocytic antigens. These antigens have been grouped into “clusters of differentiation” based on common immunocytochemical localization patterns in various differentiated and undifferentiated Ieukocytic cell types. Antigens in a given cluster are presumed to identify a single cell surface protein and are assigned a "cluster of differentiation" or "CD" designation.
  • CD antigens Some of the genes encoding proteins identified by CD antigens have been cloned and verified by standard molecular biology techniques. CD antigens have been characterized as both transmembrane proteins and cell surface proteins anchored to the plasma membrane via covalent attachment to fatty acid-containing glycolipids such as glycosylphosphatidylinositol (GPI). (Reviewed in Barclay, A.N. et al. (1995) The Leucocyte Antigen Facts Book, Academic Press, San Diego CA, pp. 17-20.) Ion Channels
  • Ion channels are found in the plasma membranes of virtually every cell in the body.
  • chloride channels mediate a variety of cellular functions including regulation of membrane potentials and absorption and secretion of ions across epithelial membranes.
  • Chloride channels also regulate the pH of organelles such as the Golgi apparatus and endosomes (see, e.g., Greger, R. (1988) Annu. Rev, Physiol. 50:111-122).
  • Electrophysiological and pharmacological properties of chloride channels including ion conductance, current-voltage relationships, and sensitivity to modulators, suggest that different chloride channels exist in muscles, neurons, fibroblasts, epithelial cells, and lymphocytes.
  • ion channels have sites for phosphorylation by one or more protein kinases including protein kinase A, protein kinase C, tyrosine kinase, and casein kinase II, all of which regulate ion channel activity in cells. Inappropriate phosphorylation of proteins in cells has been linked . to changes in cell cycle progression and cell differentiation. Changes in the cell cycle have been linked to induction of apoptosis or cancer. Changes in cell differentiation have been linked to diseases and disorders of the reproductive system, immune system, skeletal muscle, and other organ systems. Proton Pumps
  • Proton ATPases comprise a large class of membrane proteins that use the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane. The resultant gradient may be used to transport other ions across the membrane (Na + , K + , or Cl " ) or to maintain organelle pH.
  • Proton ATPases are further subdivided into the mitochondrial F-ATPases, the plasma membrane ATPases, and the vacuolar ATPases. The vacuolar ATPases establish and maintain an acidic pH within various organelles involved in the processes of endocytosis and exocytosis (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700).
  • Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorption of peptides using an electrochemical H + gradient as the driving force.
  • Another type of peptide transporter, the TAP transporter is a heterodimer consisting of TAP 1 and TAP 2 and is associated with antigen processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by TAP so they can be expressed on the cell surface in association with MHC molecules.
  • Each TAP protein consists of multiple hydrophobic membrane spanning segments and a highly conserved ATP-binding cassette (Boll, M. et al. (1996) Proc. Natl. Acad. Sci.
  • Pathogenic microorganisms such as herpes simplex virus, may encode inhibitors of TAP-mediated peptide transport in order to evade immune surveillance (Marusina, K. and J.J Manaco (1996) Curr. Opin. Hematol. 3:19-26). ABC Transporters
  • ABC transporters also called the "traffic ATPases”
  • the ATP-binding cassette (ABC) transporters comprise a superfamily of membrane proteins that mediate transport and channel functions in prokaryotes and eukaryotes (Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8:67-113).
  • ABC proteins share a similar overall structure and significant sequence homology. All ABC proteins contain a conserved domain of approximately two hundred amino acid residues which includes one or more nucleotide binding domains.
  • ABC transporter genes are associated with various disorders, such as hyperbilirubinemia II/Dubin- Johnson syndrome, recessive Stargardt's disease, X-linked adrenoleukodystrophy, multidrug resistance, celiac disease, and cystic fibrosis.
  • membrane proteins are not membrane-spanning but are attached to the plasma membrane via membrane anchors or interactions with integral membrane proteins.
  • Membrane anchors are covalently joined to a protein post-translationally and include such moieties as prenyl, myristyl, and glycosylphosphatidyl inositol groups.
  • Membrane localization of peripheral and anchored proteins is important for their function in processes such as receptor-mediated signal transduction. For example, prenylation of Ras is required for its localization to the plasma membrane and for its normal and oncogenic functions in signal transduction.
  • Intercellular communication is essential for the development and survival of multicellular organisms.
  • Cells communicate with one another through the secretion and uptake of protein signaling molecules.
  • the uptake of proteins into the cell is achieved by the endocytic pathway, in which the interaction of extracellular signaling molecules with plasma membrane receptors results in the formation of plasma membrane-derived vesicles that enclose and transport the molecules into the cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) compartments.
  • the secretion of proteins from the cell is achieved by exocytosis, in which molecules inside of the cell proceed through the secretory pathway.
  • vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. Vesicle formation occurs when a region of membrane buds off from the donor organelle.
  • the membrane-bound vesicle contains proteins to be transported and is surrounded by a proteinaceous coat, the components of which are recruited from the cytosol.
  • a proteinaceous coat the components of which are recruited from the cytosol.
  • Two different classes of coat protein have been identified. Clathrin coats form on vesicles derived from the TGN and PM, whereas coatomer (COP) coats form on vesicles derived from the ER and Golgi.
  • COP coats can be further classified as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COP ⁇ , involved in anterograde traffic from the ER to the Golgi (Mellman, supra).
  • adapter proteins bring vesicle cargo and coat proteins together at the surface of the budding membrane.
  • Adapter protein- 1 and -2 select cargo from the TGN and plasma membrane, respectively, based on molecular information encoded on the cytoplasmic tail of integral membrane cargo proteins.
  • Adapter proteins also recruit clathrin to the bud site.
  • Clathrin is a protein complex consisting of three large and three small polypeptide chains arranged in a three-legged structure called a triskelion. Multiple triskehons and other coat proteins appear to self-assemble on the membrane to form a coated pit. This assembly process may serve to deform the membrane into a budding vesicle.
  • GTP-bound ADP-ribosylation factor (Arf) is also incorporated into the coated assembly.
  • Another small G-protein, dynamin forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, thereby releasing the vesicle.
  • the coated vesicle complex is then transported through the cytosol. During the transport process, Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254).
  • the coat protein is assembled from cytosolic precursor molecules at specific budding regions on the organelle.
  • the COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer).
  • Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP.
  • the coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane proteins.
  • the p24 family of type I membrane proteins represent the major membrane proteins of COPI vesicles (Harter, C. and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654).
  • Organelle Associated Molecules Eukaryotic cells are organized into various cellular organelles which has the effect of separating specific molecules and their functions from one another and from the cytosol. Within the cell, various membrane structures surround and define these organelles while allowing them to interact with one another and the cell environment through both active and passive transport processes. Important cell organelles include the nucleus, the Golgi apparatus, the endoplasmic reticulum, mitochondria, peroxisomes, lysosomes, endosomes, and secretory vesicles. Nucleus The cell nucleus contains all of the genetic information of the cell in the form of DNA, and the components and machinery necessary for replication of DNA and for transcription of DNA into RNA. (See Alberts, B. et al.
  • DNA is organized into compact structures in the nucleus by interactions with various DNA-binding proteins such as histones and non-histone chromosomal proteins.
  • DNA-binding proteins such as histones and non-histone chromosomal proteins.
  • DNA-specific nucleases, DNAses partially degrade these compacted structures prior to DNA replication or transcription.
  • DNA replication takes place with the aid of DNA helicases which unwind the double-stranded DNA helix, and DNA polymerases that duplicate the separated DNA strands.
  • Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene transcription.
  • Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes PV, Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554-570.) Many transcription factors incorporate DNA-binding structural motifs which comprise either ⁇ helices or ⁇ sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA.
  • neoplastic disorders in humans can be attributed to inappropriate gene expression.
  • Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104).
  • Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentially contributing to malignancy.
  • the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms.
  • a complex and balanced program of gene activation and repression is involved in this process.
  • hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is well documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and infections (Isselbacher, K.J. et al. (1996) Harrison's Principles of Internal Medicine. 13/e, McGraw Hill, Inc. and Teton Data Systems Software).
  • RNA polymerase I makes large ribosomal RNAs
  • RNA polymerase III makes a variety of small, stable RNAs including 5S ribosomal RNA and the transfer RNAs (tRNA).
  • RNA polymerase ⁇ transcribes genes that will be translated into proteins.
  • the primary transcript of RNA polymerase II is called heterogenous nuclear RNA (hnRNA), and must be further processed by splicing to remove non-coding sequences called introns.
  • RNA splicing is mediated by small nuclear ribonucleoprotein complexes, or snRNPs, producing mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into proteins.
  • Nucleolus The nucleolus is a highly organized subcompartment in the nucleus that contains high concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly (Alberts, et al. supra, pp. 379-382), Ribosomal RNA (rRNA) is a structural RNA that is complexed with proteins to form ribonucleoprotein structures called ribosomes. Ribosomes provide the platform on which protein synthesis takes place.
  • Ribosomes are assembled in the nucleolus initially from a large, 45S rRNA combined with a variety of proteins imported from the cytoplasm, as well as smaller, 5S rRNAs. Later processing of the immature ribosome results in formation of smaller ribosomal subunits which are transported from the nucleolus to the cytoplasm where they are assembled into functional ribosomes.
  • Endoplasmic Reticulum In eukaryotes, proteins are synthesized within the endoplasmic reticulum (ER), delivered from the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the Golgi to specific intracellular and extracellular destinations.
  • ER endoplasmic reticulum
  • the rough ER is so named because of the rough appearance in electron micrographs imparted by the attached ribosomes on which protein synthesis proceeds.
  • Synthesis of proteins destined for the ER actually begins in the cytosol with the synthesis of a specific signal peptide which directs the growing polypeptide and its attached ribosome to the ER membrane where the signal peptide is removed and protein synthesis is completed.
  • Soluble proteins destined for the ER lumen, for secretion, or for transport to the lumen of other organelles pass completely into the ER lumen.
  • Transmembrane proteins destined for the ER or for other cell membranes are translocated across the ER membrane but remain anchored in the lipid bilayer of the membrane by one or more membrane-spanning ⁇ -helical regions.
  • Translocated polypeptide chains destined for other organelles or for secretion also fold and assemble in the ER lumen with the aid of certain "resident" ER proteins.
  • Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl- prolyl isomerase (PPI).
  • PDI protein disulfide isomerase
  • PPI peptidyl- prolyl isomerase
  • PPI peptidyl- prolyl isomerase
  • PPI an enzyme that catalyzes the isomerization of certain proline imide bonds in oligopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation.
  • the cyclophilins represent a major class of PPI that was originally identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226:544-547).
  • Molecular "chaperones” such as BiP (binding protein) in the ER recognize incorrectly folded proteins as well as proteins not yet folded into their final form and bind to them, both to prevent improper aggregation between them, and to promote proper folding.
  • the Golgi apparatus is a complex structure that lies adjacent to the ER in eukaryotic cells and serves primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra, pp. 600-610). Additional posttranslational processing, principally additional glycosylation, also occurs in the Golgi. Indeed, the Golgi is a major site of carbohydrate synthesis, including most of the glycosaminoglycans of the extracellular matrix. N-linked oligosaccharides, added to proteins in the ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- linked oligosaccharides.
  • O-linked glycosylation of proteins also occurs in the Golgi by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar residues to the first. This process is catalyzed by a series of glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular Cell Biology. W.H. Freeman and Co., New York NY, pp.700- 708). In many cases, both N- and O-linked oligosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the cell surface.
  • the terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination.
  • Other transport vesicles bud off containing proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes.
  • the vacuole system is a collection of membrane bound compartments in eukaryotic cells that functions in the processes of endocytosis and exocytosis. They include phagosomes, lysosomes, endosomes, and secretory vesicles.
  • Endocytosis is the process in cells of internalizing nutrients, solutes or small particles (pinocytosis) or large particles such as internalized receptors, viruses, bacteria, or bacterial toxins (phagocytosis).
  • Exocytosis is the process of transporting molecules to the cell surface. It facilitates placement or localization of membrane-bound receptors or other membrane proteins and secretion of hormones, neurotransmitters, digestive enzymes, wastes, etc.
  • a common property of all of these vacuoles is an acidic pH environment ranging from approximately pH 4.5-5.0. This acidity is maintained by the presence of a proton ATPase that uses the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700).
  • Eukaryotic vacuolar proton ATPase (vp-ATPase) is a multimeric enzyme composed of 3-10 different subunits.
  • One of these subunits is a highly hydrophobic polypeptide of approximately 16 kDa that is similar to the proteolipid component of vp-ATPases from eubacteria, fungi, and plant vacuoles (Mandel, M. et al. (1988) Proc. Natl. Acad. Sci. USA 85:5521-5524).
  • the 16 kDa proteolipid component is the major subunit of the membrane portion of vp-ATPase and functions in the transport of protons across the membrane. Lysosomes
  • Lysosomes are membranous vesicles containing various hydrolytic enzymes used for the controlled intracellular digestion of macromolecules. Lysosomes contain some 40 types of enzymes including proteases, nucleases, glycosidases, lipases, phospholipases, phosphatases, and sulfatases, all of which are acid hydrolases that function at a pH of about 5. Lysosomes are surrounded by a unique membrane containing transport proteins that allow the final products of macromolecule degradation, such as sugars, amino acids, and nucleotides, to be transported to the cytosol where they may be either excreted or reutilized by the cell. A vp-ATPase, such as that described above, maintains the acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-611). Endosomes
  • Endosomes are another type of acidic vacuole that is used to transport substances from the cell surface to the interior of the cell in the process of endocytosis. Like lysosomes, endosomes have an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of endosomes are apparent based on tracer uptake studies that distinguish their time of formation in the cell and their cellular location. Early endosomes are found near the plasma membrane and appear to function primarily in the recycling of internalized receptors back to the cell surface.
  • Late endosomes appear later in the endocytic process close to the Golgi apparatus and the nucleus, and appear to be associated with delivery of endocytosed material to lysosomes or to the TGN where they may be recycled.
  • Specific proteins are associated with particular transport vesicles and their target compartments that may provide selectivity in targeting vesicles to their proper compartments.
  • a cytosolic prenylated GTP-binding protein, Rab is one such protein.
  • Rabs 4, 5, and 11 are associated with the early endosome, whereas Rabs 7 and 9 associate with the late endosome.
  • Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane.
  • the outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in cells.
  • Glucose is initially converted to pyruvate in the cytoplasm.
  • Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to CO 2 coupled by enzymes to the transport of electrons from NADH and FADH 2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P;.
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to C0 2 with concomitant formation of NADH
  • FADH 2 FADH 2
  • GTP oxidative phosphorylation
  • the transfer of electrons from NADH and FADH 2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P ; by the F Q F, ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis include the F Q F J ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c ls FeS protein, and cytochrome c oxidase.
  • Peroxisomes Peroxisomes
  • Peroxisomes like mitochondria, are a major site of oxygen utilization. They contain one or more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen atoms from specific organic substrates in an oxidative reaction that produces hydrogen peroxide (Alberts, supra, pp. 574-577). Catalase oxidizes a variety of substrates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of liver and kidney cells for detoxifying various toxic molecules that enter the bloodstream.
  • catalase oxidizes a variety of substrates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of liver and kidney cells for detoxifying various toxic molecules that enter the bloodstream.
  • ⁇ oxidation results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted to acetyl CoA and exported to the cytosol for reuse in biosynthetic reactions.
  • peroxisomes import their proteins from the cytosol using a specific signal sequence located near the C-terminus of the protein.
  • the importance of this import process is evident in the inherited human disease Zellweger syndrome, in which a defect in importing proteins into perixosomes leads to a perixosomal deficiency resulting in severe abnormalities in the brain, liver, and kidneys, and death soon after birth.
  • One form of this disease has been shown to be due to a mutation in the gene encoding a perixosomal integral membrane protein called peroxisome assembly factor- 1.
  • the present invention relates to nucleic acid sequences comprising human diagnostic and therapeutic polynucleotides (dithp) as presented in the Sequence Listing.
  • the dithp uniquely identify genes encoding human structural, functional, and regulatory molecules.
  • the invention provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the 5 polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ JJD NO: 1-275.
  • the polynucleotide comprises at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a o polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a)
  • the polynucleotide comprises at least 60 contiguous nucleotides of a 5 polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the o polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the invention further provides a composition for the detection of expression of human diagnostic and therapeutic polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide 5 sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d); and a detectable label.
  • the invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polyneucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the method comprises a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
  • the invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
  • the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 30 contiguous nucleotides.
  • the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 60 contiguous nucleotides.
  • the invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ JD NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the invention provides a cell transformed with the recombinant polynucleotide.
  • the invention provides a transgenic organism comprising the re
  • the invention also provides a method for producing a human diagnostic and therapeutic polypeptide, the method comprising a) culturing a cell under conditions suitable for expression of the human diagnostic and therapeutic polypeptide, wherein said cell is transformed with a recombinant 5 polynucleotide, said recombinant polynucleotide comprising an isolated polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ JD NO: 1-275; iii) a polynucleotide complementary to the polynucleotide of i); iv) o a polynucleotide complementary to the polynucleotide
  • the invention also provides an isolated human diagnostic and therapeutic polypeptide 5 (D ⁇ THP) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275.
  • the invention further provides a method of screening for a test compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
  • the method comprises a) combining the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276- 0 553 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276- 553 to the test compound, thereby identifying a compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553.
  • the invention further provides a microarray wherein at least one element of the microarray is 5 an isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-275; c) a polynucleotide complementary to the polynucleotide of a); d) a o polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d).
  • the invention also provides a method for generating a transcript image of a sample which contains polynucleotides.
  • the method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of 5 the polynucleotides in the sample.
  • the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d).
  • a target polynucleotide comprises a polynucleotide selected from the group consisting of
  • the method comprises a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
  • the invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of
  • Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 -275 ; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of i) through iv), and alternatively, the target poly
  • the invention further provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
  • the invention provides an isolated polypeptide comprising an amino acid sequence selected from the 5 group consisting of SEQ ID NO:276-553.
  • the invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of o SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
  • the polynucleotide encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
  • the 5 polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275.
  • the invention provides an isolated antibody which specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553, b) a polypeptide comprising a o naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553.
  • the invention further provides a composition comprising a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence o selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 276- 553, and a pharmaceutically acceptable excipient.
  • the composition comprises a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO : 276- 553.
  • the invention additionally provides a method of treating a disease or condition associated with 5 decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample.
  • the invention provides a composition comprising an agonist compound identified by the method and a pharmaceutically acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from tfre group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO-.276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample.
  • the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceutically acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
  • the method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
  • Table 1 shows the sequence identification numbers (SEQ JD NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with the sequence identification numbers (SEQ ID NO:s) and open reading frame identification o numbers (ORF IDs) corresponding to polypeptides encoded by the template ID.
  • Table 2 shows the sequence identification numbers (SEQ JD NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
  • Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start” and "stop” nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the o polynucleotide segments are indicated.
  • Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to-the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated “start” and “stop” nucleotide positions.
  • the reading frames of the polynucleotide segments are shown, and the 5 polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated.
  • SP signal peptide
  • TM transmembrane
  • the membrane topology of the encoded polypeptide sequence is indicated, the N-terminus (N) listed as being oriented to either the cytosolic (N in) or non- cytosolic (N out) side of the cell membrane or organelle.
  • Table 5 shows the sequence identification numbers (SEQ ID NO:s) and template 0 identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) corresponding to each template.
  • the component sequences, which were used to assemble the template sequences, are defined by the indicated “start” and “stop” nucleotide positions along each template.
  • Table 6 shows the tissue distribution profiles for the templates of the invention.
  • Table 7 shows the sequence identification numbers (SEQ ID NO:s) corresponding to the polypeptides of the present invention, along with the reading frames used to obtain the polypeptide segments, the lengths of the polypeptide segments, the "start” and “stop” nucleotide positions of the polynucleotide sequences used to define the encoded polypeptide segments, the GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
  • Table 8 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention.
  • the first .column of Table 8 lists analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
  • dithp refers to a nucleic acid sequence
  • D ⁇ THP amino acid sequence encoded by dithp
  • a “full-length” dithp refers to a nucleic acid sequence containing the entire coding region of a gene endogenously expressed in human tissue.
  • adjuvants are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
  • Alleles refers to an alternative form of a nucleic acid sequence. Alleles result from a “mutation,” a change or an alternative reading of the genetic code. Any given gene may have none, one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence.
  • the present invention encompasses allelic dithp.
  • amino acid sequence refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin.
  • the amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence.
  • “Amplification” refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies well known in the art.
  • “Antibody” refers to intact molecules as well as to fragments thereof, such as Fab, F(ab') 2 , and Fv fragments, which are capable of binding the epitopic determinant.
  • Antibodies that bind DITHP polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen.
  • the polypeptide or peptide used to immunize an animal can be derived from the translation of RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired.
  • a carrier protein e.g., bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH).
  • KLH keyhole limpet hemocyanin
  • Antisense sequence refers to a sequence capable of specifically hybridizing to a target sequence.
  • the antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
  • PNA peptide nucleic acid
  • Antisense sequence refers to a sequence capable of specifically hybridizing to a target sequence.
  • the antisense sequence can be DNA, RNA, or any nucleic acid mimic or analog.
  • Antisense technology refers to any technology which relies on the specific hybridization of an antisense sequence to a target sequence.
  • a “bin” is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program.
  • “Biologically active” refers to an amino acid sequence having a structural, regulatory, or biochemical function of a naturally occurring amino acid sequence.
  • “Clone joining” is a process for combining gene bins based upon the bins' containing sequence information from the same clone.
  • the sequences may assemble into a primary gene transcript as well as one or more splice variants.
  • “Complementary” describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3'-T-C-A-5').
  • a “component sequence” is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
  • a "consensus sequence” or “template sequence” is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVJJBW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
  • GCG Genetics Computer Group
  • RDMS relational database management system
  • Constant amino acid substitutions are those substitutions that, when made, least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions.
  • the table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions.
  • Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
  • “Deletion” refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent.
  • Derivative refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group.
  • “Differential expression” refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons may be carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample.
  • element and “array element” refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
  • E-value refers to the statistical probability that a match between two sequences occurred by chance.
  • Exon shuffling refers to the recombination of different coding regions (exons). Since an 0 exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus allowing acceleration of the evolution of new protein functions.
  • a "fragment” is a unique portion of dithp or DITHP which is identical in sequence to but shorter in length than the parent sequence.
  • a fragment may comprise up to the entire length of the 5 defined sequence, minus one nucleotide/amino acid residue.
  • a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides.
  • a fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments may be preferentially selected from certain regions of a molecule.
  • a o polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first
  • a fragment of dithp comprises a region of unique polynucleotide sequence that specifically 5 identifies dithp, for example, as distinct from any other sequence in the same genome.
  • a fragment of dithp is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish dithp from related polynucleotide sequences.
  • a fragment of DITHP is encoded by a fragment of dithp.
  • a fragment of DITHP comprises a region of unique amino acid sequence that specifically identifies DITHP.
  • a fragment of D ⁇ P is useful as an immunogenic peptide for the development of antibodies that specifically recognize DITHP.
  • the precise length of a fragment of DITHP and the region of DITHP to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the 5 intended purpose for the fragment.
  • a "full length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" polypeptide.
  • “Hit” refers to a sequence whose annotation will be used to describe a given template. Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E-value. "Homology” refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of a dithp or between a reference amino acid sequence and a fragment of a DITHP.
  • Hybridization refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step.
  • the defined hybridization conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched.
  • Permissive conditions for annealing of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency.
  • T m thermal melting point
  • High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%.
  • blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 ⁇ g/ml. Useful variations on these conditions will be readily apparent to those skilled in the art.
  • Hybridization, particularly under high stringency conditions may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins.
  • RNA:DNA hybridizations may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skill in the art.
  • Immunologically active or “immunogenic” describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell lines.
  • “Insertion” or “addition” refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or residue, respectively, is added to the sequence.
  • Labeling refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
  • “Microarray” is any arrangement of nucleic acids, amino acids, antibodies, etc., on a substrate.
  • the substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or an appropriate membrane.
  • Linkers are short stretches of nucleotide sequence which may be added to a vector or a dithp to create restriction endonuclease sites to facilitate cloning.
  • Polylmkers are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and HindHT) and those which provide blunt ends (e.g., EcoRV, SnaBL and Stul).
  • Naturally occurring refers to an endogenous polynucleotide or polypeptide that may be isolated from viruses or prokaryotic or eukaryotic cells.
  • Nucleic acid sequence refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oligomer, oligonucleotide, or polynucleotide.
  • the nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand.
  • Oligomers refers to a nucleic acid sequence of at least about 6 nucleotides and as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used as, e.g., primers for PCR, and are usually chemically synthesized.
  • "Operably linked” refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
  • PNA protein nucleic acid
  • PNAs refers to a DNA mimic in which nucleotide bases are attached to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA.
  • percent identity and % identity refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • the BLAST software suite includes various sequence analysis programs including "blastn,” that is used to determine alignment between a known polynucleotide sequence and other sequences on a variety of databases.
  • BLAST 2 Sequences are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences” tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example: Matrix: BLOSUM62
  • Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
  • Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
  • the phrases "percent identity” and "% identity”, as applied to polypeptide sequences refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide.
  • NCBI BLAST software suite may be used.
  • BLAST 2 Sequences Version 2.0.9 (May-07-1999) with blastp set at default parameters.
  • Such default parameters may be, for example:
  • Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ JD number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
  • Post-translational modification of a DITHP may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu and the DITHP.
  • Probe refers to dithp or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers” are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme.
  • Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the figures and Sequence Listing, may be used.
  • PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
  • Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope.
  • the Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which . sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.)
  • the PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences.
  • this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments.
  • the oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above.
  • “Purified” refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are naturally associated.
  • a "recombinant nucleic acid” is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra.
  • the term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid.
  • a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
  • such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
  • regulatory element refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host proteins to carry out or regulate transcription or translation.
  • Reporter molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
  • RNA equivalent in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose. "Sample” is used in its broadest sense.
  • Samples may contain nucleic or amino acids, antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues).
  • source e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues.
  • Specific binding or “specifically binding” refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic
  • an antibody is specific for epitope "A”
  • the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
  • substitution refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
  • Substrate refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles or capillaries.
  • the substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
  • a “transcript image” refers to the collective pattern of gene expression by a particular tissue or cell type under given conditions at a given time.
  • Transformation refers to a process by which exogenous DNA enters a recipient cell.
  • Transformation may occur under natural or artificial conditions using various methods well known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell being transformed.
  • Transformants include stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as cells which transiently express inserted DNA or RNA.
  • a "transgenic organism,” as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art.
  • the nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus.
  • the term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule.
  • the transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, and plants and animals.
  • the isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
  • a "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length.
  • the variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties.
  • a variant may be described as, for example, an "allelic” (as defined above), “splice,” “species,” or “polymorphic” variant.
  • a splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing.
  • the corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule.
  • Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other.
  • a polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species.
  • Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base.
  • SNPs single nucleotide polymorphisms
  • the presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
  • variants of the polynucleotides of the present invention may be generated through recombinant methods.
  • One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number
  • DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired ⁇ properties.
  • a "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater identity over a certain defined length of one of the polypeptides.
  • cDNA sequences derived from human tissues and cell lines were aligned based on nucleotide sequence identity and assembled into "consensus" or "template” sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 2.
  • the sequence identification numbers (SEQ JD NO:s) corresponding to the template IDs are shown in column 1.
  • the template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3.
  • the statistical probability of each GenBank hit is indicated by a probability score in column 4, and the functional annotation corresponding to each GenBank hit is listed in column 5.
  • the invention incorporates the nucleic acid sequences of these templates as disclosed in the Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in human molecules.
  • the invention further utilizes these sequences in hybridization and amplification technologies, and in particular, in technologies which assess gene 5 expression patterns correlated with specific cells or tissues and their responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a transcript image for a particular cell or tissue.
  • RNA derived from normal and diseased human tissues and cell lines The human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Genomics, Inc. 5 (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
  • Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells.
  • Such cell lines include, for example, THP-1 , Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA).
  • cell lines Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of Ieukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • a pharmaceutical agent such as 5'-aza-2'-deoxycytidine
  • an activating agent such as lipopolysaccharide in the case of Ieukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides).
  • Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Applied Biosystems). Sequencing can be carried out using, for example, the ABI 373 or 377 (Applied Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
  • nucleotide sequences of the Sequence Listing have been prepared by current, state-of- the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art.
  • Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short
  • cDNA sequences Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art. Alternatively, cDNA sequences are used as "component" sequences that are assembled into
  • template or “consensus” sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed.
  • Block 1 See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA).
  • a series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s",
  • the processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available.
  • RDMS relational database management system
  • a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves.
  • the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
  • bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
  • a resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
  • cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 1.1; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853; and Table 8.) These analyses comprise both reading" frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
  • BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845).
  • an appropriate search tool e.g., BLAST or HMM
  • GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query dithp or DITHP of the present invention.
  • Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 5 BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
  • SEQ JD NO:1 15 by SEQ ID NO:l, SEQ JD NO:2, SEQ JD NO:3, SEQ JD NO:4, SEQ JD NO:5, SEQ JD NO:6, SEQ JD NO:7, SEQ ED NO:8, SEQ JD NO:9, SEQ JD NO: 10, SEQ HD NO: 11, SEQ ID NO: 12, SEQ JD NO:13, SEQ ID NO:14, SEQ JD NO:15, and SEQ JD NO:16, respectively, are, for example, human enzyme molecules.
  • SEQ ID NO.-292 SEQ JD NO:293, SEQ JD NO:294, SEQ ID NO:295, and SEQ JD NO:296,
  • SEQ JD NO:17, SEQ JD NO:18, SEQ JD NO:19, SEQ ID NO:20, and SEQ ED NO:21, respectively, are, for example, extracellular information transmission molecules.
  • SEQ ID NO-.297, SEQ JD NO:298, SEQ JD NO:299, SEQ JD NO:300, SEQ JD NO:301, and SEQ ID NO.302, encoded by SEQ ED NO:22, SEQ JD NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ED NO:26, and SEQ ED NO:27, respectively, are, for example, receptor molecules.
  • SEQ JD NO-.40, SEQ ED NO:41, SEQ ID NO:42, and SEQ ID NO:43, respectively, are, for example, intracellular signaling molecules.
  • HD NO:403, SEQ ID NO:404, and SEQ ID NO:405, encoded by SEQ ED NO: 122, SEQ ED NO: 123, SEQ HD NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ED NO: 127, SEQ D NO: 128, and SEQ ID NO:129, respectively, are, for example, membrane transport molecules.
  • SEQ HD NO:436, encoded by SEQ HD NO: 160 is, for example, an adhesion molecule.
  • SEQ ID NO:437, SEQ HD NO:438, and SEQ HD N0.439, encoded by SEQ JD NO: 161, SEQ ED NO: 162, and SEQ HD NO: 163, respectively, are, for example, antigen recognition molecules.
  • HD NO: 164, SEQ HD NO: 165, SEQ ED NO: 166, and SEQ ED NO: 167, respectively, are, for example, electron transfer associated molecules.
  • SEQ HD NO:444 SEQ ID NO:445, SEQ ID NO:446, SEQ ID NO:447, SEQ HD NO:448, and SEQ HD NO:449, encoded by SEQ HD NO: 168, SEQ HD NO: 169, SEQ ID NO: 170, SEQ HD NO: 171, SEQ HD NO: 172, and SEQ HD NO: 173, respectively, are, for example, secreted/extracellular matrix molecules.
  • SEQ ED NO:462 SEQ ED NO:463, SEQ ID NO:464, SEQ HD NO:465, SEQ HD NO:466, SEQ HD NO:467, SEQ ED NO:468, SEQ ED NO:469, SEQ HD NO:470, and SEQ HD NO:471, encoded by SEQ HD NO:186, SEQ ID NO:187, SEQ HD NO:188, SEQ HD NO:189, SEQ HD NO:190, SEQ ED
  • SEQ HD NO:191, SEQ HD NO:192, SEQ HD NO 93, SEQ HD NO:194, and SEQ HD NO:195, respectively, are, for example, cell membrane molecules.
  • HD NO-.243, and SEQ ID NO:244, respectively, are, for example, chromatin molecules.
  • SEQ HD NO:273, SEQ HD NO:274, and SEQ HD NO:275, respectively, are, for example, molecules associated with growth and development.
  • the dithp of the present invention may be used for a variety of diagnostic and therapeutic purposes.
  • a dithp may be used to diagnose a particular condition, disease, or disorder associated with human molecules.
  • Such conditions, diseases, and disorders include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast
  • the dithp can be used to detect the presence of, or to quantify the amount of, a dithp-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is established.
  • a polynucleotide complementary to a given dithp can inhibit or inactivate a therapeutically relevant gene related to the dithp.
  • the expression of dithp may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of dithp expression.
  • the level of expression of dithp may be compared among different cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at 5 different developmental stages, or among cell types or tissues undergoing various treatments.
  • This type of analysis is useful, for example, to assess the relative levels of dithp expression in fully or partially differentiated cells or tissues, to determine if changes in dithp expression levels are correlated with the development or progression of specific disease states, and to assess the response of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies.
  • Methods for the analysis of dithp expression are based on hybridization and amplification technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utilize, for example, microarrays, and PCR-based procedures.
  • the dithp, their fragments, or complementary sequences may be used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences.
  • the dithp may be hybridized to naturally occurring or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations.
  • Hybridization with a probe based on the nucleic acid sequence of at least one of the dithp allows for the detection of nucleic acid sequences, o including genomic sequences, which are identical or related to the dithp of the Sequence Listing.
  • Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ HD NO: 1-275 and tested for their ability to identify or amplify the target nucleic acid sequence using standard protocols.
  • Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in 5 SEQ HD NO: 1-275 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in "Definitions.”
  • a probe for use in Southern or northern hybridization may be derived from a fragment of a dithp sequence, or its complement, that is up to several hundred nucleotides in length and is either o single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing dithp. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression correlated with, e.g., various stages of development, treatment with a drug or compound, 5 or disease progression.
  • An array analogous to a dot or slot blot may be used to arrange and link polynucleotides to the surface of a substrate using one or more of the following: mechanical (vacuum), chemical, thermal, or UV bonding procedures.
  • Such an array may contain any number of dithp and may be produced by hand or by using available devices, materials, and machines.
  • Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g.,
  • Probes may be labeled by either PCR or enzymatic techniques using a variety of commercially available reporter molecules.
  • commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies).
  • dithp may be cloned into commercially available vectors for the production of RNA probes.
  • Such probes may be transcribed in the presence of at least one labeled nucleotide (e.g., 3 P-ATP, Amersham Pharmacia Biotech).
  • polynucleotides of SEQ JD NO: 1-275 or suitable fragments thereof can be used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures well known in the art, e.g., cDNA library screening, PCR amplification, etc.
  • the molecular cloning of such full length cDNA sequences may employ the method of cDNA library screening with probes using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6.
  • These procedures may also be employed with genomic libraries to isolate genomic sequences of dithp in order to analyze, e.g., regulatory elements.
  • Gene identification and mapping are important in the investigation and treatment of almost all conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder.
  • cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream
  • diabetes may result when a particular individual's immune system is activated by an infection and attacks the insulin-producing cells of the pancreas.
  • Alzheimer's disease has been linked to a gene on chromosome 21; other studies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generally proceeds from genetic linkage analysis to physical mapping.
  • a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition.
  • Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers.
  • RFLP radio frequency domain
  • markers and their locations are known from previous studies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online Mendelian 5 Inheritance in Man (OMEVI) World Wide Web site.
  • dithp sequences may be used to generate hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences. Either coding or noncoding sequences of dithp may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of a dithp coding o sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping.
  • sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries.
  • HACs human artificial chromosomes
  • YACs yeast artificial chromosomes
  • BACs bacterial artificial chromosomes
  • PI constructions or single chromosome cDNA libraries.
  • Fluorescent in situ hybridization may be correlated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of dithp on a physical chromosomal map and a specific disorder, or a o predisposition to a specific disorder, may help define the region of DNA associated with that disorder.
  • the dithp sequences may also be used to detect polymorphisms that are genetically linked to the inheritance of a particular condition, disease, or disorder.
  • In situ hybridization of chromosomal preparations and genetic mapping techniques may be used for extending existing genetic 5 maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the number or arm of the corresponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques.
  • any sequences mapping to that area may represent associated or regulatory genes for further investigation.
  • the nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, inversion, etc., among normal, carrier, or affected individuals.
  • a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease.
  • This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are well known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
  • the dithp of the present invention may be used to design probes useful in diagnostic assays. Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of dithp expression. Labeled probes developed from dithp sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, dithp, or fragments or oligonucleotides derived from dithp, may be used as primers in amplification steps prior to hybridization. The amount of hybridization complex formed is quantified and compared with standards for that cell or tissue. If dithp expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease.
  • Qualitative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
  • PCR enzyme-linked immunosorbent assay
  • the probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of dithp expression, or to evaluate the efficacy of a particular therapeutic treatment.
  • the candidate probe may be identified from the dithp that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of an individual patient.
  • standard expression is established by methods well known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods well known to those skilled in the art may be use to determine the significance of such therapeutic agents.
  • the polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA.
  • the polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be 5 sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique HD database is established for an individual, positive identification of that individual can be made from extremely small tissue samples.
  • oligonucleotide primers derived from the dithp of the invention may be used to detect single nucleotide polymorphisms (SNPs).
  • SNPs are substitutions, insertions and 0 deletions that are a frequent cause of inherited or acquired genetic disease in humans.
  • Methods of SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods.
  • SSCP single-stranded conformation polymorphism
  • fSSCP fluorescent SSCP
  • oligonucleotide primers derived from dithp are used to amplify DNA using the polymerase chain reaction (PCR).
  • the DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like.
  • SNPs in the 5 DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels.
  • the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high-throughput equipment such as DNA sequencing machines.
  • sequence database analysis methods termed in silico SNP (isSNP) are capable of identifying polymorphisms 0 by comparing the sequences of individual overlapping DNA fragments which assemble into a p common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing errors using statistical models and automated analyses of DNA sequence chromatograms.
  • SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASS ARRAY system (Sequenom, Inc., 5 San Diego CA).
  • DNA-based identification techniques are critical in forensic technology.
  • DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc.
  • body fluids e.g., blood, saliva, semen, etc.
  • PCR e.g., PCR Technology, Freeman and Co., New York, NY.
  • polynucleotides of the o present invention can be used as polymorphic markers.
  • reagents capable of identifying the source of a particular tissue.
  • Appropriate reagents can comprise, for example, DNA probes or primers prepared from the sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to 5 screen tissue cultures for contamination.
  • polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, and as an antigen to elicit an immune response.
  • the dithp of the invention or their mammalian homologs may be "knocked out” in an animal model system using homologous recombination in embryonic stem (ES) cells.
  • ES embryonic stem
  • Such techniques are well known in the art and are useful for the generation of animal models of human disease.
  • mouse ES cells such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture.
  • the ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292).
  • a marker gene e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292).
  • the vector integrates into the corresponding region of the host genome by homologous recombination.
  • homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, JD. (1996) Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330).
  • Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain.
  • the blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains.
  • Transgenic animals thus generated may be tested with potential therapeutic or toxic agents.
  • the dithp of the invention may also be manipulated in vitro in ES cells derived from human blastocysts.
  • Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
  • the dithp of the invention can also be used to create "knockin" humanized animals (pigs) or transgenic animals (mice or rats) to model human disease.
  • knockin technology a region of dithp is injected into animal ES cells, and the injected sequence integrates into the animal cell genome.
  • Transformed cells are injected into blastulae, and the blastulae are implanted as described above.
  • Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease.
  • a mammal inbred to overexpress dithp resulting, e.g., in the secretion of DITHP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
  • DITHP encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides.
  • the binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule.
  • Examples of such molecules include antibodies, 5 oligonucleotides, proteins (e.g., receptors), or small molecules.
  • the molecule is closely related to the natural ligand of the polypeptide, e.g., a ligand or fragment thereof, a natural substrate, or a structural or functional mimetic.
  • the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, o e.g., the active site.
  • the molecule can be rationally designed using known techniques.
  • the screening for these molecules involves producing appropriate cells which express the polypeptide, either as a secreted protein or on the cell membrane.
  • Preferred cells include cells from mammals, yeast, Drosophila, or E. coli. Cells expressing the polypeptide or cell membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, 5 stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
  • An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
  • the assay can be carried out using cell-free preparations, polypeptide/molecule o affixed to a solid support, chemical libraries, or natural product mixtures.
  • the assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
  • an ELISA assay using, e.g., a monoclonal or polyclonal antibody can measure 5 polypeptide level in a sample.
  • the antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
  • All of the above assays can be used in a diagnostic or prognostic context.
  • the molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule.
  • the o assays can discover agents which may inhibit or enhance the production of the polypeptide from suitably manipulated cells or tissues.
  • a transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incorporated by reference herein.)
  • a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type.
  • the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray.
  • the resultant transcript image would provide a profile of gene activity pertaining to human molecules for diagnostics and therapeutics.
  • Transcript images which profile dithp expression may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples.
  • the transcript image may thus reflect dithp expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line.
  • Transcript images which profile dithp expression may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N.L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties.
  • the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound.
  • Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified.
  • the transcript levels in the treated biological sample are compared with levels fr an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
  • proteome refers to the global pattern of protein expression in a particular tissue or cell type.
  • proteome expression patterns, or profiles are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time.
  • a profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type.
  • the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra).
  • the proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains.
  • the optical density of each protein spot is generally proportional to the level of the protein in the sample.
  • the optical densities of equivalently positioned protein spots from different samples for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment.
  • the proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry.
  • the identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
  • a proteomic profile may also be generated using antibodies specific for DITHP to quantify the levels of DITHP expression.
  • the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-11; Mendoze, L.G. et al. (1999) Biotechniques 27:778-88). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a tl iol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
  • Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level.
  • There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile.
  • the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
  • the toxicity of a test compound is assessed by treating a biological 5 sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. Individual proteins are identified by sequencing o the amino acid residues of the individual proteins and comparing these partial sequences to the DITHP encoded by polynucleotides of the present invention. Another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound.
  • Proteins from the biological sample are incubated with antibodies specific to the DITHP encoded by polynucleotides of the present invention. 5 The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
  • Transcript images may be used to profile dithp expression in distinct tissue types. This o process can be used to determine human molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of dithp expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor the efficacy of drug treatments for diseases which affect the activity of human molecules. 5 Transcript images of cell lines can be used to assess human molecule activity and/or to identify cell lines that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, and a transcript image following treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in human molecule activity. o Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
  • the polynucleotides of the present invention are useful in antisense technology.
  • Antisense 5 technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression.
  • Agrawal, S., ed. 1996 Antisense Therapeutics, Humana Press h e, Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3): 171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40:1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12): 1055-1063; and Lavrosky, Y.
  • An antisense sequence is a polynucleotide sequence capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs.
  • the binding which results in modulation of expression occurs through hybridization or binding of complementary base pairs.
  • Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double helix.
  • the polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by dithp.
  • the antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (Applied Biosystems) or other automated systems known in the art.
  • Antisense sequences can also be produced biologically, such as by transforming an appropriate host cell with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.) h therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used.
  • Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein.
  • Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors.
  • viral vectors such as retrovirus and adeno-associated virus vectors.
  • the nucleotide sequences encoding DITHP or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host.
  • an appropriate expression vector i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host.
  • Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding DITHP and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
  • a variety of expression vector/host systems may be utilized to contain and express sequences encoding DITHP. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammalian) cell systems.
  • microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors
  • yeast transformed with yeast expression vectors insect cell systems infected with viral expression vectors (e.g., baculovirus)
  • plant cell systems transformed with viral expression vectors e.g., cauliflower mosaic
  • Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population.
  • the invention is not limited by the host cell employed.
  • sequences encoding DITHP can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems may be used to recover transformed cell fines. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. Acad. Sci.
  • the dithp of the invention may be used for somatic or germline gene therapy.
  • Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCHD)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J.
  • a genetic deficiency e.g., in the cases of severe combined immunodeficiency (SCHD)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al.
  • hepatitis B or C virus HBV, HCV
  • fungal parasites such as Candida albicans and Paracoccidioides brasiliensis
  • protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi.
  • diseases or disorders caused by deficiencies in dithp are treated by constructing mammalian expression vectors comprising dithp and introducing these vectors by mechanical means into dithp-deficient cells.
  • Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
  • Expression vectors that may be effective for the expression of dithp include, but are not limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SCRJPT, PCMV-TAG, PEGSH PERV (Stratagene, La Jolla CA), and PTET-OFF,
  • the dithp of the invention may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M.
  • a constitutively active promoter e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes
  • an inducible promoter e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H
  • TRANSFECTION KIT available from Invitrogen
  • transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845).
  • the introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
  • diseases or disorders caused by genetic defects with respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) dithp under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences required for efficient vector propagation.
  • Retrovirus vectors e.g., PFB and PFBNEO
  • Retrovirus vectors are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A.
  • the vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, AD. (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol.
  • VSVg vector producing cell line
  • U.S. Patent Number 5,910,434 to Rigg discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4 + T-cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M .
  • an adenovirus-based gene therapy delivery system is used to deliver dithp to cells which have one or more genetic abnormalities with respect to the expression of dithp.
  • the construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art.
  • Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268).
  • Potentially useful adenoviral vectors are described in U.S. Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference.
  • herpes-based, gene therapy delivery system is used to deliver dithp to target cells which have one or more genetic abnormalities with respect to the expression of dithp.
  • HSV herpes simplex virus
  • the use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing dithp to cells of the central nervous system, for which HSV has a tropism.
  • the construction and packaging of herpes-based vectors are well known to those with ordinary skill in the art.
  • HSV herpes simplex virus
  • a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference.
  • U.S. Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists 5 of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for purposes including human gene therapy.
  • HSV vectors see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163: 152-161, hereby incorporated by reference.
  • the manipulation of cloned herpesvirus sequences, o the generation of recombinant virus following the transfection of multiple plasmids containing different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art.
  • an alphavirus (positive, single-stranded RNA virus) vector is used to deliver dithp to target cells.
  • the biology of the prototypic alphavirus, Semliki Forest Virus (SFV), 5 has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469).
  • SFV Semliki Forest Virus
  • This subgenomic RNA replicates to higher levels than the full-length genomic RNA, resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). o Similarly, inserting dithp into the alphavirus genome in place of the capsid-coding region results in the production of a large number of dithp RNAs and the synthesis of high levels of DITHP in vector transduced cells.
  • alphavirus infection is typically associated with cell lysis within a few days, the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the 5 needs of the gene therapy application (Dryga, S.A. et al. (1997) Virology 228:74-83).
  • the wide host range of alphaviruses will allow the introduction of dithp into a variety of cell types.
  • the specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction.
  • the methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
  • Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For descriptions of and protocols of antibody technologies, see, e.g., Pound JD. (1998) Immunochemical Protocols, Humana Press, Totowa, NJ.
  • amino acid sequence encoded by the dithp of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity.
  • appropriate software e.g., LASERGENE NAVIGATOR software, DNASTAR
  • the optimal sequences for immunization are selected from the C-terminus, the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation. Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra. Chapter 11.7). Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic.
  • Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids, preferably at least 10 amino acids, and most preferably at least 15 amino acids.
  • a peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet hemocyanin (KLH; Sigma, St. Louis MO) for antibody production.
  • KLH keyhole limpet hemocyanin
  • a peptide encompassing an antigenic region may be expressed from a dithp, synthesized as described above, or purified from human cells.
  • mice, goats, and rabbits may be immunized by injection with a peptide.
  • various adjuvants may be used to increase immunological response.
  • peptides about 15 residues in length may be synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra).
  • Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant.
  • the resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- rabbit IgG.
  • BSA bovine serum albumin
  • Antisera with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting.
  • isolated and purified peptide may be used to immunize mice (about 100 ⁇ g of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of peptide is sufficient for labeling and screening several thousand clones.
  • Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody.
  • wells of a multi-well plate FAST, Becton-Dickinson, Palo Alto, CA
  • affinity-purified, specific rabbit-anti-mouse (or suitable anti-species IgG) antibodies at 10 mg/ml.
  • the coated wells are blocked with 1% BSA and washed and exposed to supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 mg/ml.
  • Clones producing antibodies bind a quantity of labeled peptide that is detectable above background.
  • Antibody fragments containing specific binding sites for an epitope may also be generated.
  • such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments.
  • construction of Fab expression libraries in filamentous bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra, Chaps. 45-47).
  • Antibodies generated against polypeptide encoded by dithp can be used to purify and characterize full-length DITHP protein and its activity, binding partners, etc.
  • Anti-DITHP antibodies may be used in assays to quantify the amount of DITHP found in a particular human cell. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions.
  • the peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
  • Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes between the DLTHP and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
  • 60/229,750 U.S. Ser. No. 60/230,597, U.S. Ser. No. 60/230,505, U.S. Ser. No. 60/231,163, U.S. Ser. No. 60/229,747, U.S. Ser. No. 60/229,748, U.S. Ser. No. 60/230,583, U.S. Ser. o No. 60/230,519, U.S. Ser. No. 60/230,595, U.S. Ser. No. 60/230,865, and U.S. Ser. No. 60/230,951, are hereby expressly incorporated by reference.
  • RNA was provided with RNA and constructed the corresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene Cloning Systems, ie. (Stratagene), La Jolla CA) or SUPERSCRIPT o plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000, 5 SEPHAROSE CL2B , or SEPHAROSE CL4B column chromatography (Amersham Pharmacia
  • cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRJPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (Invitrogen, Carlsbad CA), PBK-CMV plasmid (Stratagene), or pINCY (Incyte Genomics, Palo Alto CA), or derivatives thereof.
  • PBLUESCRJPT plasmid (Stratagene)
  • PSPORT1 plasmid (Life Technologies)
  • PCDNA2.1 plasmid Invitrogen, Carlsbad CA
  • PBK-CMV plasmid pINCY
  • Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XL1- 5 BlueMRF, or SOLR from Stratagene or DH5 ⁇ , DH10B, or ElectroMAX DH10B from Life Technologies.
  • Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system 0 (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or 5 without lyophilization, at 4 ° C.
  • the Magic or WIZARD Minipreps DNA purification system Promega
  • AGTC Miniprep purification kit Edge BioSystems, Gaithersburg MD
  • plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format.
  • Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically o using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN H fluorescence scanner (Labsystems Oy, Helsinki, Finland).
  • cDNA sequencing reactions were processed using standard methods or high-throughput 5 instrumentation such as the ABI CATALYST 800 thermal cycler (Applied Biosystems) or the PTC- 200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Corp., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton).
  • cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing 0 ready reaction kit (Applied Biosystems).
  • Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (Applied Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using 5 standard methods (reviewed in Ausubel, 1997, supra, Chapter 7.7). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VHI. IV. Assembly and Analysis of Sequences
  • Component sequences from chromatograms were subject to PHRED analysis and assigned a quality score.
  • the sequences having at least a required quality score were subject to various preprocessing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, polyA 5 tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaller than 50 base pairs.
  • low-information sequences and repetitive elements e.g., dinucleotide repeats, Alu repeats, etc.
  • sequences were then subject to assembly procedures in which the sequences were 0 assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bins using BLASTn (v.1.4 WashU) and CROSSMATCH. Candidate pairs were identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local identity were accepted into the bin. The component sequences from each bin were 5 assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP.
  • each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed o herein.
  • the component sequences which were used to assemble each template consensus sequence are listed in Table 5, along with their positions along the template nucleotide sequences.
  • Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-split. Assembled templates were also subject to analysis by 5 STLTCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
  • bins were clone joined 0 based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actually belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
  • the template sequences were translated in all three forward reading frames, and each translation was searched against hidden Markov models for signal peptides using the HMMER 5 software package. Construction of hidden Markov models and their usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. Biol. 6:361-365.) Only those signal peptide hits with a cutoff score of 11 bits or greater are reported. A cutoff score of 11 bits or greater corresponds to at least about 91-94% true-positives in signal peptide prediction.
  • Template sequences were also translated in all three forward reading frames, and each translation was searched o against TMAP, a program that uses weight matrices to delineate transmembrane segments on protein sequences and determine orientation, with respect to the cell cytosol (Persson, B. and P. Argos (1994) J. Mol. Biol. 237: 182-192; Persson, B. and P. Argos (1996) Protein Sci. 5:363-371). Regions of templates which, when translated, contain similarity to signal peptide or transmembrane consensus sequences are reported in Table 4. 5 The results of HMMER analysis as reported in Tables 3 and 4 may support the results of
  • BLAST analysis as reported in Table 2 or may suggest alternative or additional properties of template-encoded polypeptides not previously uncovered by BLAST or other analyses.
  • Template sequences are further analyzed using the bioinformatics tools listed in Table 8, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). 5 Template sequences may be further queried against public databases such as the GenBank rodent, mammalian, vertebrate, prokaryote, and eukaryote databases.
  • polypeptide sequences were translated to derive the corresponding longest open reading frame as presented by the polypeptide sequences as reported in Table 7.
  • a polypeptide of the invention may begin at any of the methionine residues within the full length translated o polypeptide.
  • Polypeptide sequences were subsequently analyzed by querying against the GenBank protein database (GENPEPT, (GenBank version 124)).
  • Full length polynucleotide sequences are also analyzed using MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR).
  • Polynucleotide and polypeptide sequence alignments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the 5 MEGALIGN multisequence alignment program (DNASTAR), which also calculates the percent identity between aligned sequences.
  • Table 7 shows sequences with homology to the polypeptides of the invention as identified by BLAST analysis against the GenBank protein (GENPEPT) database.
  • Column 1 shows the polypeptide sequence identification number (SEQ HD NO:) for the polypeptide segments of the o invention.
  • Column 2 shows the reading frame used in the translation of the polynucleotide sequences encoding the polypeptide segments.
  • Column 3 shows the length of the translated polypeptide segments.
  • Columns 4 and 5 show the start and stop nucleotide positions of the polynucleotide sequences encoding the polypeptide segments.
  • Column 6 shows the GenBank identification number (GI Number) of the nearest GenBank homolog.
  • Column 7 shows the probability score for the match 5 between each polypeptide and its GenBank homolog.
  • Column 8 shows the annotation of the
  • Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs o from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel,
  • the product score takes into account both the degree of similarity between two sequences and the length of the sequence match.
  • the product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences).
  • the BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score.
  • the product score represents a balance between fractional overlap and quality in a BLAST alignment.
  • a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared.
  • a product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other.
  • a product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
  • a tissue distribution profile is determined for each template by compiling the cDNA library tissue classifications of its component cDNA sequences.
  • Each component sequence is derived from a cDNA library constructed from a human tissue.
  • Each human tissue is classified into one of the following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract.
  • Template sequences, component sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD database (Incyte Genomics, Palo Alto CA).
  • Table 6 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of ⁇ 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of ⁇ 10% in all tissue categories.
  • Transcript images are generated as described in Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference.
  • Oligonucleotide primers designed using a dithp of the Sequence Listing are used to extend the nucleic acid sequence.
  • One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template.
  • the initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations are avoided.
  • Selected human cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed.
  • PCR is performed in 96-well plates using the PTC-200 thermal cycler (MJ Research).
  • the reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg 2+ , (NH 4 ) 2 S0 4 , and ⁇ - mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C
  • the parameters for primer pair T7 and SK+ are as follows: Step 1: 94 °C, 3 min; Step 2: 94°C,
  • the concentration of DNA in each well is determined by dispensing 100 ⁇ l PICOGREEN quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 ⁇ l of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Incorporated
  • the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector
  • the cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham 5 Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified ⁇ using the same conditions as described above.
  • Samples are diluted with 20% dimethysulfoxide (1:2, o v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems). h like manner, the dithp is used to obtain regulatory sequences (promoters, introns, and enhancers) using the procedure above, oligonucleotides designed for such extension, and an 5 appropriate genomic library.
  • Hybridization probes derived from the dithp of the Sequence Listing are employed for screening cDNAs, mRNAs, or genomic DNA.
  • the labeling of probe nucleotides between 100 and 0 1000 nucleotides in length is specifically described, but essentially the same procedure may be used with larger cDNA fragments.
  • Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, ⁇ 32 P-ATP, and 0.5X One-Phor-AU Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech).
  • the probe mixture is diluted to IO 7 dpm/ ⁇ g/ml hybridization buffer and used in a typical membrane-based 5 hybridization analysis.
  • the DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel.
  • the DNA fragments are transferred from the agarose to nylon membrane (NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the manufacturer of the membrane.
  • Prehybridization is carried out for three or more hours at 68 °C, and o hybridization is carried out overnight at 68 °C.
  • blots are sequentially washed at room temperature under increasingly stringent conditions, up to O.lx saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate.
  • SSC O.lx saline sodium citrate
  • the cDNA sequences which were used to assemble SEQ HD NO: 1-275 are compared with sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ HD NO: 1-275 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 8). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped.
  • SHGC Stanford Human Genome Center
  • WIGR Whitehead Institute for Genome Research
  • Genethon Genethon
  • SEQ HD NO: 1-275 are described as ranges, or intervals, of human chromosomes.
  • the map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p-arm.
  • centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers.
  • cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.
  • Mb megabase
  • the cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
  • Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA + RNA is purified using the oligo (dT) cellulose method.
  • Each polyA + RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/ ⁇ l oligo-dT primer (21mer), IX first strand buffer, 0.03 units/ ⁇ l RNase inhibitor, 500 ⁇ M dATP, 500 ⁇ M dGTP, 500 ⁇ M dTTP, 40 ⁇ M dCTP, 40 ⁇ M dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech).
  • the reverse transcription reaction is performed in a 25 ml volume containing 200 ng polyA + RNA with GEMBRIGHT kits (Incyte).
  • Specific control polyA 4" RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpublished).
  • the control mRNAs at 0.002 ng, 0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1:100,000, 1 : 10,000, 1 : 1000, 1 : 100 (w/w) to sample mRNA respectively.
  • the control mRNAs are diluted into reverse transcription reaction at ratios of 1:3, 3:1, 1:10, 10:1, 1:25, 25:1 (w/w) to sample mRNA differential expression patterns.
  • each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA.
  • Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.
  • reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol.
  • the probe is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 ⁇ l 5X SSC/0.2% SDS.
  • Sequences of the present invention are used to generate array elements.
  • Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts.
  • PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert.
  • Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 ⁇ g. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
  • Purified array elements are immobilized on polymer-coated glass slides.
  • Glass microscope slides (Coming) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments.
  • Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven.
  • Array elements are applied to the coated glass substrate using a procedure described in US Patent No. 5,807,522, incorporated herein by reference.
  • 1 ⁇ l of the array element DNA, at an average concentration of 100 ng/ ⁇ l, is loaded into the open capillary printing element by a high-speed robotic apparatus.
  • the apparatus then deposits about 5 nl of array element sample per slide.
  • Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 0.2% SDS and distilled water as before.
  • PBS phosphate buffered saline
  • Hybridization reactions contain 9 ⁇ l of probe mixture consisting of 0.2 ⁇ g each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer.
  • the probe mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm 2 coverslip.
  • the arrays are transferred to a waterproof chamber having a cavity just slightly larger than a microscope slide.
  • the chamber is kept at 100% humidity internally by the addition of 140 ⁇ l of 5x SSC in a corner of the chamber.
  • the chamber containing the arrays is incubated for about 6.5 hours at 60° C.
  • the arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45° C in a second wash buffer (0.1X SSC), and dried. Detection
  • Reporter-labeled hybridization complexes are detected with a microscope equipped with an Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5.
  • the excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY).
  • the slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- scanned past the objective.
  • the 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
  • a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477,
  • a specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1 : 100,000.
  • the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
  • the output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC computer.
  • a D analog-to-digital
  • the digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal).
  • the data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore 's emission spectrum.
  • a grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid.
  • the fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal.
  • the software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
  • oligonucleotide sequences complementary to the dithp are used to detect, decrease, or inhibit expression of the naturally occurring nucleotide.
  • the use of oligonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaller or larger sequence fragments can also be used.
  • Appropriate oligonucleotides are designed from the dithp using OLIGO 4.06 software (National Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial supplier.
  • OLIGO 4.06 software National Biosciences
  • a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence.
  • a complementary oligonucleotide is designed to prevent ribosomal binding and processing of the transcript.
  • DITHP is accomplished using bacterial or virus-based expression systems.
  • cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription.
  • promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element.
  • Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3).
  • Antibiotic resistant bacteria express DLTHP upon induction with isopropyl beta-D- thiogalactopyranoside (EPTG).
  • DITHP in eukaryotic cells
  • baculovirus recombinant Autographica califomica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus.
  • AcMNPV Autographica califomica nuclear polyhedrosis virus
  • the nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding DLTHP by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription.
  • Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.)
  • DITHP is synthesized as a fusion protein with, e.g., glutathione S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates.
  • GST glutathione S-transferase
  • a peptide epitope tag such as FLAG or 6-His
  • FLAG an 8-amino acid peptide
  • 6-His a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra. Chapters 10 and 16). Purified DLTHP obtained by these methods can be used directly in the following activity assay.
  • DLTHP activity is demonstrated through a variety of specific assays, some of which are outlined below.
  • Oxidoreductase activity of DLTHP is measured by the increase in extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of oxidation activity, or the decrease in o extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858).
  • One of three substrates may be used: Asn- ⁇ Gal, biocytidine, or ubiquinone-10.
  • the respective subunits of the enzyme reaction for example, cytochtome c r b oxidoreductase and cytochrome c, are reconstituted.
  • the reaction mixture contains a)l-2 mg/ml DITHP; and b) 15 mM substrate, 2.4 mM NAD(P) + in 0.1 M phosphate buffer, pH 7.1 5 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na 2 HP0 4 buffer, pH 7.4 ( reduction reaction); in a total volume of 0.1 ml.
  • Changes in absorbance at 340 nm (A 340 ) are measured at 23.5 ° C using a recording spectrophotometer (Shimadzu Scientific Instruments, Inc., Pleasanton CA).
  • o Oxidoreductase activity of DITHP activity is proportional to the amount of NAD(P)H present in the assay.
  • Transferase activity of DITHP is measured through assays such as a methyl transferase assay in which the transfer of radiolabeled methyl groups between a donor substrate and an acceptor substrate is measured (Bokar, J.A. et al. (1994) J. Biol. Chem. 269:17697-17704).
  • Reaction mixtures 5 (50 ⁇ l final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 ⁇ Ci [met/YyZ- 3 H]AdoMet (0.375 ⁇ M AdoMet) (DuPont-NEN), 0.6 ⁇ g DITHP, and acceptor substrate (0.4 ⁇ g [ 35 S]RNA or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. The products are separated by chromatography or electrophoresis and the level of methyl transferase activity is o determined by quantification of methyl- 3 Ji recovery.
  • DITHP hydrolase activity is measured by the hydrolysis of appropriate synthetic peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absorption of the released chromophore.
  • Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases), animopeptidase (leucine aminopeptidase), or carboxypeptidase (Carboxypeptidase A and B, procollagen C-proteinase).
  • DITHP isomerase activity such as peptidyl prolyl cis/trans isomerase activity can be assayed by an enzyme assay described by Rahfeld, J.U., et al. (1994) (FEBS Lett. 352: 180-184).
  • the assay is performed at 10 °C in 35 mM HEPES buffer, pH 7.8, containing chymotrypsin (0.5 mg/ml) and 5 DITHP at a variety of concentrations. Under these assay conditions, the substrate, Suc-Ala-Xaa-Pro- Phe-4-NA, is in equilibrium with respect to the prolyl bond, with 80-95% in trans and 5-20% in cis conformation.
  • An assay for DITHP activity associated with growth and development measures cell proliferation as the amount of newly initiated DNA synthesis in Swiss mouse 3T3 cells.
  • a plasmid containing polynucleotides encoding DITHP is transfected into quiescent 3T3 cultured cells using 5 methods well known in the art. The transiently transfected cells are then incubated in the presence of [ 3 H]thymidine, a radioactive DNA precursor. Where applicable, varying amounts of DITHP ligand are added to the transfected cells. Incorporation of [ 3 H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA.
  • o Growth factor activity of DLTHP is measured by the stimulation of DNA synthesis in Swiss mouse 3T3 cells (McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY). Initiation of DNA synthesis indicates the cells' entry into the mitotic cycle and their commitment to undergo later division. 3T3 cells are competent to respond to most growth factors, not only those that are mitogenic, but also those that are involved in embryonic 5 induction. This competence is possible because the in vivo specificity demonstrated by some growth factors is not necessarily inherent but is determined by the responding tissue.
  • DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incorporation of [ 3 H]thymidine into acid-precipitable DNA is measured o over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of growth factor activity.
  • DITHP concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [ 3 H] thymidine into acid-precipitable DNA.
  • an assay for cytokine activity of DITHP measures the proliferation of leukocytes. In this assay, the amount of tritiated thymidine incorporated into newly synthesized DNA is used to estimate proliferative activity. Varying amounts of DITHP are added to cultured leukocytes, such as granulocytes, monocytes, or lymphocytes, in the presence of [ 3 H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations.
  • Incorporation of [ 3 H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA.
  • a linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of DITHP activity.
  • One unit of activity per milliliter is conventionally defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [ 3 H]thymidine into acid-precipitable DNA.
  • An alternative assay for DITHP cytokine activity utilizes a Boyden micro chamber
  • migratory cells such as macrophages or monocytes are placed in cell culture media in the upper compartment of the chamber.
  • Varying dilutions of DITHP are placed in the lower compartment.
  • the two compartments are separated by a 5 or 8 micron pore polycarbonate filter (Nucleopore, Pleasanton CA). After incubation at 37 °C for 80 to 120 minutes, the filters are fixed in methanol and stained with appropriate labeling agents. Cells which migrate to the other side of the filter are counted using standard microscopy.
  • the chemotactic index is calculated by dividing the number of migratory cells counted when DITHP is present in the lower compartment by the number of migratory cells counted when only media is present in the lower compartment.
  • the chemotactic index is proportional to the activity of DITHP.
  • cell lines or tissues transformed with a vector containing dithp can be assayed for DITHP activity by immunoblotting.
  • Cells are denatured in SDS in the presence of ⁇ - mercaptoethanol, nucleic acids removed by ethanol precipitation, and proteins purified by acetone precipitation.
  • Pellets are resuspended in 20 mM tris buffer at pH 7.5 and incubated with Protein G- Sepharose pre-coated with an antibody specific for DITHP. After washing, the Sepharose beads are boiled in electrophoresis sample buffer, and the eluted proteins subjected to SDS-PAGE.
  • the SDS- PAGE is transferred to a nitrocellulose membrane for immunoblotting, and the DITHP activity is assessed by visualizing and quantifying bands on the blot using the antibody specific for DITHP as the primary antibody and 1 5 I-labeled IgG specific for the primary antibody as the secondary antibody.
  • DITHP kinase activity is measured by phosphorylation of a protein substrate using ⁇ -labeled
  • [ 32 P]-ATP and quantitation of the incorporated radioactivity using a radioisotope counter are incubated with the protein substrate, [ 32 P]-ATP, and an appropriate kinase buffer.
  • the [ 32 P] incorporated into the product is separated from free [ 32 P]-ATP by electrophoresis and the incorporated [ 32 P] is counted.
  • the amount of [ 32 P] recovered is proportional to the kinase activity of DITHP in the assay.
  • a determination of the specific amino acid residue phosphorylated is made by phosphoamino acid analysis of the hydrolyzed protein.

Abstract

The present invention provides purified human polynucleotides for diagnostics and therapeutics (dithp). Also encompassed are the polypeptides (DITHP) encoded by dithp. The invention also provides for the use of dithp, or complements, oligonucleotides, or fragments thereof in diagnostic assays. The invention further provides for vectors and host cells containing dithp for the expression of DITHP. The invention additionally provides for the use of isolated and purified DITHP to induce antibodies and to screen libraries of compounds and the use of anti-DITHP antibodies in diagnostic assays. Also provided are microarrays containing dithp and methods of use.

Description

MOLECULES FOR DIAGNOSTICS AND THERAPEUTICS
TECHNICAL FIELD
5 The present invention relates to human molecules and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
BACKGROUND OF THE INVENTION o The human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders, autoimmune/inflammatory disorders, infections, developmental disorders, endocrine disorders, 5 metabolic disorders, neurological disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders. The identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment. Therefore, these genes and their products are useful as diagnostics and therapeutics. These genes may encode, for example, enzyme molecules, molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription 5 factor molecules, chromatin molecules, cell membrane molecules, and organelle associated molecules.
For example, cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered o patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into 5 several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways and include growth factors, growth factor receptors, intracellular signal transducers, nuclear 5 transcription factors, and cell-cycle control proteins. In contrast, tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced function or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Although many different genes and their products have been found to be associated with cell proliferative disorders such as cancer, many more may exist that are yet to be discovered. 0 DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals 5 have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, o the expression of a large number of genes. In some cases the interactions may be expected, such as when the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes. 5
Enzyme Molecules
The cellular processes of biogenesis and biodegradation involve a number of key enzyme classes including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. These enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well o regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation, apoptosis, etc., and through catalyzing critical steps in DNA replication and repair, and the process of translation. Oxidoreductases 5 Many pathways of biogenesis and biodegradation require oxidoreductase (dehydrogenase or reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor. Potential cofactors include cytochromes, oxygen, disulfide, iron-sulfur proteins, flavin adenine dinucleotide (FAD), and the nicotinamide adenine dinucleotides NAD and NADP (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K., pp. 779-793). Reductase activity catalyzes the transfer of electrons between substrate(s) and cofactor(s) with concurrent oxidation of the cofactor. The reverse dehydrogenase reaction catalyzes the reduction of a cofactor and consequent oxidation of the substrate. Oxidoreductase enzymes are a broad superfamily of proteins that catalyze numerous reactions in all cells of organisms ranging from bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, estrogens, androgens, and prostaglandins. Different family members are named according to the direction in which their reactions are typically catalyzed; thus they may be referred to as oxidoreductases, oxidases, reductases, or dehydrogenases. In addition, family members often have distinct cellular localizations, including the cytosol, the plasma membrane, mitochondrial inner or outer membrane, and peroxisomes. Short-chain alcohol dehydrogenases (SCADs) are a family of dehydrogenases that only share
15% to 30% sequence identity, with similarity predominantly in the coenzyme binding domain and the substrate binding domain. In addition to the well-known role in detoxification of ethanol, SCADs are also involved in synthesis and degradation of fatty acids, steroids, and some prostaglandins, and are therefore implicated in a variety of disorders such as lipid storage disease, myopathy, SCAD deficiency, and certain genetic disorders. For example, retinol dehydrogenase is a SCAD-family member (Simon, A. et al. (1995) J. Biol. Chem. 270:1107-1112) that converts retinol to retinal, the precursor of retinoic acid. Retinoic acid, a regulator of differentiation and apoptosis, has been shown to down-regulate genes involved in cell proliferation and inflammation (Chai, X. et al. (1995) J. Biol. Chem. 270:3900-3904). In addition, retinol dehydrogenase has been linked to hereditary eye diseases such as autosomal recessive childhood-onset severe retinal dystrophy (Simon, A. et al. (1996) Genomics 36:424-430).
Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B. (1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; Brown, J.K. and H. Imam (1991) J. Inherit. Metab. Dis. 14:436-458). Many pathways of neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a cofactor, such as NAD+/NADH (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K. pp. 779-793). Degradation of catecholamines (epinephrine or norepinephrine) requires alcohol dehydrogenase (in the brain) or aldehyde dehydrogenase (in peripheral tissue). NAD4" -dependent aldehyde dehydrogenase oxidizes 5-hydroxyindole-3-acetate (the product of 5-hydroxylxyptamine (serotonin) metabolism) in the brain, blood platelets, liver and pulmonary endothelium (Newsholme, supra, p. 786). Other neurotransmitter degradation pathways that utilize NAD+ NADH-dependent oxidoreductase activity include those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain stem, spinal cord and retina) (Newsholme, supra, pp. 790, 792). Epigenetic or genetic defects in neurotransmitter metabolic pathways can result in a spectrum of disease states in different tissues including Parkinson disease and inherited myoclonus (McCance, K.L. and S.E. Huether (1994) Pathophysiology. Mosby- Year Book, Inc., St. Louis MO, pp. 402-404; Gundlach, A.L. (1990) FASEB J. 4:2761-2766). Tetrahydrofolate is a derivatized glutamate molecule that acts as a carrier, providing activated one-carbon units to a wide variety of biosynthetic reactions, including synthesis of purines, pyrimidines, and the amino acid methionine. Tetrahydrofolate is generated by the activity of a holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase. Thus, tetrahydrofolate dehydrogenase plays an important role in generating building blocks for nucleic and amino acids, crucial to proliferating cells.
3-Hydroxyacyl-CoA dehydrogenase (3HACD) is involved in fatty acid metabolism. It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells. In peroxisomes, 3HACD and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are associated with peroxisomal bifunctional enzyme deficiency. This interruption in fatty acid metabolism produces accumulation of very-long chain fatty acids, disrupting development of the brain, bone, and adrenal glands. Infants born with this deficiency typically die within 6 months (Watkins, P. et al. (1989) J. Clin. Invest. 83:771-777; Online Mendelian Inheritance in Man (OMJVI), #261515). The neurodegeneration that is characteristic of Alzheimer's disease involves development of extracellular plaques in certain brain regions. A major protein component of these plaques is the peptide amyloid-β (Aβ), which is one of several cleavage products of amyloid precursor protein (APP). 3HACD has been shown to bind the Aβ peptide, and is overexpressed in neurons affected in Alzheimer' s disease. In addition, an antibody against 3HACD can block the toxic effects of A β in a cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMIM, #602057).
Steroids, such as estrogen, testosterone, corticosterone, and others, are generated from a common precursor, cholesterol, and are interconverted into one another. A wide variety of enzymes act upon cholesterol, including a number of dehydrogenases. Steroid dehydrogenases, such as the hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and D. Ghosh (1997) Steroids 62:95-100). One such dehydrogenase is 3-oxo-5-α-steroid dehydrogenase (OASD), a icrosomal membrane protein highly expressed in prostate and other androgen- responsive tissues. OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most potent androgen. Dihydrotestosterone is essential for the formation of the male phenotype during embryogenesis, as well as for proper androgen-mediated growth of tissues such as the prostate and male genitalia. A defect in OASD that prevents the conversion of testosterone into dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective formation of the external genitalia (Andersson, S. et al. (1991) Nature 354:159-161; Labrie, F. et al. (1992) Endocrinology 131:1571-1573; OMTJVI #264600). Thus, OASD plays a central role in sexual differentiation and androgen physiology. 0 17 β-hydroxy steroid dehydrogenase (17βHSD6) plays an important role in the regulation of the male reproductive hormone, dihydrotestosterone (DHTT). 17 βHSD6 acts to reduce levels of DHTT by oxidizing a precursor of DHTT, 3 -diol, to androsterone which is readily glucuronidated and removed from tissues. 17βHSD6 is active with both androgen and estrogen substrates when expressed in embryonic kidney 293 cells. At least five other isozymes of 17 βHSD have been 5 identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for different steroid substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol. Chem. 272:15959-15966). For example, 17βHSDl preferentially reduces estradiol and is abundant in the ovary and placenta. 17βHSD2 catalyzes oxidation of androgens and is present in the endometrium and placenta. 17βHSD3 is exclusively a reductive enzyme in the testis (Geissler, W.M. et al. (1994) Nat. Genet. o 7:34-39). An excess of androgens such as DHTT can contribute to certain disease states such as benign prostatic hyperplasia and prostate cancer.
Oxidoreductases are components of the fatty acid metabolism pathways in mitochondria and peroxisomes. The main beta-oxidation pathway degrades both saturated and unsaturated fatty acids, while the auxiliary pathway performs additional steps required for the degradation of unsaturated 5 fatty acids. The auxiliary beta-oxidation enzyme 2,4-dienoyl-CoA reductase catalyzes the removal of even-numbered double bonds from unsaturated fatty acids prior to their entry into the main beta- oxidation pathway. The enzyme may also remove odd-numbered double bonds from unsaturated fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al. (1992) Proc. Natl. Acad. Sci. USA 89:6673-6677). 2,4-dienoyl-CoA reductase is located in both mitochondria and o peroxisomes. Inherited deficiencies in mitochondrial and peroxisomal beta-oxidation enzymes are associated with severe diseases, some of which manifest themselves soon after birth and lead to death within a few years. Defects in beta-oxidation are associated with Reye's syndrome, Zellweger syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, and bifunctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoefler, 5 supra; Cotran, R.S. et al. (1994) Robbins Pathologic Basis of Disease, W.B. Saunders Co.,
Philadelphia PA, p.866). Peroxisomal beta-oxidation is impaired in cancerous tissue. Although neoplastic human breast epithelial cells have the same number of peroxisomes as do normal cells, fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have lower fatty-acyl-CoA oxidase and bifunctional enzyme (including enoyl-CoA hydratase) activities 5 than normal tissue (Cable, S. et al. (1992) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol. 62:221- 226). Another important oxidoreductase is isocitrate dehydrogenase, which catalyzes the conversion of isocitrate to a-ketoglutarate, a substrate of the citric acid cycle. Isocitrate dehydrogenase can be either NAD or NADP dependent, and is found in the cytosol, mitochondria, and peroxisomes. Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, 0 neurotransmitters, and growth factors.
Hydroxypyruvate reductase (HPR), a peroxisomal 2-hydroxyacid dehydrogenase in the glycolate pathway, catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of both NADH and NADPH. The reverse dehydrogenase reaction reduces NAD+ and NADP+. HPR recycles nucleotides and bases back into pathways leading to the synthesis of ATP and GTP. ATP 5 and GTP are used to produce DNA and RNA and to control various aspects of signal transduction and energy metabolism. Inhibitors of purine nucleotide biosynthesis have long been employed as antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis of serine and cellular serine levels available for protein synthesis.
The mitochondrial electron transport (or respiratory) chain is a series of oxidoreductase-type o enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons from NADH through a series of redox centers within these complexes to oxygen, and the coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the primary source of energy for driving a cell's many energy-requiring reactions. The key complexes in the respiratory chain are NADH:ubiquinone oxidoreductase (complex I), succinate:ubiquinone 5 oxidoreductase (complex II), cytochrome crb oxidoreductase (complex ffi), cytochrome c oxidase (complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on the inner matrix side of the mitochondrial membrane except complex U, which is on the cytosolic side. Complex II transports electrons generated in the citric acid cycle to the respiratory chain. The o electrons generated by oxidation of succinate to fumarate in the citric acid cycle are transferred through electron carriers in complex II to membrane bound ubiquinone (Q). Transcriptional regulation of these nuclear-encoded genes appears to be the predominant means for controlling the biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain are associated with a variety of disease conditions. 5 Other dehydrogenase activities using NAD as a cofactor are also important in mitochondrial function. 3-hydroxyisobutyrate dehydrogenase (3HBD), important in valine catabolism, catalyzes the NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within mitochondria. Elevated levels of 3-hydroxyisobutyrate have been reported in a number of disease states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraff, P.M. et al. (1989) J. Biol. 5 Chem. 264:5899-5903).
Another mitochondrial dehydrogenase important in amino acid metabolism is the enzyme isovaleryl-CoA-dehydrogenase (FVD). JND is involved in leucine metabolism and catalyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA. Human IVD is a tetrameric flavoprotein that is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial 0 import signal sequence. A genetic deficiency, caused by a mutation in the gene encoding FVD, results in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial import and processing of the JND precursor (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494-2501). Transferases 5 Transferases are enzymes that catalyze the transfer of molecular groups. The reaction may involve an oxidation, reduction, or cleavage of covalent bonds, and is often specific to a substrate or to particular sites on a type of substrate. Transferases participate in reactions essential to such functions as synthesis and degradation of cell components, regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Transferases are o involved in key steps in disease processes involving these functions. Transferases are frequently classified according to the type of group transferred. For example, methyl transferases transfer one- carbon methyl groups, amino transferases transfer nitrogenous amino groups, and similarly denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, phosphorous-containing, sulfur-containing, or selenium-containing groups, as well as small 5 enzymatic groups such as Coenzyme A.
Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in fatty acid metabolism and transport. Choline O-acetyl transferase catalyzes the biosynthesis of the neurotransmitter acetylcholine. o Amino transferases play key roles in protein synthesis and degradation, and they contribute to other processes as well. For example, the amino transferase 5-aminolevulinic acid synthase catalyzes the addition of succinyl-CoA to glycine, the first step in heme biosynthesis. Other amino transferases participate in pathways important for neurological function and metabolism. For example, glutamine-phenylpyruvate amino transferase, also known as glutamine transaminase K (GTK), 5 catalyzes several reactions with a pyridoxal phosphate cofactor. GTK catalyzes the reversible conversion of L-glutamine and phenylpyruvate to 2-oxoglutaramate and L-phenylalanine. Other amino acid substrates for GTK include L-methionine, L-histidine, and L-tyrosine. GTK also catalyzes the conversion of kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D-aspartate (NMD A) receptor in the brain and may exert a neuromodulatory function. Alteration of the kynurenine metabolic pathway may be associated with several neurological disorders. GTK also plays a role in the metabolism of halogenated xenobiotics conjugated to glutathione, leading to nephrotoxicity in rats and neurotoxicity in humans. GTK is expressed in kidney, liver, and brain. Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: EC 2.6.1.64; Perry, S.J. et al. (1993) Mol. Pharmacol. 43:660-665; Perry, S. et al. (1995) FEBS Lett. 360:277-280; and Alberati-Giani, D. et al. (1995) J. Neurochem. 64:1448-1455). A second amino transferase associated with this pathway is kynurenine/α-aminoadipate amino transferase (AadAT). AadAT catalyzes the reversible conversion of α-aminoadipate and α-ketoglutarate to α-ketoadipate and L-glutamate during lysine metabolism. AadAT also catalyzes the transamination of kynurenine to kynurenic acid. A cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. (1970) Biochim. Biophys. Acta 198:219- 228; Buchli, R. et al. (1995) J. Biol. Chem. 270:29330-29335).
Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, and other foreign substances. Another mammalian glycosyl transferase, mammalian UDP-galactose- ceramide galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of galactocerebrosides in myelin membranes of the nervous system. The UDP-glycosyl transferases share a conserved signature domain of about 50 amino acid residues (PROSITE: PDOC00359, http://expasy.hcuge.ch/sprot/prosite.html).
Methyl transferases are involved in a variety of pharmacologically important processes. Nicotinamide N-methyl transferase catalyzes the N-methylation of nicotinamides and other pyridines, an important step in the cellular handling of drugs and other foreign compounds. Phenylethanolamine N-methyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6- O-methylguanine-DNA methyl transferase reverses DNA methylation, an important step in carcinogenesis. Uroporphyrin-III C-methyl transferase, which catalyzes the transfer of two methyl groups from S-adenosyl-L-methionine to uroporphyrinogen III, is the first specific enzyme in the biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia. Protein- arginine methyl transferases catalyze the posttranslational methylation of arginine residues in proteins, resulting in the mono- and dimethylation of arginine on the guanidino group. Substrates include histones, myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in mRNA processing, splicing, and transport. Protein-arginine methyl transferase interacts with proteins upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with interferon, suggesting an important role for methylation in cytokine receptor signaling (Lin, W.-J. et al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and Scott, H.S. et al. (1998) Genomics 48:330-340).
Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important in energy-requiring and -releasing reactions. The metabolic enzyme creatine kinase catalyzes the reversible phosphate transfer between creatine/creatine phosphate and ATP/ADP. Glycocyamine kinase catalyzes phosphate transfer from ATP to guanidoacetate, and arginine kinase catalyzes phosphate transfer from ATP to arginine. A cysteine-containing active site is conserved in this family (PROSITE: PDOC00103). Prenyl transferases are heterodimers, consisting of an alpha and a beta subunit, that catalyze the transfer of an isoprenyl group. An example of a prenyl transferase is the mammalian protein farnesyl transferase. The alpha subunit of farnesyl transferase consists of 5 repeats of 34 amino acids each, with each repeat containing an invariant tryptophan (PROSITE: PDOC00703).
Saccharyl transferases are glycating enzymes involved in a variety of metabolic processes. Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts.
Accumulation of these endproducts is observed in vascular complications of diabetes, macrovascular disease, renal insufficiency, and Alzheimer's disease (Thornalley, P.J. (1998) Cell Mol. Biol. (Noisy- Le-Grand) 44:1013-1023).
Coenzyme A (CoA) transferase catalyzes the transfer of CoA between two carboxylic acids. Succinyl CoA:3-oxoacid CoA transferase, for example, transfers CoA from succinyl-CoA to a recipient such as acetoacetate. Acetoacetate is essential to the metabolism of ketone bodies, which accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). Hydrolases
Hydrolysis is the breaking of a covalent bond in a substrate by introduction of a molecule of water. The reaction involves a nucleophilic attack by the water molecule's oxygen atom on a target bond in the substrate. The water molecule is split across the target bond, breaking the bond and generating two product molecules. Hydrolases participate in reactions essential to such functions as synthesis and degradation of cell components, and for regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Hydrolases are involved in key steps in disease processes involving these functions. Hydrolytic enzymes, or hydrolases, may be grouped by substrate specificity into classes including phosphatases, peptidases, lysophospholipases, phosphodiesterases, glycosidases, and glyoxalases.
Phosphatases hydrolytically remove phosphate groups from proteins, an energy-providing step that regulates many cellular processes, including intracellular signaling pathways that in turn control cell growth and differentiation, cell-cell contact, the cell cycle, and oncogenesis.
Lysophospholipases (LPLs) regulate intracellular lipids by catalyzing the hydrolysis of ester bonds to remove an acyl group, a key step in lipid degradation. Small LPL isoforms, approximately 15-30 kD, function as hydrolases; larger isoforms function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes. LPL activity is regulated by signaling molecules important in numerous pathways, including the inflammatory 5 response.
Peptidases, also called proteases, cleave peptide bonds that form the backbone of peptide or protein chains. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Since typical protein half-lives range from hours to a few days, peptidases are continually cleaving precursor proteins to their active form, o removing signal sequences from targeted proteins, and degrading aged or defective proteins.
Peptidases function in bacterial, parasitic, and viral invasion and replication within a host. Examples of peptidases include trypsin and chymotrypsin (components of the complement cascade and the blood-clotting cascade) lysosomal cathepsins, calpains, pepsin, renin, and chymosin (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New s York NY, pp. 1-5).
The phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell growth and replication as well as protein synthesis. Another phosphodiesterase is acid o sphingomyelinase, which hydrolyzes the membrane phospholipid sphingomyelin to ceramide and phosphorylcholine. Phosphorylcholine is used in the synthesis of phosphatidylcholine, which is involved in numerous intracellular signaling pathways. Ceramide is an essential precursor for the generation of gangliosides, membrane lipids found in high concentration in neural tissue. Defective acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyelin molecules in 5 lysosomes, resulting in Niemann-Pick disease.
Glycosidases catalyze the cleavage of hemiacetyl bonds of glycosides, which are compounds that contain one or more sugar. Mammalian lactase-phlorizin hydrolase, for example, is an intestinal enzyme that splits lactose. Mammalian beta-galactosidase removes the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated o with a gangliosidosis known as Morquio disease type B. Vertebrate lysosomal alpha-glucosidase, which hydrolyzes glycogen, maltose, and isomaltose, and vertebrate intestinal sucrase-isomaltase, which hydrolyzes sucrose, maltose, and isomaltose, are widely distributed members of this family with highly conserved sequences at their active sites.
The glyoxylase system is involved in gluconeogenesis, the production of glucose from 5 storage compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and glyoxylase II, which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. Glyoxylases are involved in hyperglyce ia, non-insulin-dependent diabetes mellitus, the detoxification of bacterial toxins, and in the control of cell proliferation and microtubule assembly. Lyases Lyases are a class of enzymes that catalyze the cleavage of C-C, C-O, C-N, C-S, C-(halide),
P-O or other bonds without hydrolysis or oxidation to form two molecules, at least one of which contains a double bond (Stryer, L. (1995) Biochemistry W.H. Freeman and Co. New York, NY p.620). Lyases are critical components of cellular biochemistry with roles in metabolic energy production including fatty acid metabolism, as well as other diverse enzymatic processes. Further classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group. The group of C-C lyases include carboxyl-lyases (decarboxylases), aldehyde-lyases (aldolases), oxo-acid-lyases and others. The C-O lyase group includes hydro-lyases, lyases acting on polysaccharides and other lyases. The C-N lyase group includes ammonia-lyases, amidine-lyases, amine-lyases (deaminases) and other lyases. Proper regulation of lyases is critical to normal physiology. For example, mutation induced deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in the genetically-linked disorder familial porphyria cutanea tarda (Mendez, M. et al. (1998) Am. J. Genet. 63:1363-1375). It has also been shown that adenosine deaminase (ADA) deficiency stems from genetic mutations in the ADA gene, resulting in the disorder severe combined immunodeficiency disease (SCID) (Hershfield, M.S. (1998) Semin. Hematol. 35:291-298). Isomerases
Isomerases are a class of enzymes that catalyze geometric or structural changes within a molecule to form a single product. This class includes racemases and epimerases, cis-trans- isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) Biochemistry, W.H. Freeman and Co., New York NY, pp.483-507).
Racemases are a subset of isomerases that catalyze inversion of a molecules configuration around the asymmetric carbon atom in a substrate having a single center of asymmetry, thereby interconvertmg two racemers. Epimerases are another subset of isomerases that catalyze inversion of configuration around an asymmetric carbon atom in a substrate with more than one center of symmetry, thereby interconvertmg two epimers. Racemases and epimerases can act on amino acids and derivatives, hydroxy acids and derivatives, as well, as carbohydrates and derivatives. The interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4'-epimerase. Proper regulation and function of this epimerase is essential to the synthesis of glycoproteins and glycolipids. Elevated blood galactose levels have been correlated with UDP-galactose-4'-epimerase deficiency in screening programs of infants (Gitzelmann, R. (1972) Helv. Paediat. Acta 27:125-130). Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer of electrons from a substrate that becomes oxidized to a substrate that becomes reduced. This class of enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases. Proper maintenance of oxidoreductase levels is physiologically important. For example, genetically-linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. (1977) Pediat. Res. 11:1198-1202).
Another subgroup of isomerases are the transferases (or mutases). Transferases transfer a chemical group from one compound (the donor) to another compound (the acceptor). The types of groups transferred by these enzymes include acyl groups, amino groups, phosphate groups
(phosphotransferases or phosphomutases), and others. The transferase carnitine palmitoyltransferase is an important component of fatty acid metabolism. Genetically-linked deficiencies in this transferase can lead to myopathy (Scriver, C.R. et al. (1995) The Metabolic and Molecular Basis of Inherited Disease, McGraw-Hill, New York NY, pp.1501-1533). Yet another subgroup of isomerases are the topoisomersases. Topoisomerases are enzymes that affect the topological state of DNA. For example, defects in topoisomerases or their regulation can affect normal physiology. Reduced levels of topoisomerase II have been correlated with some of the DNA processing defects associated with the disorder ataxia-telangiectasia (Singh, S.P. et al. (1988) Nucleic Acids Res. 16:3919-3929). Ligases
Ligases catalyze the formation of a bond between two substrate molecules. The process involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are classified based on the nature of the type of bond they form, which can include carbon-oxygen, carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds. Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RNA (tRNA) synthetases which are important RNA-associated enzymes with roles in translation. Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, and each class is characterized by a distinctive topology of the catalytic domain. Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold. Class II enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel β-sheet motif, as well as N- and C- terminal regulatory domains. Class II enzymes are separated into two groups based on the heterodimeric or homodimeric structure of the enzyme; the latter group is further subdivided by the structure of the N- and C-terminal regulatory domains (Hartlein, M. and S. Cusack (1995) J. Mol. Evol. 40:519-530). Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (JLD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals.
Ligases forming carbon-sulfur bonds (Acid-thiol ligases) mediate a large number of cellular biosynthetic intermediary metabolism processes involve intermolecular transfer of carbon atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic acid cycle, synthesis of fatty acids and long-chain phospholipids, synthesis of alcohols and aldehydes, synthesis of intermediary metabolites, and reactions involved in the amino acid degradation pathways. Some of these reactions require input of energy, usually in the form of conversion of ATP to either ADP or AMP and pyrophosphate.
In many cases, a carbon substrate is derived from a small molecule containing at least two carbon atoms. The carbon substrate is often covalently bound to a larger molecule which acts as a carbon substrate carrier molecule within the cell. In the biosynthetic mechanisms described above, the carrier molecule is coenzyme A. Coenzyme A (CoA) is structurally related to derivatives of the nucleotide ADP and consists of 4'-phosphopantetheine linked via a phosphodiester bond to the alpha phosphate group of adenosine 3',5'-bisphosphate. The terminal thiol group of 4'-phosρhoρantetheine acts as the site for carbon substrate bond formation. The predominant carbon substrates which utilize CoA as a carrier molecule during biosynthesis and intermediary metabolism in the cell are acetyl, succinyl, and propionyl moieties, collectively referred to as acyl groups. Other carbon substrates include enoyl lipid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein. Acyl-CoA and acetyl-CoA are synthesized in the cell by acyl-CoA synthetase and acetyl-CoA synthetase, respectively.
Activation of fatty acids is mediated by at least three forms of acyl-CoA synthetase activity: i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight carboxylic acids and is found in muscle mitochondria and the cytosol of other tissues; ϋ) medium-chain acyl-CoA synthetase, which activates fatty acids containing between four and eleven carbon atoms (predominantly from dietary sources), and is present only in liver mitochondria; and iii) acyl CoA synthetase, which is specific for long chain fatty acids with between six and twenty carbon atoms, and is found in microsomes and the mitochondria. Proteins associated with acyl-CoA synthetase activity have been identified from many sources including bacteria, yeast, plants, mouse, and man. The activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by cAMP-dependent protein kinase.
Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase (glutamate-ammonia ligase) that catalyzes the animation of glutamic acid to glutamine by ammonia using the energy of ATP hydrolysis. Glutamine is the primary source for the amino group in various amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and pyrimidine ribonucleotide interconversions. Overexpression of glutamine synthetase has been observed in primary liver cancer (Christa, L. et al. (1994) Gastroent. 106:1312-1320).
Acid-amino-acid ligases (peptide synthases) are represented by the ubiquitin proteases which are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of 5 cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein. Ub is first activated by a ubiquitin-activating enzyme (El), and then transferred to one of several Ub- 0 conjugating enzymes (E2). E2 then links the Ub molecule through its C-terminal glycine to an internal lysine (acceptor lysine) of a target protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. The UCS is implicated in the degradation of mitotic cyclic krnases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors 5 associated with signal transduction, transcriptional regulators, and mutated or damaged proteins
(Ciechanover, A. (1994) Cell 79: 13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183). 0 Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis. Because these pathways are critical to the synthesis of nucleotides for replication of both RNA and DNA, many of these enzymes have been the targets of clinical agents for the treatment of cell proliferative disorders such as cancer and infectious diseases. 5 Purine biosynthesis occurs de novo from the amino acids glycine and glutamine, and other small molecules. Three of the key reactions in this process are catalyzed by a trifunctional enzyme composed of glycinamide-ribonucleotide synthetase (GARS), aminoimidazole ribonucleotide synthetase (AIRS), and glycinamide ribonucleotide transformylase (GART). Together these three enzymes combine ribosylamine phosphate with glycine to yield phosphoribosyl aminoimidazole, a o precursor to both adenylate and guanylate nucleotides. This trifunctional protein has been implicated in the pathology of Downs syndrome (Aimi, J. et al. (1990) Nucleic Acid Res. 18:6665-6672). Adenylosuccinate synthetase catalyzes a later step in purine biosynthesis that converts inosinic acid to adenylosuccinate, a key step on the path to ATP synthesis. This enzyme is also similar to another carbon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea 5 cycle (Powell, S.M. et al. (1992) FEBS Lett. 303:4-10).
Like the de novo biosynthesis of purines, de novo synthesis of the pyrimidine nucleotides uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide orotidylate derived from orotate and phosphoribosyl pyrophosphate (PPRP). Again a trifunctional enzyme comprising three carbon-nitrogen ligases plays a key role in the process. In this case the enzymes aspartate transcarbamylase (ATCase), carbamyl phosphate synthetase II, and dihydroorotase 5 (DHOase) are encoded by a single gene called CAD. Together these three enzymes combine the initial reactants in pyrimidine biosynthesis, glutamine, CO 2 and ATP to form dihydroorotate, the precursor to orotate and orotidylate (Iwahana, H. et al. (1996) Biochem. Biophys. Res. Commun. 219:249-255). Further steps then lead to the synthesis of uridine nucleotides from orotidylate. Cytidine nucleotides are derived from uridine-5'-triphosphate (UTP) by the amidation of UTP using 0 glutamine as the amino donor and the enzyme CTP synthetase. Regulatory mutations in the human CTP synthetase are believed to confer multi-drug resistance to agents widely used in cancer therapy (Yamauchi, M. et al. (1990) EMBO J. 9:2095-2099).
Ligases forming carbon-carbon bonds include the carboxylases acetyl-CoA carboxylase and pyruvate carboxylase. Acetyl-CoA carboxylase catalyzes the carboxylation of acetyl-CoA from CO2 5 and H2O using the energy of ATP hydrolysis. Acetyl-CoA carboxylase is the rate-limiting step in the biogenesis of long-chain fatty acids. Two isoforms of acetyl-CoA carboxylase, types I and types II, are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297- 306). Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle. o Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a DNA chain using the energy from ATP hydrolysis to first activate the free 5 '-phosphate of one nucleotide and then react it with the 3'-OH group of the adjacent nucleotide. This resealing reaction is used in both DNA replication to join small DNA fragments called Okazaki fragments that are 5 transiently formed in the process of replicating new DNA, and in DNA repair. DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the DNA can occur. Bloom's syndrome is an inherited human disease in which individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, B. et al. o (1994) The Molecular Biology of the Cell. Garland Publishing Inc., New York NY, p. 247).
Molecules Associated with Growth and Development
Human growth and development requires the spatial and temporal regulation of cell differentiation, cell proliferation, and apoptosis. These processes coordinately control reproduction, 5 aging, embryogenesis, morphogenesis, organogenesis, and tissue repair and maintenance. At the cellular level, growth and development is governed by the cell's decision to enter into or exit from the cell division cycle and by the cell's commitment to a terminally differentiated state. These decisions are made by the cell in response to extracellular signals and other environmental cues it receives. The following discussion focuses on the molecular mechanisms of cell division, reproduction, cell differentiation and proliferation, apoptosis, and aging. 5 Cell Division
Cell division is the fundamental process by which all living things grow and reproduce. In unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, while in multicellular species many rounds of cell division are required to replace cells lost by wear or by programmed cell death, and for cell differentiation to produce a new tissue or organ. Details of 0 the cell division cycle may vary, but the basic process consists of three principle events. The first event, interphase, involves preparations for cell division^ replication of the DNA, and production of essential proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of the cell. The final event, cytokinesis, is division and fission of the cell cytoplasm. The sequence and timing of cell cycle transitions is under the control of the cell cycle regulation 5 system which controls the process by positive or negative regulatory circuits at various check points. Regulated progression of the cell cycle depends on the integration of growth control pathways with the basic cell cycle machinery. Cell cycle regulators have been identified by selecting for human and yeast cDNAs that block or activate cell cycle arrest signals in the yeast mating pheromone pathway when they are overexpressed. Known regulators include human CPR (cell cycle 0 progression restoration) genes, such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including CDC91, that block the arrest signals. The CPR genes express a variety of proteins including cyclins, tumor suppressor binding proteins, chaperones, transcription factors, translation factors, and RNA-binding proteins (Edwards, M.C. et al.(1997) Genetics 147:1063-1076). Several cell cycle transitions, including the entry and exit of a cell from mitosis, are 5 dependent upon the activation and inhibition of cyclin-dependent kinases (Cdks). The Cdks are composed of a kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levels of regulation. There appears to be a single Cdk in Saccharomyces cerevisiae and Saccharomyces pombe whereas mammals have a variety of specialized Cdks. Cyclins act by binding to and activating cyclin-dependent protein kinases which then phosphorylate and activate selected o proteins involved in the mitotic process. The Cdk-cyclin complex is both positively and negatively regulated by phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53. In addition, Cdks are further regulated by binding to inhibitors and other proteins such as Sucl that modify their specificity or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 10:1503-1515; and Mathias, N. et al. (1996) Mol. Cell Biol. 16:6634-6643). 5 Reproduction
The male and female reproductive systems are complex and involve many aspects of growth and development. The anatomy and physiology of the male and female reproductive systems are reviewed in (Guyton, A.C. (1991) Textbook of Medical Physiology, W.B. Saunders Co., Philadelphia PA, pp. 899-928).
The male reproductive system includes the process of spermatogenesis, in which the sperm are formed, and male reproductive functions are regulated by various hormones and their effects on accessory sexual organs, cellular metabolism, growth, and other bodily functions.
Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones released from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell divisions before undergoing meiosis and full maturation. The testes secrete several male sex hormones, the most abundant being testosterone, that is essential for growth and division of the immature sperm, and for the masculine characteristics of the male body. Three other male sex hormones, gonadotropin-releasing hormone (GnRH), luteinizing hormone (LH), and follicle- stimulating hormone (FSH) control sexual function.
The uterus, ovaries, fallopian tubes, vagina, and breasts comprise the female reproductive system. The ovaries and uterus are the source of ova and the location of fetal development, respectively. The fallopian tubes and vagina are accessory organs attached to the top and bottom of the uterus, respectively. Both the uterus and ovaries have additional roles in the development and loss of reproductive capability during a female's lifetime. The primary role of the breasts is lactation. Multiple endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other tissues coordinate reproduction and lactation. These signals vary during the monthly menstruation cycle and during the female's lifetime. Similarly, the sensitivity of reproductive organs to these endocrine signals varies during the female's lifetime.
A combination of positive and negative feedback to the ovaries, pituitary and hypothalamus glands controls physiologic changes during the monthly ovulation and endometrial cycles. The anterior pituitary secretes two major gonadotropin hormones, follicle-stimulating hormone (FSH) and luteinizing hormone (LH), regulated by negative feedback of steroids, most notably by ovarian estradiol. If fertilization does not occur, estrogen and progesterone levels decrease. This sudden reduction of the ovarian hormones leads to menstruation, the desquamation of the endometrium.
Hormones further govern all the steps of pregnancy, parturition, lactation, and menopause. During pregnancy large quantities of human chorionic gonadotropin (hCG), estrogens, progesterone, and human chorionic somatomammotropin (hCS) are formed by the placenta. hCG, a glycoprotein similar to luteinizing hormone, stimulates the corpus luteum to continue producing more progesterone and estrogens, rather than to involute as occurs if the ovum is not fertilized. hCS is similar to growth hormone and is crucial for fetal nutrition. The female breast also matures during pregnancy. Large amounts of estrogen secreted by the placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the secretion of prolactin by the pituitary gland.
Parturition involves several hormonal changes that increase uterine contractility toward the end of pregnancy, as follows. The levels of estrogens increase more than those of progesterone. Oxytocin is secreted by the neurohypophysis. Concomitantly, uterine sensitivity to oxytocin 5 increases. The fetus itself secretes oxytocin, cortisol (from adrenal glands), and prostaglandins. Menopause occurs when most of the ovarian follicles have degenerated. The ovary then produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands. Mean levels of circulating FSH and LH increase, even as ovulatory cycles continue. Therefore, the ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual 0 cycles. Consequently, menstrual bleeding ceases and reproductive capability ends. Cell Differentiation and Proliferation
Tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of 5 proteins which control cell cycle progression in response to extracellular signals, such as growth factors and other mitogens, and intracellular cues, such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. 0 Growth factors were originally described as serum factors required to promote cell proliferation. Most growth factors are large, secreted polypeptides that act on cells in their local environment. Growth factors bind to and activate specific cell surface receptors and initiate intracellular signal transduction cascades. Many growth factor receptors are classified as receptor tyrosine kinases which undergo autophosphorylation upon ligand binding. Autophosphorylation 5 enables the receptor to interact with signal transduction proteins characterized by the presence of SH2 or SH3 domains (Src homology regions 2 or 3). These proteins then modulate the activity state of small G-proteins, such as Ras, Rab, and Rho, along with GTPase activating proteins (GAPs), guanine nucleotide releasing proteins (GNRPs), and other guanine nucleotide exchange factors. Small G proteins act as molecular switches that activate other downstream events, such as mitogen-activated o protein kinase (MAP kinase) cascades. MAP kinases ultimately activate transcription of mitosis- promoting genes.
In addition to growth factors, small signaling peptides and hormones also influence cell proliferation. These molecules bind primarily to another class of receptor, the trimeric G-protein coupled receptor (GPCR), found predominantly on the surface of immune, neuronal and 5 neuroendocrine cells. Upon ligand binding, the GPCR activates a trimeric G protein which in turn triggers increased levels of intracellular second messengers such as phospholipase C, Ca2+, and cyclic AMP. Most GPCR-mediated signaling pathways indirectly promote cell proliferation by causing the secretion or breakdown of other signaling molecules that have direct mitogenic effects. These signaling cascades often involve activation of kinases and phosphatases. Some growth factors, such as some members of the transforming growth factor beta (TGF-β) family, act on some cells to stimulate cell proliferation and on other cells to inhibit it. Growth factors may also stimulate a cell at one concentration and inhibit the same cell at another concentration. Most growth factors also have a multitude of other actions besides the regulation of cell growth and division: they can control the proliferation, survival, differentiation, migration, or function of cells depending on the circumstance. For example, the tumor necrosis factor/nerve growth factor (TNF/NGF) family can activate or inhibit cell death, as well as regulate proliferation and differentiation. The cell response depends on the type of cell, its stage of differentiation and transformation status, which surface receptors are stimulated, and the types of stimuli acting on the cell (Smith, A. et al. (1994) Cell 76:959-962; and Nocenti i, G. et al. (1997) Proc. Natl. Acad. Sci. USA 94:6216-6221).
Neighboring cells in a tissue compete for growth factors, and when provided with "unlimited" quantities in a perfused system will grow to even higher cell densities before reaching density- dependent inhibition of cell division. Cells often demonstrate an anchorage dependence of cell division as well. This anchorage dependence may be associated with the formation of focal contacts linking the cytoskeleton with the extracellular matrix (ECM). The expression of ECM components can be stimulated by growth factors. For example, TGF-β stimulates fibroblasts to produce a variety of ECM proteins, including fibronectin, collagen, and tenascin (Pearson, CA. et al. (1988) EMBO J. 7:2677-2981). In fact, for some cell types specific ECM molecules, such as laminin or fibronectin, may act as growth factors. Tenascin-C and -R, expressed in developing and lesioned neural tissue, provide stimulatory/anti-adhesive or inhibitory properties, respectively, for axonal growth (Faissner, A. (1997) Cell Tissue Res. 290:331-341). Cancers are associated with the activation of oncogenes which are derived from normal cellular genes. These oncogenes encode oncoproteins which convert normal cells into malignant cells. Some oncoproteins are mutant isoforms of the normal protein, and other oncoproteins are abnormally expressed with respect to location or amount of expression. The latter category of oncoprotein causes cancer by altering transcriptional control of cell proliferation. Five classes of oncoproteins are known to affect cell cycle controls. These classes include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. Viral oncogenes are integrated into the human genome after infection of human cells by certain viruses. Examples of viral oncogenes include v-src, v-abl, and v-fps.
Many oncogenes have been identified and characterized. These include sis, erbA, erbB, her- 2, mutated Gs, src, abl, ras, crk, jun, fos, myc, and mutated tumor-suppressor genes such as RB, p53, mdm2, Cipl, pl6, and cyclin D. Transformation of normal genes to oncogenes may also occur by chromosomal translocation. The Philadelphia chromosome, characteristic of chronic myeloid leukemia and a subset of acute lymphoblastic leukemias, results from a reciprocal translocation between chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint cluster region (bcr) on chromosome 22. 5 Tumor-suppressor genes are involved in regulating cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in uncontrolled cell proliferation. For example, the retinoblastoma gene product (RB), in a non-phosphorylated state, binds several early- response genes and suppresses their transcription, thus blocking cell division. Phosphorylation of RB causes it to dissociate from the genes, releasing the suppression, and allowing cell division to 0 proceed. Apoptosis
Apoptosis is the genetically controlled process by which unneeded or defective cells undergo programmed cell death. Selective elimination of cells- is as important for morphogenesis and tissue remodeling as is cell proliferation and differentiation. Lack of apoptosis may result in hyperplasia 5 and other disorders associated with increased cell proliferation. Apoptosis is also a critical component of the immune response. Immune cells such as cytotoxic T-cells and natural killer cells prevent the spread of disease by inducing apoptosis in tumor cells and virus-infected cells. In addition, immune cells that fail to distinguish self molecules from foreign molecules must be eliminated by apoptosis to avoid an autoimmune response. 0 Apoptotic cells undergo distinct morphological changes. Hallmarks of apoptosis include cell shrinkage, nuclear and cytoplasmic condensation, and alterations in plasma membrane topology. Biochemically, apoptotic cells are characterized by increased intracellular calcium concentration, fragmentation of chromosomal DNA, and expression of novel cell surface components.
The molecular mechanisms of apoptosis are highly conserved, and many of the key protein 5 regulators and effectors of apoptosis have been identified. Apoptosis generally proceeds in response to a signal which is transduced intracellularly and results in altered patterns of gene expression and protein activity. Signaling molecules such as hormones and cytokines are known both to stimulate and to inhibit apoptosis through interactions with cell surface receptors. Transcription factors also play an important role in the onset of apoptosis. A number of downstream effector molecules, o particularly proteases such as the cysteine proteases called caspases, have been implicated in the degradation of cellular components and the proteolytic activation of other apoptotic effectors. Aging and Senescence
Studies of the aging process or senescence have shown a number of characteristic cellular and molecular changes (Fauci et al. (1998) Harrison's Principles of Internal Medicine, McGraw-Hill, New 5 York NY, p.37). These characteristics include increases in chromosome structural abnormalities, DNA cross-linking, incidence of single-stranded breaks in DNA, losses in DNA methylation, and degradation of telomere regions. In addition to these DNA changes, post-translational alterations of proteins increase including, deamidation, oxidation, cross-linking, and nonenzymatic glycation. Still further molecular changes occur in the mitochondria of aging cells through deterioration of structure. These changes eventually contribute to decreased function in every organ of the body.
Biochemical Pathway Molecules
Biochemical pathways are responsible for regulating metabolism, growth and development, protein secretion and trafficking, environmental responses, and ecological interactions including immune response and response to parasites. DNA replication '
Deoxyribonucleic acid (DNA), the genetic material, is found in both the nucleus and mitochondria of human cells. The bulk of human DNA is nuclear, in the form of linear chromosomes, while mitochondrial DNA is circular. DNA replication begins at specific sites called origins of replication. Bidirectional synthesis occurs from the origin via two growing forks that move in opposite directions. Replication is semi-conservative, with each daughter duplex containing one old strand and its newly synthesized complementary partner. Proteins involved in DNA replication include DNA polymerases, DNA primase, telomerase, DNA helicase, topoisomerases, DNA ligases, replication factors, and DNA-binding proteins. DNA Recombination and Repair Cells are constantly faced with replication errors and environmental assault (such as ultraviolet irradiation) that can produce DNA damage. Damage to DNA consists of any change that modifies the structure of the molecule. Changes to DNA can be divided into two general classes, single base changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may produce a disorder, such as cancer. Changes in DNA are recognized by repair systems within the cell. These repair systems act to correct the damage and thus prevent any deleterious affects of a mutational event. Repair systems can be divided into three general types, direct repair, excision repair, and retrieval systems. Proteins involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link repair proteins, recombination and repair proteins, RAD51 proteins, and BLN and WRN proteins that are homologs of RecQ helicase. When the repair systems are eliminated, cells become exceedingly sensitive to environmental mutagens, such as ultraviolet irradiation. Patients with disorders associated with a loss in DNA repair systems often exhibit a high sensitivity to environmental mutagens. Examples of such disorders include xeroderma pigmentosum (XP), Bloom's syndrome (BS), and Werner's syndrome (WS) (Yamagata, K. et al. (1998) Proc. Natl. Acad. Sci. USA 95:8733- 8738), ataxia telangiectasia, Cockayne's syndrome, and Fanconi's anemia.
Recombination is the process whereby new DNA sequences are generated by the movements of large pieces of DNA. In homologous recombination, which occurs during meiosis and DNA repair, parent DNA duplexes align at regions of sequence similarity, and new DNA molecules form by the breakage and joining of homologous segments. Proteins involved include RAD51 recombinase. In site-specific recombination, two specific but not necessarily homologous DNA 5 sequences are exchanged. In the immune system this process generates a diverse collection of antibody and T cell receptor genes. Proteins involved in site-specific recombination in the immune system include recombination activating genes 1 and 2 (RAG1 and RAG2). A defect in immune system site-specific recombination causes severe combined immunodeficiency disease in mice. RNA Metabolism 0 Ribonucleic acid (RNA) is a linear single-stranded polymer of four nucleotides, ATP, CTP,
UTP, and GTP. In most organisms, RNA is transcribed as a copy of DNA, the genetic material of the organism, h retroviruses RNA rather than DNA serves as the genetic material. RNA copies of the genetic material encode proteins or serve various structural, catalytic, or regulatory roles in organisms. RNA is classified according to its cellular localization and function. Messenger RNAs 5 (mRNAs) encode polypeptides. Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate mRNA into polypeptides. Transfer RNAs (tRNAs) are cytosolic adaptor molecules that function in mRNA translation by recognizing both an mRNA codon and the amino acid that matches that codon. Heterogeneous nuclear RNAs (hnRNAs) include mRNA precursors and other nuclear RNAs of various sizes. Small o nuclear RNAs (snRNAs) are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) and rejoins exons in pre-mRNAs. RNA Transcription
The transcription process synthesizes an RNA copy of DNA. Proteins involved include multi-subunit RNA polymerases, transcription factors HA, KB, HD, HE, HF, JJH, and JJJ. Many 5 transcription factors incorporate DNA-binding structural motifs which comprise either α-helices or β- sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-' turn-helix, zinc finger, leucine zipper, and helix-loop-helix. RNA Processing
Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- o mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns. The spliceosomal complex is comprised of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6. Each snRNP contains a single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base-pair with intron consensus sequences. The protein components mediate spliceosome 5 assembly and the splicing reaction. Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, p. 863).
Heterogeneous nuclear ribonucleoproteins (hnRNPs) have been identified that have roles in splicing, exporting of the mature RNAs to the cytoplasm, and mRNA translation (Biamonti, G. et al. (1998) Clin. Exp. Rheumatol. 16:317-326). Some examples of hnRNPs include the yeast proteins 5 Hr lp, involved in cleavage and polyadenylation at the 3' end of the RNA; Cbp80p, involved in capping the 5' end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, involved in export of mRNA from the nucleus (Shen, E.C. et al. (1998) Genes Dev. 12:679-691). HnRNPs have been shown to be important targets of the autoimmune response in rheumatic diseases (Biamonti, supra). Many snRNP proteins, hnRNP proteins, and alternative splicing factors are characterized by 0 an RNA recognition motif (RRM). (Reviewed in Birney, E. et al. (1993) Nucleic Acids Res.
21:5803-5816.) The RRM is about 80 amino acids in length and forms four β-strands and two α- helices arranged in an α/β sandwich. The RRM contains a core RNP-1 octapeptide motif along with surrounding conserved sequences. RNA Stability and Degradation 5 RNA helicases alter and regulate RNA conformation and secondary structure by using energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most well-characterized and ubiquitous family of RNA helicases is the DEAD-box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family. Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. o DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box. helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 337:121-122.)
Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of 5 neuroblastoma (Nb) and retinoblastoma (Rb) tumors. Other DEAD-box helicases have been implicated either directly or indirectly in ultraviolet light-induced tumors, B cell lymphoma, and myeloid malignancies. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168.)
Ribonucleases (RNases) catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus cleaving the RNA. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of o pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle. RNase H domains are often found as a domain associated with reverse transcriptases. RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase 5 activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Protein Translation
The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more than fifty proteins. The ribosomal proteins have a prefix which denotes 5 the subunit to which they belong, either L (large) or S (small). Three important sites are identified on the ribosome. The aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (Translation is reviewed in Stryer, L. 0 (1995) Biochemistry, W.H. Freeman and Company, New York NY, pp. 875-908; and Lodish, H. et al. (1995) Molecular Cell Biology. Scientific American Books, New York NY, pp. 119-138.) tRNA Charging
Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of 5 an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, Class I and Class U. Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (DUD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals. 0 Translation Initiation
Initiation of translation can be divided into three stages. The first stage brings an initiator transfer RNA (Met-fRNAf) together with the 40S ribosomal subunit to form the 43S preinitiation complex. The second stage binds the 43S preinitiation complex to the mRNA, followed by migration of the complex to the correct AUG initiation codon. The third stage brings the 60S ribosomal subunit 5 to the 40S subunit to generate an 80S ribosome at the initiation codon. Regulation of translation primarily involves the first and second stage in the initiation process (Pain, V.M. (1996) Eur. J. Biochem. 236:747-771).
Several initiation factors, many of which contain multiple subunits, are involved in bringing an initiator tRNA and 40S ribosomal subunit together. eIF2, a guanine nucleotide binding protein, o recruits the initiator tRNA to the 40S ribosomal subunit. Only when eJJF2 is bound to GTP does it associate with the initiator tRNA. eIF2B, a guanine nucleotide exchange protein, is responsible for converting eIF2 from the GDP-bound inactive form to the GTP-bound active form. Two other factors, elFIA and eJF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal structural proteins. eIF3 is also involved in association of the 40S ribosomal 5 subunit with mRNA. The Met-tRNAt, elFl A, eJF3, and 40S ribosomal subunit together make up the 43S preinitiation complex (Pain, supra). Additional factors are required for binding of the 43S preinitiation complex to an mRNA molecule, and the process is regulated at several levels. eJJF4F is a complex consisting of three proteins: eIF4E, eIF4A, and eIF4G. eJJF4E recognizes and binds to the mRNA 5 -terminal m7GTP cap, eJP4A is a bidirectional RNA-dependent helicase, and eIF4G is a scaffolding polypeptide. eIF4G has three binding domains. The .N-terminal third of eJJF4G interacts with eJF4E, the central third interacts with eIF4A, and the C-terminal third interacts with eIF3 bound to the 43S preinitiation complex. Thus, eJF4G acts as a bridge between the 40S ribosomal subunit and the mRNA (Hentze, M.W. (1997) Science 275:500-501).
The ability of eJF4F to initiate binding of the 43S preinitiation complex is regulated by structural features of the mRNA. The mRNA molecule has an untranslated region (UTR) between the 5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that impede binding of the 43S preinitiation complex. The helicase activity of eIF4A is thought to function in removing this secondary structure to facilitate binding of the 43S preinitiation complex (Pain, supra). Translation Elongation
Elongation is the process whereby additional amino acids are joined to the initiator methionine to form the complete polypeptide chain. The elongation factors EFlα, EFlβ γ, and EF2 are involved in elongating the polypeptide chain following initiation. EFlα is a GTP-binding protein. In EFlα's GTP-bound form, it brings an aminoacyl-tRNA to the ribosome' s A site. The amino acid attached to the newly arrived aminoacyl-tRNA forms a peptide bond with the initiator methionine. The GTP on EFlα is hydrolyzed to GDP, and EFlα-GDP dissociates from the ribosome. EFlβ γ binds EFlα -GDP and induces the dissociation of GDP from EFlα, allowing EFlα to bind GTP and a new cycle to begin.
As subsequent aminoacyl-tRNAs are brought to the ribosome, EF-G, another GTP-binding protein, catalyzes the translocation of tRNAs from the A site to the P site and finally to the E site of the ribosome. This allows the processivity of translation. Translation Termination
The release factor eRF carries out termination of translation. eRF recognizes stop codons in the mRNA, leading to the release of the polypeptide chain from the ribosome. Post-Translational Pathways
Proteins may be modified after translation by the addition of phosphate, sugar, prenyl, fatty acid, and other chemical groups. These modifications are often required for proper protein activity. Enzymes involved in post-translational modification include kinases, phosphatases, glycosyltransferases, and prenyltransferases. The conformation of proteins may also be modified after translation by the introduction and rearrangement of disulfide bonds (rearrangement catalyzed by protein disulfide isomerase), the isomerization of proline sidechains by prolyl isomerase, and by interactions with molecular chaperone proteins.
Proteins may also be cleaved by proteases. Such cleavage may result in activation, inactivation, or complete degradation of the protein. Proteases include serine proteases, cysteine proteases, aspartic proteases, and metalloproteases. Signal peptidase in the endoplasmic reticulum (ER) lumen cleaves the signal peptide from membrane or secretory proteins that are imported into the ER. Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein. Proteins involved in the UCS include ubiquitin-activating enzyme, ubiquitin-conjugating enzymes, ubiquitin-ligases, and ubiquitin C-terminal hydrolases. The ubiquitinated protein is then recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. Lipid Metabolism
Lipids are water-insoluble, oily or greasy substances that are soluble in nonpolar solvents such as chloroform or ether. Neutral fats (triacylglycerols) serve as major fuels and energy stores. Polar lipids, such as phospholipids, sphingolipids, glycolipids, and cholesterol, are key structural components of cell membranes. Lipid metabolism is involved in human diseases and disorders. In the arterial disease atherosclerosis, fatty lesions form on the inside of the arterial wall. These lesions promote the loss of arterial flexibility and the formation of blood clots (Guyton, A.C. Textbook of Medical Physiology (1991) W.B. Saunders Company, Philadelphia PA, pp.760-763). In Tay-Sachs disease, the GM2 ganglioside (a sphingolipid) accumulates in lysosomes of the central nervous system due to a lack of the enzyme N-acetylhexosaminidase. Patients suffer nervous system degeneration leading to early death (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY, p. 2171). The Niemann-Pick diseases are caused by defects in lipid metabolism. Niemann-Pick diseases types A and B are caused by accumulation of sphingomyelin (a sphingolipid) and other lipids in the central nervous system due to a defect in the enzyme sphingomyelinase, leading to neurodegeneration and lung disease. Niemann-Pick disease type C results from a defect in cholesterol transport, leading to the accumulation of sphingomyelin and cholesterol in lysosomes and a secondary reduction in sphingomyelinase activity. Neurological symptoms such as grand mal seizures, ataxia, and loss of previously learned speech, manifest 1-2 years after birth. A mutation in the NPC protein, which contains a putative cholesterol-sensing domain, was found in a mouse model of Niemann-Pick disease type C (Fauci, supra, p. 2175; Loftus, S.K. et al. (1997) Science 277:232- 235). (Lipid metabolism is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY; Lehninger, A. (1982) Principles of Biochemistry Worth Publishers, Inc., New York NY; and ExPASy "Biochemical Pathways" index of Boehringer Mannheim World Wide Web site.) Fatty Acid Synthesis
Fatty acids are long-chain organic acids with a single carboxyl group and a long non-polar hydrocarbon tail. Long-chain fatty acids are essential components of glycolipids, phospholipids, and cholesterol, which are building blocks for biological membranes, and of triglycerides, which are biological fuel molecules. Long-chain fatty acids are also substrates for eicosanoid production, and are important in the functional modification of certain complex carbohydrates and proteins. 16- carbon and 18-carbon fatty acids are the most common. Fatty acid synthesis occurs in the cytoplasm. In the first step, acetyl-Coenzyme A (CoA) carboxylase (ACC) synthesizes malonyl-CoA from acetyl-CoA and bicarbonate. The enzymes which catalyze the remaining reactions are covalently linked into a single polypeptide chain, referred to as the multifunctional enzyme fatty acid synthase (FAS). FAS catalyzes the synthesis of palmitate from acetyl-CoA and malonyl-CoA. FAS contains acetyl transferase, malonyl transferase, β-ketoacetyl synthase, acyl carrier protein, β-ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase activities. The final product of the FAS reaction is the 16-carbon fatty acid palmitate. Further elongation, as well as unsaturation, of palmitate by accessory enzymes of the ER produces the variety of long chain fatty acids required by the individual cell. These enzymes include a NADH-cytochrome b5 reductase, cytochrome b5, and a desaturase. Phospholipid and Triacylglycerol Synthesis
Triacylglycerols, also known as triglycerides and neutral fats, are major energy stores in animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3-phosphate is produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from glycerol by glycerol kinase. Fatty acid-CoA's are produced from fatty acids by fatty acyl-CoA synthetases. Glyercol-3-phosphate is acylated with two fatty acyl-CoA's by the enzyme glycerol phosphate acyltransferase to give phosphatidate. Phosphatidate phosphatase converts phosphatidate to diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride acyltransferase. Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol synthetase complex bound to the ER membrane. A major class of phospholipids are the phosphoglycerides, which are composed of a glycerol backbone, two fatty acid chains, and a phosphorylated alcohol. Phosphoglycerides are components of cell membranes. Principal phosphoglycerides are phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol. Many enzymes involved in phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology and Biotechnology, VCH Publishers Inc., New York NY, pp. 494-501). Phosphatidate is converted to CDP-diacylglycerol by the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 2.7.7.41). Transfer of the diacylglycerol group from CDP-diacylglycerol to serine to yield phosphatidyl serine, or to inositol to yield phosphatidyl inositol, is catalyzed by the enzymes CDP- diacylglycerol-serine O-phosphatidyltransferase and CDP-diacylglycerol-inositol 3- phosphatidyltransferase, respectively (ExPASy ENZYME EC 2.7.8.8; ExPASy ENZYME EC 2.7.8.11). The enzyme phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl ethanolamine, using a pyruvate cofactor (Voelker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244). Phosphatidyl choline is formed using diet-derived choline by the reaction of CDP-choline with 1,2-diacylglycerol, catalyzed by diacylglycerol cholinephosphotransferase (ExPASy ENZYME 2.7.8.2). Sterol, Steroid, and Isoprenoid Metabolism
Cholesterol, composed of four fused hydrocarbon rings with an alcohol at one end, moderates the fluidity of membranes in which it is incorporated. In addition, cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, estrogen, and testosterone. Bile salts derived from cholesterol facilitate the digestion of lipids. Cholesterol in the skin forms a barrier that prevents excess water evaporation from the body. Farnesyl and geranylgeranyl groups, which are derived from cholesterol biosynthesis intermediates, are post-translationally added to signal transduction proteins such as ras and protein-targeting proteins such as rab. These modifications are important for the activities of these proteins (Guyton, supra; Stryer, supra, pp. 279-280, 691-702, 934).
Mammals obtain cholesterol derived from both de novo biosynthesis and the diet. The liver is the major site of cholesterol biosynthesis in mammals. Two acetyl-CoA molecules initially condense to form acetoacetyl-CoA, catalyzed by a tbiolase. Acetoacetyl-CoA condenses with a third acetyl-CoA to form hydroxymethylglutaryl-CoA (HMG-CoA), catalyzed by HMG-CoA synthase. Conversion of HMG-CoA to cholesterol is accomplished via a series of enzymatic steps known as the mevalonate pathway. The rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG- CoA reductase. The drug lovastatin, a potent inhibitor of HMG-CoA reductase, is given to patients to reduce their serum cholesterol levels. Other mevalonate pathway enzymes include mevalonate kinase, phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase, dimethylallyl transferase, geranyl transferase, farnesyl-diphosphate farnesyltransferase, squalene monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholesterol reductase.
Cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, aldosterone, estrogen, and testosterone. First, cholesterol is converted to pregnenolone by cholesterol monooxygenases. The other steroid hormones are synthesized from pregnenolone by a series of enzyme-catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and demethylations. Examples of these enzymes include steroid Δ-isomerase, 3β-hydroxy-Δ5-steroid dehydrogenase, steroid 21 -monooxygenase, steroid 19-hydroxylase, and 3β-hydroxysteroid dehydrogenase. Cholesterol is also the precursor to vitamin D.
Numerous compounds contain 5-carbon isoprene units derived from the mevalonate pathway intermediate isopentenyl pyrophosphate. Isoprenoid groups are found in vitamin K, ubiquinone, retinal, dolichol phosphate (a carrier of oligosaccharides needed for N-linked glycosylation), and farnesyl and geranylgeranyl groups that modify proteins. Enzymes involved include farnesyl transferase, polyprenyl transferases, dolichyl phosphatase, and dolichyl kinase. Sphingolipid Metabolism
Sphingolipids are an important class of membrane lipids that contain sphingosine, a long chain amino alcohol. They are composed of one long-chain fatty acid, one polar head alcohol, and sphingosine or sphingosine derivative. The three classes of sphingolipids are sphingomyelins, cerebrosides, and gangliosides. Sphingomyelins, which contain phosphocholine or phosphoethanolamine as their head group, are abundant in the myelin sheath surrounding nerve cells. Galactocerebrosides, which contain a glucose or galactose head group, are characteristic of the brain. Other cerebrosides are found in nonneural tissues. Gangliosides, whose head groups contain multiple sugar units, are abundant in the brain, but are also found in nonneural tissues.
Sphingolipids are built on a sphingosine backbone. Sphingosine is acylated to ceramide by the enzyme sphingosine acetyltransferase. Ceramide and phosphatidyl choline are converted to sphingomyelin by the enzyme ceramide choline phosphotransferase. Cerebrosides are synthesized by the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues to ceramide by transferase enzymes yields gangliosides. Eicosanoid Metabolism
Eicosanoids, including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated from phospholipids by phospholipase A2 and from diacylglycerols by diacylglycerol lipase.
Leukotrienes are produced from arachidonate by the action of lipoxygenases. Prostaglandin synthase, reductases, and isomerases are responsible for the synthesis of the prostaglandins. Prostaglandins have roles in inflammation, blood flow, ion transport, synaptic transmission, and sleep. Prostacyclin and the thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and thromboxane synthases, respectively. Ketone Body Metabolism
Pairs of acetyl-CoA molecules derived from fatty acid oxidation in the liver can condense to form acetoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone. These three products are known as ketone bodies. Enzymes involved in ketone body metabolism include HMG-CoA synthetase, HMG-CoA cleavage enzyme, D-3-hydroxybutyrate dehydrogenase, acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase. Ketone bodies are a normal fuel supply of the heart and renal cortex. Acetoacetate produced by the liver is transported to cells where the acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle, i times of starvation, ketone bodies produced from stored triacylglyerols become an important fuel source, especially for the brain. Abnormally high levels of ketone bodies are observed in diabetics. Diabetic coma can result if ketone body levels become too great. Lipid Mobilization
Within cells, fatty acids are transported by cytoplasmic fatty acid binding proteins (Online Mendelian Inheritance in Man (OMJJVI) * 134650 Fatty Acid-Binding Protein 1, Liver; FABP1). Diazepam binding inhibitor (DBI), also known as endozepine and acyl CoA-binding protein, is an endogenous γ-aminobutyric acid (GAB A) receptor ligand which is thought to down-regulate the effects of GABA. DBI binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters (OMIM * 125950 Diazepam Binding Inhibitor; DBI; PROSITE PDOC00686 Acyl-CoA-binding protein signature).
Fat stored in liver and adipose triglycerides may be released by hydrolysis and transported in the blood. Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol esters in the blood are transported in lipoprotein particles. The particles consist of a core of hydrophobic lipids surrounded by a shell of polar lipids and apolipoproteins. The protein components serve in the solubilization of hydrophobic lipids and also contain cell-targeting signals. Lipoproteins include chylomicrons, chylomicron remnants, very-low-density lipoproteins (VLDL), intermediate- density lipoproteins (IDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL). There is a strong inverse correlation between the levels of plasma HDL and risk of premature coronary heart disease.
Triacylglycerols in chylomicrons and VLDL are hydrolyzed by lipoprotein lipases that line blood vessels in muscle and other tissues that use fatty acids. Cell surface LDL receptors bind LDL particles which are then internalized by endocytosis. Absence of the LDL receptor, the cause of the disease familial hypercholesterolemia, leads to increased plasma cholesterol levels and ultimately to atherosclerosis. Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters from HDL to apolipoprotein B -containing lipoproteins. Cholesteryl ester transfer protein is important in the reverse cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. et al. (1997) Curr. Opin. Lipidol. 8:101-110). Macrophage scavenger receptors, which bind and internalize modified lipoproteins, play a role in lipid transport and may contribute to atherosclerosis (Greaves, D.R. et al. (1998) Curr. Opin. Lipidol. 9:425-432).
Proteins involved in cholesterol uptake and biosynthesis are tightly regulated in response to cellular cholesterol levels. The sterol regulatory element binding protein (SREBP) is a sterol- responsive transcription factor. Under normal cholesterol conditions, SREBP resides in the ER membrane. When cholesterol levels are low, a regulated cleavage of SREBP occurs which releases the extracellular domain of the protein. This cleaved domain is then transported to the nucleus where it activates the transcription of the LDL receptor gene, and genes encoding enzymes of cholesterol synthesis, by binding the sterol regulatory element (SRE) upstream of the genes (Yang, J. et al. (1995) J. Biol. Chem. 270:12152-12161). Regulation of cholesterol uptake and biosynthesis also 5 occurs via the oxysterol-binding protein (OSBP). OSBP is a high-affinity intracellular receptor for a variety of oxysterols that down-regulate cholesterol synthesis and stimulate cholesterol esterification (Lagace, T.A. et al. (1997) Biochem. J. 326:205-213). Beta-oxidation
Mitochondrial and peroxisomal beta-oxidation enzymes degrade saturated and unsaturated 0 fatty acids by sequential removal of two-carbon units from CoA-activated fatty acids. The main beta- oxidation pathway degrades both saturated and unsaturated fatty acids while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
The pathways of mitochondrial and peroxisomal beta-oxidation use similar enzymes, but have different substrate specificities and functions. Mitochondria oxidize short-, medium-, and long- 5 chain fatty acids to produce energy for cells. Mitochondrial beta-oxidation is a major energy source for cardiac and skeletal muscle, liver, it provides ketone bodies to the peripheral circulation when glucose levels are low as in starvation, endurance exercise, and diabetes (Eaton, S. et al. (1996) Biochem. J. 320:345-357). Peroxisomes oxidize medium-, long-, and very-long-chain fatty acids, dicarboxylic fatty acids, branched fatty acids, prostaglandins, xenobiotics, and bile acid o intermediates. The chief roles of peroxisomal beta-oxidation are to shorten toxic lipophilic carboxylic acids to facilitate their excretion and to shorten very-long-chain fatty acids prior to mitochondrial beta-oxidation (Mannaerts, G.P. and P.P. van Veldhoven (1993) Biochimie 75:147- 158).
Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, 5 acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogenase, β- ketothiolase, 2,4-dienoyl CoA reductase, and isomerase. Lipid Cleavage and Degradation
Triglycerides are hydrolyzed to fatty acids and glycerol by Iipases. Lysophospholipases (LPLs) are widely distributed enzymes that metabolize intracellular lipids, and occur in numerous 0 isoforms. Small isoforms, approximately 15-30 kD, function as hydrolases; large isoforms, those exceeding 60 kD, function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes when it is formed or imported into a cell. LPLs are regulated by lipid factors including acylcarnitine, arachidonic acid, and phosphatidic acid. These lipid factors are signaling molecules important in numerous pathways, including the 5 inflammatory response. (Anderson, R. et al. (1994) Toxicol. Appl. Pharmacol. 125:176-183; Selle, H. et al. (1993); Eur. J. Biochem. 212:411-416.) The secretory phospholipase A2 (PLA2) superfamily comprises a number of heterogeneous enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of phosphoglycerides. Hydrolysis of the glycerophospholipids releases free fatty acids and lysophospholipids. PLA2 activity generates precursors for the biosynthesis of biologically active 5 lipids, hydroxy fatty acids, and platelet-activating factor. PLA2 hydrolysis of the sn-2 ester bond in phospholipids generates free fatty acids, such as arachidonic acid and lysophospholipids. Carbon and Carbohydrate Metabolism
Carbohydrates, including sugars or saccharides, starch, and cellulose, are aldehyde or ketone compounds with multiple hydroxyl groups. The importance of carbohydrate metabolism is 0 demonstrated by the sensitive regulatory system in place for maintenance of blood glucose levels. Two pancreatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and increased glucose release from cells, respectively. Carbohydrates have three important roles in mammalian cells. First, carbohydrates are used as energy stores, fuels, and metabolic intermediates. Carbohydrates are broken down to form energy in glycolysis and are stored as 5 glycogen for later use. Second, the sugars deoxyribose and ribose form part of the structural support of DNA and RNA, respectively. Third, carbohydrate modifications are added to secreted and membrane proteins and lipids as they traverse the secretory pathway. Cell surface carbohydrate- containing macromolecules, including glycoproteins, glycolipids, and transmembrane proteoglycans, mediate adhesion with other cells and with components of the extracellular matrix. The extracellular o matrix is comprised of diverse glycoproteins, glycosaminoglycans (GAGs), and carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the cell surface. The interaction of the cell with the surrounding matrix profoundly influences cell shape, strength, flexibility, motility, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proliferation and 5 differentiation, tissue construction, and embryonic development.
Carbohydrate metabolism is altered in several disorders including diabetes mellitus, hyperglycemia, hypoglycemia, galactosemia, galactokinase deficiency, and UDP-galactose-4- epimerase deficiency (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine, McGraw- Hill, New York NY, pp. 2208-2209). Altered carbohydrate metabolism is associated with cancer. o Reduced GAG and proteoglycan expression is associated with human lung carcinomas (Nackaerts, K. et al. (1997) Int. J. Cancer 74:335-345). The carbohydrate determinants sialyl Lewis A and sialyl Lewis X are frequently expressed on human cancer cells (Kannagi, R. (1997) Glycoconj. J. 14:577- 584). Alterations of the N-linked carbohydrate core structure of cell surface glycoproteins are linked to colon and pancreatic cancers (Schwarz, R.E. et al. (1996) Cancer Lett. 107:285-291). Reduced 5 expression of the Sda blood group carbohydrate structure in cell surface glycolipids and glycoproteins is observed in gastrointestinal cancer (Dohi, T. et al. (1996) Int. J. Cancer 67:626-663). (Carbon and carbohydrate metabolism is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY; Lehninger, A.L. (1982) Principles of Biochemistry Worth Publishers Inc., New York NY; and Lodish, H. et al. (1995) Molecular Cell Biology Scientific American Books, New York NY.) Glycolysis Enzymes of the glycolytic pathway convert the sugar glucose to pyruvate while simultaneously producing ATP. The pathway also provides building blocks for the synthesis of cellular components such as long-chain fatty acids. After glycolysis, pyrvuate is converted to acetyl- Coenzyme A, which, in aerobic organisms, enters the citric acid cycle. Glycolytic enzymes include hexokinase, phosphoglucose isomerase, phosphofructokinase, aldolase, triose phosphate isomerase, glyceraldehyde 3-phosphate dehydrogenase, phosphoglycerate kinase, phosphoglyceromutase, enolase, and pyruvate kinase. Of these, phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of glycolysis.
Gluconeogenesis
Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate and amino acids. The pathway, which functions mainly in times of starvation and intense exercise, occurs mostly in the liver and kidney. Responsible enzymes include pyruvate carboxylase, phosphoenolpyruvate carboxykinase, fructose 1,6-bisphosρhatase, and glucose-6-phosphatase. Pentose Phosphate Pathway
Pentose phosphate pathway enzymes are responsible for generating the reducing agent NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5-phosphate. Ribose-5- phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme A, NAD+, FAD, RNA, and DNA. The pentose phosphate pathway has both oxidative and non- oxidative branches. The oxidative branch steps, which are catalyzed by the enzymes glucose-6- phosphate dehydrogenase, lactonase, and 6-phosphogluconate dehydrogenase, convert glucose-6- phosphate and NADP+ to ribulose-6-phosphate and NADPH. The non-oxidative branch steps, which are catalyzed by the enzymes phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase, allow the interconversion of three-, four-, five-, six-, and seven-carbon sugars. Glucouronate Metabolism
Glucuronate is a monosaccharide which, in the form of D-glucuronic acid, is found in the GAGs chondroitin and dermatan. D-glucuronic acid is also important in the detoxification and excretion of foreign organic compounds such as phenol. Enzymes involved in glucuronate metabolism include UDP-glucose dehydrogenase and glucuronate reductase. Disaccharide Metabolism
Disaccharides must be hydrolyzed to monosaccharides to be digested. Lactose, a disaccharide found in milk, is hydrolyzed to galactose and glucose by the enzyme lactase. Maltose is derived from plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in insects and mushrooms, is hydrolyzed to glucose by the enzyme trehalase (OMIM *275360 Trehalase; Ruf, J. et al. (1990) J. Biol. Chem. 265:15034-15039). Lactase, maltase, sucrase, and trehalase are bound to mucosal cells lining the small intestine, where they participate in the digestion of dietary disaccharides. The enzyme lactose synthetase, composed of the catalytic subunit galactosyltransferase and the modifier subunit α-lactalbumin, converts UDP- galactose and glucose to lactose in the mammary glands. Glycogen, Starch, and Chitin Metabolism
Glycogen is the storage form of carbohydrates in mammals. Mobilization of glycogen maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly in the liver and in skeletal muscle in the form of cytoplasmic granules. These granules contain enzymes that catalyze the synthesis and degradation of glycogen, as well as enzymes that regulate these processes. Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a transferase, α-l,6-glucosidase, and phosphoglucomutase. Enzymes that catalyze the synthesis of glycogen include UDP-glucose pyrophosphorylase, glycogen synthetase, a branching enzyme, and nucleoside diphosphokinase. The enzymes of glycogen synthesis and degradation are tightly regulated by the hormones insulin, glucagon, and epinephrine. Starch, a plant-derived polysaccharide, is hydrolyzed to maltose, maltotriose, and α-dextrin by α-amylase, an enzyme secreted by the salivary glands and pancreas. Chitin is a polysaccharide found in insects and Crustacea. A chitotriosidase is secreted by macrophages and may play a role in the degradation of chitin-containing pathogens (Boot, R.G. et al. (1995) J. Biol. Chem. 270:26252-26256). Peptidoglycans and Glycosaminoglycans
Glycosaminoglycans (GAGs) are anionic linear unbranched polysaccharides composed of repetitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed of a core protein attached to one or more GAGs. GAGs are found on the cell surface, inside cells, and in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases including autoimmune thyroid disease, autoimmune diabetes mellitus, and systemic lupus erythematosus (Hansen, C. et al. (1996) Clin. Exp. Rheum. 14 (Suppl. 15):S59-S67). GAGs include chondroitin sulfate, keratan sulfate, heparin, heparan sulfate, dermatan sulfate, and hyaluronan.
The GAG hyaluronan (HA) is found in the extracellular matrix of many cells, especially in soft connective tissues, and is abundant in synovial fluid (Pitsillides, A.A. et al. (1993) Int. J. Exp. Pathol. 74:27-34). HA seems to play important roles in cell regulation, development, and differentiation (Laurent, T.C. and J.R. Fraser (1992) FASEB J. 6:2397-2404). Hyaluronidase is an enzyme that degrades HA to oligosaccharides. Hyaluronidases may function in cell adhesion, infection, angiogenesis, signal transduction, reproduction, cancer, and inflammation.
Proteoglycans, also known as peptidoglycans, are found in the extracellular matrix of connective tissues such as cartilage and are essential for distributing the load in weight-bearing joints. Cell-surface-attached proteoglycans anchor cells to the extracellular matrix. Both extracellular and cell-surface proteoglycans bind growth factors, facilitating their binding to cell-surface receptors and subsequent triggering of signal transduction pathways. Amino Acid and Nitrogen Metabolism
NH4 + is assimilated into amino acids by the actions of two enzymes, glutamate dehydrogenase and glutamine synthetase. The carbon skeletons of amino acids come from the intermediates of glycolysis, the pentose phosphate pathway, or the citric acid cycle. Of the twenty amino acids used in proteins, humans can synthesize only thirteen (nonessential amino acids). The remaining nine must come from the diet (essential amino acids). Enzymes involved in nonessential amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, adenosylhomocysteinase, cystathionine β-synthase, cystathionine γ-lyase, phosphoglycerate dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine hydroxylmethyltransferase, and glycine synthase.
Metabolism of amino acids takes place almost entirely in the liver, where the amino group is removed by aminotransferases (transaminases), for example, alanine aminotransferase. The amino group is transferred to α-ketoglutarate to form glutamate. Glutamate dehydrogenase converts glutamate to NH4 + and α-ketoglutarate. NH4 +is converted to urea by the urea cycle which is catalyzed by the enzymes arginase, ornithine transcarbamoylase, arginosuccinate synthetase, and arginosuccinase. Carbamoyl phosphate synthetase is also involved in urea formation. Enzymes involved in the metabolism of the carbon skeleton of amino acids include serine dehydratase, asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain α-keto dehydrogenase complex, isovaleryl CoA dehydrogenase, β-methylcrotonyl CoA carboxylase, phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase. Polyamines, which include spermidine, putrescine, and spermine, bind tightly to nucleic acids and are abundant in rapidly proliferating cells. Enzymes involved in polyamine synthesis include ornithine decarboxylase.
Diseases involved in amino acid and nitrogen metabolism include hyperammonemia, carbamoyl phosphate synthetase deficiency, urea cycle enzyme deficiencies, methylmalonic aciduria, maple syrup disease, alcaptonuria, and phenylketonuria. Energy Metabolism Cells derive energy from metabolism of ingested compounds that may be roughly categorized as carbohydrates, fats, or proteins. Energy is also stored in polymers such as triglycerides (fats) and glycogen (carbohydrates). Metabolism proceeds along separate reaction pathways connected by key intermediates such as acetyl coenzyme A (acetyl-CoA). Metabolic pathways feature anaerobic and aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine (GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the triphosphate drives reactions needed for cell maintenance, growth, and proliferation.
Digestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are converted in the liver to glucose. Enzymes involved in these conversions include galactose- 1- phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts glucose to pyruvate in a series of reactions coupled to ATP synthesis.
Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to CO2 with concomitant formation of NADH, FADHj, and GTP. In oxidative phosphorylation, the transport of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P1 by the FQFJ ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis mclude the FQFJ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c-,, FeS protein, and cytochrome c oxidase.
Triglycerides are hydrolyzed to fatty acids and glycerol by lipases. Glycerol is then phosphorylated to glycerol-3-phosphate by glycerol kinase and glycerol phosphate dehydrogenase, and degraded by the glycolysis. Fatty acids are transported into the mitochondria as fatty acyl- carnitine esters and undergo oxidative degradation.
In addition to metabolic disorders such as diabetes and obesity, disorders of energy metabolism are associated with cancers (Dorward, A. et al. (1997) J. Bioenerg. Biomembr. 29:385- 392), autism (Lombard, J. (1998) Med. Hypotheses 50:497-500), neurodegenerative disorders (Alexi, T. et al. (1998) Neuroreport 9:R57-64), and neuromuscular disorders (DiMauro, S. et al. (1998) Biochim. Biophys. Acta 1366:199-210). The myocardium is heavily dependent on oxidative metabolism, so metabolic dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol. 13:190-197).
For a review of energy metabolism enzymes and intermediates, see Stryer, L. et al. (1995) Biochemistry, W.H. Freeman and Co., San Francisco CA, pp. 443-652. For a review of energy metabolism regulation, see Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, pp. 744-770. Cofactor Metabolism
Cofactors, including coenzymes and prosthetic groups, are small molecular weight inorganic or organic compounds that are required for the action of an enzyme. Many cofactors contain vitamins as a component. Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, tetrahydrofolate, lipoamide, and heme. The vitamins biotin and cobalamin are associated with enzymes as well. Heme, a prosthetic group found in myoglobin and hemoglobin, consists of protoporphyrin group bound to iron. Porphyrin groups contain four substituted pyrroles covalently joined in a ring, often with a bound metal atom. Enzymes involved in porphyrin synthesis include δ- aminolevulinate synthase, δ-aminolevulinate dehydrase, porphobilinogen deaminase, and cosynthase. Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte turnover. Enzymes involved in heme degradation include heme oxygenase and biliverdin reductase. Iron is a required cofactor for many enzymes. Besides the heme-containing enzymes, iron is found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q reductase. Iron is transported in the blood by the protein rransferrin. Binding of transferrin to the transferrm receptor on cell surfaces allows uptake by receptor mediated endocytosis. Cytosolic iron is bound to ferritin protein.
A molybdenum-containing cofactor (molybdopterin) is found in enzymes including sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental retardation and lens dislocation. Other diseases caused by defects in cofactor metabolism include pernicious anemia and methylmalonic aciduria. Secretion and Trafficking Eukaryotic cells are bound by a lipid bilayer membrane and subdivided into functionally distinct, membrane bound compartments. The membranes maintain the essential differences between the cytosol, the extracellular environment, and the lumenal space of each intracellular organelle. As lipid membranes are highly impermeable to most polar molecules, transport of essential nutrients, metabolic waste products, cell signaling molecules, macromolecules and proteins across lipid membranes and between organelles must be mediated by a variety of transport-associated molecules. Protein Trafficking
In eukaryotes, some proteins are synthesized on ER-bound ribosomes, co-translationally imported into the ER, delivered from the ER to the Golgi complex for post-translational processing and sorting, and transported from the Golgi to specific intracellular and extracellular destinations. All cells possess a constitutive transport process which maintains homeostasis between the cell and its environment. In many differentiated cell types, the basic machinery is modified to carry out specific transport functions. For example, in endocrine glands, hormones and other secreted proteins are packaged into secretory granules for regulated exocytosis to the cell exterior. In macrophage, foreign extracellular material is engulfed (phagocytosis) and delivered to lysosomes for degradation. In fat and muscle cells, glucose transporters are stored in vesicles which fuse with the plasma membrane only in response to insulin stimulation. The Secretory Pathway
Synthesis of most integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organelle occurs on ER-bound ribosomes. These proteins are co-translationally imported into the ER. The proteins leave the ER via membrane-bound vesicles which bud off the ER at specific sites and fuse with each other (homotypic fusion) to form the ER-Golgi Intermediate Compartment (ERGIC). The ERGIC matures progressively through the cis, medial, and trans cisternal stacks of the Golgi, modifying the enzyme composition by retrograde transport of specific Golgi enzymes. In this way, proteins moving through the Golgi undergo post-translational modification, such as glycosylation. The final Golgi compartment is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport vesicles destined for intracellular compartments, such as the lysosome, bud off the TGN. What remains is a secretory vesicle which contains proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Secretory vesicles eventually fuse with the plasma membrane (Glick, B.S. and V. Malhotra (1998) Cell 95:883-889).
The secretory process can be constitutive or regulated. Most cells have a constitutive pathway for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse with the plasma membrane. In many cells, such as endocrine cells, digestive cells, and neurons, vesicle pools derived from the TGN collect in the cytoplasm and do not fuse with the plasma membrane until they are directed to by a specific signal. Endocytosis
Endocytosis, wherein cells internalize material from the extracellular environment, is essential for transmission of neuronal, metabolic, and proliferative signals; uptake of many essential nutrients; and defense against invading organisms. Most cells exhibit two forms of endocytosis. The first, phagocytosis, is an actin-driven process exemplified in macrophage and neutrophils. Material to be endocytosed contacts numerous cell surface receptors which stimulate the plasma membrane to extend and surround the particle, enclosing it in a membrane-bound phagosome. In the mammalian immune system, IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes. Activation of the Fc receptors initiates a signal cascade involving src-family cytosolic kinases and the monomeric GTP-binding (G) protein Rho. The resulting actin reorganization leads to phagocytosis of the particle. This process is an important component of the humoral immune response, allowing the processing and presentation of bacterial-derived peptides to antigen-specific T-lymphocytes.
The second form of endocytosis, pinocytosis, is a more generalized uptake of material from the external milieu. Like phagocytosis, pinocytosis is activated by ligand binding to cell surface receptors. Activation of individual receptors stimulates an internal response that includes 5 coalescence of the receptor-ligand complexes and formation of clathrin-coated pits. Invagination of the plasma membrane at clathrin-coated pits produces an endocytic vesicle within the cell cytoplasm. These vesicles undergo homotypic fusion to form an early endosomal (EE) compartment. The tubulovesicular EE serves as a sorting site for incoming material. ATP-driven proton pumps in the EE membrane lowers the pH of the EE lumen (pH 6.3-6.8). The acidic environment causes many 0 ligands to dissociate from their receptors. The receptors, along with membrane and other integral membrane proteins, are recycled back to the plasma membrane by budding off the tubular extensions of the EE in recycling vesicles (RV). This selective removal of recycled components produces a carrier vesicle containing ligand and other material from the external environment. The carrier vesicle fuses with TGN-derived vesicles which contain hydrolytic enzymes. The acidic environment 5 of the resulting late endosome (LE) activates the hydrolytic enzymes which degrade the ligands and other material. As digestion takes place, the LE fuses with the lysosome where digestion is completed (Mellman, I. (1996) Annu. Rev. Cell Dev. Biol. 12:575-625).
Recycling vesicles may return directly to the plasma membrane. Receptors internalized and returned directly to the plasma membrane have a turnover rate of 2-3 minutes. Some RVs undergo o microtubule-directed relocation to a perinuclear site, from which they then return to the plasma membrane. Receptors following this route have a turnover rate of 5-10 minutes. Still other RVs are retained within the cell until an appropriate signal is received (Mellman, supra; and James, D.E. et al. (1994) Trends Cell Biol. 4:120-126). Vesicle Formation 5 Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles. Specifically, vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. The process begins with the budding of a vesicle out of the donor membrane. The membrane-bound vesicle contains proteins to be transported and is o surrounded by a protective coat made up of protein subunits recruited from the cytosol. The initial budding and coating processes are controlled by a cytosolic ras-like GTP-binding protein, ADP- ribosylating factor (Arf), and adapter proteins (AP). Different isoforms of both Arf and AP are involved at different sites of budding. Another small G-protein, dynamin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to accomplish the 5 final step of the budding process. The coated vesicle complex is then transported through the cytosol. During the transport process, Arf-bound GTP is hydrolyzed to GDP and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254). Two different classes of coat protein have also been identified. Clathrin coats form on the TGN and PM surfaces, whereas coatomer or COP coats form on the ER and Golgi. COP coats can further be distinguished as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPII, involved in 5 anterograde traffic from the ER to the Golgi (Mellman, supra). The COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. (Harter, C. and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654.) Membrane Fusion o Transport vesicles undergo homotypic or heterotypic fusion in the secretory and endocytotic pathways. Molecules required for appropriate targeting and fusion of vesicles with their target membrane include proteins incorporated in the vesicle membrane, the target membrane, and proteins recruited from the cytosol. During budding of the vesicle from the donor compartment, an integral membrane protein, VAMP (vesicle-associated membrane protein) is incorporated into the vesicle. 5 Soon after the vesicle uncoats, a cytosolic prenylated GTP-binding protein, Rab (a member of the Ras superfamily), is inserted into the vesicle membrane. GTP-bound Rab proteins are directed into nascent transport vesicles where they interact with VAMP. Following vesicle transport, GTPase activating proteins (GAPs) in the target membrane convert Rab proteins to the GDP-bound form. A cytosolic protein, guanine-nucleotide dissociation inhibitor (GDI) helps return GDP-bound Rab o proteins to their membrane of origin. Several Rab isoforms have been identified and appear to associate with specific compartments within the cell. Rab proteins appear to play a role in mediating the function of a viral gene, Rev, which is essential for replication of HJV-1, the virus responsible for AIDS (Flavell, R.A. et al. (1996) Proc. Natl. Acad. Sci. USA 93:4421-4424).
Docking of the transport vesicle with the target membrane involves the formation of a 5 complex between the vesicle SNAP receptor (v-SNARE), target membrane (t-) SNAREs, and certain other membrane and cytosolic proteins. Many of these other proteins have been identified although their exact functions in the docking complex remain uncertain (Tellam, J.T. et al. (1995) J. Biol. Chem. 270:5857-5863; and Hata, Y. and T.C. Sudhof (1995) J. Biol. Chem. 270:13022-13028). N-ethylmaleimide sensitive factor (NSF) and soluble NSF-attachment protein (α-SNAP and β-SNAP) 0 are two such proteins that are conserved from yeast to man and function in most intracellular membrane fusion reactions. Seel represents a family of yeast proteins that function at many different stages in the secretory pathway including membrane fusion. Recently, mammalian homologs of Seel, called Munc-18 proteins, have been identified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Hata et al. supra). 5 The SNARE complex involves three SNARE molecules, one in the vesicular membrane and two in the target membrane. Synaptotagmin is an integral membrane protein in the synaptic vesicle which associates with the t-SNARE syntaxin in the docking complex. Synaptotagmin binds calcium in a complex with negatively charged phospholipids, which allows the cytosolic SNAP protein to displace synaptotagmin from syntaxin and fusion to occur. Thus, synaptotagmin is a negative regulator of fusion in the neuron (Littleton, J.T. et al. (1993) Cell 74: 1125-1134). The most abundant 5 membrane protein of synaptic vesicles appears to be the glycoprotein synaptophysin, a 38 kDa protein with four transmembrane domains.
Specificity between a vesicle and its target is derived from the v-SNARE, t-SNAREs, and associated proteins involved. Different isoforms of SNAREs and Rabs show distinct cellular and subcellular distributions. VAMP-1/synaptobrevin, membrane-anchored synaptosome-associated 0 protein of 25 kDa (SNAP-25), syntaxin-1, Rab3A, Rabl5, and Rab23 are predominantly expressed in the brain and nervous system. Different syntaxin, VAMP, and Rab proteins are associated with distinct subcellular compartments and their vesicular carriers. Nuclear Transport
Transport of proteins and RNA between the nucleus and the cytoplasm occurs through 5 nuclear pore complexes (NPCs). NPC-mediated transport occurs in both directions through the nuclear envelope. All nuclear proteins are imported from the cytoplasm, their site of synthesis. tRNA and mRNA are exported from the nucleus, their site of synthesis, to the cytoplasm, their site of function. Processing of small nuclear RNAs involves export into the cytoplasm, assembly with proteins and modifications such as hypermethylation to produce small nuclear ribonuclear proteins o (snRNPs), and subsequent import of the snRNPs back into the nucleus. The assembly of ribosomes requires the initial import of ribosomal proteins from the cytoplasm, their incorporation with RNA into ribosomal subunits, and export back to the cytoplasm. (Gδrlich, D. and I.W. Mattaj (1996) Science 271:1513-1518.)
The transport of proteins and mRNAs across the NPC is selective, dependent on nuclear 5 localization signals, and generally requires association with nuclear transport factors. Nuclear localization signals (NLS) consist of short stretches of amino acids enriched in basic residues. NLS are found on proteins that are targeted to the nucleus, such as the glucocorticoid receptor. The NLS is recognized by the NLS receptor, importin, which then interacts with the monomeric GTP-binding protein Ran. This NLS protein/receptor/Ran complex navigates the nuclear pore with the help of the o homodimeric protein nuclear transport factor 2 (NTF2). NTF2 binds the GDP-bound form of Ran and to multiple proteins of the nuclear pore complex containing FXFG repeat motifs, such as p62. (Paschal, B. et al. (1997) J. Biol. Chem. 272:21534-21539; and Wong, D.H. et al. (1997) Mol. Cell Biol. 17:3755-3767). Some proteins are dissociated before nuclear mRNAs are transported across the NPC while others are dissociated shortly after nuclear mRNA transport across the NPC and are 5 reimported into the nucleus. Disease Correlation The etiology of numerous human diseases and disorders can be attributed to defects in the transport or secretion of proteins. For example, abnormal hormonal secretion is linked to disorders such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone, ACTH). Moreover, cancer cells secrete excessive amounts of hormones or other biologically active peptides. Disorders related to excessive secretion of biologically active peptides by tumor cells include fasting hypoglycemia due to increased insulin secretion from insulinoma-islet cell tumors; hypertension due to increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal medulla and sympathetic paraganglia; and carcinoid syndrome, which is characterized by abdominal cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors. Biologically active peptides that are ectopically synthesized in and secreted from tumor cells include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung and bladder cancers); calcitonin (lung and breast cancers); and thyroid-stimulating hormone (medullary thyroid carcinoma). Such peptides may be useful as diagnostic markers for tumorigenesis (Schwartz, M.Z. (1997) Semin. Pediatr. Surg. 3:141-146; and Said, S.I. and G.R. Faloona (1975) N. Engl. J. Med. 293:155-160).
Defective nuclear transport may play a role in cancer. The BRCA1 protein contains three potential NLSs which interact with importin alpha, and is transported into the nucleus by the importin/NPC pathway. In breast cancer cells the BRCA1 protein is aberrantly localized in the cytoplasm. The mislocation of the BRCA1 protein in breast cancer cells may be due to a defect in the NPC nuclear import pathway (Chen, C.F. et al. (1996) J. Biol. Chem. 271:32863-32868).
It has been suggested that in some breast cancers, the tumor-suppressing activity of p53 is inactivated by the sequestration of the protein in the cytoplasm, away from its site of action in the cell nucleus. Cytoplasmic wild-type p53 was also found in human cervical carcinoma cell lines. (Moll, U.M. et al. (1992) Proc. Natl. Acad. Sci. USA 89:7262-7266; and Liang, X.H. et al. (1993) Oncogene 8:2645-2652.) Environmental Responses
Organisms respond to the environment by a number of pathways. Heat shock proteins, including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to cellular proteins.
Aquaporins (AQP) are channels that transport water and, in some cases, nonionic small solutes such as urea and glycerol. Water movement is important for a number of physiological processes including renal fluid filtration, aqueous humor generation in the eye, cerebrospinal fluid production in the brain, and appropriate hydration of the lung. Aquaporins are members of the major intrinsic protein (MIP) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev. Physiol. 58:619-648; Ishibashi, K. et al. (1997) J. Biol. Chem. 272:20782-20786). The study of aquaporins may have relevance to understanding edema formation and fluid balance in both normal physiology and disease states (King, supra). Mutations in AQP2 cause autosomal recessive nephrogenic diabetes insipidus (OMJJVI * 107777 Aquaporin 2; AQP2). Reduced AQP4 expression in skeletal muscle may be associated with Duchenne muscular dystrophy (Frigeri, A. et al. (1998) J. Clin. Invest. 102:695-703). Mutations in AQP0 cause autosomal dominant cataracts in the mouse (OM * 154050 Major Intrinsic Protein of Lens Fiber; MIP).
The metallothioneins (MTs) are a group of small (61 amino acids), cysteine-rich proteins that bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in metal detoxification or the metabolism and homeostasis of metals. Arsenite-resistance proteins have been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997) Mutat. Res. 386:307-314).
Humans respond to light and odors by specific protein pathways. Proteins involved in light perception include rhodopsin, transducin, and cGMP phosphodiesterase. Proteins involved in odor perception include multiple olfactory receptors. Other proteins are important in human Orcadian rhythms and responses to wounds. Immunity and Host Defense
All vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal and parasitic infections. Included in these systems are the processes of humoral immunity, the complement cascade and the inflammatory response (Paul, W.E. (1993) Fundamental Immunology, Raven Press, Ltd., New York NY, pp.1-20).
The cellular components of the humoral immune system include six different types of leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, eosinophils, and basophils) and plasma cells. Additionally, fragments of megakaryocytes, a seventh type of white blood cell in the bone marrow, occur in large numbers in the blood as platelets.
Leukocytes are formed from two stem cell lineages in bone marrow. The myeloid stem cell line produces granulocytes and monocytes and, the lymphoid stem cell produces lymphocytes. Lymphoid cells travel to the thymus, spleen and lymph nodes, where they mature and differentiate into lymphocytes. Leukocytes are responsible for defending the body against invading pathogens. Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them by phagocytosis. Monocytes enter tissues and differentiate into macrophages which are extremely phagocytic. Lymphocytes and plasma cells are a part of the immune system which recognizes specific foreign molecules and organisms and inactivates them, as well as signals other cells to attack the invaders. Granulocytes and monocytes are formed and stored in the bone marrow until needed.
Megakaryocytes are produced in bone marrow, where they fragment into platelets and are released into the bloodstream. The main function of platelets is to activate the blood clotting mechanism. Lymphocytes and plasma cells are produced in various lymphogenous organs, including the lymph nodes, spleen, thymus, and tonsils.
Both neutrophils and macrophages exhibit chemotaxis towards sites of inflammation. Tissue 5 inflammation in response to pathogen invasion results in production of chemo-attractants for leukocytes, such as endotoxins or other bacterial products, prostaglandins, and products of leukocytes or platelets.
Basophils participate in the release of the chemicals involved in the inflammatory process. The main function of basophils is secretion of these chemicals to such a degree that they have been 0 referred to as "unicellular endocrine glands." A distinct aspect of basopbilic secretion is that the contents of granules go directly into the extracellular environment, not into vacuoles as occurs with neutrophils, eosinophils and monocytes. Basophils have receptors for the Fc fragment of immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE with anti-IgE or other ligands triggers degranulation. 5 Eosinophils are bi- or multi-nucleated white blood cells which contain eosinophilic granules.
Their plasma membrane is characterized by Ig receptors, particularly IgG and IgE. Generally, eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. They have specific functions in parasitic infections and allergic reactions, and are thought to detoxify some of the substances released by mast cells and basophils which cause inflammation. Additionally, o they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation. Macrophages are monocytes that have left the blood stream to settle in tissue. Once monocytes have migrated into tissues, they do not re-enter the bloodstream. The mononuclear phagocyte system is comprised of precursor cells in the bone marrow, monocytes in circulation, and macrophages in tissues. The system is capable of very fast and extensive phagocytosis. A 5 macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive for many more months. Macrophages are also capable of ingesting large particles, including red blood cells and malarial parasites. They increase several-fold in size and transform into macrophages that are characteristic of the tissue they have entered, surviving in tissues for several months.
Mononuclear phagocytes are essential in defending the body against invasion by foreign 0 pathogens, particularly intracellular microorganisms such as M. tuberculosis, listeria, leishmania and toxoplasma. Macrophages can also control the growth of tumorous cells, via both phagocytosis and secretion of hydrolytic enzymes. Another important function of macrophages is that of processing antigen and presenting them in a biochemically modified form to lymphocytes.
The immune system responds to invading microorganisms in two major ways: antibody 5 production and cell mediated responses. Antibodies are immunoglobulin proteins produced by
B-lymphocytes which bind to specific antigens and cause inactivation or promote destruction of the antigen by other cells. Cell-mediated immune responses involve T-Iymphocytes (T cells) that react with foreign antigen on the surface of infected host cells. Depending on the type of T cell, the infected cell is either killed or signals are secreted which activate macrophages and other cells to destroy the infected cell (Paul, supra). 5 T-lymphocytes originate in the bone marrow or liver in fetuses. Precursor cells migrate via the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is crucial because of positive and negative selection of T cells that will react with foreign antigen and not with self molecules. After processing, T cells continuously circulate in the blood and secondary lymphoid tissues, such as lymph nodes, spleen, certain epithelium-associated tissues in the 0 gastrointestinal tract, respiratory tract and skin. When T-lymphocytes are presented with the complementary antigen, they are stimulated to proliferate and release large numbers of activated T cells into the lymph system and the blood system. These activated T cells can survive and circulate for several days. At the same time, T memory cells are created, which remain in the lymphoid tissue for months or years. Upon subsequent exposure to that specific antigen, these memory cells will 5 respond more rapidly and with a stronger response than induced by the original antigen. This creates an "immunological memory" that can provide immunity for years.
There are two major types of T cells: cytotoxic T cells destroy infected host cells, and helper T cells activate other white blood cells via chemical signals. One class of helper cell, TH1, activates macrophages to destroy ingested microorganisms, while another, TH2, stimulates the production of o antibodies by B cells.
Cytotoxic T cells directly attack the infected target cell. In virus-infected cells, peptides derived from viral proteins are generated by the proteasome. These peptides are transported into the ER by the transporter associated with antigen processing (TAP) (Pamer, E. and P. Cresswell (1998) Annu. Rev. Immunol. 16:323-358). Once inside the ER, the peptides bind MHC I chains, and the 5 peptide/MHC I complex is transported to the cell surface. Receptors on the surface of T cells bind to antigen presented on cell surface MHC molecules. Once activated by binding to antigen, T cells secrete γ-interferon, a signal molecule that induces the expression of genes necessary for presenting viral (or other) antigens to cytotoxic T cells. Cytotoxic T cells kill the infected cell by stimulating programmed cell death. o Helper T cells constitute up to 75% of the total T cell population. They regulate the immune functions by producing a variety of lymphokines that act on other cells in the immune system and on bone marrow. Among these lymphokines are: interleukins-2,3,4,5,6; granulocyte-monocyte colony stimulating factor, and γ-interferon.
Helper T cells are required for most B cells to respond to antigen. When an activated helper 5 cell contacts a B cell, its centrosome and Golgi apparatus become oriented toward the B cell, aiding the directing of signal molecules, such as transmembrane-bound protein called CD40 ligand, onto the B cell surface to interact with the CD40 transmembrane protein. Secreted signals also help B cells to proliferate and mature and, in some cases, to switch the class of antibody being produced.
B-lymphocytes (B cells) produce antibodies which react with specific antigenic proteins presented by pathogens. Once activated, B cells become filled with extensive rough endoplasmic 5 reticulum and are known as plasma cells. As with T cells, interaction of B cells with antigen stimulates proliferation of only those B cells which produce antibody specific to that antigen. There are five classes of antibodies, known as immunoglobulins, which together comprise about 20% of total plasma protein. Each class mediates a characteristic biological response after antigen binding. Upon activation by specific antigen B cells switch from making membrane-bound antibody to o secretion of that antibody.
Antibodies, or immunoglobulins (Ig), are the founding members of the Ig superfamily and the central components of the humoral immune response. Antibodies are either expressed on the surface of B cells or secreted by B cells into the circulation. Antibodies bind and neutralize blood- borne foreign antigens. The prototypical antibody is a tetramer consisting of two identical heavy 5 polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the α, δ, e, γ, and μ. H-chain types. There are two types of L- chains, K and λ, either of which may associate as a pair with any H-chain pair. IgG, the most o common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs c. 5 within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. In addition, H chains such as μ have been shown to associate with other polypeptides during differentiation of the B cell. •
Antibodies can be described in terms of their two main functional domains. Antigen o recognition is mediated by the Fab (antigen binding fragment) region of the antibody, while effector functions are mediated by the Fc (crystallizable fragment) region. Binding of antibody to an antigen, such as a bacterium, triggers the destruction of the antigen by phagocytic white blood cells such as macrophages and neutrophils. These cells express surface receptors that specifically bind to the antibody Fc region and allow the phagocytic cells to engulf, ingest, and degrade the antibody-bound 5 antigen. The Fc receptors expressed by phagocytic cells are single-pass transmembrane glycoproteins of about 300 to 400 amino acids (Sears, D.W. et al. (1990) J. Immunol. 144:371-378). The extracellular portion of the Fc receptor typically contains two or three Ig domains.
Diseases which cause over- or under-abundance of any one type of leukocyte usually result in the entire immune defense system becoming involved. A well-known autoimmune disease is AIDS (Acquired Immunodeficiency Syndrome) where the number of helper T cells is depleted, leaving the patient susceptible to infection by microorganisms and parasites. Another widespread medical condition attributable to the immune system is that of allergic reactions to certain antigens. Allergic reactions include: hay fever, asthma, anaphylaxis, and urticaria (hives). Leukemias are an excess production of white blood cells, to the point where a major portion of the body' s metabolic resources are directed solely at proliferation of white blood cells, leaving other tissues to starve. Leukopenia or agranulocytosis occurs when the bone marrow stops producing white blood cells. This leaves the body unprotected against foreign microorganisms, including those which normally inhabit skin, mucous membranes, and gastrointestinal tract. If all white blood cell production stops completely, infection will occur within two days and death may follow only 1 to 4 days later.
Impaired phagocytosis occurs in several diseases, including monocytic leukemia, systemic lupus, and granulomatous disease. In such a situation, macrophages can phagocytize normally, but the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts oxygen to lethally reactive forms results in abscess formation in liver, lungs, spleen, lymph nodes, and beneath the skin. Eosinophilia is an excess of eosinophils commonly observed in patients with allergies (hay fever, asthma), allergic reactions to drugs, rheumatoid arthritis, and cancers (Hodgkin' s disease, lung, and liver cancer) (Isselbacher, K.J. et al. (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, Inc., New York NY).
Host defense is further augmented by the complement system. The complement system serves as an effector system and is involved in infectious agent recognition. It can function as an independent immune network or in conjunction with other humoral immune responses. The complement system is comprised of numerous plasma and membrane proteins that act in a cascade of reaction sequences whereby one component activates the next. The result is a rapid and amplified response to infection through either an inflammatory response or increased phagocytosis.
The complement system has more than 30 protein components which can be divided into functional groupings including modified serine proteases, membrane-binding proteins and regulators of complement activation. Activation occurs through two different pathways the classical and the alternative. Both pathways serve to destroy infectious agents through distinct triggering mechanisms that eventually merge with the involvement of the component C3.
The classical pathway requires antibody binding to infectious agent antigens. The antibodies serve to define the target and initiate the complement system cascade, culminating in the destruction of the infectious agent. In this pathway, since the antibody guides initiation of the process, the complement can be seen as an effector arm of the humoral immune system. The alternative pathway of the complement system does not require the presence of preexisting antibodies for targeting infectious agent destruction. Rather, this pathway, through low levels of an activated component, remains constantly primed and provides surveillance in the non- immune host to enable targeting and destruction of infectious agents. In this case foreign material triggers the cascade, thereby facilitating phagocytosis or lysis (Paul, supra, pp.918-919).
Another important component of host defense is the process of inflammation. Inflammatory responses are divided into four categories on the basis of pathology and include allergic inflammation, cytotoxic antibody mediated inflammation, immune complex mediated inflammation and monocyte mediated inflammation. Inflammation manifests as a combination of each of these forms with one predominating.
Allergic acute inflammation is observed in individuals wherein specific antigens stimulate IgE antibody production. Mast cells and basophils are subsequently activated by the attachment of antigen-IgE complexes, resulting in the release of cytoplasmic granule contents such as Mstamine. The products of activated mast cells can increase vascular permeability and constrict the smooth muscle of breathing passages, resulting in anaphylaxis or asthma. Acute inflammation is also mediated by cytotoxic antibodies and can result in the destruction of tissue through the binding of complement-fixing antibodies to cells. The responsible antibodies are of the IgG or IgM types. Resultant clinical disorders include autoimmune hemolytic anemia and thrombocytopenia as associated with systemic lupus erythematosis. Immune complex mediated acute inflammation involves the IgG or IgM antibody types which combine with antigen to activate the complement cascade. When such immune complexes bind to neutrophils and macrophages they activate the respiratory burst to form protein- and vessel- damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines. Clinical manifestations include rheumatoid arthritis and systemic lupus erythematosus. In chronic inflammation or delay ed-type hypersensitivity, macrophages are activated and process antigen for presentation to T cells that subsequently produce lymphokines and monokines. This type of inflammatory response is likely important for defense against intracellular parasites and certain viruses. Clinical associations include, granulomatous disease, tuberculosis, leprosy, and sarcoidosis (Paul, W.E., supra, pp.1017-1018).
Extracellular Information Transmission Molecules
Intercellular communication is essential for the growth and survival of multicellular organisms, and in particular, for the function of the endocrine, nervous, and immune systems. In addition, intercellular communication is critical for developmental processes such as tissue construction and organogenesis, in which cell proliferation, cell differentiation, and morphogenesis must be spatially and temporally regulated in a precise and coordinated manner. Cells communicate with one another through the secretion and uptake of diverse types of signaling molecules such as hormones, growth factors, neuropeptides, and cytokines.
Hormones
Hormones are signaling molecules that coordinately regulate basic physiological processes from embryogenesis throughout adulthood. These processes include metabolism, respiration, reproduction, excretion, fetal tissue differentiation and organogenesis, growth and development, homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly integrated and interdependent. Hormones are secreted by endocrine glands, primarily the hypothalamus and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes.
The secretion of hormones into the circulation is tightly controlled. Hormones are often secreted in diurnal, pulsatile, and cyclic patterns. Hormone secretion is regulated by perturbations in blood biochemistry, by other upstream-acting hormones, by neural impulses, and by negative feedback loops. Blood hormone concentrations are constantly monitored and adjusted to maintain optimal, steady-state levels. Once secreted, hormones act only on those target cells that express specific receptors.
Most disorders of the endocrine system are caused by either hyposecretion or hypersecretion of hormones. Hyposecretion often occurs when a hormone's gland of origin is damaged or otherwise impaired. Hypersecretion often results from the proliferation of tumors derived from hormone- secreting cells. Inappropriate hormone levels may also be caused by defects in regulatory feedback loops or in the processing of hormone precursors. Endocrine malfunction may also occur when the target cell fails to respond to the hormone.
Hormones can be classified biochemically as polypeptides, steroids, eicosanoids, or amines. Polypeptides, which include diverse hormones such as insulin and growth hormone, vary in size and function and are often synthesized as inactive precursors that are processed intracellularly into mature, active forms. Amines, which include epinephrine and dopamine, are amino acid derivatives that function in neuroendocrine signaling. Steroids, which include the cholesterol-derived hormones estrogen and testosterone, function in sexual development and reproduction. Eicosanoids, which include prostaglandins and prostacyclins, are fatty acid derivatives that function in a variety of processes. Most polypeptides and some amines are soluble in the circulation where they are highly susceptible to proteolytic degradation within seconds after their secretion. Steroids and lipids are insoluble and must be transported in the circulation by carrier proteins. The following discussion will focus primarily on polypeptide hormones.
Hormones secreted by the hypothalamus and pituitary gland play a critical role in endocrine function by coordinately regulating hormonal secretions from other endocrine glands in response to neural signals. Hypothalamic hormones include thyrotropin-releasing hormone, gonadotropin- releasing hormone, somatostatin, growth-hormone releasing factor, corticotropin-releasing hormone, substance P, dopamine, and prolactin-releasing hormone. These hormones directly regulate the secretion of hormones from the anterior lobe of the pituitary. Hormones secreted by the anterior pituitary include adrenocorticotropic hormone (ACTH), melanocyte-stimulating hormone, somatotropic hormones such as growth hormone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), β- lipotropin, and β-endorphins. These hormones regulate hormonal secretions from the thyroid, pancreas, and adrenal glands, and act directly on the reproductive organs to stimulate ovulation and spermatogenesis. The posterior pituitary synthesizes and secretes antidiuretic hormone (ADH, vasopressin) and oxytocin.
Disorders of the hypothalamus and pituitary often result from lesions such as primary brain tumors, adenomas, infarction associated with pregnancy, hypophysectomy, aneurysms, vascular malformations, thrombosis, infections, immunological disorders, and complications due to head trauma. Such disorders have profound effects on the function of other endocrine glands. Disorders associated with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus,
Kallman's disease, Hand-Schuller-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sella syndrome, and dwarfism. Disorders associated with hyperpituitarism include acromegaly, giantism, and syndrome of inappropriate ADH secretion (SIADH), often caused by benign adenomas.
Hormones secreted by the thyroid and parathyroid primarily control metabolic rates and the regulation of serum calcium levels, respectively. Thyroid hormones include calcitonin, somatostatin, and thyroid hormone. The parathyroid secretes parathyroid hormone. Disorders associated with hypothyroidism include goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism. Disorders associated with hyperthyroidism include thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and
Plummer's disease. Disorders associated with hyperparathyroidism include Conn disease (chronic hypercalemia) leading to bone resorption and parathyroid hyperplasia.
Hormones secreted by the pancreas regulate blood glucose levels by modulating the rates of carbohydrate, fat, and protein metabolism. Pancreatic hormones include insulin, glucagon, amylin, γ- aminobutyric acid, gastrin, somatostatin, and pancreatic polypeptide. The principal disorder associated with pancreatic dysfunction is diabetes mellitus caused by insufficient insulin activity. Diabetes mellitus is generally classified as either Type I (insulin-dependent, juvenile diabetes) or Type JJ (non-insulin-dependent, adult diabetes). The treatment of both forms by insulin replacement therapy is well known. Diabetes mellitus often leads to acute complications such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic complications leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
The anatomy, physiology, and diseases related to hormonal function are reviewed in McCance, K.L. and S.E. Huether (1994) Pathophysiology: The Biological Basis for Disease in Adults and Children, Mosby-Year Book, Inc., St. Louis MO; Greenspan, F.S. and J.D. Baxter (1994) Basic and Clinical Endocrinology, Appleton and Lange, East Norwalk CT. Growth Factors
Growth factors are secreted proteins that mediate intercellular communication. Unlike hormones, which travel great distances via the circulatory system, most growth factors are primarily local mediators that act on neighboring cells. Most growth factors contain a hydrophobic N-terminal signal peptide sequence which directs the growth factor into the secretory pathway. Most growth factors also undergo post-translational modifications within the secretory pathway. These modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond formation. Once secreted, growth factors bind to specific receptors on the surfaces of neighboring target cells, and the bound receptors trigger intracellular signal transduction pathways. These signal transduction pathways elicit specific cellular responses in the target cells. These responses can include the modulation of gene expression and the stimulation or inhibition of cell division, cell differentiation, and cell motility.
Growth factors fall into at least two broad and overlapping classes. The broadest class includes the large polypeptide growth factors, which are wide-ranging in their effects. These factors include epidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor-β (TGF-β), insulin-like growth factor (IGF), nerve growth factor (NGF), and platelet-derived growth factor (PDGF), each defining a family of numerous related factors. The large polypeptide growth factors, with the exception of NGF, act as mitogens on diverse cell types to stimulate wound healing, bone synthesis and remodeling, extracellular matrix synthesis, and proliferation of epithelial, epidermal, and connective tissues. Members of the TGF-β, EGF, and FGF families also function as inductive signals in the differentiation of embryonic tissue. NGF functions specifically as a neurotrophic factor, promoting neuronal growth and differentiation.
Another class of growth factors includes the hematopoietic growth factors, which are narrow in their target specificity. These factors stimulate the proliferation and differentiation of blood cells such as B-lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, macrophages, and their stem cell precursors. These factors include the colony-stimulating factors (G-CSF, M-CSF, GM-CSF, and CSF1-3), erythropoietin, and the cytokines. The cytokines are specialized hematopoietic factors secreted by cells of the immune system and are discussed in detail below. Growth factors play critical roles in neoplastic transformation of cells in vitro and in tumor progression in vivo. Overexpression of the large polypeptide growth factors promotes the proliferation and transformation of cells in culture. Inappropriate expression of these growth factors by tumor cells in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth factors are both structurally and functionally related to oncoproteins, the potentially cancer- 5 causing products of proto-oncogenes. Certain FGF and PDGF family members are themselves homologous to oncoproteins, whereas receptors for some members of the EGF, NGF, and FGF families are encoded by proto-oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors, CRC Press, Ann Arbor MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical 0 Approach, Oxford University Press, New York NY; Habenicht, A., ed. (1990) Growth Factors, Differentiation Factors, and Cytokines, Springer- Verlag, New York NY).
In addition, some of the large polypeptide growth factors play crucial roles in the induction of the primordial germ layers in the developing embryo. This induction ultimately results in the formation of the embryonic mesoderm, ectoderm, and endoderm which in turn provide the framework 5 for the entire adult body plan. Disruption of this inductive process would be catastrophic to embryonic development. Small Peptide Factors - Neuropeptides and Vasomediators
Neuropeptides and vasomediators (NP/VM) comprise a family of small peptide factors, typically of 20 amino acids or less. These factors generally function in neuronal excitation and o inhibition of vasoconstriction/vasodilation, muscle contraction, and hormonal secretions from the brain and other endocrine tissues. Included in this family are neuropeptides and neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin IT and related peptides involved in smooth muscle stimulation, vasopressin, vasoactive intestinal peptide, and circulatory system-borne signaling 5 molecules such as angiotensin, complement, calcitonin, endothelins, formyl-methionyl peptides, glucagon, cholecystokinin, gastrin, and many of the peptide hormones discussed above. NP/VMs can transduce signals directly, modulate the activity or release of other neurotransmitters and hormones, and act as catalytic enzymes in signaling cascades. The effects of NP/VMs range from extremely brief to long-lasting. (Reviewed in Martin, C.R. et al. (1985) Endocrine Physiology, Oxford o University Press, New York NY, pp. 57-62.)
Cytokines
Cytokines comprise a family of signaling molecules that modulate the immune system and the inflammatory response. Cytokines are usually secreted by leukocytes, or white blood cells, in response to injury or infection. Cytokines function as growth and differentiation factors that act 5 primarily on cells of the immune system such as B- and T-lymphocytes, monocytes, macrophages, and granulocytes. Like other signaling molecules, cytokines bind to specific plasma membrane receptors and trigger intracellular signal transduction pathways which alter gene expression patterns. There is considerable potential for the use of cytokines in the treatment of inflammation and immune system disorders.
Cytokine structure and function have been extensively characterized in vitro. Most cytokines 5 are small polypeptides of about 30 kilodaltons or less. Over 50 cytokines have been identified from human and rodent sources. Examples of cytokine subfamilies include the interferons (IFN-α, -β, and -γ), the interleukins (IL1-IL13), the tumor necrosis factors (TNF-α and -β), and the chemokines. Many cytokines have been produced using recombinant DNA techniques, and the activities of individual cytokines have been determined in vitro. These activities include regulation of leukocyte 0 proliferation, differentiation, and motility.
The activity of an individual cytokine in vitro may not reflect the full scope of that cytokine' s activity in vivo. Cytokines are not expressed individually in vivo but are instead expressed in combination with a multitude of other cytokines when the organism is challenged with a stimulus. Together, these cytokines collectively modulate the immune response in a manner appropriate for that 5 particular stimulus. Therefore, the physiological activity of a cytokine is determined by the stimulus itself and by complex interactive networks among co-expressed cytokines which may demonstrate both synergistic and antagonistic relationships.
Chemokines comprise a cytokine subfamily with over 30 members. (Reviewed in Wells, T. N.C. and M.C. Peitsch (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initially identified as o chemotactic proteins that recruit monocytes and macrophages to sites of inflammation. Recent evidence indicates that chemokines may also play key roles in hematopoiesis and FflV-1 infection. Chemokines are small proteins which range from about 6-15 kilodaltons in molecular weight. Chemokines are further classified as C, CC, CXC, or CX3C based on the number and position of critical cysteine residues. The CC chemokines, for example, each contain a conserved motif 5 consisting of two consecutive cysteines followed by two additional cysteines which occur downstream at 24- and 16-residue intervals, respectively (ExPASy PROSITE database, documents PS00472 and PDOC00434). The presence and spacing of these four cysteine residues are highly conserved, whereas the intervening residues diverge significantly. However, a conserved tyrosine located about 15 residues downstream of the cysteine doublet seems to be important for chemotactic o activity. Most of the human genes encoding CC chemokines are clustered on chromosome 17, although there are a few examples of CC chemokine genes that map elsewhere. Other chemokines include lymphotactin (C chemokine); macrophage chemotactic and activating factor (MCAF/MCP-1; CC chemokine); platelet factor 4 and JL-8 (CXC chemokines); and fractalkine and neurotractin (CX3C chemokines). (Reviewed in Luster, A.D. (1998) N. Engl. J. Med. 338:436-445.) 5
Receptor Molecules The term receptor describes proteins that specifically recognize other molecules. The category is broad and includes proteins with a variety of functions. The bulk of receptors are cell surface proteins which bind extracellular ligands and produce cellular responses in the areas of growth, differentiation, endocytosis, and immune response. Other receptors facilitate the selective transport of proteins out of the endoplasmic reticulum and localize enzymes to particular locations in the cell. The term may also be applied to proteins which act as receptors for ligands with known or unknown chemical composition and which interact with other cellular components. For example, the steroid hormone receptors bind to and regulate transcription of DNA.
Regulation of cell proliferation, differentiation, and migration is important for the formation and function of tissues. Regulatory proteins such as growth factors coordinately control these cellular processes and act as mediators in cell-cell signaling pathways. Growth factors are secreted proteins that bind to specific cell-surface receptors on target cells. The bound receptors trigger intracellular signal transduction pathways which activate various downstream effectors that regulate gene expression, cell division, cell differentiation, cell motility, and other cellular processes. Cell surface receptors are typically integral plasma membrane proteins. These receptors recognize hormones such as catecholamines; peptide hormones; growth and differentiation factors; small peptide factors such as thyrotropin-releasing hormone; galanin, somatostatin, and tachykinins; and circulatory system-borne signaling molecules. Cell surface receptors on immune system cells recognize antigens, antibodies, and major histocompatibility complex (MHC)-bound peptides. Other cell surface receptors bind ligands to be internalized by the cell. This receptor-mediated endocytosis functions in the uptake of low density lipoproteins (LDL), transferrin, glucose- or mannose-terminal glycoproteins, galactose-terminal glycoproteins, immunoglobulins, phosphovitellogenins, fibrin, proteinase-inhibitor complexes, plasminogen activators, and thrombospondin (Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, p. 723; Mikhailenko, I. et al. (1997) J. Biol. Chem. 272:6784-6791). Receptor Protein Kinases
Many growth factor receptors, including receptors for epidermal growth factor, platelet-derived growth factor, fibroblast growth factor, as well as the growth modulator α-thrombin, contain intrinsic protein kinase activities. When growth factor binds to the receptor, it triggers the autophosphorylation of a serine, threonine, or tyrosine residue on the receptor. These phosphorylated sites are recognition sites for the binding of other cytoplasmic signaling proteins. These proteins participate in signaling pathways that eventually link the initial receptor activation at the cell surface to the activation of a specific intracellular target molecule. In the case of tyrosine residue autophosphorylation, these signaling proteins contain a common domain referred to as a Src homology (SH) domain. SH2 domains and SH3 domains are found in phospholipase C-γ, PI-3-K p85 regulatory subunit, Ras-GTPase activating protein, and pp60Q"src (Lowenstein, E.J. et al. (1992) Cell 70:431-442). The cytokine family of receptors share a different common binding domain and include transmembrane receptors for growth hormone (GH), interleukins, erythropoietin, and prolactin. Other receptors and second messenger-binding proteins have intrinsic serine/threonine protein kinase activity. These include activin/TGF-β/BMP-superfamily receptors, calcium- and 5 diacylglycerol-activated/phospholipid-dependant protein kinase (PK-C), and RNA-dependant protein kinase (PK-R). In addition, other serine/threonine protein kinases, including nematode Twitchin, have fibronectin-like, immunoglobulin C2-like domains. G-Protein Coupled Receptors
G-protein coupled receptors (GPCRs) are integral membrane proteins characterized by the 0 presence of seven hydrophobic transmembrane domains which span the plasma membrane and form a bundle of antiparallel alpha (α) helices, These proteins range in size from under 400 to over 1000 amino acids (Strosberg, A.D. (1991) Eur. J. Biochem. 196:1-10; Coughlin, S.R. (1994) Curr. Opin. Cell Biol. 6: 191-197). The amino-terminus of the GPCR is extracellular, of variable length and often glycosylated; the carboxy-terminus is cytoplasmic and generally phosphorylated. Extracellular loops 5 of the GPCR alternate with intracellular loops and link the transmembrane domains. The most conserved domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops. The transmembrane domains account for structural and functional features of the receptor. In most cases, the bundle of α helices forms a binding pocket. In addition, the extracellular N-terminal segment or one or more of the three extracellular loops may also participate in ligand binding. o Ligand binding activates the receptor by inducing a conformational change in intracellular portions of the receptor. The activated receptor, in turn, interacts with an intracellular heterotrimeric guanine nucleotide binding (G) protein complex which mediates further intracellular signaling activities, generally the production of second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate, or interactions with ion channel proteins (Baldwin, J.M. (1994) Curr. Opin. Cell 5 Biol. 6:180-190).
GPCRs include those for acetylcholine, adenosine, epinephrine and norepinephrine, bombesin, bradykinin, chemokines, dopamine, endothelin, γ-aminobutyric acid (GABA), follicle- stimulating hormone (FSH), glutamate, gonadotropin-releasing hormone (GnRH), hepatocyte growth factor, histamine, leukotrienes, melanocortins, neuropeptide Y, opioid peptides, opsins, prostanoids, o serotonin, somatostatin, tachykinins, thrombin, thyrotropin-releasing hormone (TRH), vasoactive intestinal polypeptide family, vasopressin and oxytocin, and orphan receptors.
GPCR mutations, which may cause loss of function or constitutive activation, have been associated with numerous human diseases (Coughlin, supra). For instance, retinitis pigmentosa may arise from mutations in the rhodopsin gene. Rhodopsin is the retinal photoreceptor which is located 5 within the discs of the eye rod cell. Parma, J. et al. (1993, Nature 365:649-651) report that somatic activating mutations in the thyrotropin receptor cause hyperfunctioning thyroid adenomas and suggest that certain GPCRs susceptible to constitutive activation may behave as protooncogenes. Nuclear Receptors
Nuclear receptors bind small molecules such as hormones or second messengers, leading to increased receptor-binding affinity to specific chromosomal DNA elements. In addition the affinity for other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate and modulate gene expression. Examples of such receptors include the steroid hormone receptors family, the retinoic acid receptors family, and the thyroid hormone receptors family. Ligand-Gated Receptor Ion Channels
Ligand-gated receptor ion channels fall into two categories. The first category, extracellular ligand-gated receptor ion channels (ELGs), rapidly transduce neurotransmitter-binding events into electrical signals, such as fast synaptic neurotransmission. ELG function is regulated by post- translational modification. The second category, intracellular ligand-gated receptor ion channels (ILGs), are activated by many intracellular second messengers and do not require post-translational modification(s) to effect a channel-opening response. ELGs depolarize excitable cells to the threshold of action potential generation. Jxi non- excitable cells, ELGs permit a limited calcium ion-influx during the presence of agonist. ELGs include channels directly gated by neurotransmitters such as acetylcholine, L-glutamate, glycine, ATP, serotonin, GABA, and histamine. ELG genes encode proteins having strong structural and functional similarities. ILGs are encoded by distinct and unrelated gene families and include receptors for cAMP, cGMP, calcium ions, ATP, and metabolites of aracbidonic acid.
Macrophage Scavenger Receptors
Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens. Scavenger receptors types I and JJ are trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a transmembrane domain, a large extracellular domain, and a C-terminal cysteine-rich domain. The extracellular domain contains a short spacer domain, an α-helical coiled-coil domain, and a triple helical collagenous domain. These receptors have been shown to bind a spectrum of ligands, including chemically modified lipoproteins and albumin, polyribonucleotides, polysaccharides, phospholipids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. (1995) Cell 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. T-Cell Receptors
T cells play a dual role in the immune system as effectors and regulators, coupling antigen recognition with the transmission of signals that induce cell death in infected cells and stimulate proliferation of other immune cells. Although a population of T cells can recognize a wide range of different antigens, an individual T cell can only recognize a single antigen and only when it is presented to the T cell receptor (TCR) as a peptide complexed with a major histocompatibility molecule (MHC) on the surface of an antigen presenting cell. The TCR on most T cells consists of immunoglobulin-like integral membrane glycoproteins containing two polypeptide subunits, α and β, 5 of similar molecular weight. Both TCR subunits have an extracellular domain containing both variable and constant regions, a transmembrane domain that traverses the membrane once, and a short intracellular domain (Saito, H. et al. (1984) Nature 309:757-762). The genes for the TCR subunits are constructed through somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context with the TCR initiates signaling cascades that induce the proliferation, o maturation, and function of cellular components of the immune system (Weiss, A. (1991) Annu. Rev. Genet. 25:487-510). Rearrangements in TCR genes and alterations in TCR expression have been noted in lymphomas, leukemias, autoimmune disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. Engl. J. Med. 313:529-533; Weiss, supra).
5 Intracellular Signaling Molecules
Intracellular signaling is the general process by which cells respond to extracellular signals (hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of biochemical reactions that begins with the binding of a signaling molecule to a cell membrane receptor and ends with the activation of an intracellular target molecule. Intermediate steps in the o process involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, and their deactivation by protein phosphatases, and the eventual translocation of some of these activated proteins to the cell nucleus where the transcription of specific genes is triggered. The intracellular signaling process regulates all types of cell functions including cell proliferation, cell differentiation, and gene transcription, and involves a diversity of molecules including protein 5 kinases and phosphatases, and second messenger molecules, such as cyclic nucleotides, calcium- calmodulin, inositol, and various mitogens, that regulate protein phosphorylation. Protein Phosphorylation
Protein kinases and phosphatases play a key role in the intracellular signaling process by controlling the phosphorylation and activation of various signaling proteins. The high energy o phosphate for this reaction is generally transferred from the adenosine triphosphate molecule (ATP) to a particular protein by a protein kinase and removed from that protein by a protein phosphatase. Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serine/threonine kinases, STK). A few protein kinases have dual specificity for serine/threonine and 5 tyrosine residues. Almost all kinases contain a conserved 250-300 amino acid catalytic domain containing specific residues and sequence motifs characteristic of the kinase family (Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Books, Vol 1:7-20, Academic Press, San Diego CA). STKs include the second messenger dependent protein kinases such as the cyclic-AMP dependent protein kinases (PKA), involved in mediating hormone-induced cellular responses; calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle contraction, glycogen breakdown, and neurotransmission; and the mitogen-activated protein kinases (MAP) which mediate signal transduction from the cell surface to the nucleus via phosphorylation cascades. Altered PKA expression is implicated in a variety of disorders and diseases including cancer, thyroid disorders, diabetes, atherosclerosis, and cardiovascular disease (Isselbacher, K . et al. (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, New York NY, pp. 416-431, 1887). PTKs are divided into transmembrane, receptor PTKs and nontransmembrane, non-receptor
PTKs. Transmembrane PTKs are receptors for most growth factors. Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intracellular regions of cell surface receptors. Receptors that function through non-receptor PTKs include those for cytokines and hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes. Many of these PTKs were first identified as the products of mutant oncogenes in cancer cells in which their activation was no longer subject to normal cellular controls. In fact, about one third of the known oncogenes encode PTKs, and it is well known that cellular transformation (oncogenesis) is often accompanied by increased tyrosine phosphorylation activity (Charbonneau, H. and N.K. Tonks (1992) Annu. Rev. Cell Biol. 8:463-493). An additional family of protein kinases previously thought to exist only in procaryotes is the histidine protein kinase family (HPK). HPKs bear little homology with mammalian STKs or PTKs but have distinctive sequence motifs of their own (Davie, J.R. et al. (1995) J. Biol. Chem. 270:19861-19867). A histidine residue in the N-terminal half of the molecule (region I) is an autophosphorylation site. Three additional motifs located in the C-terminal half of the molecule include an invariant asparagine residue in region II and two glycine-rich loops characteristic of nucleotide binding domains in regions III and IV. Recently a branched chain alpha-ketoacid dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra).
Protein phosphatases regulate the effects of protein kinases by removing phosphate groups from molecules previously activated by kinases. The two principal categories of protein phosphatases are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs). PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508). PTPs reverse the effects of protein tyrosine kinases and play a significant role in cell cycle and cell signaling processes (Charbonneau, supra). As previously noted, many PTKs are encoded by oncogenes, and oncogenesis is often accompanied by increased tyrosine phosphorylation activity. It is therefore possible that PTPs may prevent or reverse cell transformation and the growth of various cancers by controlling the levels of tyrosine phosphorylation in cells. This hypothesis is supported by studies showing that overexpression of PTPs can suppress transformation in cells, and that specific inhibition of PTPs can enhance cell transformation (Charbonneau, supra). Phospholipid and Inositol-Phosphate Signaling Inositol phospholipids (phosphoinositides) are involved in an intracellular signaling pathway that begins with binding of a signaling molecule to a G-protein linked receptor in the plasma membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the inner side of the plasma membrane to the biphosphate state (PJ 2) by inositol kinases. Simultaneously, the G- protein linked receptor binding stimulates a trimeric G-protein which in turn activates a phosphoinositide-specific phospholipase C-β. Phospholipase C-β then cleaves PIP2 into two products, inositol triphosphate (IP3) and diacylglycerol. These two products act as mediators for separate signaling events. IP3 diffuses through the plasma membrane to induce calcium release from the endoplasmic reticulum (ER), while diacylglycerol remains in the membrane and helps activate protein kinase C, an STK that phosphorylates selected proteins in the target cell. The calcium response initiated by IP3 is teπninated by the dephosphorylation of IP3 by specific inositol phosphatases. Cellular responses that are mediated by this pathway are glycogen breakdown in the liver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and thrombin-induced platelet aggregation. Cyclic Nucleotide Signaling Cyclic nucleotides (cAMP and cGMP) function as intracellular second messengers to transduce a variety of extracellular signals including hormones, light, and neurotransmitters. In particular, cyclic-AMP dependent protein kinases (PKA) are thought to account for all of the effects of cAMP in most mammalian cells, including various hormone-induced cellular responses. Visual excitation and the phototransmission of light signals in the eye is controlled by cyclic-GMP regulated, Ca2+-specific channels. Because of the importance of cellular levels of cyclic nucleotides in mediating these various responses, regulating the synthesis and breakdown of cyclic nucleotides is an important matter. Thus adenylyl cyclase, which synthesizes cAMP from AMP, is activated to increase cAMP levels in muscle by binding of adrenaline to β-andrenergic receptors, while activation of guanylate cyclase and increased cGMP levels in photoreςeptors leads to reopening of the Ca2+-specific channels and recovery of the dark state in the eye. In contrast, hydrolysis of cyclic nucleotides by cAMP and cGMP-specific phosphodiesterases (PDEs) produces the opposite of these and other effects mediated by increased cyclic nucleotide levels. PDEs appear to be particularly important in the regulation of cyclic nucleotides, considering the diversity found in this family of proteins. At least seven families of mammalian PDEs (PDE1-7) have been identified based on substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, J.A. (1995) Physiological Reviews 75:725-748). PDE inhibitors have been found to be particularly useful in treating various clinical disorders. Rolipram, a specific inhibitor of PDE4, has been used in the treatment of depression, and similar inhibitors are undergoing evaluation as anti-inflammatory agents. Theophylline is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and other respiratory diseases (Banner, K.H. and C.P. Page (1995) Eur. Respir. J. 8:996-1000). G-Protein Signaling
Guanine nucleotide binding proteins (G-proteins) are critical mediators of signal transduction between a particular class of extracellular receptors, the G-protein coupled receptors (GPCR), and intracellular second messengers such as cAMP and Ca2+. G-proteins are linked to the cytosolic side of a GPCR such that activation of the GPCR by ligand binding stimulates binding of the G-protein to GTP, inducing an "active" state in the G-protein. In the active state, the G-protein acts as a signal to trigger other events in the cell such as the increase of cAMP levels or the release of Ca2+ into the cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intracellular proteins. Recycling of the G-protein to the inactive state involves hydrolysis of the bound GTP to GDP by a GTPase activity in the G-protein. (See Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp.734-759.) Two structurally distinct classes of G- proteins are recognized: heterotrimeric G-proteins, consisting of three different subunits, and monomeric, low molecular weight (LMW), G-proteins consisting of a single polypeptide chain.
The three polypeptide subunits of heterotrimeric G-proteins are the , β, and γ subunits. The subunit binds and hydrolyzes GTP. The β and γ subunits form a tight complex that anchors the protein to the inner side of the plasma membrane. The β subunits, also known as G-β proteins or β transducins, contain seven tandem repeats of the WD-repeat sequence motif, a motif found in many proteins with regulatory functions. Mutations and variant expression of β transducin proteins are linked with various disorders (Neer, E.J. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1*998) Mol. Cell 1:565-574). LMW GTP-proteins are GTPases which regulate cell growth, cell cycle control, protein secretion, and intracellular vesicle interaction. They consist of single polypeptides which, like the subunit of the heterotrimeric G-proteins, are able to bind and hydrolyze GTP, thus cycling between an inactive and an active state. At least sixty members of the LMW G-protein superfamily have been identified and are currently grouped into the six subfamilies of ras, rho, arf, sari, ran, and rab. Activated ras genes were initially found in human cancers, and subsequent studies confirmed that ras function is critical in determining whether cells continue to grow or become differentiated. Other members of the LMW G-protein superfamily have roles in signal transduction that vary with the function of the activated genes and the locations of the G-proteins.
Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by determining whether GTP or GDP is bound. GTPase-activating protein (GAP) binds to GTP-ras and induces it to hydrolyze GTP to GDP. In contrast, guanine nucleotide releasing protein (GNRP) binds to GDP-ras and induces the release of GDP and the binding of GTP.
Other regulators of G-protein signaling (RGS) also exist that act primarily by negatively regulating the G-protein pathway by an unknown mechanism (Druey, K.M. et al. (1996) Nature 379:742-746). Some 15 members of the RGS family have been identified. RGS family members are related structurally through similarities in an approximately 120 amino acid region termed the RGS domain and functionally by their ability to inhibit the interleukin (cytokine) induction of MAP kinase in cultured mammalian 293T cells (Druey, supra). Calcium Signaling Molecules Ca+2 is another second messenger molecule that is even more widely used as an intracellular mediator than cAMP. Two pathways exist by which Ca+2 can enter the cytosol in response to extracellular signals: One pathway acts primarily in nerve signal transduction where Ca+2 enters a nerve terminal through a voltage-gated Ca+2 channel. The second is a more ubiquitous pathway in which Ca+2 is released from the ER into the cytosol in response to binding of an extracellular signaling molecule to a receptor. Ca2+ directly activates regulatory enzymes, such as protein kinase C, which trigger signal transduction pathways. Ca2+ also binds to specific Ca2+-binding proteins (CBPs) such as calmodulin (CaM) which then activate multiple target proteins in the cell including enzymes, membrane transport pumps, and ion channels. CaM interactions are involved in a multitude of cellular processes including, but not limited to, gene regulation, DNA synthesis, cell cycle progression, mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion homeostasis, exocytosis, and metabolic regulation (Celio, M.R. et al. (1996) Guidebook to Calcium-binding Proteins, Oxford University Press, Oxford, UK, pp. 15-20). Some CBPs can serve as a storage depot for Ca2+ in an inactive state. Calsequestrin is one such CBP that is expressed in isoforms specific to cardiac muscle and skeletal muscle. It is suggested that calsequestrin binds Ca2+ in a rapidly exchangeable state that is released during Ca2+ -signaling conditions (Celio, M.R. et al. (1996) Guidebook to Calcium-binding Proteins, Oxford University Press, New York NY, pp. 222- 224). Cyclins
Cell division is the fundamental process by which all living things grow and reproduce. In most organisms, the cell cycle consists of three principle steps; interphase, mitosis, and cytokinesis. Interphase, involves preparations for cell division, replication of the DNA and production of essential proteins. In mitosis, the nuclear material is divided and separates to opposite sides of the cell. Cytokinesis is the final division and fission of the cell cytoplasm to produce the daughter cells.
The entry and exit of a cell from mitosis is regulated by the synthesis and destruction of a family of activating proteins called cyclins. Cyclins act by binding to and activating a group of cyclin-dependent protein kinases (Cdks) which then phosphorylate and activate selected proteins involved in the mitotic process. Several types of cyclins exist. (Ciechanover, A. (1994) Cell 79:13-21.) Two principle types are mitotic cyclin, or cyclin B, which controls entry of the cell into mitosis, and Gl cyclin, which controls events that drive the cell out of mitosis. Signal Complex Scaffolding Proteins 5 Ceretain proteins in intracellular signaling pathways serve to link or cluster other proteins involved in the signaling cascade. A conserved protein domain called the PDZ domain has been identified in various membrane-associated signaling proteins. This domain has been implicated in receptor and ion channel clustering and in the targeting of multiprotein signaling complexes to specialized functional regions of the cytosolic face of the plasma membrane. (For a review of PDZ 0 domain-containing proteins, see Ponting, C.P. et al. (1997) Bioessays 19:469-479.) A large proportion of PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate kinase) protein family, members of which bind to the intracellular domains of receptors and channels. However, PDZ domains are also found in diverse membrane-localized proteins such as protein tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins 5 such as syntrophins and neuronal nitric oxide synthase (nNOS). Generally, about one to three PDZ domains are found in a given protein, although up to nine PDZ domains have been identified in a single protein.
Membrane Transport Molecules o The plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm and the extracellular environment, and between the cytoplasm and lumenal spaces of cellular organelles requires specific transport proteins. Each transport protein carries a particular class of molecule, such as ions, sugars, or amino acids, and often is specific to a certain molecular species of the class. A variety of human inherited diseases are caused by a mutation in a transport protein. For 5 example, cystinuria is an inherited disease that results from the inability to transport cystine, the disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cystiήe in the urine leads to the formation of cystine stones in the kidneys.
Transport proteins are multi-pass transmembrane proteins, which either actively transport molecules across the membrane or passively allow them to cross. Active transport involves o directional pumping of a solute across the membrane, usually against an electrochemical gradient.
Active transport is tightly coupled to a source of metabolic energy, such as ATP hydrolysis or an electrochemically favorable ion gradient. Passive transport involves the movement of a solute down its electrochemical gradient. Transport proteins can be further classified as either carrier proteins or channel proteins. Carrier proteins, which can function in active or passive transport, bind to a 5 specific solute to be transported and undergo a conformational change which transfers the bound solute across the membrane. Channel proteins, which only function in passive transport, form hydrophilic pores across the membrane. When the pores open, specific solutes, such as inorganic ions, pass through the membrane and down the electrochemical gradient of the solute.
Carrier proteins which transport a single solute from one side of the membrane to the other are called uniporters. In contrast, coupled transporters link the transfer of one solute with simultaneous or sequential transfer of a second solute, either in the same direction (symport) or in the opposite direction (antiport). For example, intestinal and kidney epithelium contains a variety of symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium moves into the cell down its electrochemical gradient and brings the solute into the cell with it. The sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous Na+/K+ ATPase. Sodium-coupled transporters include the mammalian glucose transporter (SGLTl), iodide transporter (NTS), and multivitamin transporter (SMVT). All three transporters have twelve putative transmembrane segments, extracellular glycosylation sites, and cytoplasmically-oriented N- and C-termini. NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for specific targeting of radioisotopes to the thyroid gland (Levy, O. et al. (1997) Proc. Natl. Acad. Sci. USA 94:5568-5573). SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is implicated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et al. (1998) J. Biol. Chem. 273:7501-7506).
Transporters play a major role in the regulation of pH, excretion of drugs, and the cellular K+/Na+ balance. Monocarboxylate anion transporters are proton-coupled symporters with a broad substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and beta-hydroxybutyrate. At least seven isoforms have been identified to date. The isoforms are predicted to have twelve transmembrane (TM) helical domains with a large intracellular loop between TM6 and TM7, and play a critical role in maintaining intracellular pH by removing the protons that are produced stoichiometrically with lactate during glycolysis. The best characterized
H(+)-monocarboxylate transporter is that of the erythrocyte membrane, which transports L-lactate and a wide range of other aliphatic monocarboxylates. Other cells possess H(+)-linked monocarboxylate transporters with differing substrate and inhibitor selectivities. In particular, cardiac muscle and tumor cells have transporters that differ in their Km values for certain substrates, including stereoselectivity for L- over D-lactate, and in their sensitivity to inhibitors. There are Na(+)-monocarboxylate cotransporters on the luminal surface of intestinal and kidney epithelia, which allow the uptake of lactate, pyruvate, and ketone bodies in these tissues. In addition, there are specific and selective transporters for organic cations and organic anions in organs including the kidney, intestine and liver. Organic anion transporters are selective for hydrophobic, charged molecules with electron-attracting side groups. Organic cation transporters, such as the ammonium transporter, mediate the secretion of a variety of drugs and endogenous metabolites, and contribute to the maintenance of intercellular pH. (Poole, R.C. and A.P. Halestrap (1993) Am. J. Physiol. 264:C761-C782; Price, N.T. et al. (1998) Biochem. J. 329:321-328; and Martinelle, K. and I. Haggstrom (1993) J. Biotechnol. 30: 339-350.)
The largest and most diverse family of transport proteins known is the ATP-binding cassette (ABC) transporters. As a family, ABC transporters can transport substances that differ markedly in chemical structure and size, ranging from small molecules such as ions, sugars, amino acids, peptides, and phospholipids, to lipopeptides, large proteins, and complex hydrophobic drugs. ABC proteins consist of four modules: two nucleotide-binding domains (NBD), which hydrolyze ATP to supply the energy required for transport, and two membrane-spanning domains (MSD), each containing six putative transmembrane segments. These four modules may be encoded by a single gene, as is the case for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes. When encoded by separate genes, each gene product contains a single NBD and MSD. These "half-molecules" form homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major histocompatibility (MHC) peptide transport system. Several genetic diseases are attributed to defects in ABC transporters, such as the following diseases and their corresponding proteins: cystic fibrosis (CFTR, an ion channel), adrenoleukodystrophy (adrenoleukodystrophy protein, ALDP), Zellweger syndrome (peroxisomal membrane protein-70, PMP70), and hyperinsulinemic hypoglycemia (sulfonylurea receptor, SUR). Overexpression of the multidrug resistance (MDR) protein, another ABC transporter, in human cancer cells makes the cells resistant to a variety of cytotoxic drugs used in chemotherapy (Taglight, D. and S. Michaelis ( 1998) Meth. Enzymol. 292: 131-163).
Transport of fatty acids across the plasma membrane can occur by diffusion, a high capacity, low affinity process. However, under normal physiological conditions a significant fraction of fatty acid transport appears to occur via a high affinity, low capacity protein-mediated transport process. Fatty acid transport protein (FATP), an integral membrane protein with four transmembrane segments, is expressed in tissues exhibiting high levels of plasma membrane fatty acid flux, such as muscle, heart, and adipose. Expression of FATP is upregulated in 3T3-L1 cells during adipose conversion, and expression in COS7 fibroblasts elevates uptake of long-chain fatty acids (Hui, T.Y. et al. (1998) J. Biol. Chem. 273:27420-27429). Ion Channels The electrical potential of a cell is generated and maintained by controlling the movement of ions across the plasma membrane. The movement of ions requires ion channels, which form an ion- selective pore within the membrane. There are two basic types of ion channels, ion transporters and gated ion channels. Ion transporters utilize the energy obtained from ATP hydrolysis to actively transport an ion against the ion's concentration gradient. Gated ion channels allow passive flow of an ion down the ion' s electrochemical gradient under restricted conditions. Together, these types of ion channels generate, maintain, and utilize an electrochemical gradient that is used in 1) electrical impulse conduction down the axon of a nerve cell, 2) transport of molecules into cells against concentration gradients, 3) initiation of muscle contraction, and 4) endocrine cell secretion.
Ion transporters generate and maintain the resting electrical potential of a cell. Utilizing the energy derived from ATP hydrolysis, they transport ions against the ion's concentration gradient. These transmembrane ATPases are divided into three families. The phosphorylated (P) class ion transporters, including Na+-K+ ATPase, Ca2+-ATPase, and H+-ATPase, are activated by a phosphorylation event. P-class ion transporters are responsible for maintaining resting potential distributions such that cytosolic concentrations of Na+ and Ca2+ are low and cytosolic concentration of K+ is high. The vacuolar (V) class of ion transporters includes H+ pumps on intracellular organelles, such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within the lumen of these organelles that is required for function. The coupling factor (F) class consists of H+ pumps in the mitochondria. F-class ion transporters utilize a proton gradient to generate ATP from ADP and inorganic phosphate (P;).
The resting potential of the cell is utilized in many processes involving carrier proteins and gated ion channels. Carrier proteins utilize the resting potential to transport molecules into and out of the cell. Amino acid and glucose transport into many cells is linked to sodium ion co-transport (symport) so that the movement of Na+ down an electrochemical gradient drives transport of the other molecule up a concentration gradient. Similarly, cardiac muscle links transfer of Ca2+ out of the cell with transport of Na+ into the cell (antiport). Ion channels share common structural and mechanistic themes. The channel consists of four or five subunits or protein monomers that are arranged like a barrel in the plasma membrane. Each subunit typically consists of six potential transmembrane segments (SI, S2, S3, S4, S5, and S6). The center of the barrel forms a pore lined by α-helices or β-strands. The side chains of the amino acid residues comprising the α-helices or β-strands establish the charge (cation or anion) selectivity of the channel. The degree of selectivity, or what specific ions are allowed to pass through the channel, depends on the diameter of the narrowest part of the pore.
Gated ion channels control ion flow by regulating the opening and closing of pores. These channels are categorized according to the manner of regulating the gating function. Mechanically- gated channels open pores in response to mechanical stress, voltage-gated channels open pores in response to changes in membrane potential, and ligand-gated channels open pores in the presence of a specific ion, nucleotide, or neurotransmitter.
Voltage-gated Na+ and K+ channels are necessary for the function of electrically excitable cells, such as nerve and muscle cells. Action potentials, which lead to neurotransmitter release and muscle contraction, arise from large, transient changes in the permeability of the membrane to Na+ and K+ ions. Depolarization of the membrane beyond the threshold level opens voltage-gated Na+ channels. Sodium ions flow into the cell, further depolarizing the membrane and opening more voltage-gated Na+ channels, which propagates the depolarization down the length of the cell. Depolarization also opens voltage-gated potassium channels. Consequently, potassium ions flow outward, which leads to repolarization of the membrane. Voltage-gated channels utilize charged residues in the fourth transmembrane segment (S4) to sense voltage change. The open state lasts only about 1 millisecond, at which time the channel spontaneously converts into an inactive state that cannot be opened irrespective of the membrane potential. Inactivation is mediated by the channel's N-terminus, which acts as a plug that closes the pore. The transition from an inactive to a closed state requires a return to resting potential.
Voltage-gated Na+ channels are heterotrimeric complexes composed of a 260 kDa pore forming α subunit that associates with two smaller auxiliary subunits, βl and β2. The β2 subunit is an integral membrane glycoprotein that contains an extracellular Ig domain, and its association with α and βl subunits correlates with increased functional expression of the channel, a change in its gating properties, and an increase in whole cell capacitance due to an increase in membrane surface area. (Isom, L.L. et al. (1995) Cell 83:433-442.) Voltage-gated Ca2+ channels are involved in presynaptic neurotransmitter release, and heart and skeletal muscle contraction. The voltage-gated Ca2+ channels from skeletal muscle (L-type) and brain (N-type) have been purified, and though their functions differ dramatically, they have similar subunit compositions. The channels are composed of three subunits. The αj subunit forms the membrane pore and voltage sensor, while the α2δ and β subunits modulate the voltage-dependence, gating properties, and the current amplitude of the channel. These subunits are encoded by at least six αls one α2δ, and four β genes. A fourth subunit, γ, has been identified in skeletal muscle. (Walker, D. et al. (1998) J. Biol. Chem. 273:2361-2367; and Jay, S.D. et al. (1990) Science 248:490- 492.)
Chloride channels are necessary in endocrine secretion and in regulation of cytosolic and organelle pH. In secretory epithelial cells, Cl " enters the cell across a basolateral membrane through an Na+, K7C1" cotransporter, accumulating in the cell above its electrochemical equilibrium concentration. Secretion of Cl " from the apical surface, in response to hormonal stimulation, leads to flow of Na+ and water into the secretory lumen. The cystic fibrosis transmembrane conductance regulator (CFTR) is a chloride channel encoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans. Loss of CFTR function decreases transepithelial water secretion and, as a result, the layers of mucus that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating "chronic obstructive pulmonary disease" (Al- Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266). Many intracellular organelles contain H+-ATPase pumps that generate transmembrane pH and electrochemical differences by moving protons from the cytosol to the organelle lumen. If the membrane of the organelle is permeable to other ions, then the electrochemical gradient can be abrogated without affecting the pH differential. In fact, removal of the electrochemical barrier allows more H+ to be pumped across the membrane, increasing the pH differential. Cl" is the sole counterion of H+ translocation in a number of organelles, including chromaffin granules, Golgi vesicles, lysosomes, and endosomes. Functions that require a low vacuolar pH include uptake of small molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as pro-hormones by proteolytic enzymes, and protein degradation in lysosomes (Al-Awqati, supra).
Ligand-gated channels open their pores when an extracellular or intracellular mediator binds to the channel. .Neurotransmitter-gated channels are channels that open when a neurotransmitter binds to their extracellular domain. These channels exist in the postsynaptic membrane of nerve or muscle cells. There are two types of neurotransmitter-gated channels. Sodium channels open in response to excitatory neurotransmitters, such as acetylcholine, glutamate, and serotonin. This opening causes an influx of Na + and produces the initial localized depolarization that activates the voltage-gated channels and starts the action potential. Chloride channels open in response to inhibitory neurotransmitters, such as γ-aminobutyric acid (GABA) and glycine, leading to hyperpolarization of the membrane and the subsequent generation of an action potential.
Ligand-gated channels can be regulated by intracellular second messengers. Calcium- activated K+ channels are gated by internal calcium ions. In nerve cells, an influx of calcium during depolarization opens K+ channels to modulate the magnitude of the action potential (Ishi, T.M. et al. (1997) Proc. Natl. Acad. Sci. USA 94:11651-11656). Cyclic nucleotide-gated (CNG) channels are gated by cytosolic cyclic nucleotides. The best examples of these are the cAMP-gated Na+ channels involved in olfaction and the cGMP-gated cation channels involved in vision. Both systems involve ligand-mediated activation of a G-protein coupled receptor which then alters the level of cyclic nucleotide within the cell.
Ion channels are expressed in a number of tissues where they are implicated in a variety of processes. CNG channels, while abundantly expressed in photoreceptor and olfactory sensory cells, are also found in kidney, lung, pineal, retinal ganglion cells, testis, aorta, and brain. Calcium- activated K + channels may be responsible for the vasodilatory effects of bradykinin in the kidney and for shunting excess K+ from brain capillary endothelial cells into the blood. They are also implicated in repolarizing granulocytes after agonist-stimulated depolarization (Ishi, supra). Ion channels have been the target for many drug therapies. Neurotransmitter-gated channels have been targeted in therapies for treatment of insomnia, anxiety, depression, and schizophrenia. Voltage-gated channels have been targeted in therapies for arrhythmia, ischemic stroke, head trauma, and neurodegenerative disease (Taylor, CP. and L.S. Narasimhan (1997) Adv. Pharmacol. 39:47-98). Disease Correlation The etiology of numerous human diseases and disorders can be attributed to defects in the transport of molecules across membranes. Defects in the trafficking of membrane-bound transporters and ion channels are associated with several disorders, e.g. cystic fibrosis, glucose-galactose malabsorption syndrome, hypercholesterolemia, von Gierke disease, and certain forms of diabetes mellitus. Single-gene defect diseases resulting in an inability to transport small molecules across membranes include, e.g., cystinuria, iminoglycinuria, Hartup disease, and Fanconi disease (van't Hoff, W.G. (1996) Exp. Nephrol. 4:253-262; Talente, G.M. et al. (1994) Ann. Intern. Med. 120:218-226; and Chillon, M. et al. (1995) New Engl. J. Med. 332:1475-1480).
Protein Modification and Maintenance Molecules
The cellular processes regulating modification and maintenance of protein molecules coordinate their conformation, stabilization, and degradation. Each of these processes is mediated by key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and molecular chaperones. Proteases
Proteases cleave proteins and peptides at the peptide bond that forms the backbone of the peptide and protein chain. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Typical protein half- lives range from hours to a few days, so that within all living cells, precursor proteins are being cleaved to their active form, signal sequences proteolytically removed from targeted proteins, and aged or defective proteins degraded by proteolysis. Proteases function in bacterial, parasitic, and viral invasion and replication within a host. Four principal categories of mammalian proteases have been identified based on active site structure, mechanism of action, and overall three-dimensional structure. (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York NY, pp. 1-5).
The serine proteases (SPs) have a serine residue, usually within a conserved sequence, in an active site composed of the serine, an aspartate, and a histidine residue. SPs include the digestive enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting cascade, and enzymes that control extracellular protein degradation. The main SP sub-families are trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and serases which cleave after serine. Enterokinase, the initiator of intestinal digestion, is a serine protease found in the intestinal brush border, where it cleaves the acidic propeptide from trypsinogen to yield active trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA 91:7588-7592). Prolylcarboxypeptidase, a lysosomal serine peptidase that cleaves peptides such as angiotensin JJ and III and [des-Arg9] bradykinin, shares sequence homology with members of both the serine carboxypeptidase and prolylendopeptidase families (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 16638).
Cysteine proteases (CPs) have a cysteine as the major catalytic residue at an active site where catalysis proceeds via an intermediate thiol ester and is facilitated by adjacent histidine and aspartic acid residues. CPs are involved in diverse cellular processes ranging from the processing of precursor proteins to intracellular degradation. Mammalian CPs include lysosomal cathepsins and cytosolic calcium activated proteases, calpains. CPs are produced by monocytes, macrophages and other cells of the immune system which migrate to sites of inflammation and secrete molecules involved in tissue repair. Overabundance of these repair molecules plays a role in certain disorders. In autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C degrades collagen, laminin, elastin and other structural proteins found in the extracellular matrix of bones.
Aspartic proteases are members of the cathepsin family of lysosomal proteases and include pepsin A, gastricsin, chymosin, renin, and cathepsins D and E. Aspartic proteases have a pair of aspartic acid residues in the active site, and are most active in the pH 2 - 3 range, in which one of the aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial penicillopepsin, mammalian pepsin, renin, chymosin, and certain fungal proteases. Abnormal regulation and expression of cathepsins is evident in various inflammatory disease states, i cells isolated from inflamed synovia, the mRNA for stromelysin, cytokines, TJJvlP-1, cathepsin, gelatinase, and other molecules is preferentially expressed. Expression of cathepsins L and D is elevated in synovial tissues from patients with rheumatoid arthritis and osteoarthritis. Cathepsin L expression may also contribute to the influx of mononuclear cells which exacerbates the destruction of the rheumatoid synovium. (Keyszer, G.M. (1995) Arthritis Rheum. 38:976-984.) The increased expression and differential regulation of the cathepsins are linked to the metastatic potential of a variety of cancers and as such are of therapeutic and prognostic interest (Chambers, A.F. et al. (1993) Crit. Rev. Oncog. 4:95-114).
Metalloproteases have active sites that include two glutamic acid residues and one histidine residue that serve as binding sites for zinc. Carboxypeptidases A and B are the principal mammalian metalloproteases. Both are exoproteases of similar structure and active sites. Carboxypeptidase A, like chymotrypsin, prefers C-terminal aromatic and aliphatic side chains of hydrophobic nature, whereas carboxypeptidase B is directed toward basic arginine and lysine residues. Glycoprotease (GCP), or O-sialoglycoprotein endopeptidase, is a metallopeptidase which specifically cleaves O-sialoglycoproteins such as glycophorin A. Another metallopeptidase, placental leucine aminopeptidase (P-LAP) degrades several peptide hormones such as oxytocin and vasopressin, suggesting a role in maintaining homeostasis during pregnancy, and is expressed in several tissues (Rogi, T. et al. (1996) J. Biol. Chem. 271:56-61). Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the 5 UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. The UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal 0 transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79: 13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183). 5 Signal Peptidases
The mechanism for the translocation process into the endoplasmic reticulum (ER) involves the recognition of an N-terminal signal peptide on the elongating protein. The signal peptide directs the protein and attached ribosome to a receptor on the ER membrane. The polypeptide chain passes through a pore in the ER membrane into the lumen while the N-terminal signal peptide remains o attached at the membrane surface. The process is completed when signal peptidase located inside the
ER cleaves the signal peptide from the protein and releases the protein into the lumen. Protease Inhibitors
Protease inhibitors and other regulators of protease activity control the activity and effects of proteases. Protease inhibitors have been shown to control pathogenesis in animal models of 5 proteolytic disorders (Murphy, G. (1991) Agents Actions Suppl. 35:69-76). Low levels of the cystatins, low molecular weight inhibitors of the cysteine proteases, correlate with malignant progression of tumors. (Calkins, C. et al (1995) Biol. Biochem. Hoppe Seyler 376:71-80). Serpins are inhibitors of mammalian plasma serine proteases. Many serpins serve to regulate the blood clotting cascade and/or the complement cascade in mammals. Sp32 is a positive regulator of the o mammalian acrosomal protease, acrosin, that binds the proenzyme, proacrosin , and thereby aides in packaging the enzyme into the acrosomal matrix (Baba, T. et al. (1994) J. Biol. Chem. 269:10133- 10140). The Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz domains" containing a series of cysteine residues that are regularly spaced over approximately 50 amino acid residues and form three intrachain disulfide bonds. Members of this family include 5 aprotinin, tissue factor pathway inhibitor (TFPI-1 and TFPI-2), inter-α-trypsin inhibitor, and bikunin. (Marlor, C.W. et al. (1997) J. Biol. Chem. 272:12202-12208.) Members of this family are potent inhibitors (in the nanomolar range) against serine proteases such as kallikrein and plasmin. Aprotinin has clinical utility in reduction of perioperative blood loss.
A major portion of all proteins synthesized in eukaryotic cells are synthesized on the cytosolic surface of the endoplasmic reticulum (ER). Before these immature proteins are distributed to other organelles in the cell or are secreted, they must be transported into the interior lumen of the
ER where post-translational modifications are performed. These modifications include protein folding and the formation of disulfide bonds, and N-linked glycosylations.
Protein Isomerases
Protein folding in the ER is aided by two principal types of protein isomerases, protein 0 disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imidic bonds in oligopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation. The cyclophilins represent a major class of PPI that was originally 5 identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher,
R.E. et al. (1984) Science 226: 544-547).
Protein Glycosylation
The glycosylation of most soluble secreted and membrane-bound proteins by oligosaccharides linked to asparagine residues in proteins is also performed in the ER. This reaction 0 is catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. Although the exact purpose of this "N-linked" glycosylation is unknown, the presence of oligosaccharides tends to make a glycoprotein resistant to protease digestion. In addition, oligosaccharides attached to cell-surface proteins called selectins are known to function in cell-cell adhesion processes (Alberts, B. et al.
(1994) Molecular Biology of the Cell, Garland Publishing Co., New York NY, p.608). "O-linked" 5 glycosylation of proteins also occurs in the ER by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar residues to the first. This process is catalysed by a series of glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular Cell Biology, W.H. Freeman and Co., New York NY, pp.700-708). In many cases, both N- and O-linked o oligosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the cell surface.
An additional glycosylation mechanism operates in the ER specifically to target lysosomal enzymes to lysosomes and prevent their secretion. Lysosomal enzymes in the ER receive an N- linked oligosaccharide, like plasma membrane and secreted proteins, but are then phosphorylated on 5 one or two mannose residues. The phosphorylation of mannose residues occurs in two steps, the first step being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine phosphotransferase, and the second the removal of the N-acetylglucosamine group by phosphodiesterase. The phosphorylated mannose residue then targets the lysosomal enzyme to a mannose 6-phosphate receptor which transports it to a lysosome vesicle (Lodish, supra, pp. 708-711). Chaperones Molecular chaperones are proteins that aid in the proper folding of immature proteins and refolding of improperly folded ones, the assembly of protein subunits, and in the transport of unfolded proteins across membranes. Chaperones are also called heat-shock proteins (hsp) because of their tendency to be expressed in dramatically increased amounts following brief exposure of cells to elevated temperatures. This latter property most likely reflects their need in the refolding of proteins that have become denatured by the high temperatures. Chaperones may be divided into several classes according to their location, function, and molecular weight, and include hsp60, TCP1, hsp70, hsp40 (also called DnaJ), and hsp90. For example, hsp90 binds to steroid hormone receptors, represses transcription in the absence of the ligand, and provides proper folding of the ligand-binding domain of the receptor in the presence of the hormone (Burston, S.G. and A.R. Clarke (1995) Essays Biochem. 29:125-136). Hsp60 and hsp70 chaperones aid in the transport and folding of newly synthesized proteins. Hsp70 acts early in protein folding, binding a newly synthesized protein before it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the folded protein. Hsp60, along with hsp 10, binds misfolded proteins and gives them the opportunity to refold correctly. All chaperones share an affinity for hydrophobic patches on incompletely folded proteins and the abihty to hydrolyze ATP. The energy of ATP hydrolysis is used to release the hsp- bound protein in its properly folded state (Alberts, supra, pp 214, 571-572).
Nucleic Acid Synthesis and Modification Molecules
Polymerases DNA and RNA replication are critical processes for cell replication and function. DNA and
RNA replication are mediated by the enzymes DNA and RNA polymerase, respectively, by a "templating" process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA. However, there are fundamental differences between the two processes. DNA polymerase catalyzes the step wise addition of a deoxyribonucleotide to the 3' -OH end of a polynucleotide strand (the primer strand) that is paired to a second (template) strand. The new DNA strand therefore grows in the 5' to 3' direction (Alberts, B. et al. (1994)The Molecular Biology of the Cell, Garland Publishing Inc., New York NY, pp. 251-254). The substrates for the polymerization reaction are the corresponding deoxynucleotide triphosphates which must base-pair with the correct nucleotide on the template strand in order to be recognized by the polymerase.
Because DNA exists as a double-stranded helix, each of the two strands may serve as a template for the formation of a new complementary strand. Each of the two daughter cells of the dividing cell therefore inherits a new DNA double helix containing one old and one new strand. Thus, DNA is said to be replicated "semiconservatively" by DNA polymerase. In addition to the synthesis of new DNA, DNA polymerase is also involved in the repair of damaged DNA as discussed below under "Ligases."
In contrast to DNA polymerase, RNA polymerase uses a DNA template strand to "transcribe" DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside monophosphate to the 3'-OH end of a growing RNA chain. DNA transcription generates messenger RNAs (mRNA) that carry information for protein synthesis, as well as the transfer, ribosomal, and other RNAs that have structural or catalytic functions. In eukaryotes, three discrete RNA polymerases synthesize the three different types of RNA (Alberts, supra, pp. 367-368). RNA polymerase I makes the large ribosomal RNAs, RNA polymerase II makes the mRNAs that will be translated into proteins, and RNA polymerase III makes a variety of small, stable RNAs, including 5S ribosomal RNA and the transfer RNAs (tRNA). In all cases, RNA synthesis is initiated by binding of the RNA polymerase to a promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is completed at a broad, general stop or termination region in the DNA where both the polymerase and the completed RNA chain are released. Ligases DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA are corrected before replication or transcription of the DNA can occur. Because of the efficiency of the DNA repair process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra, pp. 245-249). The three steps common to most types of DNA repair are (1) excision of the damaged or altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in this gap by DNA polymerase using the complementary strand as the template; and (3) sealing the break left between the inserted nucleotide(s) and the existing DNA strand by DNA ligase. In the last reaction, DNA ligase uses the energy from ATP hydrolysis to activate the 5' end of the broken phosphodiester bond before forming the new bond with the 3'-OH of the DNA strand. In Bloom's syndrome, an inherited human disease, individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, supra, p. 247). Nucleases
Nucleases comprise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They serve different purposes in nucleic acid metabolism. Nucleases hydrolyze the phosphodiester bonds between adjacent nucleotides either at internal positions (endonucleases) or at the terminal 3' or 5' nucleotide positions (exonucleases). A DNA exonuclease activity in DNA polymerase, for example, serves to remove improperly paired nucleotides attached to the 3' -OH end of the growing DNA strand by the polymerase and thereby serves a "proofreading" function. As mentioned above, DNA endonuclease activity is involved in the excision step of the DNA repair process.
RNases also serve a variety of functions. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle. Pancreatic RNase secreted by the pancreas into the intestine hydrolyzes RNA present in ingested foods. RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Methylases
Methylation of specific nucleotides occurs in both DNA and RNA, and serves different functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in DNA occurs specifically at CG sequences which are base-paired with one another in the DNA double-helix. This pattern of methylation is passed from generation to generation during DNA replication by an enzyme called "maintenance methylase" that acts preferentially on those CG sequences that are base-paired with a CG sequence that is already methylated. Such methylation appears to distinguish active from inactive genes by preventing the binding of regulatory proteins that "turn on" the gene, but permit the binding of proteins that inactivate the gene (Alberts, supra, pp. 448- 451). In RNA metabolism, "tRNA methylase" produces one of several nucleotide modifications in tRNA that affect the conformation and base-pairing of the molecule and facilitate the recognition of the appropriate mRNA codons by specific tRNAs. The primary methylation pattern is the dimethylation of guanine residues to form N,N-dimethyl guanine. Helicases and Single-Stranded Binding Proteins
Helicases are enzymes that destabilize and unwind double helix structures in both DNA and RNA. Since DNA replication occurs more or less simultaneously on both strands, the two strands must first separate to generate a replication "fork" for DNA polymerase to act on. Two types of replication proteins contribute to this process, DNA helicases and single-stranded binding proteins. DNA helicases hydrolyze ATP and use the energy of hydrolysis to separate the DNA strands. Single- stranded binding proteins (SSBs) then bind to the exposed DNA strands without covering the bases, thereby temporarily stabilizing them for templating by the DNA polymerase (Alberts, supra, pp. 255- 256).
RNA helicases also alter and regulate RNA conformation and secondary structure. Like the DNA helicases, RNA helicases utilize energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most well-characterized and ubiquitous family of RNA helicases is the DEAD- box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family. Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. 5 Some DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168). These observations suggest that DDX1 may promote or enhance tumor progression by altering the normal secondary structure and expression levels of RNA in cancer cells. 0 Other DEAD-box helicases have been implicated either directly or indirectly in tumorigenesis (Discussed in Godbout, supra). For example, murine p68 is mutated in ultraviolet light-induced tumors, and human DDX6 is located at a chromosomal breakpoint associated with B-cell lymphoma. Similarly, a chimeric protein comprised of DDX10 and NUP98, a nucleoporin protein, may be involved in the pathogenesis of certain myeloid malignancies. 5 Topoisomerases
Besides the need to separate DNA strands prior to replication, the two strands must be "unwound" from one another prior to their separation by DNA helicases. This function is performed by proteins known as DNA topoisomerases. DNA topoisomerase effectively acts as a reversible nuclease that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two strands to o rotate freely about one another to remove the strain of the helix, and then rejoins the original phosphodiester bond between the two strands. Two types of DNA topoisomerase exist, types I and II. DNA Topoisomerase I causes a single-strand break in a DNA helix to allow the rotation of the two strands of the helix about the remaining phosphodiester bond in the opposite strand. DNA topoisomerase II causes a transient break in both strands of a DNA helix where two double helices 5 cross over one another. This type of topoisomerase can efficiently separate two interlocked DNA circles (Alberts, supra, pp.260-262). Type II topoisomerases are largely confined to proliferating cells in eukaryotes, such as cancer cells. For this reason they are targets for anticancer drugs. Topoisomerase II has been implicated in multi-drug resistance (MDR) as it appears to aid in the repair of DNA damage inflicted by DNA binding agents such as doxorubicin and vincristine. 0 Recombinases
Genetic recombination is the process of rearranging DNA sequences within an organism's genome to provide genetic variation for the organism in response to changes in the environment. DNA recombination allows variation in the particular combination of genes present in an individual's genome, as well as the timing and level of expression of these genes (see Alberts, supra, pp. 263- 5 273). Two broad classes of genetic recombination are commonly recognized, general recombination and site-specific recombination. General recombination involves genetic exchange between any homologous pair of DNA sequences usually located on two copies of the same chromosome. The process is aided by enzymes called recombinases that "nick" one strand of a DNA duplex more or less randomly and permit exchange with the complementary strand of another duplex. The process does not normally change the arrangement of genes on a chromosome. In site-specific recombination, the recombinase recognizes specific nucleotide sequences present in one or both of the recombining molecules. Base-pairing is not involved in this form of recombination and therefore does not require DNA homology between the recombining molecules. Unlike general recombination, this form of recombination can alter the relative positions of nucleotide sequences in chromosomes. Splicing Factors Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns. The primary RNA transcript from DNA is a faithful copy of the gene containing both exon and intron sequences, and the latter sequences must be cut out of the RNA transcript to produce an mRNA that codes for a protein. This "splicing" of the mRNA sequence takes place in the nucleus with the aid of a large, multicomponent ribonucleoprotein complex known as a spliceosome. The spliceosomal complex is composed of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6, and a number of additional proteins. Each snRNP contains a single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base pair with intron consensus sequences. The protein components mediate spliceosome assembly and the splicing reaction. Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, p. 863).
Adhesion Molecules The surface of a cell is rich in transmembrane proteoglycans, glycoproteins, glycolipids, and receptors. These macromolecules mediate adhesion with other cells and with components of the extracellular matrix (ECM). The interaction of the cell with its surroundings profoundly influences cell shape, strength, flexibility, motility, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proliferation and differentiation, tissue construction, and embryonic development. Cadherins
Cadherins comprise a family of calcium-dependent glycoproteins that function in mediating cell-cell adhesion in virtually all solid tissues of multicellular organisms. These proteins share multiple repeats of a cadherin-specific motif, and the repeats form the folding units of the cadherin extracellular domain. Cadherin molecules cooperate to form focal contacts, or adhesion plaques, between adjacent epithelial cells. The cadherin family includes the classical cadherins and protocadherins. Classical cadherins include the E-cadherin, N-cadherin, and P-cadherin subfamilies. E-cadherin is present on many types of epithelial cells and is especially important for embryonic development. N-cadherin is present on nerve, muscle, and lens cells and is also critical for embryonic development. P-cadherin is present on cells of the placenta and epidermis. Recent studies report that protocadherins are involved in a variety of cell-cell interactions (Suzuki, S.T. (1996) J. Cell Sci. 109:2609-2611). The intracellular anchorage of cadherins is regulated by their dynamic association with catenins, a family of cytoplasmic signal transduction proteins associated with the actin cytoskeleton. The anchorage of cadherins to the actin cytoskeleton appears to be regulated by protein tyrosine phosphorylation, and the cadherins are the target of phosphorylation-induced junctional disassembly (Aberle, H. et al. (1996) J. Cell. Biochem. 61:514-523). Integrins
Integrins are ubiquitous transmembrane adhesion molecules that link the ECM to the internal cytoskeleton. Integrins are composed of two noncovalently associated transmembrane glycoprotein subunits called a and β. Integrins function as receptors that play a role in signal transduction. For example, binding of integrin to its extracellular ligand may stimulate changes in intracellular calcium levels or protein kinase activity (Sjaastad, M.D. and W.J. Nelson (1997) BioEssays 19:47-55). At least ten cell surface receptors of the integrin family recognize the ECM component fibronectin, which is involved in many different biological processes including cell migration and embryogenesis (Johansson, S. et al. (1997) Front. Biosci. 2:D126-D146). Lectins
Lectins comprise a ubiquitous family of extracellular glycoproteins which bind cell surface carbohydrates specifically and reversibly, resulting in the agglutination of cells (reviewed in Drickamer, K. and M.E. Taylor (1993) Annu. Rev. Cell Biol. 9:237-264). This function is particularly important for activation of the immune response. Lectins mediate the agglutination and mitogenic stimulation of lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. Cell. Biochem. 45:139-146; Paietta, E. et al. (1989) J. Immunol. 143:2850-2857).
Lectins are further classified into subfamilies based on carbohydrate-binding specificity and other criteria. The galectin subfamily, in particular, includes lectins that bind β-galactoside carbohydrate moieties in a thiol-dependent manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. Chem. 270:3447-3453). Galectins are widely expressed and developmentally regulated. Because all galectins lack an N-terminal signal peptide, it is suggested that galectins are externalized through an atypical secretory mechanism. Two classes of galectins have been defined based on molecular weight and oligomerization properties. Small galectins form homodimers and are about 14 to 16 kilodaltons in mass, while large galectins are monomeric and about 29-37 kilodaltons. Galectins contain a characteristic carbohydrate recognition domain (CRD). The CRD is about 140 amino acids and contains several stretches of about 1 - 10 amino acids which are highly conserved among all galectins. A particular 6-amino acid motif within the CRD contains conserved tryptophan and arginine residues which are critical for carbohydrate binding. The CRD of some galectins also contains cysteine residues which may be important for disulfide bond formation. Secondary structure predictions indicate that the CRD forms several β-sheets. Galectins play a number of roles in diseases and conditions associated with cell-cell and cell- matrix interactions. For example, certain galectins associate with sites of inflammation and bind to cell surface immunoglobulin E molecules. In addition, galectins may play an important role in cancer metastasis. Galectin overexpression is correlated with the metastatic potential of cancers in humans and mice. Moreover, anti-galectin antibodies inhibit processes associated with cell transformation, such as cell aggregation and anchorage-independent growth (See, for example, Su, Z.-Z. et al. (1996) Proc. Natl. Acad. Sci. USA 93:7252-7257). Selectins
Selectins, or LEC-CAMs, comprise a specialized lectin subfamily involved primarily in inflammation and leukocyte adhesion (Reviewed in Lasky, supra). Selectins mediate the recruitment of leukocytes from the circulation to sites of acute inflammation and are expressed on the surface of vascular endothelial cells in response to cytokine signaling. Selectins bind to specific ligands on the leukocyte cell membrane and enable the leukocyte to adhere to and migrate along the endothelial surface. Binding of selection to its ligand leads to polarized rearrangement of the actin cytoskeleton and stimulates signal transduction within the leukocyte (Brenner, B. et al. (1997) Biochem. Biophys. Res. Commun. 231:802-807; Hidari, K.I. et al. (1997) J. Biol. Chem. 272:28750-28756). Members of the selectin family possess three characteristic motifs: a lectin or carbohydrate recognition domain; an epidermal growth factor-like domain; and a variable number of short consensus repeats (ser or "sushi" repeats) which are also present in complement regulatory proteins. The selectins include lymphocyte adhesion molecule-1 (Lam-1 or L-selectin), endothelial leukocyte adhesion molecule-1 (ELAM-1 or E-selectin), and granule membrane protein- 140 (GMP-140 or P-selectin) (Johnston, G.I. et al. (1989) Cell 56:1033-1044).
Antigen Recognition Molecules
All vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal, and parasitic infections. A key feature of the immune system is its ability to distinguish foreign molecules, or antigens, from "self molecules. This ability is mediated primarily by secreted and transmembrane proteins expressed by leukocytes (white blood cells) such as lymphocytes, granulocytes, and monocytes. Most of these proteins belong to the immunoglobulin (Ig) superfamily, members of which contain one or more repeats of a conserved structural domain. This Ig domain is comprised of antiparallel β sheets joined by a disulfide bond in an arrangement called the Ig fold. Members of the Ig superfamily include T-cell receptors, major histocompatibility (MHC) proteins, antibodies, and immune cell-specific surface markers such as CD4, CD8, and CD28.
MHC proteins are cell surface markers that bind to and present foreign antigens to T cells. MHC molecules are classified as either class I or class II. Class I MHC molecules (MHC I) are expressed on the surface of almost all cells and are involved in the presentation of antigen to cytotoxic T cells. For example, a cell infected with virus will degrade intracellular viral proteins and express the protein fragments bound to MHC I molecules on the cell surface. The MHC I/antigen complex is recognized by cytotoxic T-cells which destroy the infected cell and the virus within. Class II MHC molecules are expressed primarily on specialized antigen-presenting cells of the immune system, such as B-cells and macrophages. These cells ingest foreign proteins from the extracellular fluid and express MHC U/antigen complex on the cell surface. This complex activates helper T-cells, which then secrete cytokines and other factors that stimulate the immune response. MHC molecules also play an important role in organ rejection following transplantation. Rejection occurs when the recipient's T-cells respond to foreign MHC molecules on the transplanted organ in the same way as to self MHC molecules bound to foreign antigen. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of the Cell. Garland Publishing, New York NY, pp. 1229-1246.)
Antibodies, or immunoglobulins, are either expressed on the surface of B-cells or secreted by B-cells into the circulation. Antibodies bind and neutralize foreign antigens in the blood and other extracellular fluids. The prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the , δ, e, γ, and μ H-chain types. There are two types of L- chains, K and λ, either of which may associate as a pair with any H-chain pair. IgG, the most common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. The constant region consists of about 110 amino acids in L-chains and about 330 or 440 amino acids in H-chains. The amino acid sequence of the constant region is nearly identical among H- or L-chains of a particular class. The variable region consists of about 110 amino acids in both H- and L-chains. However, the amino acid sequence of the variable region differs among H- or L-chains of a particular class. Within each H- or L-chain variable region are three hypervariable regions of extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, the H- and L-chain hypervariable regions come together to form the antigen recognition site. (Reviewed in Alberts, supra, pp. 1206-1213 and 1216-1217.)
Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. The immune system is capable of recognizing and responding to any foreign molecule that enters the body. Therefore, the immune system must be armed with a full repertoire of antibodies against all potential antigens. Such antibody diversity is generated by somatic rearrangement of gene segments encoding variable and constant regions. These gene segments are joined together by site- specific recombination which occurs between highly conserved DNA sequences that flank each gene segment. Because there are hundreds of different gene segments, millions of unique genes can be generated combinatorially. In addition, imprecise joining of these segments and an unusually high rate of somatic mutation within these segments further contribute to the generation of a diverse antibody population.
T-cell receptors are both structurally and functionally related to antibodies. (Reviewed in Alberts, supra, pp. 1228-1229.) T-cell receptors are cell surface proteins that bind foreign antigens and mediate diverse aspects of the immune response. A typical T-cell receptor is a heterodimer comprised of two disulfide-linked polypeptide chains called α and β. Each chain is about 280 amino acids in length and contains one variable region and one constant region. Each variable or constant region folds into an Ig domain. The variable regions from the α and β chains come together in the heterodimer to form the antigen recognition site. T-cell receptor diversity is generated by somatic rearrangement of gene segments encoding the α and β chains. T-cell receptors recognize small peptide antigens that are expressed on the surface of antigen-presenting cells and pathogen-infected cells. These peptide antigens are presented on the cell surface in association with major histocompatibility proteins which provide the proper context for antigen recognition.
Secreted and Extracellular Matrix Molecules
Protein secretion is essential for cellular function. Protein secretion is mediated by a signal peptide located at the amino terminus of the protein to be secreted. The signal peptide is comprised of about ten to twenty hydrophobic amino acids which target the nascent protein from the ribosome to the endoplasmic reticulum (ER). Proteins targeted to the ER may either proceed through the secretory pathway or remain in any of the secretory organelles such as the ER, Golgi apparatus, or lysosomes. Proteins that transit through the secretory pathway are either secreted into the extracellular space or retained in the plasma membrane. Secreted proteins are often synthesized as inactive precursors that are activated by post-translational processing events during transit through the secretory pathway. Such events include glycosylation, proteolysis, and removal of the signal peptide by a signal peptidase. Other events that may occur during protein transport include chaperone-dependent unfolding and folding of the nascent protein and interaction of the protein with a receptor or pore complex. Examples of secreted proteins with amino terminal signal peptides include receptors, extracellular matrix molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of The Cell. Garland Publishing, New York NY, pp. 557- 560, 582-592.)
The extracellular matrix (ECM) is a complex network of glycoproteins, polysaccharides, proteoglycans, and other macromolecules that are secreted from the cell into the extracellular space. The ECM remains in close association with the cell surface and provides a supportive meshwork that profoundly influences cell shape, motility, strength, flexibility, and adhesion. In fact, adhesion of a cell to its surrounding matrix is required for cell survival except in the case of metastatic tumor cells, which have overcome the need for cell-ECM anchorage. This phenomenon suggests that the ECM plays a critical role in the molecular mechanisms of growth control and metastasis. (Reviewed in Ruoslahti, E. (1996) Sci. Am. 275:72-77.) Furthermore, the ECM determines the structure and physical properties of connective tissue and is particularly important for morphogenesis and other processes associated with embryonic development and pattern formation.
The collagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, ligaments, tendons, cartilage, blood vessels, and basement membranes. Multiple collagen proteins have been identified. Three collagen molecules fold together in a triple helix stabilized by interchain disulfide bonds. Bundles of these triple helices then associate to form fibrils. Collagen primary structure consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. Glycines are crucial to helix formation as the bulkier amino acid sidechains cannot fold into the triple helical conformation. Because of these strict sequence requirements, mutations in collagen genes have severe consequences. Osteogenesis imperfecta patients have brittle bones that fracture easily; in severe cases patients die in utero or at birth. Ehlers-Danlos syndrome patients have hyperelastic skin, hypermobile joints, and susceptibility to aortic and intestinal rupture. Chondrodysplasia patients have short stature and ocular disorders. Alport syndrome patients have hematuria, sensorineural deafness, and eye lens deformation. (Isselbacher, K.J. et al. (1994) Harrison's Principles of Internal Medicine. McGraw-Hill, Inc., New York NY, pp. 2105-2117; and Creighton, T.E. (1984) Proteins, Structures and Molecular Principles, W.H. Freeman and Company, New York NY, pp. 191-197.)
Elastin and related proteins confer elasticity to tissues such as skin, blood vessels, and lungs. Elastin is a highly hydrophobic protein of about 750 amino acids that is rich in proline and glycine residues. Elastin molecules are highly cross-linked, forming an extensive extracellular network of fibers and sheets. Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number of glycoproteins, including fibrillin. Mutations in the gene encoding fibrillin are responsible for Marfan' s syndrome, a genetic disorder characterized by defects in connective tissue. In severe cases, the aortas of afflicted individuals are prone to rupture. (Reviewed hi Alberts, supra, pp. 984- 986.)
Fibronectin is a large ECM glycoprotein found in all vertebrates. Fibronectin exists as a 5 dimer of two subunits, each containing about 2,500 amino acids. Each subunit folds into a rod-like structure containing multiple domains. The domains each contain multiple repeated modules, the most common of which is the type HI fibronectin repeat. The type HI fibronectin repeat is about 90 amino acids in length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic proteins. Furthermore, some type HI fibronectin repeats contain a characteristic 0 tripeptide consisting of Arginine-Glycine-Aspartic acid (RGD). The RGD sequence is recognized by the integrin family of cell surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene encoding fibronectin causes early embryonic lethality in mice. The mutant embryos display extensive morphological defects, including defects in the formation of the notochord, somites, heart, blood vessels, neural tube, and extraembryonic structures. (Reviewed in 5 Alberts, supra, pp. 986-987.)
Laminin is a major glycoprotein component of the basal lamina which underlies and supports epithelial cell sheets. Laminin is one of the first ECM proteins synthesized in the developing embryo. Laminin is an 850 kilodalton protein composed of three polypeptide chains joined in the shape of a cross by disulfide bonds. Laminin is especially important for angiogenesis and in particular, for o guiding the formation of capillaries. (Reviewed in Alberts, supra, pp. 990-991.)
There are many other types of proteinaceous ECM components, most of which can be classified as proteoglycans. Proteoglycans are composed of unbranched polysaccharide chains (glycosaminoglycans) attached to protein cores. Common proteoglycans include aggrecan, betaglycan, decorin, perlecan, serglycin, and syndecan-1. Some of these molecules not only provide 5 mechanical support, but also bind to extracellular signaling molecules, such as fibroblast growth factor and transforming growth factor β, suggesting a role for proteoglycans in cell-cell communication and cell growth. (Reviewed in Alberts, supra, pp. 973-978.) Likewise, the glycoproteins tenascin-C and tenascin-R are expressed in developing and lesioned neural tissue and provide stimulatory and anti-adhesive (inhibitory) properties, respectively, for axonal growth. o (Faissner, A. (1997) Cell Tissue Res. 290:331-341.)
Cytoskeletal Molecules
The cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, and movement. The cytoskeleton supports the cell membrane and forms tracks along which 5 organelles and other elements move in the cytosol. The cytoskeleton is a dynamic structure that allows cells to adopt various shapes and to carry out directed movements. Major cytoskeletal fibers include the microtubules, the microfilaments, and the intermediate filaments. Motor proteins, including myosin, dynein, and kinesin, drive movement of or along the fibers. The motor protein dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the cell membrane. Tubulins
Microtubules, cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell membrane that are necessary for sweeping materials across an epithelium and for swimming of sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along tracks of microtubules. For example, microtubules run through nerve cell axons, allowing bidirectional transport of materials and membrane vesicles between the cell body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during cell division. Both stable and short-lived populations of microtubules exist in the cell.
Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a heterodimer of α- and β- tubulin, multiple isoforms of which exist. The hydrolysis of GTP is linked to the addition of tubulin subunits at the end of a microtubule. The subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule. A microtubule is polarized, one end ringed with α-tubulin and the other with β-tubulin, and the two ends differ in their rates of assembly. Generally, each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found. Cilia and flagella contain doublet microtubules. Microtubules grow from specialized structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules. The basal body,. the organizing center located at the base of a cilium or flagellum, contains one centriole. Gamma tubulin present in the MTOC is important for nucleating the polymerization of α- and β- tubulin heterodimers but does not polymerize into microtubules. Microtubule-Associated Proteins Microtubule-associated proteins (MAPs) have roles in the assembly and stabilization of microtubules. One major family of MAPs, assembly MAPs, can be identified in neurons as well as non-neuronal cells. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type H. Type I MAPs, which include MAP1A and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes. Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neutralizes negatively charged tubulin, leading to stabilization of microtubules. MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolytically processed to generate one heavy chain and one light chain.
Another light chain, LC3, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during cell proliferation (Mann, S.S. et al. (1994) J. Biol. Chem. 269:11492-11497).
Type H MAPs, which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain. MAP2a, MAP2b, and MAP2c are found only in dendrites, MAP4 is found in non-neuronal cells, and Tau is found in axons and dendrites of nerve cells. Alternative splicing of the Tau mRNA leads to the existence of multiple forms of Tau protein. Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and familial frontotemporal dementia and Parkinsonism linked to chromosome 17. The altered Tau phosphorylation leads to a collapse of the microtubule network and the formation of intraneuronal Tau aggregates (Spillantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428- 433).
The protein pericentrin is found in the MTOC and has a role in microtubule assembly. Actins
Microfilaments, cytoskeletal filaments with a diameter of about 7-9 nm, are vital to cell locomotion, cell shape, cell adhesion, cell division, and muscle contraction. Assembly and disassembly of the microfilaments allow cells to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intracellular protein in the eukaryotic cell. Human cells contain six isoforms of actin. The three α-actins are found in different kinds of muscle, nonmuscle β-actin and nonmuscle γ-actin are found in nonmuscle cells, and another γ-actin is found in intestinal smooth muscle cells. G-actin, the monomeric form of actin, polymerizes into polarized, helical F-actin filaments, accompanied by the hydrolysis of ATP to ADP. Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine cell shape. These bundles and networks are connected to the cell membrane. In muscle cells, thin filaments containing actin slide past thick filaments containing the motor protein yosin during contraction. A family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin-Associated Proteins Actin-associated proteins have roles in cross-linking, severing, and stabilization of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-linking proteins. These proteins have two actin-binding sites, one for each filament. Short cross-linking proteins 5 promote bundle formation while longer, more flexible cross-linking proteins promote network formation. Calmodulin-like calcium-binding domains in actin cross-linking proteins allow calcium regulation of cross-linking. Group I cross-linking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin. Group II cross-linking proteins have a 7,000- MW actin-binding domain and include villin and dematin. Group IH cross-linking proteins have 0 pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and filamin.
Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin. Capping proteins can cap the ends of actin filaments, but cannot break filaments. Capping proteins 5 include CapZ and tropomodulm. The proteins thymosin and profilin sequester actin monomers in the cytosol, allowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium. Intermediate Filaments and Associated Proteins
Intermediate filaments (IFs) are cytoskeletal fibers with a diameter of about 10 nm, 0 intermediate between that of microfilaments and microtubules. IFs serve structural roles in the cell, reinforcing cells and organizing cells into tissues. IFs are particularly abundant in epidermal cells and in neurons. IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in cell motility.
Five types of IF proteins are known in mammals. Type I and Type H proteins are the acidic 5 and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin JJFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithelia such as nails and hair, and in epithelia that line internal body cavities. Mutations in keratin genes lead to epithelial diseases including epidermolysis bullosa simplex, bullous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar o keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of these diseases result in severe skin blistering. (See, e.g., Wawersik, M. et al. (1997) J. Biol. Chem. 272:32557-32565; and Corden L.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.)
Type HI IF proteins include desmin, glial fibrillary acidic protein, vimentin, and peripherin. Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting 5 muscle. Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and astrocytes. Vimentin filaments are found in blood vessel endothelial cells, some epithelial cells, and mesenchymal cells such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organelles in place in the cell. Type TV IFs include the neurofilaments and nestin. Neurofilaments, composed of three polypeptides NF-L, NF-M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metabolism of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (Julien, J.P. and W.E. Mushynski (1998) Prog. Nucleic Acid Res. Mol. Biol. 61: 1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane. JJFs have a central α-helical rod region interrupted by short nonhelical linker segments. The rod region is bracketed, in most cases, by non-helical head and tail domains. The rod regions of intermediate filament proteins associate to form a coiled-coil dimer. A highly ordered assembly process leads from the dimers to the JJFs. Neither ATP nor GTP is needed for JF assembly, unlike that of microfilaments and microtubules. IF-associated proteins (IFAPs) mediate the interactions of IFs with one another and with other cell structures. IFAPs cross-link I s into a bundle, into a network, or to the plasma membrane, and may cross-link IFs to the microfilament and microtubule cytoskeleton. Microtubules and IFs are in particular closely associated. IFAPs include BPAG1, plakoglobin, desmoplakin I, desmoplakin H, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining cell shape and for muscle contraction. In erythrocytes, the spectrin-actin cytoskeleton is attached to cell membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormally shaped cells which are more rapidly degraded by the spleen, leading to anemia. In platelets, the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin. In muscle cells the protein dystrophin links actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy. In adherens junctions and adhesion plaques the peripheral membrane proteins α-actinin and vinculin attach actin filaments to the cell membrane. IFs are also attached to membranes by cytoskeletal-membrane anchors. The nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor. Vimentin IFs are attached to the plasma membrane by ankyrin and plectin. Desmosome and hemidesmosome membrane junctions hold together epithelial cells of organs and skin. These membrane junctions allow shear forces to be distributed across the entire epithelial cell layer, thus providing strength and rigidity to the epithelium. IFs in epithelial cells are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link IFs to hemidesmosomes are not known. Desmin IFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin. Myosin-related Motor Proteins
Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intracellular movements such as phagocytosis and rearrangement of cell contents during mitotic cell division (cytokinesis). The contractile unit of skeletal muscle, termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber.
Myosins are composed of one or two heavy chains and associated light chains. Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain binding, and a carboxy-terminal tail domain. The tail domains may associate to form an α-helical coiled coil. Conventional myosins, such as those found in muscle tissue, are composed of two myosin heavy-chain subunits, each associated with two light-chain subunits that bind at the neck region and play a regulatory role. Unconventional myosins, believed to function in intracellular motion, may contain either one or two heavy chains and associated light chains. There is evidence for about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional. Dvnein-related Motor Proteins Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosolic and axonemal, have been identified. Cytosolic dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve terminal to the cell body and transport of endocytic vesicles to lysosomes. Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flagella and cilia. Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sliding force produces bending forces that cause the flagellum or cilium to beat. Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven by the hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and light chains. Kinesin-related Motor Proteins
Kinesins are (+) end-directed motor proteins which act on microtubules. The prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function is particularly important for axonal transport in neurons. Kinesin is also important in all cell types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for maintaining the identity and functionality of these secretory organelles.
Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement, and cellular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. Cell Biol. 6:63-68.) The prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two light polypeptide chains 5 (KLCs). The KHC subunits are typically referred to as "kinesin." KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length. Two KHCs dimerize to form a rod-shaped molecule with three distinct regions of secondary structure. At one end of the molecule is a globular motor domain that functions in ATP hydrolysis and microtubule binding. Kinesin motor domains are highly conserved and share over 70% identity. Beyond the motor domain is an α-helical coiled-coil 0 region which mediates dimerization. At the other end of the molecule is a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
Members of the more divergent subfamilies of kinesins are called kinesin-related proteins (KRPs), many of which function during mitosis in eukaryotes (Hoyt, supra). Some KRPs are 5 required for assembly of the mitotic spindle. In vivo and in vitro analyses suggest that these KRPs exert force on microtubules that comprise the mitotic spindle, resulting in the separation of spindle poles. Phosphorylation of KRP is required for this activity. Failure to assemble the mitotic spindle results in abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer cells. In addition, a unique KRP, centromere protein E, localizes to the kinetochore of human 0 mitotic chromosomes and may play a role in their segregation to opposite spindle poles. Dynamin-related Motor Proteins
Dynamin is a large GTPase motor protein that functions as a "molecular pinchase," generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin-coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in 5 neurons. Binding of dynamin to a membrane leads to dynamin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynamin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle. Release of GDP and inorganic phosphate leads to dynamin disassembly. Following disassembly the dynamin may either dissociate from the o membrane or remain associated to the vesicle and be transported to another region of the cell. Three homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. Conserved dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin' s GTPase activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on 5 other proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the proline-rich domain. (See McNiven, M.A. (1998) Cell 94:151-154; Scaife, R.M. and R.L. Margolis (1997) Cell. Signal. 9:395-401.)
The cytoskeleton is reviewed in Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY.
5 Ribosomal Molecules
Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate messenger RNA into polypeptides. The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more 0 than fifty proteins. The ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small). Ribosomal protein activities include binding rRNA and organizing the conformation of the junctions between rRNA helices (Woodson, S.A. and N.B. Leontis (1998) Curr. Opin. Struct. Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends Biochem: Sci. 23:208-212.) Three important sites are identified on the ribosome. The aminoacyl- 5 tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (The ribosome is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, pp. 888-908; and Lodish, H. et al. (1995) Molecular Cell o Biology Scientific American Books, New York NY. pp. 119-138.)
Chromatin Molecules
The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are observed: euchromatin, some of which may be transcribed, and heterochromatin so densely packed 5 that much of it is inaccessible to transcription. Chromatin packing thus serves to regulate protein expression in eukaryotes. Bacteria lack chromatin and the chromatin-packing level of gene regulation.
The fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by another o class of histones, HI. Low molecular weight non-histone proteins called the high mobility group
(HMG), associated with chromatin, may function in the unwinding of DNA and stabilization of single-stranded DNA. Chromodomain proteins function in compaction of chromatin into its transcriptionally silent heterochromatin form.
During mitosis, all DNA is compacted into heterochromatin and transcription ceases. 5 Transcription in interphase begins with the activation of a region of chromatin. Active chromatin is decondensed. Decondensation appears to be accompanied by changes in binding coefficient, phosphorylation and acetylation states of chromatin histones. HMG proteins HMG13 and HMG17 selectively bind activated chromatin. Topoisomerases remove superhelical tension on DNA. The activated region decondenses, allowing gene regulatory proteins and transcription factors to assemble on the DNA. 5 Patterns of chromatin structure can be stably inherited, producing heritable patterns of gene expression, i mammals, one of the two X chromosomes in each female cell is inactivated by condensation to heterochromatin during zygote development. The inactive state of this chromosome is inherited, so that adult females are mosaics of clusters of paternal-X and maternal-X clonal cell groups. The condensed X chromosome is reactivated in meiosis. 0 Chromatin is associated with disorders of protein expression such as thalassemia, a genetic anemia resulting from the removal of the locus control region (LCR) required for decondensation of the globin gene locus.
For a review of chromatin structure and function see Alberts, B. et al. (1994) Molecular Cell Biology, third edition, Garland Publishing, Inc., New York NY, pp. 351-354, 433-439. 5
Electron Transfer Associated Molecules
Electron carriers such as cytochromes accept electrons from NADH or FADH2 and donate them to other electron carriers. Most electron-transferring proteins, except ubiquinone, are prosthetic groups such as flavins, heme, FeS clusters, and copper, bound to inner membrane proteins. o Adrenodoxin, for example, is an FeS protein that forms a complex with NADPH:adrenodoxin reductase and cytochrome p450. Cytochromes contain a heme prosthetic group, a porphyrin ring containing a tightly bound iron atom. Electron transfer reactions play a crucial role in cellular energy production.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initially converted 5 to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C02 coupled by enzymes to the transport of electrons from NADH and FADH2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and Pj.
Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, 0 and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH, FADH2, and GTP. In oxidative phosphorylation, the transfer of electrons from NADH and FADH2 to 5 oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P; by the FQFJ ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the F^ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c1; FeS protein, and cytochrome c oxidase.
ATP synthesis requires membrane transport enzymes including the phosphate transporter and the ATP- ADP antiport protein. The ATP-binding casette (ABC) superfamily has also been suggested 5 as belonging to the mitochondrial transport group (Hogue, D.L. et al. (1999) J. Mol. Biol. 285:379- 389). Brown fat uncoupling protein dissipates oxidative energy as heat, and may be involved the fever response to infection and trauma (Cannon, B. et al. (1998) Ann. NY Acad. Sci. 856:171-187).
Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix 0 inside the inner membrane. The outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in cells.
Mitochondria contain a small amount of DNA. Human mitochondrial DNA encodes 13 5 proteins, 22 tRNAs, and 2 rRNAs. Mitochondrial-DNA encoded proteins include NADH-Q reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits.
Electron-transfer reactions also occur outside the mitochondria in locations such as the endoplasmic reticulum, which plays a crucial role in lipid and protein biosynthesis. Cytochrome b5 is a central electron donor for various reductive reactions occurring on the cytoplasmic surface of 0 liver endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum (ER), and microbody membranes.
For a review of mitochondrial metabolism and regulation, see Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, pp. 745-797 and Stryer (1995) Biochemistry, W.H. Freeman and Co., San Francisco CA, pp 529-558, 988-989. 5 The majority of mitochondrial proteins are encoded by nuclear genes, are synthesized on cytosolic ribosomes, and are imported into the mitochondria. Nuclear-encoded proteins which are destined for the mitochondrial matrix typically contain positively-charged amino terminal signal sequences. Import of these preproteins from the cytoplasm requires a multisubunit protein complex in the outer membrane known as the translocase of outer mitochondrial membrane (TOM; previously o designated MOM; Pfanner, N. et al. (1996) Trends Biochem. Sci. 21:51-52) and at least three inner membrane proteins which comprise the translocase of inner mitochondrial membrane (TIM; previously designated MIM; Pfanner, supra). An inside-negative membrane potential across the inner mitochondrial membrane is also required for preprotein import. Preproteins are recognized by surface receptor components of the TOM complex and are translocated through a proteinaceous pore 5 formed by other TOM components. Proteins targeted to the matrix are then recognized by the import machinery of the TIM complex. The import systems of the outer and inner membranes can function independently (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218).
Once precursor proteins are in the mitochondria, the leader peptide is cleaved by a signal peptidase to generate the mature protein. Most leader peptides are removed in a one step process by a protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl. Acad. Sci. USA 90:5355-5358). In some cases a two-step process occurs in which MPP generates an intermediate precursor form which is cleaved by a second enzyme, mitochondrial intermediate peptidase, to generate the mature protein.
Mitochondrial dysfunction leads to impaired calcium buffering, generation of free radicals that may participate in deleterious intracellular and extracellular processes, changes in mitochondrial permeability and oxidative damage which is observed in several neurodegenerative diseases.
Neurodegenerative diseases linked to mitochondrial dysfunction include some forms of Alzheimer's disease, Friedreich's ataxia, familial amyotrophic lateral sclerosis, and Huntington's disease (Beal, M.F. (1998) Biochim. Biophys. Acta 1366:211-213). The myocardium is heavily dependent on oxidative metabolism, so mitochondrial dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol 13:190-197). Mitochondria are implicated in disorders of cell proliferation, since they play an important role in a cell's decision to proliferate or self-destruct through apoptosis. The oncoprotein Bcl-2, for example, promotes cell proliferation by stabilizing mitochondrial membranes so that apoptosis signals are not released (Susin, S.A. (1998) Biochim. Biophys. Acta 1366:151-165).
Transcription Factor Molecules
Multicellular organisms are comprised of diverse cell types that differ dramatically both in structure and function. The identity of a cell is determined by its characteristic pattern of gene expression, and different cell types express overlapping but distinctive sets of genes throughout development. Spatial and temporal regulation of gene expression is critical for the control of cell proliferation, cell differentiation, apoptosis, and other processes that contribute to organismal development. Furthermore, gene expression is regulated in response to extracellular signals that mediate cell-cell communication and coordinate the activities of different cell types. Appropriate gene regulation also ensures that cells function efficiently by expressing only those genes whose functions are required at a given time.
Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene transcription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene' s coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes TV, Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554-
570.)
The double helix structure and repeated sequences of DNA create topological and chemical features which can be recognized by transcription factors. These features are hydrogen bond donor 5 and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated stretches of sequence which induce distinct bends in the helix. Typically, transcription factors recognize specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent transcription factor-binding motifs may be required for gene regulation.
Many transcription factors incorporate DNA-binding structural motifs which comprise either 0 α helices or β sheets that bind to the major groove of DNA. Four well-characterized structural motifs are hehx-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. The helix-turn-helix motif consists of two α helices connected at a fixed angle by a short chain of amino acids. One of the helices binds to the major groove. Helix-turn-helix motifs are 5 exemplified by the homeobox motif which is present in homeodomain proteins. These proteins are critical for specifying the anterior-posterior body axis during development and are conserved throughout the animal kingdom. The Antennapedia and Ultrabithorax proteins of Drosophila melanogaster are prototypical homeodomain proteins (Pabo, CO. and R.T. Sauer (1992) Annu. Rev.
Biochem. 61:1053-1095). 0 The zinc finger motif, which binds zinc ions, generally contains tandem repeats of about 30 amino acids consisting of periodically spaced cysteine and histidine residues. Examples of this sequence pattern, designated C2H2 and C3HC4 ("RING" finger), have been described (Lewin, supra). Zinc finger proteins each contain an α helix and an antiparallel β sheet whose proximity and conformation are maintained by the zinc ion. Contact with DNA is made by the arginine prece ding 5 the α helix and by the second, third, and sixth residues of the α helix. Variants of the zinc finger motif include poorly defined cysteine-rich motifs which bind zinc or other metal ions. These motifs may not contain histidine residues and are generally nonrepetitive.
The leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an amphipathic a helix. This structure provides the basis for dimerization of two leucine zipper o proteins. The region adjacent to the leucine zipper is usually basic, and upon protein dimerization, is optimally positioned for binding to the major groove. Proteins containing such motifs are generally referred to as bZIP transcription factors.
The helix-loop-helix motif (HLH) consists of a short α helix connected by a loop to a longer a helix. The loop is flexible and allows the two helices to fold back against each other and to bind to 5 DNA. The transcription factor Myc contains a prototypical HLH motif. Most transcription factors contain characteristic DNA binding motifs, and variations on the above motifs and new motifs have been and are currently being characterized (Faisst, S. and S. Meyer (1992) Nucleic Acids Res. 20:3-26).
Many neoplastic disorders in humans can be attributed to inappropriate gene expression. Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M . (1992) Cancer Surv. 15:89-104). Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentially contributing to malignancy. In addition, the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is well documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and infections (Isselbacher, K.J. et al. (1996) Harrison's Principles of Internal Medicine, 13/e, McGraw Hill, Inc. and Teton Data Systems Software).
Furthermore, the generation of multicellular organisms is based upon the induction and coordination of cell differentiation at the appropriate stages of development. Central to this process is differential gene expression, which confers the distinct identities of cells and tissues throughout the body. Failure to regulate gene expression during development can result in developmental disorders. Human developmental disorders caused by mutations in zinc finger-type transcriptional regulators include: urogenenital developmental abnormalities associated with WT1; Greig cephalopolysyndactyly, Pallister-Hall syndrome, and postaxial polydactyly type A (GLI3); and Townes-Brocks syndrome, characterized by anal, renal, limb, and ear abnormalities (SALLl)
(Engelkamp, D. and V. van Heyningen (1996) Curr. Opin. Genet. Dev. 6:334-342; Kohlhase, J. et al. (1999) Am. J. Hum. Genet. 64:435-445).
Cell Membrane Molecules Eukaryotic cells are surrounded by plasma membranes which enclose the cell and maintain an environment inside the cell that is distinct from its surroundings. In addition, eukaryotic organisms are distinct from prokaryotes in possessing many intracellular organelle and vesicle structures. Many of the metabolic reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry take place within these structures. The plasma membrane and the membranes surrounding organelles and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phospholipids, glycolipids, proteoglycans, and proteins. These components confer identity and functionality to the membranes with which they associate. Integral Membrane Proteins
The majority of known integral membrane proteins are transmembrane proteins (TM) which are characterized by an extracellular, a transmembrane, and an intracellular domain. TM domains are typically comprised of 15 to 25 hydrophobic amino acids which are predicted to adopt an α-helical conformation. TM proteins are classified as bitopic (Types I and II) and polytopic (Types III and IV) (Singer, S.J. (1990) Annu. Rev. Cell Biol. 6:247-296). Bitopic proteins span the membrane once while polytopic proteins contain multiple membrane-spanning segments. TM proteins function as cell-surface receptors, receptor-interacting proteins, transporters of ions or metabolites, ion channels, cell anchoring proteins, and cell type-specific surface antigens.
Many membrane proteins (MPs) contain amino acid sequence motifs that target these proteins to specific subcellular sites. Examples of these motifs include PDZ domains, KDEL, RGD, NGR, and GSL sequence motifs, von Willebrand factor A (vWFA) domains, and EGF-like domains. RGD, NGR, and GSL motif-containing peptides have been used as drug delivery agents in targeted cancer treatment of tumor vasculature (Arap, W. et al. (1998) Science 279:377-380). Furthermore, MPs may also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that mediate interactions with extracellular or intracellular molecules. G-Protein Coupled Receptors
G-protein coupled receptors (GPCR) are a superfamily of integral membrane proteins which transduce extracellular signals. GPCRs include receptors for biogenic amines, lipid mediators of inflammation, peptide hormones, and sensory signal mediators. The structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an extracellular N-terminus, and a cytoplasmic C-terminus. Three extracellular loops alternate with three intracellular loops to link the seven transmembrane regions. Cysteine disulfide bridges connect the second and third extracellular loops. The most conserved regions of GPCRs are the transmembrane regions and the first two cytoplasmic loops. A conserved, acidic- Arg-aromatic residue triplet present in the second cytoplasmic loop may interact with G proteins. A GPCR consensus pattern is characteristic of most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. and S. Arkinstall (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego CA, pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been associated with neurological disorders such as schizophrenia, Parkinson's disease, Alzheimer's disease, drug addiction, and feeding disorders. Scavenger Receptors
Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens. Scavenger receptors types I and π are trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a transmembrane domain, a large extracellular domain, and a C-terminal cysteine-rich domain. The extracellular domain contains a short spacer region, an α-helical coiled-coil region, and a triple helical collagen-like region. These receptors have been shown to bind a spectrum of ligands, including chemically modified lipoproteins and albumin, polyribonucleotides, polysaccharides, phospholipids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; and Elomaa, O. et al. (1995) Cell 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. Tetraspan Family Proteins The transmembrane 4 superfamily (TM4SF) or tetraspan family is a multigene family encoding type III integral membrane proteins (Wright, M.D. and M.G. Tomlinson (1994) Immunol. Today 15:588-594). The TM4SF is comprised of membrane proteins which traverse the cell membrane four times. Members of the TM4SF include platelet and endothelial cell membrane proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) Oncogene 9:1205-1211). Members of the TM4SF share about 25-30% amino acid sequence identity with one another.
A number of TM4SF members have been implicated in signal transduction, control of cell adhesion, regulation of cell growth and proliferation, including development and oncogenesis, and cell motility, including tumor cell metastasis. Expression of TM4SF proteins is associated with a variety of tumors and the level of expression may be altered when cells are growing or activated. Tumor Antigens
Tumor antigens are cell surface molecules that are differentially expressed in tumor cells relative to normal cells. Tumor antigens distinguish tumor cells immunologically from normal cells and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032). Leukocyte Antigens
Other types of cell surface antigens include those identified on Ieukocytic cells of the immune system. These antigens have been identified using systematic, monoclonal antibody (mAb)- based "shot gun" techniques. These techniques have resulted in the production of hundreds of mAbs directed against unknown cell surface Ieukocytic antigens. These antigens have been grouped into "clusters of differentiation" based on common immunocytochemical localization patterns in various differentiated and undifferentiated Ieukocytic cell types. Antigens in a given cluster are presumed to identify a single cell surface protein and are assigned a "cluster of differentiation" or "CD" designation. Some of the genes encoding proteins identified by CD antigens have been cloned and verified by standard molecular biology techniques. CD antigens have been characterized as both transmembrane proteins and cell surface proteins anchored to the plasma membrane via covalent attachment to fatty acid-containing glycolipids such as glycosylphosphatidylinositol (GPI). (Reviewed in Barclay, A.N. et al. (1995) The Leucocyte Antigen Facts Book, Academic Press, San Diego CA, pp. 17-20.) Ion Channels
Ion channels are found in the plasma membranes of virtually every cell in the body. For example, chloride channels mediate a variety of cellular functions including regulation of membrane potentials and absorption and secretion of ions across epithelial membranes. Chloride channels also regulate the pH of organelles such as the Golgi apparatus and endosomes (see, e.g., Greger, R. (1988) Annu. Rev, Physiol. 50:111-122). Electrophysiological and pharmacological properties of chloride channels, including ion conductance, current-voltage relationships, and sensitivity to modulators, suggest that different chloride channels exist in muscles, neurons, fibroblasts, epithelial cells, and lymphocytes.
Many ion channels have sites for phosphorylation by one or more protein kinases including protein kinase A, protein kinase C, tyrosine kinase, and casein kinase II, all of which regulate ion channel activity in cells. Inappropriate phosphorylation of proteins in cells has been linked. to changes in cell cycle progression and cell differentiation. Changes in the cell cycle have been linked to induction of apoptosis or cancer. Changes in cell differentiation have been linked to diseases and disorders of the reproductive system, immune system, skeletal muscle, and other organ systems. Proton Pumps
Proton ATPases comprise a large class of membrane proteins that use the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane. The resultant gradient may be used to transport other ions across the membrane (Na+, K+, or Cl") or to maintain organelle pH. Proton ATPases are further subdivided into the mitochondrial F-ATPases, the plasma membrane ATPases, and the vacuolar ATPases. The vacuolar ATPases establish and maintain an acidic pH within various organelles involved in the processes of endocytosis and exocytosis (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700).
Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorption of peptides using an electrochemical H+ gradient as the driving force. Another type of peptide transporter, the TAP transporter, is a heterodimer consisting of TAP 1 and TAP 2 and is associated with antigen processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by TAP so they can be expressed on the cell surface in association with MHC molecules. Each TAP protein consists of multiple hydrophobic membrane spanning segments and a highly conserved ATP-binding cassette (Boll, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:284-289). Pathogenic microorganisms, such as herpes simplex virus, may encode inhibitors of TAP-mediated peptide transport in order to evade immune surveillance (Marusina, K. and J.J Manaco (1996) Curr. Opin. Hematol. 3:19-26). ABC Transporters
The ATP-binding cassette (ABC) transporters, also called the "traffic ATPases", comprise a superfamily of membrane proteins that mediate transport and channel functions in prokaryotes and eukaryotes (Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8:67-113). ABC proteins share a similar overall structure and significant sequence homology. All ABC proteins contain a conserved domain of approximately two hundred amino acid residues which includes one or more nucleotide binding domains. Mutations in ABC transporter genes are associated with various disorders, such as hyperbilirubinemia II/Dubin- Johnson syndrome, recessive Stargardt's disease, X-linked adrenoleukodystrophy, multidrug resistance, celiac disease, and cystic fibrosis. Peripheral and Anchored Membrane Proteins
Some membrane proteins are not membrane-spanning but are attached to the plasma membrane via membrane anchors or interactions with integral membrane proteins. Membrane anchors are covalently joined to a protein post-translationally and include such moieties as prenyl, myristyl, and glycosylphosphatidyl inositol groups. Membrane localization of peripheral and anchored proteins is important for their function in processes such as receptor-mediated signal transduction. For example, prenylation of Ras is required for its localization to the plasma membrane and for its normal and oncogenic functions in signal transduction. Vesicle Coat Proteins
Intercellular communication is essential for the development and survival of multicellular organisms. Cells communicate with one another through the secretion and uptake of protein signaling molecules. The uptake of proteins into the cell is achieved by the endocytic pathway, in which the interaction of extracellular signaling molecules with plasma membrane receptors results in the formation of plasma membrane-derived vesicles that enclose and transport the molecules into the cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) compartments. The secretion of proteins from the cell is achieved by exocytosis, in which molecules inside of the cell proceed through the secretory pathway. In this pathway, molecules transit from the ER to the Golgi apparatus and finally to the plasma membrane, where they are secreted from the cell. Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles. Specifically, vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. Vesicle formation occurs when a region of membrane buds off from the donor organelle. The membrane-bound vesicle contains proteins to be transported and is surrounded by a proteinaceous coat, the components of which are recruited from the cytosol. Two different classes of coat protein have been identified. Clathrin coats form on vesicles derived from the TGN and PM, whereas coatomer (COP) coats form on vesicles derived from the ER and Golgi. COP coats can be further classified as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPπ, involved in anterograde traffic from the ER to the Golgi (Mellman, supra). In clathrin-based vesicle formation, adapter proteins bring vesicle cargo and coat proteins together at the surface of the budding membrane. Adapter protein- 1 and -2 select cargo from the TGN and plasma membrane, respectively, based on molecular information encoded on the cytoplasmic tail of integral membrane cargo proteins. Adapter proteins also recruit clathrin to the bud site. Clathrin is a protein complex consisting of three large and three small polypeptide chains arranged in a three-legged structure called a triskelion. Multiple triskehons and other coat proteins appear to self-assemble on the membrane to form a coated pit. This assembly process may serve to deform the membrane into a budding vesicle. GTP-bound ADP-ribosylation factor (Arf) is also incorporated into the coated assembly. Another small G-protein, dynamin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, thereby releasing the vesicle. The coated vesicle complex is then transported through the cytosol. During the transport process, Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254).
Vesicles which bud from the ER and the Golgi are covered with a protein coat similar to the clathrin coat of endocytic and TGN vesicles. The coat protein (COP) is assembled from cytosolic precursor molecules at specific budding regions on the organelle. The COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. The coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane proteins. These include the KKXX retrieval motif of membrane proteins of the ER and dibasic/diphenylamine motifs of members of the p24 family. The p24 family of type I membrane proteins represent the major membrane proteins of COPI vesicles (Harter, C. and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654).
Organelle Associated Molecules Eukaryotic cells are organized into various cellular organelles which has the effect of separating specific molecules and their functions from one another and from the cytosol. Within the cell, various membrane structures surround and define these organelles while allowing them to interact with one another and the cell environment through both active and passive transport processes. Important cell organelles include the nucleus, the Golgi apparatus, the endoplasmic reticulum, mitochondria, peroxisomes, lysosomes, endosomes, and secretory vesicles. Nucleus The cell nucleus contains all of the genetic information of the cell in the form of DNA, and the components and machinery necessary for replication of DNA and for transcription of DNA into RNA. (See Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing Inc., New York NY, pp. 335-399.) DNA is organized into compact structures in the nucleus by interactions with various DNA-binding proteins such as histones and non-histone chromosomal proteins. DNA-specific nucleases, DNAses, partially degrade these compacted structures prior to DNA replication or transcription. DNA replication takes place with the aid of DNA helicases which unwind the double-stranded DNA helix, and DNA polymerases that duplicate the separated DNA strands. Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene transcription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes PV, Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554-570.) Many transcription factors incorporate DNA-binding structural motifs which comprise either α helices or β sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA.
Many neoplastic disorders in humans can be attributed to inappropriate gene expression. Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentially contributing to malignancy.
In addition, the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is well documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and infections (Isselbacher, K.J. et al. (1996) Harrison's Principles of Internal Medicine. 13/e, McGraw Hill, Inc. and Teton Data Systems Software).
Transcription of DNA into RNA also takes place in the nucleus catalyzed by RNA polymerases. Three types of RNA polymerase exist. RNA polymerase I makes large ribosomal RNAs, while RNA polymerase III makes a variety of small, stable RNAs including 5S ribosomal RNA and the transfer RNAs (tRNA). RNA polymerase π transcribes genes that will be translated into proteins. The primary transcript of RNA polymerase II is called heterogenous nuclear RNA (hnRNA), and must be further processed by splicing to remove non-coding sequences called introns. RNA splicing is mediated by small nuclear ribonucleoprotein complexes, or snRNPs, producing mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into proteins. Nucleolus The nucleolus is a highly organized subcompartment in the nucleus that contains high concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly (Alberts, et al. supra, pp. 379-382), Ribosomal RNA (rRNA) is a structural RNA that is complexed with proteins to form ribonucleoprotein structures called ribosomes. Ribosomes provide the platform on which protein synthesis takes place. Ribosomes are assembled in the nucleolus initially from a large, 45S rRNA combined with a variety of proteins imported from the cytoplasm, as well as smaller, 5S rRNAs. Later processing of the immature ribosome results in formation of smaller ribosomal subunits which are transported from the nucleolus to the cytoplasm where they are assembled into functional ribosomes. Endoplasmic Reticulum In eukaryotes, proteins are synthesized within the endoplasmic reticulum (ER), delivered from the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the Golgi to specific intracellular and extracellular destinations. Synthesis of integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organelle occurs on the rough endoplasmic reticulum (ER). The rough ER is so named because of the rough appearance in electron micrographs imparted by the attached ribosomes on which protein synthesis proceeds. Synthesis of proteins destined for the ER actually begins in the cytosol with the synthesis of a specific signal peptide which directs the growing polypeptide and its attached ribosome to the ER membrane where the signal peptide is removed and protein synthesis is completed. Soluble proteins destined for the ER lumen, for secretion, or for transport to the lumen of other organelles pass completely into the ER lumen. Transmembrane proteins destined for the ER or for other cell membranes are translocated across the ER membrane but remain anchored in the lipid bilayer of the membrane by one or more membrane-spanning α-helical regions.
Translocated polypeptide chains destined for other organelles or for secretion also fold and assemble in the ER lumen with the aid of certain "resident" ER proteins. Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl- prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imide bonds in oligopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation. The cyclophilins represent a major class of PPI that was originally identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226:544-547). Molecular "chaperones" such as BiP (binding protein) in the ER recognize incorrectly folded proteins as well as proteins not yet folded into their final form and bind to them, both to prevent improper aggregation between them, and to promote proper folding.
The "N-linked" glycosylation of most soluble secreted and membrane-bound proteins by ohgosacchrides linked to asparagine residues in proteins is also performed in the ER. This reaction is catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. Golgi Apparatus
The Golgi apparatus is a complex structure that lies adjacent to the ER in eukaryotic cells and serves primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra, pp. 600-610). Additional posttranslational processing, principally additional glycosylation, also occurs in the Golgi. Indeed, the Golgi is a major site of carbohydrate synthesis, including most of the glycosaminoglycans of the extracellular matrix. N-linked oligosaccharides, added to proteins in the ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- linked oligosaccharides. "O-linked" glycosylation of proteins also occurs in the Golgi by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar residues to the first. This process is catalyzed by a series of glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular Cell Biology. W.H. Freeman and Co., New York NY, pp.700- 708). In many cases, both N- and O-linked oligosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the cell surface.
The terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport (or secretory) vesicles destined for intracellular compartments, such as lysosomes, bud off of the TGN. Other transport vesicles bud off containing proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Vacuoles
The vacuole system is a collection of membrane bound compartments in eukaryotic cells that functions in the processes of endocytosis and exocytosis. They include phagosomes, lysosomes, endosomes, and secretory vesicles. Endocytosis is the process in cells of internalizing nutrients, solutes or small particles (pinocytosis) or large particles such as internalized receptors, viruses, bacteria, or bacterial toxins (phagocytosis). Exocytosis is the process of transporting molecules to the cell surface. It facilitates placement or localization of membrane-bound receptors or other membrane proteins and secretion of hormones, neurotransmitters, digestive enzymes, wastes, etc.
A common property of all of these vacuoles is an acidic pH environment ranging from approximately pH 4.5-5.0. This acidity is maintained by the presence of a proton ATPase that uses the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700). Eukaryotic vacuolar proton ATPase (vp-ATPase) is a multimeric enzyme composed of 3-10 different subunits. One of these subunits is a highly hydrophobic polypeptide of approximately 16 kDa that is similar to the proteolipid component of vp-ATPases from eubacteria, fungi, and plant vacuoles (Mandel, M. et al. (1988) Proc. Natl. Acad. Sci. USA 85:5521-5524). The 16 kDa proteolipid component is the major subunit of the membrane portion of vp-ATPase and functions in the transport of protons across the membrane. Lysosomes
Lysosomes are membranous vesicles containing various hydrolytic enzymes used for the controlled intracellular digestion of macromolecules. Lysosomes contain some 40 types of enzymes including proteases, nucleases, glycosidases, lipases, phospholipases, phosphatases, and sulfatases, all of which are acid hydrolases that function at a pH of about 5. Lysosomes are surrounded by a unique membrane containing transport proteins that allow the final products of macromolecule degradation, such as sugars, amino acids, and nucleotides, to be transported to the cytosol where they may be either excreted or reutilized by the cell. A vp-ATPase, such as that described above, maintains the acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-611). Endosomes
Endosomes are another type of acidic vacuole that is used to transport substances from the cell surface to the interior of the cell in the process of endocytosis. Like lysosomes, endosomes have an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of endosomes are apparent based on tracer uptake studies that distinguish their time of formation in the cell and their cellular location. Early endosomes are found near the plasma membrane and appear to function primarily in the recycling of internalized receptors back to the cell surface. Late endosomes appear later in the endocytic process close to the Golgi apparatus and the nucleus, and appear to be associated with delivery of endocytosed material to lysosomes or to the TGN where they may be recycled. Specific proteins are associated with particular transport vesicles and their target compartments that may provide selectivity in targeting vesicles to their proper compartments. A cytosolic prenylated GTP-binding protein, Rab, is one such protein. Rabs 4, 5, and 11 are associated with the early endosome, whereas Rabs 7 and 9 associate with the late endosome. Mitochondria
Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane. The outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in cells.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initially converted to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to CO2 coupled by enzymes to the transport of electrons from NADH and FADH2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P;. Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccmylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH,
FADH2, and GTP. In oxidative phosphorylation, the transfer of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P; by the FQF, ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the FQFJ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome cls FeS protein, and cytochrome c oxidase. Peroxisomes
Peroxisomes, like mitochondria, are a major site of oxygen utilization. They contain one or more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen atoms from specific organic substrates in an oxidative reaction that produces hydrogen peroxide (Alberts, supra, pp. 574-577). Catalase oxidizes a variety of substrates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of liver and kidney cells for detoxifying various toxic molecules that enter the bloodstream. Another major function of oxidative reactions in peroxisomes is the breakdown of fatty acids in a process called β oxidation, β oxidation results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted to acetyl CoA and exported to the cytosol for reuse in biosynthetic reactions.
Also like mitochondria, peroxisomes import their proteins from the cytosol using a specific signal sequence located near the C-terminus of the protein. The importance of this import process is evident in the inherited human disease Zellweger syndrome, in which a defect in importing proteins into perixosomes leads to a perixosomal deficiency resulting in severe abnormalities in the brain, liver, and kidneys, and death soon after birth. One form of this disease has been shown to be due to a mutation in the gene encoding a perixosomal integral membrane protein called peroxisome assembly factor- 1.
The discovery of new human molecules satisfies a need in the art by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
5
SUMMARY OF THE INVENTION
The present invention relates to nucleic acid sequences comprising human diagnostic and therapeutic polynucleotides (dithp) as presented in the Sequence Listing. The dithp uniquely identify genes encoding human structural, functional, and regulatory molecules. o The invention provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the 5 polynucleotide of b); and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ JJD NO: 1-275. another alternative, the polynucleotide comprises at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a o polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In another alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a 5 polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the o polynucleotide of b); and e) an RNA equivalent of a) through d). The invention further provides a composition for the detection of expression of human diagnostic and therapeutic polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide 5 sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d); and a detectable label.
The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polyneucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The method comprises a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 30 contiguous nucleotides. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 60 contiguous nucleotides.
The invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ JD NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In one alternative, the invention provides a cell transformed with the recombinant polynucleotide. i another alternative, the invention provides a transgenic organism comprising the recombinant polynucleotide.
The invention also provides a method for producing a human diagnostic and therapeutic polypeptide, the method comprising a) culturing a cell under conditions suitable for expression of the human diagnostic and therapeutic polypeptide, wherein said cell is transformed with a recombinant 5 polynucleotide, said recombinant polynucleotide comprising an isolated polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ JD NO: 1-275; iii) a polynucleotide complementary to the polynucleotide of i); iv) o a polynucleotide complementary to the polynucleotide of ii) ; and v) an RNA equivalent of i) through iv), and b) recovering the human diagnostic and therapeutic polypeptide so expressed. The invention additionally provides a method wherein the polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553.
The invention also provides an isolated human diagnostic and therapeutic polypeptide 5 (DΓTHP) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275. The invention further provides a method of screening for a test compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553. The method comprises a) combining the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276- 0 553 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276- 553 to the test compound, thereby identifying a compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553.
The invention further provides a microarray wherein at least one element of the microarray is 5 an isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-275; c) a polynucleotide complementary to the polynucleotide of a); d) a o polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The invention also provides a method for generating a transcript image of a sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of 5 the polynucleotides in the sample.
Additionally, the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The method comprises a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
The invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of i) through iv). Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 -275 ; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of i) through iv), and alternatively, the target polynucleotide comprises a polynucleotide sequence of a fragment of a polynucleotide selected from the group consisting of i-v above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound. The invention further provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553. In one alternative, the invention provides an isolated polypeptide comprising an amino acid sequence selected from the 5 group consisting of SEQ ID NO:276-553.
The invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of o SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553. In one alternative, the polynucleotide encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553. In another alternative, the 5 polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275.
Additionally, the invention provides an isolated antibody which specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553, b) a polypeptide comprising a o naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553. 5 The invention further provides a composition comprising a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence o selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 276- 553, and a pharmaceutically acceptable excipient. In one embodiment, the composition comprises a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO : 276- 553. The invention additionally provides a method of treating a disease or condition associated with 5 decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition. The invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample. In one alternative, the invention provides a composition comprising an agonist compound identified by the method and a pharmaceutically acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition. Additionally, the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from tfre group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO-.276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO-.276-553. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample. In one alternative, the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceutically acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
The invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ JD NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553. The method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
5
DESCRIPTION OF THE TABLES
Table 1 shows the sequence identification numbers (SEQ JD NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with the sequence identification numbers (SEQ ID NO:s) and open reading frame identification o numbers (ORF IDs) corresponding to polypeptides encoded by the template ID.
Table 2 shows the sequence identification numbers (SEQ JD NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits. 5 Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the o polynucleotide segments are indicated.
Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to-the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the 5 polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated. The membrane topology of the encoded polypeptide sequence is indicated, the N-terminus (N) listed as being oriented to either the cytosolic (N in) or non- cytosolic (N out) side of the cell membrane or organelle.
Table 5 shows the sequence identification numbers (SEQ ID NO:s) and template 0 identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) corresponding to each template. The component sequences, which were used to assemble the template sequences, are defined by the indicated "start" and "stop" nucleotide positions along each template.
Table 6 shows the tissue distribution profiles for the templates of the invention. 5 Table 7 shows the sequence identification numbers (SEQ ID NO:s) corresponding to the polypeptides of the present invention, along with the reading frames used to obtain the polypeptide segments, the lengths of the polypeptide segments, the "start" and "stop" nucleotide positions of the polynucleotide sequences used to define the encoded polypeptide segments, the GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits. Table 8 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention. The first .column of Table 8 lists analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
DETAILED DESCRIPTION OF THE INVENTION
Before the nucleic acid sequences and methods are presented, it is to be understood that this invention is not limited to the particular machines, methods, and materials described. Although particular embodiments are described, machines, methods, and materials similar or equivalent to these embodiments may be used to practice the invention. The preferred machines, methods, and materials set forth are not intended to limit the scope of the invention which is limited only by the appended claims.
The singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. All technical and scientific terms have the meanings commonly understood by one of ordinary skill in the art. All publications are incorporated by reference for the purpose of describing and disclosing the cell lines, vectors, and methodologies which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.
Definitions
As used herein, the lower case "dithp" refers to a nucleic acid sequence, while the upper case "DΓTHP" refers to an amino acid sequence encoded by dithp. A "full-length" dithp refers to a nucleic acid sequence containing the entire coding region of a gene endogenously expressed in human tissue. "Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence. The present invention encompasses allelic dithp.
"Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence.
"Amplification" refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies well known in the art. "Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab')2, and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind DITHP polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal.
"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target sequence. The antisense sequence can be DNA, RNA, or any nucleic acid mimic or analog.
"Antisense technology" refers to any technology which relies on the specific hybridization of an antisense sequence to a target sequence.
A "bin" is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program. "Biologically active" refers to an amino acid sequence having a structural, regulatory, or biochemical function of a naturally occurring amino acid sequence.
"Clone joining" is a process for combining gene bins based upon the bins' containing sequence information from the same clone. The sequences may assemble into a primary gene transcript as well as one or more splice variants. "Complementary" describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3'-T-C-A-5'). A "component sequence" is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVJJBW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
"Conservative amino acid substitutions" are those substitutions that, when made, least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions. The table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions.
Original Residue : Conservative Substitution Ala Gly, Ser
Arg His, Lys
Asn Asp, Gin, His
Asp Asn, Glu
Cys Ala, Ser Gin Asn, Glu, His
Glu Asp, Gin, His
Gly Ala
His Asn, Arg, Gin, Glu
He Leu, Val Leu JJe, Val
Lys Arg, Gin, Glu
Met Leu, He
Phe His, Met, Leu, Trp, Tyr
Ser Cys, Tin- Thr Ser, Val
Trp Phe, Tyr
Tyr His, Phe, Trp Val He, Leu, Thr
Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent.
"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group. "Differential expression" refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons may be carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample. 5 The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
"E-value" refers to the statistical probability that a match between two sequences occurred by chance.
"Exon shuffling" refers to the recombination of different coding regions (exons). Since an 0 exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus allowing acceleration of the evolution of new protein functions.
A "fragment" is a unique portion of dithp or DITHP which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of the 5 defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments may be preferentially selected from certain regions of a molecule. For example, a o polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first
250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the Sequence Listing and the figures, may be encompassed by the present embodiments. A fragment of dithp comprises a region of unique polynucleotide sequence that specifically 5 identifies dithp, for example, as distinct from any other sequence in the same genome. A fragment of dithp is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish dithp from related polynucleotide sequences. The precise length of a fragment of dithp and the region of dithp to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment. 0 A fragment of DITHP is encoded by a fragment of dithp. A fragment of DITHP comprises a region of unique amino acid sequence that specifically identifies DITHP. For example, a fragment of DΠΉP is useful as an immunogenic peptide for the development of antibodies that specifically recognize DITHP. The precise length of a fragment of DITHP and the region of DITHP to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the 5 intended purpose for the fragment. A "full length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" polypeptide.
"Hit" refers to a sequence whose annotation will be used to describe a given template. Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E-value. "Homology" refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of a dithp or between a reference amino acid sequence and a fragment of a DITHP.
"Hybridization" refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step. The defined hybridization conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency.
Generally, stringency of hybridization is expressed with reference to the temperature under which the wash step is carried out. Generally, such wash temperatures are selected to be about 5°C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. An equation for calculating Tm and conditions for nucleic acid hybridization is well known and can be found in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; specifically see volume 2, chapter 9.
High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%. Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 μg/ml. Useful variations on these conditions will be readily apparent to those skilled in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins.
Other parameters, such as temperature, salt concentration, and detergent concentration may be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skill in the art.
"Immunologically active" or "immunogenic" describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell lines.
"Insertion" or "addition" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or residue, respectively, is added to the sequence.
"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
"Microarray" is any arrangement of nucleic acids, amino acids, antibodies, etc., on a substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or an appropriate membrane.
"Linkers" are short stretches of nucleotide sequence which may be added to a vector or a dithp to create restriction endonuclease sites to facilitate cloning. "Polylmkers" are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and HindHT) and those which provide blunt ends (e.g., EcoRV, SnaBL and Stul).
"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be isolated from viruses or prokaryotic or eukaryotic cells.
"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand.
"Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used as, e.g., primers for PCR, and are usually chemically synthesized. "Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
"Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA.
The phrases "percent identity" and "% identity", as applied to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
Percent identity between polynucleotide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in Higgins, D.G. and Sharp, P.M. (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) CABIOS 8:189-191. For pairwise alignments of polynucleotide sequences, the default parameters are set as follows: Ktuple=2, gap ρenalty=5, window=4, and "diagonals saved"=4. The "weighted" residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polynucleotide sequence pairs.
Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to determine alignment between a known polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/. The "BLAST 2 Sequences" tool can be used for both blastn and blastp (discussed below). BLAST programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example: Matrix: BLOSUM62
Reward for match: 1
Penalty for mismatch: -2
Open Gap: 5 and Extension Gap: 2 penalties Gap x drop-off: 50
Expect: 10
Word Size: 11
Filter: on
Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein. The phrases "percent identity" and "% identity", as applied to polypeptide sequences, refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide.
Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program (described and referenced above). For pairwise alignments of polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple=l, gap ρenalty=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default residue weight table. As with polynucleotide alignments, the percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs.
Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) with blastp set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62 Open Gap: 11 and Extension Gap: 1 penalty
Gap x drop-off: 50
Expect: 10
Word Size: 3 Filter: on
Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ JD number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
"Post-translational modification" of a DITHP may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu and the DITHP.
"Probe" refers to dithp or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR). Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the figures and Sequence Listing, may be used.
Methods for preparing and using probes and primers are described in the references, for example Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; Ausubel et al., 1987, Current Protocols in Molecular Biology. Greene Publ. Assoc. & Wiley-Intersciences, New York NY; Innis et al., 1990, PCR Protocols, A Guide to Methods and Applications, Academic Press, San Diego CA. PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which . sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above. "Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are naturally associated.
A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra. The term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell. Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
"Regulatory element" refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host proteins to carry out or regulate transcription or translation.
"Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose. "Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues). "Specific binding" or "specifically binding" refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
"Substitution" refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
"Substrate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles or capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
A "transcript image" refers to the collective pattern of gene expression by a particular tissue or cell type under given conditions at a given time. "Transformation" refers to a process by which exogenous DNA enters a recipient cell.
Transformation may occur under natural or artificial conditions using various methods well known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell being transformed.
"Transformants" include stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as cells which transiently express inserted DNA or RNA.
A "transgenic organism," as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art. The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, and plants and animals. The isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra. A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length. The variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties. A variant may be described as, for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" variant. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
In an alternative, variants of the polynucleotides of the present invention may be generated through recombinant methods. One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number
5,837,458; Chang, C-C et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of DITHP, such as its biological or enzymatic activity or its ability to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner. A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater identity over a certain defined length of one of the polypeptides.
THE INVENTION
In a particular embodiment, cDNA sequences derived from human tissues and cell lines were aligned based on nucleotide sequence identity and assembled into "consensus" or "template" sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 2. The sequence identification numbers (SEQ JD NO:s) corresponding to the template IDs are shown in column 1. The template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is indicated by a probability score in column 4, and the functional annotation corresponding to each GenBank hit is listed in column 5. The invention incorporates the nucleic acid sequences of these templates as disclosed in the Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in human molecules. The invention further utilizes these sequences in hybridization and amplification technologies, and in particular, in technologies which assess gene 5 expression patterns correlated with specific cells or tissues and their responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a transcript image for a particular cell or tissue.
Derivation of Nucleic Acid Sequences 0 cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines. The human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Genomics, Inc. 5 (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. o Such cell lines include, for example, THP-1 , Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of Ieukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 5
Sequencing of the cDNAs
Methods for DNA sequencing are well known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. Biochemical Corporation, Cleveland OH), Taq polymerase (Applied Biosystems, Foster City CA), o thermostable T7 polymerase (Amersham Pharmacia Biotech, ie. (Amersham Pharmacia Biotech),
Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA template of interest. Methods have been developed for the use of both single-stranded and double- 5 stranded templates. Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed. Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Applied Biosystems). Sequencing can be carried out using, for example, the ABI 373 or 377 (Applied Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art. The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art. Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short
Protocbls in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
Assembly of cDNA Sequences Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art. Alternatively, cDNA sequences are used as "component" sequences that are assembled into
"template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available. When additional sequences are added into the RDMS, a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves. After the new sequences have been assigned to templates, the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
Once gene bins have been generated based upon sequence alignments, bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
A resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
Analysis of the cDNA Sequences
The cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 1.1; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853; and Table 8.) These analyses comprise both reading" frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
Computer programs known to those of skill in the art for performing computer-assisted searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local
Alignment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al.
(1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropriate search tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query dithp or DITHP of the present invention.
Other approaches to the identification, assembly, storage, and display of nucleotide and polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information,"
U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incorporated by reference herein in their entirety.
Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 5 BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
Identification of Human Diagnostic and Therapeutic Molecules Encoded by dithp l o The identities of the DITHP encoded by the dithp of the present invention were obtained by analysis of the assembled cDNA sequences.
SEQ JD NO:276, SEQ ID NO:277, SEQ ID NO:278, SEQ JD NO:279, SEQ ID NO:280, SEQ JD NO:281, SEQ ID NO:282, SEQ ID NO:283, SEQ ID NO:284, SEQ JD NO:285, SEQ ID NO:286, SEQ JD NO:287, SEQ JD NO:288, SEQ ID NO:289, SEQ JD NO:290, and SEQ JD NO:291, encoded
15 by SEQ ID NO:l, SEQ JD NO:2, SEQ JD NO:3, SEQ JD NO:4, SEQ JD NO:5, SEQ JD NO:6, SEQ JD NO:7, SEQ ED NO:8, SEQ JD NO:9, SEQ JD NO: 10, SEQ HD NO: 11, SEQ ID NO: 12, SEQ JD NO:13, SEQ ID NO:14, SEQ JD NO:15, and SEQ JD NO:16, respectively, are, for example, human enzyme molecules.
SEQ ID NO.-292, SEQ JD NO:293, SEQ JD NO:294, SEQ ID NO:295, and SEQ JD NO:296,
20 encoded by SEQ JD NO:17, SEQ JD NO:18, SEQ JD NO:19, SEQ ID NO:20, and SEQ ED NO:21, respectively, are, for example, extracellular information transmission molecules.
SEQ ID NO-.297, SEQ JD NO:298, SEQ JD NO:299, SEQ JD NO:300, SEQ JD NO:301, and SEQ ID NO.302, encoded by SEQ ED NO:22, SEQ JD NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ED NO:26, and SEQ ED NO:27, respectively, are, for example, receptor molecules.
25 SEQ ED NO:303, SEQ ED NO:304, SEQ ID NO:305, SEQ JD NO:306, SEQ ID NO:307, SEQ
JD NO:308, SEQ ED NO:309, SEQ ID NO:310, SEQ ED NO:311, SEQ ID NO:312, SEQ JD NO:313, SEQ ED NO-.314, SEQ JD NO:315, SEQ ED NO:316, SEQ ED NO:317, and SEQ JD NO:318, encoded by SEQ ED NO:28, SEQ ID NO:29, SEQ JD NO:30, SEQ ID NO:31, SEQ ED NO:32, SEQ JD NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ED NO:36, SEQ JD NO:37, SEQ JD NO:38, SEQ ID NO:39,
3 o SEQ JD NO-.40, SEQ ED NO:41, SEQ ID NO:42, and SEQ ID NO:43, respectively, are, for example, intracellular signaling molecules.
SEQ ID NO:319, SEQ JD NO:320, SEQ JD N0.321, SEQ ED NO:322, SEQ JD NO:323, SEQ JD NO:324, SEQ ID NO:325, SEQ JD NO:326, SEQ ED NO:327, SEQ ID NO:328, SEQ ED NO:329, SEQ ID NO:330, SEQ ID NO:331, SEQ JD NO:332, SEQ JD NO:333, SEQ JD NO:334, SEQ ID
35 NO:335, SEQ ID NO:336, SEQ JD N0.337, SEQ JD NO:338, SEQ JD NO:339, SEQ JD NO:340, SEQ JD NO:341, SEQ JD NO:342, SEQ JD NO:343, SEQ JD NO:344, SEQ HD NO:345, SEQ ED NO:346, SEQ ID NO:347, SEQ ED NO:348, SEQ JD N0.349, SEQ ED NO:350, SEQ JD NO:351, SEQ ID NO:352, SEQ JD NO:353, SEQ ED NO:354, SEQ ID NO:355, SEQ ID NO:356, SEQ JD NO:357, SEQ JD NO:358, SEQ ID NO:359, SEQ JD NO:360, SEQ JD NO:361, SEQ JD NO:362, SEQ JD NO:363, SEQ JD NO:364, SEQ JD NO:365, SEQ JD NO:366, SEQ JD NO:367, SEQ JD NO:368, SEQ JD NO:369, SEQ ED NO:370, SEQ D3 NO:371, SEQ ID NO:372, SEQ JD NO:373, SEQ JD NO-.374, SEQ ID NO:375, SEQ ID NO:376, SEQ JD NO:377, SEQ JD NO:378, SEQ H) NO:379, SEQ ID NO:380, SEQ JD NO:381, SEQ ID NO:382, SEQ ID NO:383, SEQ ED NO:384, SEQ ED NO:385, SEQ ID NO:386, SEQ ID NO:387, SEQ ID NO:388, SEQ JD NO:389, SEQ ID NO.390, SEQ H) NO:391, SEQ ID NO:392, SEQ JD NO:393, SEQ ID NO:394, SEQ ED NO:395, SEQ ID N0.396, and SEQ JD NO:397, encoded by SEQ JD NO:44, SEQ JD NO:45, SEQ JD NO:46, SEQ JD NO:47, SEQ JD NO:48, SEQ ED NO:49, SEQ ED NO:50, SEQ JD NO:51, SEQ JD NO:52, SEQ ID NO:53, SEQ HD NO:53, SEQ ID NO:54, SEQ JD NO:55, SEQ JD NO:56, SEQ ID NO:57, SEQ JD NO:58, SEQ JD NO:59, SEQ ED NO:60, SEQ ID NO:61, SEQ JD NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ JD NO:66, SEQ JD NO:67, SEQ JD NO:68, SEQ ID NO:69, SEQ JD NO:70, SEQ ID NO:71, SEQ JD NO:72, SEQ JD NO:73, SEQ JD NO:74, SEQ HD NO:75, SEQ HD NO:76, SEQ ID NO:77, SEQ ED NO:78, SEQ HD NO:79, SEQ HD NO:80, SEQ HD NO:81, SEQ HD NO:82, SEQ HD NO:83, SEQ HD NO:84, SEQ ED NO:85, SEQ HD NO:86, SEQ HD NO:87, SEQ HD NO:88, SEQ HD NO:89, SEQ ID NO:90, SEQ HD NO:91, SEQ JD NO:92, SEQ JD NO:93, SEQ JD NO:94, SEQ ED NO:95, SEQ HD NO:96, SEQ HD NO:97, SEQ HD NO:98, SEQ ED NO:99, SEQ HD NO:100, SEQ HD NO:101, SEQ πD NO:102, SEQ HD NO:103, SEQ HD NO:104, SEQ ID NO: 105, SEQ HD NO: 106, SEQ HD NO: 107, SEQ HD NO: 108, SEQ HD NO: 109, SEQ HD NO: 110, SEQ HD NO:lll, SEQ HD NO:112, SEQ HD NO:113, SEQ HD N0.114, SEQ HD NO:115, SEQ HD NO:116, SEQ ID NO:117, SEQ HD NO:118, SEQ HD NO:119, SEQ HD NO:120, and SEQ HD NO:121, respectively, are, for example, transcription factor molecules. SEQ HD NO:398, SEQ ID NO:399, SEQ HD NO.400, SEQ HD NO:401, SEQ HD NO:402, SEQ
HD NO:403, SEQ ID NO:404, and SEQ ID NO:405, encoded by SEQ ED NO: 122, SEQ ED NO: 123, SEQ HD NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ED NO: 127, SEQ D NO: 128, and SEQ ID NO:129, respectively, are, for example, membrane transport molecules.
SEQ HD NO.-406, SEQ HD NO.-407, SEQ ID NO:408, SEQ HD NO.-409, SEQ HD NO:410, SEQ HD NO:411, SEQ HD NO:412, SEQ ID NO:413, SEQ ED NO:414, SEQ HD NO:415, SEQ ED NO:416, SEQ HD N0.417, SEQ HD NO:418, SEQ HD NO:419, SEQ HD NO:420, SEQ HD NO:421, SEQ ID NO:422, and SEQ ED NO:423, encoded by SEQ HD NO: 130, SEQ ED NO: 131, SEQ HD NO: 132, SEQ ID NO: 133, SEQ HD NO: 134, SEQ ED NO: 135, SEQ ED NO: 136, SEQ HD NO: 137, SEQ HD NO: 138, SEQ HD NO: 139, SEQ HD NO: 140, SEQ HD NO: 141, SEQ HD NO: 142, SEQ HD NO: 143, SEQ HD NO:144, SEQ HD NO:145, SEQ JD NO:146, and SEQ HD NO:147, respectively, are, for example, protein modification and maintenance molecules. SEQ HD NO:424, SEQ ED NO:425, SEQ HD NO:426, SEQ HD NO:427, SEQ HD NO:428, SEQ HD NO:429, SEQ HD NO:430, SEQ HD NO:431, SEQ HD NO:432, SEQ HD NO:433, SEQ HD NO:434, and SEQ Ω NO:435, encoded by SEQ HD NO: 148, SEQ HD NO: 149, SEQ HD NO: 150, SEQ HD NO:151, SEQ ID NO:152, SEQ JD NO:153, SEQ JD NO:154, SEQ JD NO:155, SEQ JD NO:156, SEQ JD NO: 157, SEQ HD NO: 158, and SEQ ID NO: 159, respectively, are, for example, nucleic acid synthesis and modification molecules.
SEQ HD NO:436, encoded by SEQ HD NO: 160 is, for example, an adhesion molecule.
SEQ ID NO:437, SEQ HD NO:438, and SEQ HD N0.439, encoded by SEQ JD NO: 161, SEQ ED NO: 162, and SEQ HD NO: 163, respectively, are, for example, antigen recognition molecules. SEQ HD NO:440, SEQ HD NO:441, SEQ HD NO:442, and SEQ HD NO:443, encoded by SEQ
HD NO: 164, SEQ HD NO: 165, SEQ ED NO: 166, and SEQ ED NO: 167, respectively, are, for example, electron transfer associated molecules.
SEQ HD NO:444, SEQ ID NO:445, SEQ ID NO:446, SEQ ID NO:447, SEQ HD NO:448, and SEQ HD NO:449, encoded by SEQ HD NO: 168, SEQ HD NO: 169, SEQ ID NO: 170, SEQ HD NO: 171, SEQ HD NO: 172, and SEQ HD NO: 173, respectively, are, for example, secreted/extracellular matrix molecules.
SEQ HD NO:450, SEQ ID NO-.451, SEQ H N0.452, SEQ HD NO:453, SEQ HD NO:454, SEQ ED NO:455, SEQ HD NO:456, SEQ ED NO:457, SEQ HD NO:458, SEQ ED NO:459, SEQ ED NO:460, and SEQ HD NO:461, encoded by SEQ HD NO: 174, SEQ HD NO: 175, SEQ HD NO: 176, SEQ HD NO:177, SEQ HD NO:178, SEQ HD NO:179, SEQ HD NO:180, SEQ ED NO:181, SEQ ID NO:182, SEQ JD NO: 183, SEQ ID NO: 184, and SEQ JD NO: 185, respectively, are, for example, cytoskeletal molecules.
SEQ ED NO:462, SEQ ED NO:463, SEQ ID NO:464, SEQ HD NO:465, SEQ HD NO:466, SEQ HD NO:467, SEQ ED NO:468, SEQ ED NO:469, SEQ HD NO:470, and SEQ HD NO:471, encoded by SEQ HD NO:186, SEQ ID NO:187, SEQ HD NO:188, SEQ HD NO:189, SEQ HD NO:190, SEQ ED
NO:191, SEQ HD NO:192, SEQ HD NO 93, SEQ HD NO:194, and SEQ HD NO:195, respectively, are, for example, cell membrane molecules.
SEQ HD NO:472, SEQ ID NO:473, SEQ ID NO:474, SEQ HD NO:475, SEQ HD NO:476, SEQ HD NO:477, SEQ ED NO:478, SEQ ED NO:479, SEQ ED NO:480, SEQ ID NO:481, SEQ ID NO:482, SEQ ED NO:483, SEQ ID NO:484, SEQ ED NO:485, SEQ HD NO:486, SEQ HD NO:487, SEQ ID NO:488, SEQ ID NO:489, SEQ ID NO:490, SEQ HD NO:491, SEQ HD NO:492, SEQ HD NO:493, SEQ HD NO:494, SEQ ED NO:495, SEQ D NO:496, SEQ HD NO:497, SEQ ED NO:498, SEQ ID NO:499, SEQ HD NO:500, SEQ HD NO:501, SEQ HD NO:502, SEQ JD NO:503, SEQ HD NO.504, SEQ HD NO:505, SEQ HD NO:506, SEQ HD NO:507, SEQ HD NO:508, SEQ HD NO:509, SEQ HD NO.510, SEQ HD NO:511, SEQ HD NO:512, SEQ HD NO:513, SEQ HD NO:514, SEQ HD NO:515, SEQ HD NO:516, SEQ ID NO:517, and SEQ JD NO:518, encoded by SEQ ID NO:196, SEQ HD NO:197, SEQ JD NO:198, SEQ JD NO:199, SEQ JD NO:200, SEQ JD NO:201, SEQ JD NO:202, SEQ ED NO:203, SEQ ID NO:204, SEQ HD NO:205, SEQ HD NO:206, SEQ HD NO:207, SEQ ED NO-.208, SEQ ID NO:209, SEQ ID NO:210, SEQ HD N0:211, SEQ π NO:212, SEQ HD NO:213, SEQ HD NO:214, SEQ HD NO:215, SEQ HD NO:216, SEQ HD NO:217, SEQ HD NO:218, SEQ HD NO:219, SEQ HD NO:220, SEQ ID NO:221, SEQ HD NO:222, SEQ HD NO:223, SEQ ID NO:224, SEQ HD NO:225, SEQ HD NO:226, SEQ HD NO:227, SEQ ED NO:228, SEQ ED NO:229, SEQ HD NO:230, SEQ HD NO:231, SEQ ID NO:231, SEQ HD NO:232, SEQ ID NO:233, SEQ HD NO:234, SEQ HD NO:235, SEQ HD NO:236, SEQ HD NO:237, SEQ HD N0.238, SEQ HD N0.239, SEQ HD NO:240, and SEQ HD NO:241, respectively, are, for example, ribosomal molecules. SEQ JD NO:519, SEQ JD NO:520, and SEQ ID NO:521, encoded by SEQ JD NO:242, SEQ
HD NO-.243, and SEQ ID NO:244, respectively, are, for example, chromatin molecules.
SEQ HD NO:522, SEQ ID NO:523, SEQ ID NO:524, SEQ HD NO:525, SEQ HD NO:526, SEQ HD NO-.527, SEQ HD NO:528, SEQ HD NO:529, and SEQ HD NO:530, encoded by SEQ HD NO:245, SEQ HD NO:246, SEQ πD NO:247, SEQ ED NO:248, SEQ HD NO:249, SEQ HD NO:250, SEQ HD NO:251 , SEQ ED NO:252, and SEQ ID NO:253, respectively, are, for example, organelle associated molecules.
SEQ HD NO:531, SEQ JD NO:532, SEQ HD NO:533, SEQ HD NO:534, SEQ HD NO:535, SEQ HD NO:536, SEQ HD NO.537, SEQ HD NO:538, SEQ HD NO:539, SEQ HD NO:540, SEQ ED NO:541, SEQ JD NO.-542, SEQ ID NO:543, and SEQ JD NO:544, encoded by SEQ JD NO:254, SEQ ID NO:255, SEQ HD NO:256, SEQ JD NO:256, SEQ HD NO:257, SEQ HD NO:258, SEQ ED NO:259, SEQ HD NO:260, SEQ JD NO:261, SEQ HD NO:262, SEQ HD NO:263, SEQ HD NO:264, SEQ flD NO:265, and SEQ HD NO:266, respectively, are, for example, biochemical pathway molecules.
SEQ HD NO:545, SEQ ID NO:546, SEQ HD NO:547, SEQ HD NO:548, SEQ HD NO:549, SEQ HD NO:550, SEQ ED NO:551, SEQ HD NO:552, and SEQ HD NO:553, encoded by SEQ HD NO:267, SEQ HD NO:268, SEQ ID NO:269, SEQ HD NO:270, SEQ HD NO:271, SEQ HD NO:272, SEQ ID
NO:273, SEQ HD NO:274, and SEQ HD NO:275, respectively, are, for example, molecules associated with growth and development.
Sequences of Human Diagnostic and Therapeutic Molecules The dithp of the present invention may be used for a variety of diagnostic and therapeutic purposes. For example, a dithp may be used to diagnose a particular condition, disease, or disorder associated with human molecules. Such conditions, diseases, and disorders include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; an autoimmune/inflammatory disorder, such as inflammation, actinic keratosis, acquired 5 immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, allergies, ankylosing spondylitis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, 0 Goodpasture' s syndrome, gout, Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, 5 Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic cancer including lymphoma, leukemia, and myeloma; an infection caused by a viral agent classified as adenovirus, arenavirus, bunyavirus, calicivirus, coronavirus, filovirus, hepadnavirus, herpesvirus, o flavivirus, orthomyxovirus, parvovirus, papovavirus, paramyxovirus, picornavirus, poxvirus, reovirus, retrovirus, rhabdovirus, or togavirus; an infection caused by a bacterial agent classified as pneumococcus, staphylococcus, streptococcus, bacillus, corynebacterium, clostridium, meningococcus, gonococcus, listeria, moraxella, kingella, haemophilus, legionella, bordetella, gram- negative enterobacterium including shigella, salmonella, or campylobacter, pseudomonas, vibrio, 5 brucella, francisella, yersinia, bartonella, norcardium, actinomyces, mycobacterium, spirochaetale, rickettsia, chlamydia, or mycoplasma; an infection caused by a fungal agent classified as aspergillus, blastomyces, dermatophytes, cryptococcus, coccidioides, malasezzia, histoplasma, or other mycosis- causing fungal agent; and an infection caused by a parasite classified as plasmodium or malaria- causing, parasitic entamoeba, leishmania, trypanosoma, toxoplasma, pneumocystis carinii, intestinal o protozoa such as giardia, trichomonas, tissue nematode such as trichinella, intestinal nematode such as ascaris, lymphatic filarial nematode, trematode such as schistosoma, and cestrode such as tapeworm; a developmental disorder such as renal tubular acidosis, anemia, Cushing's syndrome, achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, epilepsy, gonadal dysgenesis, WAGR syndrome (Wihns' tumor, aniridia, genitourinary abnormalities, and mental retardation), 5 Smith-Magenis syndrome, myelodysplastic syndrome, hereditary mucoepithelial dysplasia, hereditary keratodermas, hereditary neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, hypothyroidism, hydrocephalus, seizure disorders such as Syndenham's chorea and cerebral palsy, spina bifida, anencephaly, craniorachischisis, congenital glaucoma, cataract, and sensorineural hearing loss; an endocrine disorder such as a disorder of the hypothalamus and/or pituitary resulting from lesions such as a primary brain tumor, adenoma, infarction associated with pregnancy, hypophysectomy, aneurysm, vascular malformation, thrombosis, infection, immunological disorder, and complication due to head trauma; a disorder associated with hypopituitarism including hypogonadism, Sheehan syndrome, diabetes insipidus, Kallman's disease, Hand-Schuller-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sella syndrome, and dwarfism; a disorder associated with hyperpituitarism including acromegaly, giantism, and syndrome of inappropriate antidiuretic hormone (ADH) secretion (SIADH) often caused by benign adenoma; a disorder associated with hypothyroidism including goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism; a disorder associated with hyperthyroidism including thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease; a disorder associated with hyperparathyroidism including Conn disease (chronic hypercalemia); a pancreatic disorder such as Type I or Type H diabetes mellitus and associated complications; a disorder associated with the adrenals such as hyperplasia, carcinoma, or adenoma of the adrenal cortex, hypertension associated with alkalosis, amyloidosis, hypokalemia, Gushing' s disease, Liddle's syndrome, and Arnold-Healy-Gordon syndrome, pheochromocytoma tumors, and Addison's disease; a disorder associated with gonadal steroid hormones such as: in women, abnormal prolactin production, infertility, endometriosis, perturbation of the menstrual cycle, polycystic ovarian disease, hyperprolactinemia, isolated gonadotropin deficiency, amenorrhea, galactorrhea, hermaphroditism, hirsutism and virilization, breast cancer, and, in post-menopausal women, osteoporosis; and, in men, Leydig cell deficiency, male climacteric phase, and germinal cell aplasia, a hypergonadal disorder associated with Leydig cell tumors, androgen resistance associated with absence of androgen receptors, syndrome of 5 α-reductase, and gynecomastia; a metabolic disorder such as Addison's disease, cerebrotendinous xanthomatosis, congenital adrenal hyperplasia, coumarin resistance, cystic fibrosis, diabetes, fatty hepatocirrhosis, fructose- 1,6-diphosphatase deficiency, galactosemia, goiter, glucagonoma, glycogen storage diseases, hereditary fructose intolerance, hyperadrenalism, hypoadrenalism, hyperparathyroidism, hypoparathyroidism, hypercholesterolemia, hyperthyroidism, hypoglycemia, hypothyroidism, hyperlipidemia, hyperlipemia, lipid myopathies, lipodystrophies, lysosomal storage diseases, mannosidosis, neuraminidase deficiency, obesity, pentosuria phenylketonuria, pseudovitamin D- deficiency rickets; disorders of carbohydrate metabolism such as congenital type II dyserythropoietic anemia, diabetes, insulin-dependent diabetes mellitus, non-insulin-dependent diabetes mellitus, fructose-l,6-diphosphatase deficiency, galactosemia, glucagonoma, hereditary fructose intolerance, hypoglycemia, mannosidosis, neuraminidase deficiency, obesity, galactose epimerase deficiency, glycogen storage diseases, lysosomal storage diseases, fructosuria, pentosuria, and inherited abnormalities of pyruvate metabolism; disorders of lipid metabolism such as fatty liver, cholestasis, primary biliary cirrhosis, carnitine deficiency, carnitine palmitoyltransferase deficiency, 5 myoadenylate deaminase deficiency, hypertriglyceridemia, lipid storage disorders such Fabry's disease, Gaucher's disease, Niemann-Pick' s disease, metachromatic leukodystrophy, adrenoleukodystrophy, GM2 gangliosidosis, and ceroid lipofuscinosis, abetalipoproteinemia, Tangier disease, hyperlipoproteinemia, diabetes mellitus, lipodystrophy, lipomatoses, acute panniculitis, disseminated fat necrosis, adiposis dolorosa, lipoid adrenal hyperplasia, minimal change disease, 0 lipomas, atherosclerosis, hypercholesterolemia, hypercholesterolemia with hypertriglyceridemia, primary hypoalphalipoproteinemia, hypothyroidism, renal disease, liver disease, lecithinxholesterol acyltransferase deficiency, cerebrotendinous xanthomatosis, sitosterolemia, hypocholesterolemia, Tay-Sachs disease, Sandhoff s disease, hyperlipidemia, hyperlipemia, lipid myopathies, and obesity; and disorders of copper metabolism such as Menke's disease, Wilson's disease, and Ehlers-Danlos 5 syndrome type IX; a neurological disorder such as epilepsy, ischemic cerebrovascular disease, stroke, cerebral neoplasms, Alzheimer's disease, Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal disorders, amyotrophic lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural 0 empyema, epidural abscess, suppurative intracranial thrombophlebitis, myelitis and radiculitis, viral central nervous system disease, prion diseases including kuru, Creutzfeldt- Jakob disease, and Gerstmann-Straussler-Scheinker syndrome, fatal familial insomnia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental disorder of the central 5 nervous system, cerebral palsy, a neuroskeletal disorder, an autonomic nervous system disorder, a cranial nerve disorder, a spinal cord disease, muscular dystrophy and other neuromuscular disorder, a peripheral nervous system disorder, dermatomyositis and polymyositis, inherited, metabolic, endocrine, and toxic myopathy, myasthenia gravis, periodic paralysis, a mental disorder including mood, anxiety, and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, o catatonia, diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, postherpetic neuralgia, and Tourette's disorder; a gastrointestinal disorder including ulcerative colitis, gastric and duodenal ulcers, cystinuria, dibasicaminoaciduria, hypercystinuria, lysinuria, hartnup disease, tryptophan malabsorption, methionine malabsorption, histidinuria, iminoglycinuria, dicarboxylicaminoaciduria, cystinosis, renal glycosuria, hypouricemia, familial hypophophatemic 5 rickets, congenital chloridorrhea, distal renal tubular acidosis, Menkes' disease, Wilson's disease, lethal diarrhea, juvenile pernicious anemia, ate malabsorption, adrenoleukodystrophy, hereditary myoglobinuria, and Zellweger syndrome; a transport disorder such as akinesia, amyotrophic lateral sclerosis, ataxia telangiectasia, cystic fibrosis, Becker's muscular dystrophy, Bell's palsy, Charcot- . Marie Tooth disease, diabetes mellitus, diabetes insipidus, diabetic neuropathy, Duchenne muscular dystrophy, hyperkalemic periodic paralysis, normokalemic periodic paralysis, Parkinson's disease, malignant hyperthermia, multidrug resistance, myasthenia gravis, myotonic dystrophy, catatonia, tardive dyskinesia, dystonias, peripheral neuropathy, cerebral neoplasms, prostate cancer, cardiac disorders associated with transport, e.g., angina, bradyarrythmia, tachyarrythmia, hypertension, Long QT syndrome, myocarditis, cardiomyopathy, nemaline myopathy, centronuclear myopathy, lipid myopathy, mitochondrial myopathy, thyrotoxic myopathy, ethanol myopathy, dermatomyositis, inclusion body myositis, infectious myositis, and polymyositis, neurological disorders associated with transport, e.g., Alzheimer's disease, amnesia, bipolar disorder, dementia, depression, epilepsy, Tourette's disorder, paranoid psychoses, and schizophrenia, and other disorders associated with transport, e.g., neurofibromatosis, postherpetic neuralgia, trigeminal neuropathy, sarcoidosis, sickle cell anemia, cataracts, infertility, pulmonary artery stenosis, sensorineural autosomal deafness, hyperglycemia, hypoglycemia, Grave's disease, goiter, glucose-galactose malabsorption syndrome, hypercholesterolemia, Gushing' s disease, and Addison's disease; and a connective tissue disorder such as osteogenesis imperfecta, Ehlers-Danlos syndrome, chondrodysplasias, Marfan syndrome, Alport syndrome, familial aortic aneurysm, achondroplasia, mucopolysaccharidoses, osteoporosis, osteopetrosis, Paget's disease, rickets, osteomalacia, hyperparathyroidism, renal osteodystrophy, osteonecrosis, osteomyelitis, osteoma, osteoid osteoma, osteoblastoma, osteosarcoma, osteochondroma, chondroma, chondroblastoma, chondromyxoid fibroma, chondrosarcoma, fibrous cortical defect, nonossifying fibroma, fibrous dysplasia, fibrosarcoma, malignant fibrous histiocytoma, Ewing's sarcoma, primitive neuroectodermal tumor, giant cell tumor, osteoarthritis, rheumatoid arthritis, ankylosing spondyloarthritis, Reiter's syndrome, psoriatic arthritis, enteropathic arthritis, infectious arthritis, gout, gouty arthritis, calcium pyrophosphate crystal deposition disease, ganglion, synovial cyst, villonodular synovitis, systemic sclerosis, Dupuytren's contracture, hepatic fibrosis, lupus erythematosus, mixed connective tissue disease, epidermolysis bullosa simplex, bullous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. The dithp can be used to detect the presence of, or to quantify the amount of, a dithp-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is established. Alternatively, a polynucleotide complementary to a given dithp can inhibit or inactivate a therapeutically relevant gene related to the dithp.
Analysis of dithp Expression Patterns The expression of dithp may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of dithp expression. For example, the level of expression of dithp may be compared among different cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at 5 different developmental stages, or among cell types or tissues undergoing various treatments. This type of analysis is useful, for example, to assess the relative levels of dithp expression in fully or partially differentiated cells or tissues, to determine if changes in dithp expression levels are correlated with the development or progression of specific disease states, and to assess the response of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. 0 Methods for the analysis of dithp expression are based on hybridization and amplification technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utilize, for example, microarrays, and PCR-based procedures.
Hybridization and Genetic Analysis 5 The dithp, their fragments, or complementary sequences, may be used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The dithp may be hybridized to naturally occurring or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the dithp allows for the detection of nucleic acid sequences, o including genomic sequences, which are identical or related to the dithp of the Sequence Listing.
Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ HD NO: 1-275 and tested for their ability to identify or amplify the target nucleic acid sequence using standard protocols.
Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in 5 SEQ HD NO: 1-275 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in "Definitions."
A probe for use in Southern or northern hybridization may be derived from a fragment of a dithp sequence, or its complement, that is up to several hundred nucleotides in length and is either o single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing dithp. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression correlated with, e.g., various stages of development, treatment with a drug or compound, 5 or disease progression. An array analogous to a dot or slot blot may be used to arrange and link polynucleotides to the surface of a substrate using one or more of the following: mechanical (vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of dithp and may be produced by hand or by using available devices, materials, and machines.
Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g.,
Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251116; Shalon, D. et al.
(1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-
2155; and Heller, MJ. et al. (1997) U.S. Patent No. 5,605,662.)
Probes may be labeled by either PCR or enzymatic techniques using a variety of commercially available reporter molecules. For example, commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies). Alternatively, dithp may be cloned into commercially available vectors for the production of RNA probes. Such probes may be transcribed in the presence of at least one labeled nucleotide (e.g., 3 P-ATP, Amersham Pharmacia Biotech).
Additionally the polynucleotides of SEQ JD NO: 1-275 or suitable fragments thereof can be used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures well known in the art, e.g., cDNA library screening, PCR amplification, etc. The molecular cloning of such full length cDNA sequences may employ the method of cDNA library screening with probes using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6. These procedures may also be employed with genomic libraries to isolate genomic sequences of dithp in order to analyze, e.g., regulatory elements.
Genetic Mapping
Gene identification and mapping are important in the investigation and treatment of almost all conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder. For example, cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream, and diabetes may result when a particular individual's immune system is activated by an infection and attacks the insulin-producing cells of the pancreas. In some studies, Alzheimer's disease has been linked to a gene on chromosome 21; other studies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generally proceeds from genetic linkage analysis to physical mapping.
As a condition is noted among members of a family, a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. (See, for example, Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) Occasionally, genetic markers and their locations are known from previous studies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online Mendelian 5 Inheritance in Man (OMEVI) World Wide Web site. i another embodiment of the invention, dithp sequences may be used to generate hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences. Either coding or noncoding sequences of dithp may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of a dithp coding o sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J.J. 5 et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7: 127-134; and Trask, BJ. (1991) Trends Genet. 7:149-154.)
Fluorescent in situ hybridization (FISH) may be correlated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of dithp on a physical chromosomal map and a specific disorder, or a o predisposition to a specific disorder, may help define the region of DNA associated with that disorder. The dithp sequences may also be used to detect polymorphisms that are genetically linked to the inheritance of a particular condition, disease, or disorder.
In situ hybridization of chromosomal preparations and genetic mapping techniques, such as linkage analysis using established chromosomal markers, may be used for extending existing genetic 5 maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the number or arm of the corresponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated 0 by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to llq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, inversion, etc., among normal, carrier, or affected individuals. 5 Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease. This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are well known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
Diagnostic Uses
The dithp of the present invention may be used to design probes useful in diagnostic assays. Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of dithp expression. Labeled probes developed from dithp sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, dithp, or fragments or oligonucleotides derived from dithp, may be used as primers in amplification steps prior to hybridization. The amount of hybridization complex formed is quantified and compared with standards for that cell or tissue. If dithp expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
The probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of dithp expression, or to evaluate the efficacy of a particular therapeutic treatment. The candidate probe may be identified from the dithp that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of an individual patient. In a typical process, standard expression is established by methods well known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods well known to those skilled in the art may be use to determine the significance of such therapeutic agents.
The polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA. The polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be 5 sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique HD database is established for an individual, positive identification of that individual can be made from extremely small tissue samples.
In a particular aspect, oligonucleotide primers derived from the dithp of the invention may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and 0 deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from dithp are used to amplify DNA using the polymerase chain reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like. SNPs in the 5 DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high-throughput equipment such as DNA sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP (isSNP), are capable of identifying polymorphisms 0 by comparing the sequences of individual overlapping DNA fragments which assemble into a p common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing errors using statistical models and automated analyses of DNA sequence chromatograms. hi the alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASS ARRAY system (Sequenom, Inc., 5 San Diego CA).
DNA-based identification techniques are critical in forensic technology. DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. (1992) PCR Technology, Freeman and Co., New York, NY). Similarly, polynucleotides of the o present invention can be used as polymorphic markers.
There is also a need for reagents capable of identifying the source of a particular tissue. Appropriate reagents can comprise, for example, DNA probes or primers prepared from the sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to 5 screen tissue cultures for contamination.
The polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, and as an antigen to elicit an immune response.
Disease Model Systems Using dithp
The dithp of the invention or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells. Such techniques are well known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent Number 5, 175,383 and U.S. Patent Number 5,767,337.) For example, mouse ES cells, such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292). The vector integrates into the corresponding region of the host genome by homologous recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, JD. (1996) Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. The dithp of the invention may also be manipulated in vitro in ES cells derived from human blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
The dithp of the invention can also be used to create "knockin" humanized animals (pigs) or transgenic animals (mice or rats) to model human disease. With knockin technology, a region of dithp is injected into animal ES cells, and the injected sequence integrates into the animal cell genome. Transformed cells are injected into blastulae, and the blastulae are implanted as described above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to overexpress dithp, resulting, e.g., in the secretion of DITHP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
Screening Assays DITHP encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule. Examples of such molecules include antibodies, 5 oligonucleotides, proteins (e.g., receptors), or small molecules.
Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et al., (1991) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, o e.g., the active site. In either case, the molecule can be rationally designed using known techniques. Preferably, the screening for these molecules involves producing appropriate cells which express the polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from mammals, yeast, Drosophila, or E. coli. Cells expressing the polypeptide or cell membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, 5 stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
Additionally, the assay can be carried out using cell-free preparations, polypeptide/molecule o affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure 5 polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
All of the above assays can be used in a diagnostic or prognostic context. The molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the o assays can discover agents which may inhibit or enhance the production of the polypeptide from suitably manipulated cells or tissues.
Transcript Imaging and Toxicological Testing
Another embodiment relates to the use of dithp to develop a transcript image of a tissue or 5 cell type. A transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incorporated by reference herein.) Thus a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type. In one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray. The resultant transcript image would provide a profile of gene activity pertaining to human molecules for diagnostics and therapeutics.
Transcript images which profile dithp expression may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect dithp expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line.
Transcript images which profile dithp expression may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N.L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties. These fingerprints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in interpretation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatures which leads to prediction of toxicity. (See, for example, Press Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 2000, available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore, it is important and desirable in toxicological screening using toxicant signatures to include all expressed gene sequences.
In one embodiment, the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified. The transcript levels in the treated biological sample are compared with levels fr an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
Another particular embodiment relates to the use of DITHP encoded by polynucleotides of the present invention to analyze the proteome of a tissue or cell type. The term proteome refers to the global pattern of protein expression in a particular tissue or cell type. Each protein component of a proteome can be subjected individually to further analysis. Proteome expression patterns, or profiles, are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time. A profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type. In one embodiment, the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra). The proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot is generally proportional to the level of the protein in the sample. The optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment. The proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry. The identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
A proteomic profile may also be generated using antibodies specific for DITHP to quantify the levels of DITHP expression. In one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-11; Mendoze, L.G. et al. (1999) Biotechniques 27:778-88). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a tl iol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level. There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile. In addition, the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
In another embodiment, the toxicity of a test compound is assessed by treating a biological 5 sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. Individual proteins are identified by sequencing o the amino acid residues of the individual proteins and comparing these partial sequences to the DITHP encoded by polynucleotides of the present invention. another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the DITHP encoded by polynucleotides of the present invention. 5 The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
Transcript images may be used to profile dithp expression in distinct tissue types. This o process can be used to determine human molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of dithp expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor the efficacy of drug treatments for diseases which affect the activity of human molecules. 5 Transcript images of cell lines can be used to assess human molecule activity and/or to identify cell lines that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, and a transcript image following treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in human molecule activity. o Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
Antisense Molecules
The polynucleotides of the present invention are useful in antisense technology. Antisense 5 technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics, Humana Press h e, Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3): 171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40:1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12): 1055-1063; and Lavrosky, Y. et al. (1997) Biochem. Mol. Med. 62(1): 11-22.) An antisense sequence is a polynucleotide sequence capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. (1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression occurs through hybridization or binding of complementary base pairs. Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double helix.
The polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by dithp. The antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (Applied Biosystems) or other automated systems known in the art. Antisense sequences can also be produced biologically, such as by transforming an appropriate host cell with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.) h therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used. Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein. (See, e.g., Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K.J., et al. (1995) 9(13): 1288-1296.) Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 76:271; Ausubel, F.M. et al. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, New York NY; Uckert, W. and W. Walther (1994) Pharmacol. Ther. 63(3):323-347.) Other gene delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems known in the art. (See, e.g., Rossi, J.J. (1995) Br. Med. Bull. 51(l):217-225; Boado, R.J. et al. (1998) J. Pharm. Sci. 87(11):1308-1315; and Morris, M.C et al. (1997) Nucleic Acids Res. 25(14):2730- 2736.)
Expression hi order to express a biologically active DITHP, the nucleotide sequences encoding DITHP or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host. Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding DITHP and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
A variety of expression vector/host systems may be utilized to contain and express sequences encoding DITHP. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammalian) cell systems. (See, e.g., Sambrook, supra; Ausubel, 1995, supra, Van Heeke, G. and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, G.A. et al. (1987) Methods Enzymol. 153:516-544; Scorer, CA. et al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; The McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New York NY, pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. USA 90(13):6340-6344; Buller, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al. (1994) Mol. Immunol. 31(3):219-226; and Verma, LM. and N. Somia (1997) Nature 389:239-242.) The invention is not limited by the host cell employed.
For long term production of recombinant proteins in mammalian systems, stable expression of DITHP in cell lines is preferred. For example, sequences encoding DITHP can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems may be used to recover transformed cell fines. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14; Hartman, S.C. and R.CMulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051; Rhodes, CA. (1995) Methods Mol. Biol. 55:121-131.) Therapeutic Uses of dithp
The dithp of the invention may be used for somatic or germline gene therapy. Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCHD)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207- 216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from Factor VHI or Factor IX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, LM. and Somia, N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the case of cancers which result from unregulated cell proliferation), or (iii) express a protein which affords protection against intracellular parasites (e.g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA. 93: 11395-11399), hepatitis B or C virus (HBV, HCV); fungal parasites, such as Candida albicans and Paracoccidioides brasiliensis; and protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi). In the case where a genetic deficiency in dithp expression or regulation causes disease, the expression of dithp from an appropriate population of transduced cells may alleviate the clinical manifestations caused by the genetic deficiency. In a further embodiment of the invention, diseases or disorders caused by deficiencies in dithp are treated by constructing mammalian expression vectors comprising dithp and introducing these vectors by mechanical means into dithp-deficient cells. Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
Expression vectors that may be effective for the expression of dithp include, but are not limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SCRJPT, PCMV-TAG, PEGSH PERV (Stratagene, La Jolla CA), and PTET-OFF,
PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The dithp of the invention may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or β-actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.V. and Blau, H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX plasmid (Invitrogen); the ecdysone-inducible promoter (available in the plasmids PVGRXR and PIND; Invitrogen); the FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter (Rossi, F.M.V. and Blau, H.M. supra), or (iii) a tissue-specific promoter or the native promoter of the endogenous gene encoding DITHP from a normal individual. Commercially available liposome transformation kits (e.g., the PERFECT LIPED
TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver polynucleotides to target cells in culture and require minimal effort to optimize experimental parameters. In the alternative, transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845). The introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
In another embodiment of the invention, diseases or disorders caused by genetic defects with respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) dithp under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences required for efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 92:6733-6737), incorporated by reference herein. The vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, AD. (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol. 72:9873-9880). U.S. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovirus packaging cell lines producing high transducing efficiency retroviral supernatant") discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4+ T-cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M . (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. U.S.A. 95:1201-1206; Su, L. (1997) Blood 89:2283-2290).
In the alternative, an adenovirus-based gene therapy delivery system is used to deliver dithp to cells which have one or more genetic abnormalities with respect to the expression of dithp. The construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, LM. and Somia, N. (1997) Nature 18:389:239-242, both incorporated by reference herein. 5 In another alternative, a herpes-based, gene therapy delivery system is used to deliver dithp to target cells which have one or more genetic abnormalities with respect to the expression of dithp. The use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing dithp to cells of the central nervous system, for which HSV has a tropism. The construction and packaging of herpes-based vectors are well known to those with ordinary skill in the art. A 0 replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res.169:385-395). The construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference. U.S. Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists 5 of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for purposes including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163: 152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, o the generation of recombinant virus following the transfection of multiple plasmids containing different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art. In another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to deliver dithp to target cells. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), 5 has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic RNA replicates to higher levels than the full-length genomic RNA, resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). o Similarly, inserting dithp into the alphavirus genome in place of the capsid-coding region results in the production of a large number of dithp RNAs and the synthesis of high levels of DITHP in vector transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the 5 needs of the gene therapy application (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host range of alphaviruses will allow the introduction of dithp into a variety of cell types. The specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction. The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
Antibodies
Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For descriptions of and protocols of antibody technologies, see, e.g., Pound JD. (1998) Immunochemical Protocols, Humana Press, Totowa, NJ.
The amino acid sequence encoded by the dithp of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus, the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation. Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra. Chapter 11.7). Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids, preferably at least 10 amino acids, and most preferably at least 15 amino acids. A peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet hemocyanin (KLH; Sigma, St. Louis MO) for antibody production. A peptide encompassing an antigenic region may be expressed from a dithp, synthesized as described above, or purified from human cells.
Procedures well known in the art may be used for the production of antibodies. Various hosts including mice, goats, and rabbits, may be immunized by injection with a peptide. Depending on the host species, various adjuvants may be used to increase immunological response.
In one procedure, peptides about 15 residues in length may be synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra). Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- rabbit IgG. Antisera with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting. In another procedure, isolated and purified peptide may be used to immunize mice (about 100 μg of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST, Becton-Dickinson, Palo Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1% BSA and washed and exposed to supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 mg/ml. Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several procedures for the production of monoclonal antibodies, including in vitro production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
Antibody fragments containing specific binding sites for an epitope may also be generated. For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra, Chaps. 45-47). Antibodies generated against polypeptide encoded by dithp can be used to purify and characterize full-length DITHP protein and its activity, binding partners, etc.
Assays Using Antibodies
Anti-DITHP antibodies may be used in assays to quantify the amount of DITHP found in a particular human cell. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions. The peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes between the DLTHP and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following preferred specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever.
The disclosures of all patents, applications, and publications mentioned above and below, 5 including U.S. Ser. No. 60/230,517, U.S. Ser. No. 60/230,599, U.S. Ser. No. 60/230,514, U.S. Ser. No. 60/231,167, U.S. Ser. No. 60/230,598, U.S. Ser. No. 60/230,988, U.S. Ser. No. 60/230,518, U.S. Ser. No. 60/230,515, U.S. Ser. No. 60/229,751, U.S. Ser. No. 60/230,610, U.S. Ser. No. 60/229,749, U.S. Ser. No. 60/229,750, U.S. Ser. No. 60/230,597, U.S. Ser. No. 60/230,505, U.S. Ser. No. 60/231,163, U.S. Ser. No. 60/229,747, U.S. Ser. No. 60/229,748, U.S. Ser. No. 60/230,583, U.S. Ser. o No. 60/230,519, U.S. Ser. No. 60/230,595, U.S. Ser. No. 60/230,865, and U.S. Ser. No. 60/230,951, are hereby expressly incorporated by reference.
EXAMPLES I. Construction of cDNA Libraries 5 RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was o precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods.
Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA purity. In most cases, RNA was treated with DNase. For most libraries, poly(A+) RNA was isolated using oligo d(T)-coupled paramagnetic particles (Promega Corporation (Promega), Madison WI), OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA 5 purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX). some cases, Stratagene was provided with RNA and constructed the corresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene Cloning Systems, ie. (Stratagene), La Jolla CA) or SUPERSCRIPT o plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000, 5 SEPHAROSE CL2B , or SEPHAROSE CL4B column chromatography (Amersham Pharmacia
Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRJPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (Invitrogen, Carlsbad CA), PBK-CMV plasmid (Stratagene), or pINCY (Incyte Genomics, Palo Alto CA), or derivatives thereof. Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XL1- 5 BlueMRF, or SOLR from Stratagene or DH5α, DH10B, or ElectroMAX DH10B from Life Technologies.
II. Isolation of cDNA Clones
Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system 0 (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or 5 without lyophilization, at 4 ° C.
Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically o using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN H fluorescence scanner (Labsystems Oy, Helsinki, Finland).
III. Sequencing and Analysis cDNA sequencing reactions were processed using standard methods or high-throughput 5 instrumentation such as the ABI CATALYST 800 thermal cycler (Applied Biosystems) or the PTC- 200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Corp., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing 0 ready reaction kit (Applied Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (Applied Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using 5 standard methods (reviewed in Ausubel, 1997, supra, Chapter 7.7). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VHI. IV. Assembly and Analysis of Sequences
Component sequences from chromatograms were subject to PHRED analysis and assigned a quality score. The sequences having at least a required quality score were subject to various preprocessing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, polyA 5 tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaller than 50 base pairs. In particular, low-information sequences and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) were replaced by "n's", or masked, to prevent spurious matches.
Processed sequences were then subject to assembly procedures in which the sequences were 0 assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bins using BLASTn (v.1.4 WashU) and CROSSMATCH. Candidate pairs were identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local identity were accepted into the bin. The component sequences from each bin were 5 assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed o herein. The component sequences which were used to assemble each template consensus sequence are listed in Table 5, along with their positions along the template nucleotide sequences.
Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-split. Assembled templates were also subject to analysis by 5 STLTCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
Once gene bins were generated based upon sequence alignments, bins were clone joined 0 based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actually belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
The final assembled templates were subsequently annotated using the following procedure. 5 Template sequences were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version
124). "Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 100% local identity over 100 base pairs, or a homolog match having an E-value, i.e. a probability score, of < 1 x 10"8. The hits were subject to framesbift FASTx versus GENPEPT (GenBank version 124). (See Table 8). In this analysis, a homolog match was defined as having an E-value of < 1 x 10"8. The assembly method used above was described in "System and Methods for 5 Analyzing Biomolecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user manual (Incyte) both incorporated by reference herein.
Following assembly, template sequences were subjected to motif, BLAST, and functional analyses, and categorized in protein hierarchies using methods described in, e.g., "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 0 08/812,290, filed March 6, 1997; "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing ^formation Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incorporated by reference herein. 5 The template sequences were further analyzed by translating each template in all three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein families and domains using the HMMER software package (available to the public from Washington University School of Medicine, St. Louis MO). Regions of templates which, when translated, contain similarity to Pfam consensus sequences are reported in Table 3, along with o descriptions of Pfam protein domains and families. Only those Pfam hits with an E-value of ≤ 1 x IO"3 are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam protein domains and families.)
Additionally, the template sequences were translated in all three forward reading frames, and each translation was searched against hidden Markov models for signal peptides using the HMMER 5 software package. Construction of hidden Markov models and their usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. Biol. 6:361-365.) Only those signal peptide hits with a cutoff score of 11 bits or greater are reported. A cutoff score of 11 bits or greater corresponds to at least about 91-94% true-positives in signal peptide prediction. Template sequences were also translated in all three forward reading frames, and each translation was searched o against TMAP, a program that uses weight matrices to delineate transmembrane segments on protein sequences and determine orientation, with respect to the cell cytosol (Persson, B. and P. Argos (1994) J. Mol. Biol. 237: 182-192; Persson, B. and P. Argos (1996) Protein Sci. 5:363-371). Regions of templates which, when translated, contain similarity to signal peptide or transmembrane consensus sequences are reported in Table 4. 5 The results of HMMER analysis as reported in Tables 3 and 4 may support the results of
BLAST analysis as reported in Table 2 or may suggest alternative or additional properties of template-encoded polypeptides not previously uncovered by BLAST or other analyses.
Template sequences are further analyzed using the bioinformatics tools listed in Table 8, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). 5 Template sequences may be further queried against public databases such as the GenBank rodent, mammalian, vertebrate, prokaryote, and eukaryote databases.
The template sequences were translated to derive the corresponding longest open reading frame as presented by the polypeptide sequences as reported in Table 7. Alternatively, a polypeptide of the invention may begin at any of the methionine residues within the full length translated o polypeptide. Polypeptide sequences were subsequently analyzed by querying against the GenBank protein database (GENPEPT, (GenBank version 124)). Full length polynucleotide sequences are also analyzed using MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Polynucleotide and polypeptide sequence alignments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the 5 MEGALIGN multisequence alignment program (DNASTAR), which also calculates the percent identity between aligned sequences.
Table 7 shows sequences with homology to the polypeptides of the invention as identified by BLAST analysis against the GenBank protein (GENPEPT) database. Column 1 shows the polypeptide sequence identification number (SEQ HD NO:) for the polypeptide segments of the o invention. Column 2 shows the reading frame used in the translation of the polynucleotide sequences encoding the polypeptide segments. Column 3 shows the length of the translated polypeptide segments. Columns 4 and 5 show the start and stop nucleotide positions of the polynucleotide sequences encoding the polypeptide segments. Column 6 shows the GenBank identification number (GI Number) of the nearest GenBank homolog. Column 7 shows the probability score for the match 5 between each polypeptide and its GenBank homolog. Column 8 shows the annotation of the
GenBank homolog.
V. Analysis of Polynucleotide Expression
Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs o from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel,
1995, supra, ch. 4 and 16.)
Analogous computer techniques applying BLAST were used to search for identical or related molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the 5 computer search can be modified to determine whether any particular match is categorized as exact or similar. The basis of the search is the product score, which is defined as: BLAST Score x Percent Identity
5 x minimum {length(Seq. 1), length(Seq. 2)}
The product score takes into account both the degree of similarity between two sequences and the length of the sequence match. The product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The product score represents a balance between fractional overlap and quality in a BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
VI. Tissue Distribution Profiling A tissue distribution profile is determined for each template by compiling the cDNA library tissue classifications of its component cDNA sequences. Each component sequence, is derived from a cDNA library constructed from a human tissue. Each human tissue is classified into one of the following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, component sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD database (Incyte Genomics, Palo Alto CA).
Table 6 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of ≥ 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of <10% in all tissue categories.
VII. Transcript Image Analysis
Transcript images are generated as described in Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference.
VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA
Oligonucleotide primers designed using a dithp of the Sequence Listing are used to extend the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template. The initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations are avoided. Selected human cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed.
High fidelity amplification is obtained by PCR using methods well known in the art. PCR is performed in 96-well plates using the PTC-200 thermal cycler (MJ Research). The reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg2+, (NH4)2S04, and β- mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C In the alternative, the parameters for primer pair T7 and SK+ are as follows: Step 1: 94 °C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C.
The concentration of DNA in each well is determined by dispensing 100 μl PICOGREEN quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 μl of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Incorporated
(Corning), Corning NY), allowing the DNA to bind to the reagent. The plate is scanned in a FLUOROSKAN H (Labsystems Oy) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence. The extended nucleotides are desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector
(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on antibiotic-containing media, individual colonies are picked and cultured overnight at 37 °C in 384- well plates in LB/2x carbenicillin liquid media.
The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham 5 Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified ~ using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1:2, o v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems). h like manner, the dithp is used to obtain regulatory sequences (promoters, introns, and enhancers) using the procedure above, oligonucleotides designed for such extension, and an 5 appropriate genomic library.
IX. Labeling of Probes and Southern Hybridization Analyses
Hybridization probes derived from the dithp of the Sequence Listing are employed for screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 0 1000 nucleotides in length is specifically described, but essentially the same procedure may be used with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, γ32P-ATP, and 0.5X One-Phor-AU Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The probe mixture is diluted to IO7 dpm/μg/ml hybridization buffer and used in a typical membrane-based 5 hybridization analysis.
The DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel. The DNA fragments are transferred from the agarose to nylon membrane (NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and o hybridization is carried out overnight at 68 °C. To remove non-specific signals, blots are sequentially washed at room temperature under increasingly stringent conditions, up to O.lx saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHOPJMAGER cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and experimental lanes are compared. Essentially the same procedure is employed when 5 screening RNA. X. Chromosome Mapping of dithp
The cDNA sequences which were used to assemble SEQ HD NO: 1-275 are compared with sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ HD NO: 1-275 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 8). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped. Inclusion of a mapped sequence in a cluster will result in the assignment of all sequences of that cluster, including its particular SEQ HD NO:, to that map location. The genetic map locations of SEQ HD NO: 1-275 are described as ranges, or intervals, of human chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) The cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
XI. Microarray Analysis Probe Preparation from Tissue or Cell Samples
Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA+ RNA is purified using the oligo (dT) cellulose method. Each polyA+ RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/μl oligo-dT primer (21mer), IX first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μM dGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription reaction is performed in a 25 ml volume containing 200 ng polyA+ RNA with GEMBRIGHT kits (Incyte). Specific control polyA4" RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpublished). As quantitative controls, the control mRNAs at 0.002 ng, 0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1:100,000, 1 : 10,000, 1 : 1000, 1 : 100 (w/w) to sample mRNA respectively. The control mRNAs are diluted into reverse transcription reaction at ratios of 1:3, 3:1, 1:10, 10:1, 1:25, 25:1 (w/w) to sample mRNA differential expression patterns. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.
(CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 μl 5X SSC/0.2% SDS.
Microarray Preparation
Sequences of the present invention are used to generate array elements. Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts. PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert. Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 μg. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
Purified array elements are immobilized on polymer-coated glass slides. Glass microscope slides (Coming) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven.
Array elements are applied to the coated glass substrate using a procedure described in US Patent No. 5,807,522, incorporated herein by reference. 1 μl of the array element DNA, at an average concentration of 100 ng/μl, is loaded into the open capillary printing element by a high-speed robotic apparatus. The apparatus then deposits about 5 nl of array element sample per slide.
Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 0.2% SDS and distilled water as before.
Hybridization
Hybridization reactions contain 9 μl of probe mixture consisting of 0.2 μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm2 coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 μl of 5x SSC in a corner of the chamber. The chamber containing the arrays is incubated for about 6.5 hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45° C in a second wash buffer (0.1X SSC), and dried. Detection
Reporter-labeled hybridization complexes are detected with a microscope equipped with an Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY). The slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477,
Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate filters positioned between the array and the photomultiplier tubes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously. The sensitivity of the scans is typically calibrated using the signal intensity generated by a cDNA control species added to the probe mix at a known concentration. A specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1 : 100,000. When two probes from different sources (e.g., representing test and control cells), each labeled with a different fluorophore, are hybridized to a single array for the purpose of identifying genes that are differentially expressed, the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC computer. The digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore 's emission spectrum.
A grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid. The fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal. The software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
XII. Complementary Nucleic Acids Sequences complementary to the dithp are used to detect, decrease, or inhibit expression of the naturally occurring nucleotide. The use of oligonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaller or larger sequence fragments can also be used. Appropriate oligonucleotides are designed from the dithp using OLIGO 4.06 software (National Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding and processing of the transcript.
XIII. Expression of DITHP
Expression and purification of DITHP is accomplished using bacterial or virus-based expression systems. For expression of DLTHP in bacteria, cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3). Antibiotic resistant bacteria express DLTHP upon induction with isopropyl beta-D- thiogalactopyranoside (EPTG). Expression of DITHP in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding DLTHP by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.)
In most expression systems, DITHP is synthesized as a fusion protein with, e.g., glutathione S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- kilodalton enzyme from Schistosoma iaponicum, enables the purification of fusion proteins on immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from DLTHP at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra. Chapters 10 and 16). Purified DLTHP obtained by these methods can be used directly in the following activity assay.
5 XIV. Demonstration of DITHP Activity
DLTHP activity is demonstrated through a variety of specific assays, some of which are outlined below.
Oxidoreductase activity of DLTHP is measured by the increase in extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of oxidation activity, or the decrease in o extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858). One of three substrates may be used: Asn-βGal, biocytidine, or ubiquinone-10. The respective subunits of the enzyme reaction, for example, cytochtome crb oxidoreductase and cytochrome c, are reconstituted. The reaction mixture contains a)l-2 mg/ml DITHP; and b) 15 mM substrate, 2.4 mM NAD(P)+ in 0.1 M phosphate buffer, pH 7.1 5 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 ( reduction reaction); in a total volume of 0.1 ml. Changes in absorbance at 340 nm (A340) are measured at 23.5 ° C using a recording spectrophotometer (Shimadzu Scientific Instruments, Inc., Pleasanton CA). The amount of NAD(P)H is stoichiometrically equivalent to the amount of substrate initially present, and the change in A340 is a direct measure of the amount of NAD(P)H produced; ΔA340 = 6620[NADH]. o Oxidoreductase activity of DITHP activity is proportional to the amount of NAD(P)H present in the assay.
Transferase activity of DITHP is measured through assays such as a methyl transferase assay in which the transfer of radiolabeled methyl groups between a donor substrate and an acceptor substrate is measured (Bokar, J.A. et al. (1994) J. Biol. Chem. 269:17697-17704). Reaction mixtures 5 (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 μCi [met/YyZ-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DITHP, and acceptor substrate (0.4 μg [35S]RNA or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. The products are separated by chromatography or electrophoresis and the level of methyl transferase activity is o determined by quantification of methyl-3Ji recovery.
DITHP hydrolase activity is measured by the hydrolysis of appropriate synthetic peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absorption of the released chromophore. (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New 5 York NY, pp. 25-55) Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases), animopeptidase (leucine aminopeptidase), or carboxypeptidase (Carboxypeptidase A and B, procollagen C-proteinase). •
DITHP isomerase activity such as peptidyl prolyl cis/trans isomerase activity can be assayed by an enzyme assay described by Rahfeld, J.U., et al. (1994) (FEBS Lett. 352: 180-184). The assay is performed at 10 °C in 35 mM HEPES buffer, pH 7.8, containing chymotrypsin (0.5 mg/ml) and 5 DITHP at a variety of concentrations. Under these assay conditions, the substrate, Suc-Ala-Xaa-Pro- Phe-4-NA, is in equilibrium with respect to the prolyl bond, with 80-95% in trans and 5-20% in cis conformation. An aliquot (2 ul) of the substrate dissolved in dimethyl sulf oxide (10 mg/ml) is added to the reaction mixture described above. Only the cis isomer of the substrate is a substrate for cleavage by chymotrypsin. Thus, as the substrate is isomerized by DITHP, the product is cleaved by o chymotrypsin to produce 4-nitroanilide, which is detected by it's absorbance at 390 nm. 4- Nitroanilide appears in a time-dependent and a DITHP concentration-dependent manner.
An assay for DITHP activity associated with growth and development measures cell proliferation as the amount of newly initiated DNA synthesis in Swiss mouse 3T3 cells. A plasmid containing polynucleotides encoding DITHP is transfected into quiescent 3T3 cultured cells using 5 methods well known in the art. The transiently transfected cells are then incubated in the presence of [3H]thymidine, a radioactive DNA precursor. Where applicable, varying amounts of DITHP ligand are added to the transfected cells. Incorporation of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. o Growth factor activity of DLTHP is measured by the stimulation of DNA synthesis in Swiss mouse 3T3 cells (McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY). Initiation of DNA synthesis indicates the cells' entry into the mitotic cycle and their commitment to undergo later division. 3T3 cells are competent to respond to most growth factors, not only those that are mitogenic, but also those that are involved in embryonic 5 induction. This competence is possible because the in vivo specificity demonstrated by some growth factors is not necessarily inherent but is determined by the responding tissue. In this assay, varying amounts of DITHP are added to quiescent 3T3 cultured cells in the presence of [3H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incorporation of [3H]thymidine into acid-precipitable DNA is measured o over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of growth factor activity. One unit of activity per milliliter is defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [3H] thymidine into acid-precipitable DNA. 5 Alternatively, an assay for cytokine activity of DITHP measures the proliferation of leukocytes. In this assay, the amount of tritiated thymidine incorporated into newly synthesized DNA is used to estimate proliferative activity. Varying amounts of DITHP are added to cultured leukocytes, such as granulocytes, monocytes, or lymphocytes, in the presence of [3H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incorporation of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of DITHP activity. One unit of activity per milliliter is conventionally defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [3H]thymidine into acid-precipitable DNA. An alternative assay for DITHP cytokine activity utilizes a Boyden micro chamber
(Neuroprobe, Cabin John MD) to measure leukocyte chemotaxis (Vicari, supra). In this assay, about 105 migratory cells such as macrophages or monocytes are placed in cell culture media in the upper compartment of the chamber. Varying dilutions of DITHP are placed in the lower compartment. The two compartments are separated by a 5 or 8 micron pore polycarbonate filter (Nucleopore, Pleasanton CA). After incubation at 37 °C for 80 to 120 minutes, the filters are fixed in methanol and stained with appropriate labeling agents. Cells which migrate to the other side of the filter are counted using standard microscopy. The chemotactic index is calculated by dividing the number of migratory cells counted when DITHP is present in the lower compartment by the number of migratory cells counted when only media is present in the lower compartment. The chemotactic index is proportional to the activity of DITHP.
Alternatively, cell lines or tissues transformed with a vector containing dithp can be assayed for DITHP activity by immunoblotting. Cells are denatured in SDS in the presence of β- mercaptoethanol, nucleic acids removed by ethanol precipitation, and proteins purified by acetone precipitation. Pellets are resuspended in 20 mM tris buffer at pH 7.5 and incubated with Protein G- Sepharose pre-coated with an antibody specific for DITHP. After washing, the Sepharose beads are boiled in electrophoresis sample buffer, and the eluted proteins subjected to SDS-PAGE. The SDS- PAGE is transferred to a nitrocellulose membrane for immunoblotting, and the DITHP activity is assessed by visualizing and quantifying bands on the blot using the antibody specific for DITHP as the primary antibody and 1 5I-labeled IgG specific for the primary antibody as the secondary antibody. DITHP kinase activity is measured by phosphorylation of a protein substrate using γ-labeled
[32P]-ATP and quantitation of the incorporated radioactivity using a radioisotope counter. DITHP is incubated with the protein substrate, [32P]-ATP, and an appropriate kinase buffer. The [32P] incorporated into the product is separated from free [32P]-ATP by electrophoresis and the incorporated [32P] is counted. The amount of [32P] recovered is proportional to the kinase activity of DITHP in the assay. A determination of the specific amino acid residue phosphorylated is made by phosphoamino acid analysis of the hydrolyzed protein. In the alternative, DITHP activity is measured by the increase in cell proliferation resulting from transformation of a mammalian cell line such as COS7, HeLa or CHO with an eukaryotic expression vector encoding DITHP. Eukaryotic expression vectors are commercially available, and the techniques to introduce them into cells are well known to those skilled in the art. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the cell line to allow expression of DITHP. Phase microscopy is then used to compare the mitotic index of transformed versus control cells. An increase in the mitotic index indicates DITHP activity.
In a further alternative, an assay for DITHP signaling activity is based upon the ability of GPCR family proteins to modulate G protein-activated second messenger signal transduction pathways (e.g., cAMP; Gaudin, P. et al. (1998) J. Biol. Chem. 273:4990-4996). A plasmid encoding full length DITHP is transfected into a mammalian cell line (e.g., Chinese hamster ovary (CHO) or human embryonic kidney (HEK-293) cell lines) using methods well-known in the art. Transfected cells are grown in 12-well trays in culture medium for 48 hours, then the culture medium is discarded, and the attached cells are gently washed with PBS. The cells are then incubated in culture medium with or without ligand for 30 minutes, then the medium is removed and cells lysed by treatment with 1 M perchloric acid. The cAMP levels in the lysate are measured by radioimmunoassay using methods well-known in the art. Changes in the levels of cAMP in the lysate from cells exposed to ligand compared to those without ligand are proportional to the amount of DITHP present in the transfected cells. Alternatively, an assay for DITHP protein phosphatase activity measures the hydrolysis of P- nitrophenyl phosphate (PNPP). DITHP is incubated together with PNPP in HEPES buffer pH 7.5, in the presence of 0.1% β-mercaptoethanol at 37°C for 60 min. The reaction is stopped by the addition of 6 ml of 10 N NaOH, and the increase in light absorbance of the reaction mixture at 410 nm resulting from the hydrolysis of PNPP is measured using a spectrophotometer. The increase in light absorbance is proportional to the phosphatase activity of DITHP in the assay (Diamond, R.H. et al (1994) Mol Cell Biol 14:3752-3762).
An alternative assay measures DITHP-mediated G-protein signaling activity by monitoring the mobilization of Ca++ as an indicator of the signal transduction pathway stimulation. (See, e.g., Grynkievicz, G. et al. (1985) J. Biol. Chem. 260:3440; McColl, S. et al. (1993) J. Immunol. 150:4550-4555; and Aussel, C. et al. (1988) J. Immunol. 140:215-220). The assay requires preloading neutrophils or T cells with a fluorescent dye such as FURA-2 or BCECF (Universal Imaging Corp, Westchester PA) whose emission characteristics are altered by Ca -1" binding. When the cells are exposed to one or more activating stimuli artificially (e.g., anti-CD3 antibody ligation of the T cell receptor) or physiologically (e.g., by allogeneic stimulation), Ca^ flux takes place. This flux can be observed and quantified by assaying the cells in a fluorometer or fluorescent activated cell sorter. Measurements of Ca** flux are compared between cells in their normal state and those transfected with DITHP. Increased Ca++ mobilization attributable to increased DITHP concentration is proportional to DITHP activity.
DLTHP transport activity is assayed by measuring uptake of labeled substrates into Xenopus laevis oocytes. Oocytes at stages V and VI are injected with DLTHP mRNA (10 ng per oocyte) and incubated for 3 days at 18°C in OR2 medium (82.5mM NaCl, 2.5 mM KC1, lmM CaCl2, lmM MgCl2, lmM Na2HP04, 5 mM Hepes, 3.8 mM NaOH, 50μg/ml gentamycin, pH 7.8) to allow expression of DLTHP protein. Oocytes are then transferred to standard uptake medium (lOOmM NaCl, 2 mM KC1, lmM CaCl2, lmM MgCl2, 10 mM Hepes/Tris pH 7.5). Uptake of various substrates (e.g., amino acids, sugars, drugs, ions, and neurotransmitters) is initiated by adding labeled substrate (e.g. radiolabeled with 3H, fluorescently labeled with rhodamine, etc.) to the oocytes. After incubating for 30 minutes, uptake is terminated by washing the oocytes three times in Na+-free medium, measuring the incorporated label, and comparing with controls. DLTHP transport activity is proportional to the level of internalized labeled substrate.
DITHP transferase activity is demonstrated by a test for galactosyltransferase activity. This can be determined by measuring the transfer of radiolabeled galactose from UDP-galactose to a GlcNAc-terminated oligosaccharide chain (Kolbinger, F. et al. (1998) J. Biol. Chem. 273:58-65). The sample is incubated with 14 μl of assay stock solution (180 mM sodium cacodylate, pH 6.5, 1 mg/ml bovine serum albumin, 0.26 mM UDP-galactose, 2 μl of UDP-[Η]galactose), 1 μl of MnCl2 (500 mM), and 2.5 μl of GlcNAcβO-(CH2)8-C02Me (37 mg/ml in dimethyl sulfoxide) for 60 minutes at 37 °C The reaction is quenched by the addition of 1 ml of water and loaded on a C18 Sep-Pak cartridge (Waters), and the column is washed twice with 5 ml of water to remove unreacted UDP- PHJgalactose. The [Ηjgalactosylated GlcNAcβO-(CH2)8-C02Me remains bound to the column during the water washes and is eluted with 5 ml of methanol. Radioactivity in the eluted material is measured by liquid scintillation counting and is proportional to galactosyltransferase activity in the starting sample.
In the alternative, DITHP induction by heat or toxins may be demonstrated using primary cultures of human fibroblasts or human cell lines such as CCL-13, HEK293, or HEP G2 (ATCC). To heat induce DITHP expression, aliquots of cells are incubated at 42 °C for 15, 30, or 60 minutes. Control aliquots are incubated at 37 °C for the same time periods. To induce DITHP expression by toxins, aliquots of cells are treated with 100 μM arsenite or 20 mM azetidine-2-carboxylic acid for 0, 3, 6, or 12 hours. After exposure to heat, arsenite, or the amino acid analogue, samples of the treated cells are harvested and cell lysates prepared for analysis by western blot. Cells are lysed in lysis buffer containing 1% Nonidet P-40, 0.15 M NaCl, 50 mM Tris-HCl, 5 mM EDTA, 2 mM N-ethylmaleimide, 2 mM phenylmethylsulfonyl fluoride, 1 mg/ml leupeptin, and 1 mg/ml pepstatin. Twenty micrograms of the cell lysate is separated on an 8% SDS-PAGE gel and transferred to a membrane. After blocking with 5% nonfat dry milk/phosphate-buffered saline for 1 h, the membrane is incubated overnight at 4°C or at room temperature for 2-4 hours with a 1: 1000 dilution of anti-DITHP serum in 2% nonfat dry milk/phosphate-buffered saline. The membrane is then washed and incubated with a 1:1000 dilution of horseradish peroxidase-conjugated goat anti-rabbit IgG in 2% dry milk/phosphate-buffered saline. After washing with 0.1% Tween 20 in phosphate-buffered saline, the DITHP protein is detected and compared to controls using chei luminescence.
Alternatively, DITHP protease activity is measured by the hydrolysis of appropriate synthetic peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absorption of the released chromophore (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York, NY, pp.25-55). Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases, or metalloproteases), aminopeptidase (leucine aminopeptidase), or carboxypeptidase (carboxypeptidases A and B, procollagen C-proteinase). Commonly used chromogens are 2-naphthylamine, 4-nitroaniline, and fury lacry lie acid. Assays are performed at ambient temperature and contain an aliquot of the enzyme and the appropriate substrate in a suitable buffer. Reactions are carried out in an optical cuvette, and the increase/decrease in absorbance of the chromogen released during hydrolysis of the peptide substrate is measured. The change in absorbance is proportional to the DITHP protease activity in the assay. hi the alternative, an assay for DLTHP protease activity takes advantage of fluorescence resonance energy transfer (FRET) that occurs when one donor and one acceptor fluorophore with an appropriate spectral overlap are in close proximity. A flexible peptide linker containing a cleavage site specific for PRTS is fused between a red-shifted variant (RSGFP4) and a blue variant (BFP5) of Green Fluorescent Protein. This fusion protein has spectral properties that suggest energy transfer is occurring from BFP5 to RSGFP4. When the fusion protein is incubated with DITHP, the substrate is cleaved, and the two fluorescent proteins dissociate. This is accompanied by a marked decrease in energy transfer which is quantified by comparing the emission spectra before and after the addition of DITHP (Mitra, R.D. et al (1996) Gene 173:13-17). This assay can also be performed in living cells. In this case the fluorescent substrate protein is expressed constitutively in cells and DITHP is introduced on an inducible vector so that FRET can be monitored in the presence and absence of DITHP (Sagot, I. et al (1999) FEBS Lett. 447:53-57).
A method to determine the nucleic acid binding activity of DITHP involves a polyacrylamide gel mobility-shift assay. In preparation for this assay, DITHP is expressed by transforming a mammalian cell line such as COS7, HeLa or CHO with a eukaryotic expression vector containing DITHP cDNA. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the cell line to allow expression and accumulation of DITHP. Extracts containing solubilized proteins can be prepared from cells expressing DITHP by methods well known in the art.
I Portions of the extract containing DITHP are added to [32P]-labeled RNA or DNA. Radioactive nucleic acid can be synthesized in vitro by techniques well known in the art. The mixtures are incubated at 25 °C in the presence of RNase- and DNase-inhibitors under buffered conditions for 5-10 minutes. After incubation, the samples are analyzed by polyacrylamide gel electrophoresis followed by autoradiography . The presence of a band on the autoradiogram indicates the formation of a complex between DITHP and the radioactive transcript. A band of similar mobility will not be present in samples prepared using control extracts prepared from untransformed cells.
In the alternative, a method to determine the methylase activity of a DITHP measures transfer of radiolabeled methyl groups between a donor substrate and an acceptor substrate. Reaction mixtures (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 μCi [me Z-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DITHP, and acceptor substrate (e.g., 0.4 μg [35S]RNA, or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. Analysis of [røet/ryZ-3H]RNA is as follows: 1) 50 μl of 2 x loading buffer (20 mM Tris-HCl, pH 7.6, 1 M LiCl, 1 mM EDTA, 1% sodium dodecyl sulphate (SDS)) and 50 μl oligo d(T)-cellulose (10 mg/ml in 1 x loading buffer) are added to the reaction mixture, and incubated at ambient temperature with shaking for 30 minutes. 2) Reaction mixtures are transferred to a 96-well filtration plate attached to a vacuum apparatus. 3) Each sample is washed sequentially with three 2.4 ml aliquots of 1 x oligo d(T) loading buffer containing 0.5% SDS, 0.1% SDS, or no SDS. and 4) RNA is eluted with 300 μl of water into a 96-well collection plate, transferred to scintillation vials containing liquid scintillant, and radioactivity deteπnined. Analysis of [methyl~3U]6-MP is as follows: 1) 500 μl 0.5 M borate buffer, pH 10.0, and then 2.5 ml of 20% (v/v) isoamyl alcohol in toluene are added to the reaction mixtures. 2) The samples mixed by vigorous vortexing for ten seconds. 3) After centrifugation at 700g for 10 minutes, 1.5 ml of the organic phase is transferred to scintillation vials containing 0.5 ml absolute ethanol and liquid scintillant, and radioactivity deteπnined. and 4) Results are corrected for the extraction of 6-MP into the organic phase (approximately 41%).
An assay for adhesion activity of DITHP measures the disruption of cytoskeletal filament networks upon overexpression of DITHP in cultured cell lines (Rezniczek, G.A. et al. (1998) J. Cell Biol. 141:209-225). cDNA encoding DITHP is subcloned into a mammalian expression vector that drives high levels of cDNA expression. This construct is transfected into cultured cells, such as rat kangaroo PtK2 or rat bladder carcinoma 804G cells. Actin filaments and intermediate filaments such as keratin and vimentin are visualized by immunofluorescence microscopy using antibodies and techniques well known in the art. The configuration and abundance of cytoskeletal filaments can be assessed and quantified using confocal imaging techniques. In particular, the bundling and collapse of cytoskeletal filament networks is indicative of DITHP adhesion activity.
Alternatively, an assay for DITHP activity measures the expression of DITHP on the cell surface. cDNA encoding DITHP is transfected into a non-leukocytic cell line. Cell surface proteins are labeled with biotin (de la Fuente, M.A. et al. (1997) Blood 90:2398-2405). Immunoprecipitations are performed using DITHP-specific antibodies, and immunoprecipitated samples are analyzed using SDS-PAGE and immunoblotting techniques. The ratio of labeled immunoprecipitant to unlabeled 5 immunoprecipitant is proportional to the amount of DITHP expressed on the cell surface.
Alternatively, an assay for DITHP activity measures the amount of cell aggregation induced by overexpression of DITHP. In this assay, cultured cells such as NIH3T3 are transfected with cDNA encoding DITHP contained within a suitable mammalian expression vector under control of a strong promoter. Cotransfection with cDNA encoding a fluorescent marker protein, such as Green o Fluorescent Protein (CLONTECH), is useful for identifying stable transfectants. The amount of cell agglutination, or clumping, associated with transfected cells is compared with that associated with untransfected cells. The amount of cell agglutination is a direct measure of DITHP activity.
DITHP may recognize and precipitate antigen from serum. This activity can be measured by the quantitative precipitin reaction (Golub, E.S. et al. (1987) Immunology: A Synthesis, Sinauer 5 Associates, Sunderland MA, pages 113-115). DITHP is isotopically labeled using methods known in the art. Various serum concentrations are added to constant amounts of labeled DITHP. DITHP- antigen complexes precipitate out of solution and are collected by centrifugation. The amount of precipitable DITHP-antigen complex is proportional to the amount of radioisotope detected in the precipitate. The amount of precipitable DITHP-antigen complex is plotted against the serum o concentration. For various serum concentrations, a characteristic precipitation curve is obtained, in which the amount of precipitable DITHP-antigen complex initially increases proportionately with increasing serum concentration, peaks at the equivalence point, and then decreases proportionately with further increases in serum concentration. Thus, the amount of precipitable DITHP-antigen complex is a measure of DITHP activity which is characterized by sensitivity to both limiting and 5 excess quantities of antigen.
A microtubule motility assay for DITHP measures motor protein activity. In this assay, recombinant DITHP is immobilized onto a glass slide or similar substrate. Taxol-stabilized bovine brain microtubules (commercially available) in a solution containing ATP and cytosolic extract are perfused onto the slide. Movement of microtubules as driven by DITHP motor activity can be o visualized and quantified using video-enhanced light microscopy and image analysis techniques.
DITHP motor protein activity is directly proportional to the frequency and velocity of microtubule movement.
Alternatively, an assay for DLTHP measures the formation of protein filaments in vitro. A solution of DITHP at a concentration greater than the "critical concentration" for polymer assembly is 5 applied to carbon-coated grids. Appropriate nucleation sites may be supplied in the solution. The grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electron microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 nm (intermediate filaments) is a demonstration of protein activity.
DITHP electron transfer activity is demonstrated by oxidation or reduction of NADP. Substrates such as Asn-βGal, biocytidine, or ubiquinone-10 may be used. The reaction mixture 5 contains 1-2 mg/ml HORP, 15 mM substrate, and 2.4 mM NAD(P)+ in 0.1 M phosphate buffer, pH 7.1 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 (reduction reaction); in a total volume of 0.1 ml. FAD may be included with NAD, according to methods well known in the art. Changes in absorbance are measured using a recording spectrophotometer. The amount of NAD(P)H is stoichiometrically equivalent to the amount of substrate initially present, and the change o in A340 is a direct measure of the amount of NAD(P)H produced; ΔA34Q = 6620[NADH]. DLTHP activity is proportional to the amount of NAD(P)H present in the assay. The increase in extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of oxidation activity, or the decrease in extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858). 5 DLTHP transcription factor activity is measured by its ability to stimulate transcription of a reporter gene (Liu, H.Y. et al. (1997) EMBO J. 16:5289-5298). The assay entails the use of a well characterized reporter gene construct, LexAop-LacZ, that consists of LexA DNA transcriptional control elements (LexAop) fused to sequences encoding the E. coli LacZ enzyme. The methods for constructing and expressing fusion genes, introducing them into cells, and measuring LacZ enzyme o activity, are well known to those skilled in the art. Sequences encoding DLTHP are cloned into a plasmid that directs the synthesis of a fusion protein, LexA-DLTHP, consisting of DLTHP and a DNA binding domain derived from the LexA transcription factor. The resulting plasmid, encoding a LexA- DLTHP fusion protein, is introduced into yeast cells along with a plasmid containing the LexAop-LacZ reporter gene. The amount of LacZ enzyme activity associated with LexA-DLTHP transfected cells, 5 relative to control cells, is proportional to the amount of transcription stimulated by the DLTHP.
Chromatin activity of DITHP is demonstrated by measuring sensitivity to DNase I (Dawson, B.A. et al. (1989) J. Biol. Chem. 264:12830-12837). Samples are treated with DNase I, followed by insertion of a cleavable biotinylated nucleotide analog, 5-[(N-biotinamido)hexanoamido-ethyl-l,3- thiopropionyl-3-aminoallyl]-2'-deoxyuridine 5 '-triphosphate using nick-repair techniques well known 0 to those skilled in the art. Following purification and digestion with EcoRI restriction endonuclease, biotinylated sequences are affinity isolated by sequential binding to streptavidin and biotincellulose.
Another specific assay demonstrates the ion conductance capacity of DITHP using an electrophysiological assay. DITHP is expressed by transforming a mammalian cell line such as COS7, HeLa or CHO with a eukaryotic expression vector encoding DITHP. Eukaryotic expression 5 vectors are commercially available, and the techniques to introduce them into cells are well known to those skilled in the art. A small amount of a second plasmid, which expresses any one of a number of marker genes such as β-galactosidase, is co-transformed into the cells in order to allow rapid identification of those cells which have taken up and expressed the foreign DNA. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the cell line to allow expression and accumulation of DITHP and β-galactosidase. Transformed cells expressing β- galactosidase are stained blue when a suitable colorimetric substrate is added to the culture media under conditions that are well known in the art. Stained cells are tested for differences in membrane conductance due to various ions by electrophysiological techniques that are well known in the art. Untransformed cells, and/or cells transformed with either vector sequences alone or β-galactosidase sequences alone, are used as controls and tested in parallel. The contribution of DITHP to cation or anion conductance can be shown by incubating the cells using antibodies specific for either DITHP. The respective antibodies will bind to the extracellular side of DITHP, thereby blocking the pore in the ion channel, and the associated conductance.
XV. Functional Assays DLTHP function is assessed by expressing dithp at physiologically elevated levels in mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV SPORT (Life Technologies) and pCR3.1 (Invitrogen Corporation, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 μg of recombinant vector are transiently transfected into a human cell line, preferably of endothelial or hematopoietic origin, using either liposome formulations or electroporation. 1-2 μg of an additional plasmid containing sequences encoding a marker protein are co-transfected.
Expression of a marker protein provides a means to distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of the cells and other cellular properties.
FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cytometry, Oxford, New York NY.
The influence of DLTHP on gene expression can be assessed using highly purified populations of cells transfected with sequences encoding DLTHP and either CD64 or CD64-GFP. CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding DLTHP and other genes of interest can be analyzed by northern analysis or microarray techniques. 0
XVI. Production of Antibodies
DLTHP substantially purified using poly aery lamide gel electrophoresis (PAGE; see, e.g., Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize rabbits and to produce antibodies using standard protocols. 5 Alternatively, the DLTHP amino acid sequence is analyzed using LASERGENE software
(DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is synthesized and used to raise antibodies by means known to those of skill in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra. Chapter 11.) o Typically, peptides 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel, supra.) Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to 5 plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-DLTFJP activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
XVII. Purification of Naturally Occurring DITHP Using Specific Antibodies o Naturally occurring or recombinant DLTHP is substantially purified by immunoaffinity chromatography using antibodies specific for DLTHP. An immunoaffinity column is constructed by covalently coupling anti-DLTHP antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions. 5 Media containing DLTHP are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of DLTHP (e.g., high ionic strength buffers in the presence of detergent). The column is eluted under conditions that disrupt antibody/DLTHP binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and DLTHP is collected.
XVIII. Identification of Molecules Which Interact with DITHP
DLTHP, or biologically active fragments thereof, are labeled with 125I Bolton-Hunter reagent. (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled DLTHP, washed, and any wells with labeled DLTHP complex are assayed. Data obtained using different concentrations of DLTHP are used to calculate values for the number, affinity, and association of DLTHP with the candidate molecules.
Alternatively, molecules interacting with DLTHP are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). DLTHP may also be used in the PATHCALLfNG process (CuraGen Corp. , New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
All publications and patents mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described modes for carrying out the invention which are obvious to those skilled in the field of molecular biology or related fields are intended to be within the scope of the following claims.
TABLE 1 SEQ ID NO: Template ID SEQ ID NO: O F ID
1 LG:405741.3:2000SEP08 276 LG:405741 ,3.orf3:2000SEP08
2 LG:337194.1 :2000SEP08 277 LG:337194.1 .orfl :2000SEP08
3 LG:017108.4:2000SEP08 278 LG:017108.4.orf2:2000SEP08
4 LG:372569.5:2000SEP08 279 LG:372569,5.orf 1 :2000SEP08
5 LG:968765.1 :2000SEP08 280 LG:968765.1.orf2:2000SEP08
6 LG:255999.16:2000SEP08 281 LG:255999.16.orf3:2000SEP08
7 LG:977820.9:2000SEP08 282 LG:977820.9.orf2:2000SEP08
8 LI : 1071608.1 :2000SEP08 283 LI: 1071608.1.orf3:2000SEP08
9 U:1074023.1 :2000SEP08 284 Ll:1074023.1.orf3:2000SEP08
10 U:453570.1 :2000SEP08 285 Ll:453570.1.orf3:2000SEP08
1 1 LI:072072.1 :2000SEP08 286 Ll:072072.1 ,orf2:2000SEP08
12 LI:148565.4:2000SEP08 287 LI: 148565.4.orf 1 :2000SEP08
13 L1:368626,4:2000SEP08 288 LI:368626.4.orf 1 :2000SEP08
14 U:346123.1 :2000SEP08 289 LI:346123.1.orf2:2000SEP08
15 LI:335795.1 1 :2000SEP08 290 Ll:335795.1 1.orf2:2000SEP08
16 LI:246023.2:2000SEP08 291 Ll:246023.2.orf 1 :2000SEP08
17 LG: 1 100661.1 :2000SEP08 292 LG: 1 100661.1.orf3:2000SEP08
18 LG:475856.1 :2000SEP08 293 LG:475856.1 .orfl :2000SEP08
19 L : 1015343.1 :2000SEP08 294 LG: 1015343.1.orfl :2000SEP08
20 LG:1400575.1:2000SEP08 295 LG: 1400575.1.orf2:2000SEP08
21 LG:1080545.1:2000SEP08 296 LG: 1080545.1.orf2:2000SEP08
22 LG:213947.1:2000SEP08 297 LG:213947.1.orfl :2000SEP08
23 LI:720ό41.1:2000SEP08 298 LI.720641.1 ,orf2:2000SEP08
24 LI:1023894.1:2000SEP08 299 Ll:1023894.1.orf3:2000SEP08
25 LI:734904.1:2000SEP08 300 Ll:734904.1 ,orf3:2000SEP08
26 U:l 178118.1 :2000SEP08 301 U:l l 781 18.1. orf3:2000SEP08
27 U:213947.1:2000SEP08 302 Ll:213947.1.orf3:2000SEP08
28 LG:407304.1:2000SEP08 303 LG:407304.1.orfl :2000SEP08
29 LG:337358.1:2000SEP08 304 LG:337358.1 ,orf2:2000SEP08
30 LG:986090.1:2000SEP08 305 LG:986090.1.orfl :2000SEP08
31 LG:123250.1.-2000SEP08 306 LG: 123250.1 ,orf2:2000SEP08
32 LG:1028774,2:2000SEP08 307 LG:1028774.2.orfl :2000SEP08
33 LG:338927.6:2000SEP08 308 LG:338927.6.orf2:2000SEP08
34 LG:332944.2:2000SEP08 309 LG:332944.2.orfl :2000SEP08
35 LI:347174.5:2000SEP08 310 Ll:347174.5.orf2:2000SEP08
36 LI:477070.1:2000SEP08 311 Ll:477070.1 .orf2:2000SEP08
37 U:723144.1:2000SEP08 312 U:723144.1.orf2:2000SEP08
38 LI:1007188.1:2000SEP08 313 Ll:l 007188.1. orfl :2000SEP08
39 U:1024412.1 :2000SEP08 314 LI:1024412.1.orf3:2000SEP08
40 L1:284797.3:2000SEP08 315 LI:284797.3.orf3:2000SEP08
41 LI :1092901.1 :2000SEP08 316 LI: 1092901. l .orf3:2000SEP08
42 U:228930.1 :2000SEP08- 317 Ll:228930.1.orf2:2000SEP08
43 U:722913.1 :2000SEP08 318 LI:722913.1.orfl :2000SEP08
44 LG:457478.1 :2000SEP08 319 LG:457478.1.orf3:2000SEP08
45 LG:358719.1 :2000SEP08 320 LG:358719.1 ,orf3:2000SEP08
46 LG:105160.5:2000SEP08 321 LG:105160.5.orfl :2000SEP08
47 LG:400705.1 :2000SEP08 322 LG:400705.1.orfl :2000SEP08
48 LG:221977.1 :2000SEP08 323 LG:221977.1 ,orf3:2000SEP08
49 LG:898771.1 :2000SEP08 324 LG:898771.1.orfl :2000SEP08 TABLE 1
SEQ ID NO: Template ID SEQ ID NO: ORF ID
50 11:457478.1 :2000SEP08 325 LI:457478.1.orf2:2000SEP08
51 LI:125140.1 :2000SEP08 326 LI:125140.1.orf3:2000SEP08
52 U:021095.2:2000SEP08 327 LI.O21095.2.orf 1 :2000SEP08
53 LI:888730.1 :2000SEP08 328 U;888730.1 ,orf2:2000SEP08
53 L1:888730.1 :2000SEP08 329 U:888730.1.orf3:2000SEP08
54 U:358719.1 :2000SEP08 330 Ll:358719.1 ,orf2:2000SEP08
55 U:351342.3:2000SEP08 331 LI:351342.3.orf3:2000SEP08
56 LI:25ό099.2:2000SEP08 332 U:256099.2.orf3:2000SEP08
57 LI:2051991 .1 :2000SEP08 333 LI:2051991.1.orfl :2000SEP08
58 LG:980769.1 :2000SEP08 334 LG:980769.1 ,orf3:2000SEP08
59 LG:332474.3:2000SEP08 335 LG:332474.3.orf3:2000SEP08
60 LG: 1087707.1 :2000SEP08 336 LG:1087707.1 .orf 1 :2000SEP08
61 LG:415349.1:2000SEP08 337 LG:415349.1.orf2;2000SEP08
62 LG:132420.2:2000SEP08 338 LG:132420.2.orf2:2000SEP08
63 LG:394201.1 :2000SEP08 339 LG:394201.1 ,orf2:2000SEP08
64 LG:1060884.1 :2000SEP08 340 LG: 1060884.1 ,orf3:2000SEP08
65 LG:242191 .1 :2000SEP08 341 LG:242191.1.orf2:2000SEP08
66 LG: 10ό3762.3:2000SEP08 342 LG:10637ό2.3.orfl :2000SEP08
67 LG: 1 100856.1 :2000SEP08 343 LG: 1 100856.1 ,orf2:2000SEP08
68 LG:979390.2:2000SEP08 344 LG:979390.2.orf 1 :2000SEP08
69 LG: 1 00447.1 :2000SEP08 345 LG: 1400447.1 .orfl :2000SEP08
70 LG: 1400562.1 :2000SEP08 346 LG: 1400562.1 .orf3:2000SEP08
71 LG: 1076130.1 :2000SEP08 347 LG:1076130.1.orf3:2000SEP08
72 LG: 1064459.1 :2000SEP08 348 LG: 1064459.1.orf2:2000SEP08
73 LG: 1079415.14:2000SEP08 349 LG: 1079415, 14.orf2:2000SEP08
74 LG: 1329431 .3:2000SEP08 350 LG:1329431.3,orf2:2000SEP08
75 LG: 1088431.2:2000SEP08 351 LG:1088431.2.orfl :2000SEP08
76 LG: 1329462.2:2000SEP08 352 LG: 1329462.2.orf 1 :2000SEP08
77 U:393468.1 :2000SEP08 353 LI:393468.1.orf2:2000SEP08
78 LI;722577.1 :2000SEP08 354 L 722577.1.orfl :2000SEP08
79 LI:322783.16:2000SEP08 355 Ll:322783.1 ό.orf 1 :2000SEP08
80 LI:901355.2:2000SEP08 356 LI:901355.2.orf2:2000SEP08
81 U:038859.2:2000SEP08 357 LI:038859.2.orfl :2000SEP08
82 LI: 10461 17.1 :2000SEP08 358 LI: 10461 7.1.orfl :2000SEP08
83 LI:801015.1 :2000SEP08 359 LI:801015.1.orfl :2000SEP08
84 LI: 1 175590.1 :2000SEP08 360 LI: 1 175590.1 ,orf3:2000SEP08
85 LI:1 170585.2:2000SEP08 361 LI:1 170585.2.orf2:2000SEP08
86 LI:719531.2:2000SEP08 362 LI:719531.2.orfl :2000SEP08
87 LI:794ό23.1 :2000SEP08 363 LI.794ό23.1.orf2:2000SEP08
88 Ll:l 1731 19.1 :2000SEP08 364 U:l 1731 19.1 . orf3:2000SEP08
89 LI :1093285.1 :2000SEP08 365 LI: 1093285.1 . orf 1 :2000SEP08
90 U:1091881.1:2000SEP08 366 LI: 1091881 .1. orf 1 :2000SEP08
91 LI:1091όl7.1:2000SEP08 367 LI:1091όl7.1.orf2:2000SEP08
92 LI: 1082344.1 :2000SEP08 368 LI : 1082344.1. orf 1 :2000SEP08
93 LI: 1 166249.1 :2000SEP08 369 Ll:l 166249.1 . orf3:2000SEP08
94 L1:799675.1 :2000SEP08 370 U:799675.1.orfl :2000SEP08
95 LI:1 178899.1 :2000SEP08 371 Ll:1 178899.1.orf2:2000SEP08
96 LI: 1 169241.1 :2000SEP08 372 LI:1 169241.1.orf3:2000SEP08
97 LI: 1 180090.1 :2000SEP08 373 Ll:l 180090.1.orfl :2000SEP08 TLABE 1
ΉoΠ H Uα. Q
O
00 00 00 oo ∞ o o o O o o o o o o o o O O o o O O O o 0_ Q- σ o o o o o o o £L O o o O
0- o o 0- o_ O u_ O O 0_ 0- 0- . a. 0- o o o o_ o o 0- 0- 0- 0- o o o o o 0- D_
Q __ α. O- 0_ 0_ α_ a. o D_ D- o 0- 0_ 0- D-
O
Φ σ o o o o o o O o o O o CD o o o o o o o o o O o CD α O σ σ o σ o o o o o o o O o o O o o o CD O o o o o o o o o o CD o o O o o o CD O o CD o o o o O σ o CD o o o σ σ o α o o o σ O o o o o o O O o o o o o O o o CM CM O O o o o o o O O o O CM CM α o o CM CM CM o CM o o CM CM CM CM CM CM σ o _" ^_1 O o CM o o CM CM CM CM CM CM ~_ ~_ o o O o o O o o o o o O o CM O CM CM M
CM M CM E ^ CM CM ^ ^ CM CM CM —- ,— —- —- ,— CM CM > CM CM CM
CM .— .— —- __; CM CM CM CM .—
CM 1 — is I <vf ^ CM 5 5 o" CM' o' ' — o' ^f O ^f CD ^f o CM o CM , — 11 — -O 1 — oό
CM vf o O 5 r-^ o o CM O CM CM vf O <> o O is oό 5 CD 5 is o o o o CM -o o o o CM CM o O" O t> o o O o O O o σ-
O o CM "-j ^r o • o • r^ ^3- O o -j- CM o <sT o •
~t o is s. - o r^ o r*. o CM -o o s rs CM O o o O o 5 o
O is ^r o o o o o o o o o O CM o o O CM o CM o o ^r o CM O o o
CM 1 CM 1 CM 00 r-- 1 " - 00000000 "^r o r—
TABLE 1
SEQ ID NO Template ID SEQ ID NO ORF ID
147 LI:1 180418.1 :2000SEP08 423 U:l 180418.1.orf2:2000SEP08
148 LG:232648.1 :2000SEP08 424 LG:232648.1.orfl :2000SEP08
149 LG: 1078420.1 :2000SEP08 425 LG: 1078420.1.orf2:2000SEP08
150 LG:1397599.1 :2000SEP08 426 LG: 1397599.1.orf2:2000SEP08
151 LG:1397655.2:2000SEP08 427 LG:1397655.2.orfl :2000SEP08
152 LG:241055.1 :2000SEP08 428 LG:241055.1.orfl :2000SEP08
153 LG:1 101065.1 :2000SEP08 429 LG:1 101065.1.orfl :2000SEP08
154 LG :475ό29.1 :2000SEP08 430 LG:475629.1.orf2:2000SEP08
155 LI:348991.1 :2000SEP08 431 U:348991.1.orfl :2000SEP08
156 LI:475629.1 :2000SEP08 432 Ll:475629.1.orf3:2000SEP08
157 U:261331 .1 :2000SEP08 433 LI:261331 .1.orfl :2000SEP08
158 LI:815686.1 :2000SEP08 434 Ll:815686.1 ,orf3:2000SEP08
159 LI:1 167327.2:2000SEP08 435 Ll:l 167327.2.orf3;2000SEP08
160 LI:758009.3:2000SEP08 436 LI:758009,3.orf2:2000SEP08
161 LG:331593.1 :2000SEP08 437 LG:331593,1 .orf3:2000SEP08
162 LI:1094174.1 :2000SEP08 438 LI : 1094174.1 ,orf2:2000SEP08
163 U:814362.1 :2000SEP08 439 Ll:814362.1 ,orf3:2000SEP08
164 LI:219542.1 :2000SEP08 440 Ll:219542.1.orf3:2000SEP08
165 LI:726197.1 :2000SEP08 441 LI:726197.1.orf3:2000SEP08
166 U:1075314.1 :2000SEP08 442 U:1075314.1.orfl :2000SEP08
167 U:437883.1 :2000SEP08 443 Ll:437883.1 ,orf2:2000SEP08
168 LG:33ό2ό5.1 :2000SEP08 444 LG:33ό2ό5.1.orf3:2000SEP08
169 LG:407788,2:2000SEP08 445 LG:407788.2,orf3:2000SEP08
170 LG: 1326925.1 :2000SEP08 446 LG: 1326925.1 ,orf2:2000SEP08
171 LI:332655.2:2000SEP08 447 LI:332655.2.orfl :2000SEP08
172 LI: 1 184621.4:2000SEP08 448 LI: 1 184621 ,4.orf2:2000SEP08
173 LI:2051386.1 :2000SEP08 449 Ll:2051386.1 ,orf2:2000SEP08
174 LG:362757.1 :2000SEP08 450 LG:362757.1 .orfl :2000SEP08
. 175 LG:406770.1 :2000SEP08 451 LG:406770.1.orf3:2000SEP08
176 LG: 1094640.1 :2000SEP08 452 LG: 1094640.1 ,orf3:2000SEP08
177 LG:001929.1 :2000SEP08 453 LG.'OOl 929.1.orfl :2000SEP08
178 Ll:401322.1 2000SEP08 454 LI:401322.1.orf3:2000SEP08
179 Ll:208748.1 2000SEP08 455 LI:208748.1.orfl :2000SEP08
180 LL407242.1 2000SEP08 456 Ll:407242.1 ,orf2:2000SEP08
181 Ll:403409.1 2000SEP08 457 Ll:403409.1 .orf3:2000SEP08
182 LI. '450798.1 2000SEP08 458 LI.-450798.1 ,orf3:2000SEP08
183 Ll:410317.1 2000SEP08 459 LI:410317.1.orfl :2000SEP08
184 Ll:340268.1 2000SEP08 460 Ll:340268.1 ,orf3:2000SEP08
185 U:2051671.1 :2000SEP08 461 Ll:2051671.1.orf3:2000SEP08
186 LG:998844.1 :2000SEP08 462 LG:998844.1.orf3:2000SEP08
187 LG:1043787.1 :2000SEP08 463 LG: 1043787.1.orfl :2000SEP08
188 LG:1098931.16:2000SEP08 464 LG:1098931.1 ό.orfl :2000SEP08
189 LG:199423.2:2000SEP08 465 LG:199423.2.orfl :2000SEP08
190 LI: 1075297.1 :2000SEP08 466 LI: 1075297.1.orf2:2000SEP08
191 LI :1043321.1 :2000SEP08 467 LI: 1043321.1.orf3:2000SEP08
192 LI:297070.1 :2000SEP08 468 Ll;297070.1.orfl :2000SEP08
193 LI:1085041.1 :2000SEP08 469 LI: 1085041.1.orf3:2000SEP08
194 LI:1071544.1 :2000SEP08 470 LI: 1071544.1.orf2:2000SEP08
195 Ll:2052480. 1 :2000SEP08 471 Ll:2052480.1 ,orf2:2000SEP08 TABLE 1
SEQ ID NO Template ID SEQ ID NO ORF ID '
196 LG:450105.1 :2000SEP08 472 LG:450105.1 ,orf2:2000SEP08
197 LG:450581 .1 :2000SEP08 473 LG:450581.1 ,orf2:2000SEP08
198 LG:450887.1 :2000SEP08 474 LG:450887.1.orf3:2000SEP08
199 LG:460809.1 :2000SEP08 475 LG:460809.1.orf3:2000SEP08
200 LG:452089.1 :2000SEP08 476 LG:452089.1 ,orf2:2000SEP08
201 LG: 1099416.1 :2000SEP08 477 LG: 1099416.1.orf3:2000SEP08
202 LG:255713.1 :2000SEP08 478 LG:255713.1.orfl :2000SEP08
203 LG:998903.1 :2000SEP08 479 LG:998903.1 .orfl :2000SEP08
204 LG:1 1 19656.1 2000SEP08 480 LG:1 1 19656.1 ,orf2:2000SEP08
205 LG: 1096907.1 2000SEP08 481 LG: 1096907.1.orfl :2000SEP08
206 LG: 1323741.1 2000SEP08 482 LG: 1323741.1 ,orf2:2000SEP08
207 LG: 1098372.1 2000SEP08 483 LG: 1098372.1.orfl :2000SEP08
208 LG: 1006783.1 2000SEP08 484 LG: 1006783.1 .orfl :2000SEP08
209 LG: 1097562.1 2000SEP08 485 LG: 1097562.1.orf2:2000SEP08
210 LG:998868.1 :2000SEP08 486 LG:998868.1 ,orf2:2000SEP08
21 1 LG:1063383. l :2000SEP08 487 LG: 1063383.1.orfl :2000SEP08
212 . LG:1400567.1 :2000SEP08 488 LG: 1400567.1 ,orf2:2000SEP08
213 LI:449404.1 :2000SEP08 489 Ll:449404.1.orfl :2000SEP08
214 U:449941.2:2000SEP08 490 Ll:449941.2.orfl :2000SEP08
215 L1:450229.1 :2000SEP08 491 LI:450229.1.orfl :2000SEP08
216 LI:450399.3:2000SEP08 492 LI:450399.3,orf3:2000SEP08
217 LI:455771 .1 :2000SEP08 493 LI:455771.1 .orf3:2000SEP08
218 U:720459.1 :2000SEP08 494 LI:720459.1.orfl :2000SEP08
219 LI:723156.1 :2000SEP08 495 LI:723156.1 ,orf3:2000SEP08
220 L1:728055.1 :2000SEP08 496 Ll;728055.1 .orf3:2000SEP08
221 LI :1020789.1 :2000SEP08 497 LI: 1020789.1 .orfl :2000SEP08
222 U:1071728.1 :2000SEP08 498 Ll:1071728.1 , orfl :2000SEP08
223 LI :1084329.1 :2000SEP08 499 LI: 1084329.1 ,orf2:2000SEP08
224 LI:246422.1 :2000SEP08 500 Ll;246422.1 .orfl :2000SEP08
225 LI :1086066.1 :2000SEP08 501 LI :1086066.1 ,orf2:2000SEP08
226 LI:223142.1 :2000SEP08 502 U:223142.1.orf3:2000SEP08
227 LI:885368.1 :2000SEP08 503 Ll:885368.1 ,orf2:2000SEP08
228 LI:481782.1 :2000SEP08 504 Ll:481782, 1 ,orf2:2000SEP08
229 LI:1093813.1 :2000SEP08 505 LI:1093813.1.orfl :2000SEP08
230 LI:449413.2:2000SEP08 506 LI:449413.2.orf3:2000SEP08
231 L1:450105.1 :2000SEP08 507 Ll:450105.1 . orf 1 :2000SEP08
231 LI:450105.1 :2000SEP08 508 LI:450105.1.orf3:2000SEP08
232 U:814285.1 :2000SEP08 509 LI :814285.1. orf 1 :2000SEP08
233 LI:1 142855.1 :2000SEP08 510 LI: 1 142855.1 ,orf2:2000SEP08
234 LI:817330.1 :2000SEP08 51 1 LI:817330.1.orf3:2000SEP08
235 U:817845.1 :2000SEP08 512 Ll:817845.1. orf2:2000SEP08
236 LI:460809.1 :2000SEP08 513 Ll:460809.1 ,orfl :2000SEP08
237 LI:815874.1 :2000SEP08 514 Ll:815874.1.orf3:2000SEP08
238 LI:255713.1 :2000SEP08 515 LI:255713.1.orf2:2000SEP08
239 LI:035973.1 :2000SEP08 516 LI:035973.1.orfl :2000SEP08
240 LI:1 138n θ. l :2000SEP08 517 U:l 1381 10.1.orf3:2000SEP08
241 LI:2049074.1 :2000SEP08 518 Ll:2049074.1 ,orf2:2000SEP08
242 LI :1092460.1 :2000SEP08 519 LI: 1092460.1.orfl :2000SEP08
243 Ll:399421.1 :20 00SEP08 520 LL399421.1.orf3:2000SEP08 TABLE 1
SEQ ID NO: Template ID SEQ ID NO: ORF ID
244 LI:816655.2:2000SEP08 521 LI:816655.2.orf2:2000SEP08
245 LG:414732.1 :2000SEP08 522 LG:414732.1. orf 1 :2000SEP08
246 LG: 1 140250.1 :2000SEP08 523 LG: 1 140250.1.orf3:2000SEP08
247 LG: 174022.1 :2000SEP08 524 LG: 174022.1.orfl :2000SEP08
248 LI:00281 1.1 :2000SEP08 525 Ll:00281 1.1 .orfl :2000SEP08
249 LI:414732.2:2000SEP08 526 LI:414732.2.orfl :2000SEP08
250 LI: 1019920.1 ;2000SEP08 527 LI: 1019920.1. orf 1 :2000SEP08
251 LI: 1038336.1 :2000SEP08 528 LI :1038336.1 ,orf2:2000SEP08
252 LI: 1 1 7772.1 1 :2000SEP08 529 LI: 1 177772.1 1. orf 1 :2000SEP08
253 U:205642.2:2000SEP08 530 U:205642.2.orf3:2000SEP08
254 LG:449685.1 :2000SEP08 531 LG:449685.1 .orf3:2000SEP08
255 LG:453922.1 :2000SEP08 532 LG:453922.1.orfl :2000SEP08
256 LG:476342.3:2000SEP08 533 LG:476342.3.orf2:2000SEP08
256 LG:476342.3:2000SEP08 534 LG:476342.3.orf3:2000SEP08
257 LI:336801.1 :2000SEP08 535 LI:336801 .1.orfl :2000SEP08
258 Ll:449685.1 :2000SEP08 536 LI:449685.1 .orf2:2000SEP08
259 LI:476342.1 :2000SEP08 537 LI:476342.1.orfl :2000SEP08
260 LI :1072804.1 :2000SEP08 538 LI:1072804.1.orf2:2000SEP08
261 LI:455450.1 :2000SEP08 539 LI:455450.1 .orfl :2000SEP08
262 LI: 1073699.1 :2000SEP08 540 LI: 1073699.1. orf 1 :2000SEP08
263 U:1013729.1 :2000SEP08 541 Ll:1013729.1.orf3:2000SEP08
264 LI:2050322.2:2000SEP08 542 LI:2050322.2.orf2:2000SEP08
265 LI:891327.1 ;2000SEP08 543 LI:891327.1.orf3:2000SEP08
266 Ll:2053076.1 :2000SEP08 544 U:2053076.1.orf3:2000SEP08
267 LG:220085.1 :2000SEP08 545 LG:220085.1 ,orf3:2000SEP08
268 LG:406709.1 :2000SEP08 546 LG :406709.1.orf3 :2000SEP08
269 LG:347863.9:2000SEP08 547 LG:347863.9.orf3:2000SEP08
270 LI : 1073027.1 :2000SEP08 548 LI: 1073027.1.orfl :2000SEP08
271 LI:347635.1 :2000SEP08 549 Ll:347635.1 ,orf2:2000SEP08
272 LI:013685.1 :2000SEP08 550 LI:013685.1.orf2:2000SEP08
273 Ll;406709.1 :2000SEP08 551 Ll:406709.1 ,orf3:2000SEP08
274 Ll:2052938.1 :2000SEP08 552 Ll:2052938.1.orfl :2000SEP08
275 LI:213208.1 :2000SEP08 553 LI:213208.1.orf3:2000SEP08
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
1 LG:405741.3:2000SEP08 g 10439273 1 .00E-43 ■ Homo sapiens cDNA: FLJ22761 fis, clone KAIA0893.
2 LG:337194.1 :2000SEP08 g 10439591 0 Homo sapiens cDNA: FLJ23031 fis, clone LNG01932.
3 LG:017108.4:2000SEP08 g7687936 2.00E-18 (fl) (Leishmania major) possible adenylate kinase
4 LG:372569.5:2000SEP08 g3273307 0 (fl) (Rattus norvegicus) Lysophospholipase
5 LG:968765.1 :2000SEP08 gl 710247 1.00E-154 Human protein disulfide isomerase-related protein P5 mRNA, partial eds.
LG :255999.1 ό:2000SEP08 g8132761 3.00E-17 Homo sapiens glutathione transferase omega (GSTOl) mRNA, complete eds.
7 LG:977820.9:2000SEP08 g 14017908 0 Homo sapiens mRNA for KIAA1846 protein, partial eds.
8 LI:1071608.1 :2000SEP08 g 13529277 6.00E-96 Homo sapiens, aldo-keto reductase family 1, member Al
(aldehyde reductase), clone MGC: 12529, mRNA, complete eds.
9 LI :1074023.1 :2000SEP08 gl 124877 2.00E-99 Macaca mulatta GST-pi enzyme mRNA, complete eds.
10 LI:453570.1 :2000SEP08 g219663 1.OOE-1 1 Human mRNA for lactoyl glutathione lyase.
1 1 LI:072072.1 :2000SEP08 g 10434968 1.00E-102 Homo sapiens cDNA FLJ13105 fis, clone NT2RP3002351, weakly similar to Human mRNA for NAD-dependent methylene tetrahydrofolate dehydrogenase cyclohydrolase (EC 1 .5.1.15).
12 LI: 148565.4:2000SEP08 gl 185554 9.00E-33 (fl) (Zea mays) glyceraldehyde-3-phosphate dehydrogenase
13 LI:368626.4:2000SEP08 g 10441003 0 Homo sapiens epidermal lipoxygenase (ALOXE3) mRNA, complete eds.
14 LI:346123.1 :2000SEP08 g546735 1.00E-27 (fl) (Stellaria longipes) triose phosphate isomerase, TPI=5.3.1.1
(Stellaria longipes, Goldie, Peptide, 257 aa)
15 LI:335795.1 1 :2000SEP08 g339679 3.00E-09 Human threonyl-tRNA synthetase mRNA, complete eds.
16 LI:246023.2:2000SEP08 g 10434527 0 Homo sapiens cDNA FLJ12816 fis, clone NT2RP2002609, weakly similar to 2-HYDROXYMUCONIC SEMIALDEHYDE HYDROLASE (EC
3.1 .1.-).
17 LG: 1 100661.1 :2000SEP08 g 189410 1. OOE-105 Human oxytocin mRNA, complete eds.
18 LG:475856.1 :2000SEP08 g 13623353 2.00E-25 Homo sapiens, Similar to zinc finger protein 136 (clone pHZ-20), clone MGC: 10647, mRNA, complete eds.
19 LG: 1015343.1 :2000SEP08 g342298 1. OOE-1 17 M.fascicularis somatostatin I mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
20 LG: 1400575.1 :2000SEP08 g 13623353 4.00E-55 Homo sapiens, Similar to zinc finger protein 136 (clone pHZ-20), clone MGC: 10647, mRNA, complete eds.
21 LG: 1080545.1 :2000SEP08 g 12052982 1.00E-88 Homo sapiens mRNA; cDNA DKFZp434ll 610 (from clone
DKFZp434llόl0); complete eds.
22 LG:213947.1 :2000SEP08 g7262613 1. OOE-12 (fl) (Homo sapiens) candidate taste receptor T2R7
23 LI:720641.1 :2000SEP08 gl419016 2.00E-69 (fl) (Mus musculus) odorant receptor
24 LI :1023894.1 :2000SEP08 g7158201 1.00E-43 (fl) (Rattus norvegicus) cytokine receptor-like protein CYRL
25 U:734904.1 :2000SEP08 gl 2276181 0 Homo sapiens FKSG35 (FKSG35) mRNA, complete eds.
26 LI: 1 1781 18.1 :2000SEP08 g7239175 2.00E-59 Homo sapiens vanilloid receptor gene, partial sequence; CARKL and CTNS genes, complete eds; TIP1 gene, partial eds; P2X5b and P2X5a genes, complete eds; and HUMINAE gene, partial
27 U:213947.1 :2000SEP08 g7262613 8.00E-13 (fl) (Homo sapiens) candidate taste receptor T2R7
28 LG:407304.1 :2000SEP08 g29924 3. OOE-12 H.sapiens DNA for cGMP phosphodiesterase (exons 4-22). 0 29 LG:337358.1 :2000SEP08 g9857401 0 Homo sapiens tumor endothelial marker 2 (TEM2) mRNA, -- complete eds.
30 LG:986090.1 :2000SEP08 g 189952 8.00E-10 Human phospholipase A2 mRNA, complete eds.
31 LG:123250.1 :2000SEP08 g4225847 3.00E-07 Homo sapiens calcium- and diacylglycerol-regulated guanine nucleotide exchange factor I (CalDAG-GEFl) mRNA, complete
32 LG:1028774.2:2000SEP08 g4885696 0 (5' incom) (Homo sapiens) unknown
33 LG;338927.6:2000SEP08 g7209308 5.00E-15 Homo sapiens mRNA for FLJ00004 protein, partial eds.
34 LG:332944.2:2000SEP08 g5823454 1.00E-75 (fl) (Homo sapiens) GTPase-activating protein 6 isoform 4
35 LI:347174.5:2000SEP08 g 1457955 4.00E-07 Human small GTP-binding protein rab30.
36 LI:477070.1 :2000SEP08 g 169931 5.00E-80 (fl) (Glycine max) Glycine max calcium dependent protein kinase mRNA
37 LI:723144.1 :2000SEP08 g! 78163 1.00E-44 Human ADP-ribosylation factor 1 mRNA, complete eds.
38 LI:1007188.1 :2000SEP08 g 190039 5.00E-10 Homo sapiens phospholipase C-beta-2 mRNA, complete eds.
39 LI:1024412.1 :2000SEP08 g3149953 5.00E-31 Homo sapiens mRNA for G-protein gamma 7, complete eds.
40 L1:284797.3:2000SEP08 gl 3183337 0 Homo sapiens calneuron 1 (CALN1) mRNA, complete eds.
41 LI:1092901.1 :2000SEP08 g7717240 1.00E-41 Homo sapiens chromosome 21 segment HS21 C001.
42 LI:228930.1 :2000SEP08 g9438228 0 Homo sapiens phospholipase C beta 1 mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
43 LI:722913.1 :2000SEP08 g3450893 1. OOE-107 (fl) (Avena fatua) ras-like small monomeric GTP-binding protein
44 LG:457478.1 :2000SEP08 g 13279025 1. OOE-108 Homo sapiens, Similar to chromobox homolog 2 (Drosophila Pc class), clone MGC: 10561, mRNA, complete eds.
45 LG:358719.1 :2000SEP08 g9652377 0 Homo sapiens chromosome 13, partial sequence and human adenovirus type 5 El A nucleoprotein gene, partial eds.
46 LG:105160.5:2000SEP08 g286024 4.00E-06 Human mRNA for transcription factor, E4TF1-53, complete eds.
47 LG:400705.1 :2000SEP08 g488286 2.00E-48 Human basic helix-loop-helix transcription factor mRNA, complete eds.
48 LG;221977.1 :2000SEP08 g 188909 2.00E-89 Human t(8;17) BCL3/myc gene translocation.
49 LG:898771.1 :2000SEP08 g 14042881 2.00E-52 Homo sapiens cDNA FLJ14977 fis, clone THYRO1001809, highly similar to MYOCYTE NUCLEAR FACTOR.
50 LI:457478.1 :2000SEP08 g 13279025 1. OOE-108 Homo sapiens, Similar to chromobox homolog 2 (Drosophila Pc class), clone MGC: 10561, mRNA, complete eds.
51 LI:125140.1 :2000SEP08 g4128144 3.00E-32 Homo sapiens RP58 gene, complete CDS.
52 LI:021095.2:2000SEP08 g37057 1.00E-70 Human mRNA for general transcription factor IIB.
53 LI:888730.1 :2000SEP08 g 10439413 1. OOE-102 Homo sapiens cDNA: FLJ22881 fis, clone KAT03571, highly similar to HUMFERL Human ferritin L chain mRNA.
54 L1:358719.1 :2000SEP08 g9652377 1 . OOE-142 Homo sapiens chromosome 13, partial sequence and human adenovirus type 5 El A nucleoprotein gene, partial eds.
55 U:351342.3:2000SEP08 g 14042881 0 Homo sapiens cDNA FLJ 14977 fis, clone THYRO1001809, highly similar to MYOCYTE NUCLEAR FACTOR.
56 LI:256099.2:2000SEP08 g 1561727 1. OOE-148 Human transcription factor RTEF-1 (RTEF1) mRNA, complete eds.
57 U:2051991.1 :2000SEP08 g2275152 1. OOE-1 16 Homo sapiens DNA-binding protein mRNA, complete eds.
58 LG:980769.1 :2000SEP08 g 14330447 0 Homo sapiens mRNA for zinc finger protein RINZF (RINZF gene).
59 LG:332474.3:2000SEP08 g9280077 2.00E-60 Macaca fascicularis brain cDNA, clone:QccE-14453.
60 LG: 1087707.1 :2000SEP08 g347905 3.00E-41 Human zinc finger protein (ZNF141) mRNA, complete eds.
61 LG:415349.1 :2000SEP08 g2244657 1.00E-43 H. sapiens DNA fragment located on chromosome Xq24 containing CpG islands.
62 LG:132420.2:2000SEP08 g!514586 1 .00E-28 H. sapiens pseudogene for kruppel-like protein (ZNF75b).
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
63 LG:394201.1 :2000SEP08 g2306771 2.00E-10 Human zinc finger protein (LD5-1) gene, exons 4, 5 and 6, and complete eds.
64 LG; 1060884.1 :2000SEP08 g340443 1.00E-05 Human zinc finger protein 41 (ZNF41) gene, 3' end.
65 LG:242191.1 :2000SEP08 g 13097776 3.00E-14 Homo sapiens, clone IMAGE:3355405, mRNA, partial eds.
66 LG:1063762.3:2000SEP08 g7022522 2.00E-27 Homo sapiens cDNA FLJ 10469 fis, clone NT2RP2000008, weakly similar to ZINC FINGER PROTEIN 84.
67 LG: 1 100856.1 :2000SEP08 g2190183 1.OOE-81 Homo sapiens mRNA for zinc finger protein, complete eds.
68 LG:979390.2:2000SEP08 g 12804322 5.00E-09 Homo sapiens, clone MGC:4054, mRNA, complete eds.
69 LG:1400447.1 :2000SEP08 g487782 4.00E-23 Human zinc finger protein ZNF133.
70 LG: 1400562.1 :2000SEP08 g454818 1. OOE-135 Human Krueppel-related DNA-binding protein (PF4) mRNA, 5'
71 LG: 1076130.1 :2000SEP08 g5262556 1. OOE-14 Homo sapiens mRNA; cDNA DKFZp569D2231 (from clone
DKFZp569D2231); partial eds.
72 LG: 1064459.1 :2000SEP08 g 13938260 5.00E-21 Homo sapiens, clone MGC: 15514, mRNA, complete eds.
73 LG: 1079415.14:2000SEP08 g 1049300 3.00E-41 Human KRAB zinc finger protein (ZNF177) mRNA, complete eds.
∞ 74 LG:1329431.3:2000SEP08 g4164082 1. OOE-155 Homo sapiens zinc finger protein EZNF (EZNF) mRNA, complete o eds.
75 LG:1088431.2:2000SEP08 g 10439208 1. OOE-166 Homo sapiens cDNA: FLJ22713 fis, clone HSI13536.
76 LG: 1329462.2:2000SEP08 g 10439974 0 Homo sapiens cDNA: FLJ23327 fis, clone HEP12630, highly similar to HSZNF37 Homo sapiens ZNF37A mRNA for zinc finger protein.
77 LI:393468.1 :2000SEP08 g2618752 1. OOE-1 12 (fl) (Takifugu rubripes) zinc finger protein
78 U:722577.1 :2000SEP08 g5107180 3.00E-31 (fl) (Lycopersicon esculentum) small zinc finger-like protein
79 LI:322783.16:2000SEP08 g 12053166 0 Homo sapiens mRNA; cDNA DKFZp43401427 (from clone
DKFZp43401427); complete eds.
80 LI:901355.2:2000SEP08 g8163823 3.00E-79 Homo sapiens krueppel-like zinc finger protein HZF2 mRNA, complete eds.
81 LI:038859.2:2000SEP08 g 14330447 0 Homo sapiens mRNA for zinc finger protein RINZF (RINZF gene).
82 U:1046117.1 :2000SEP08 g7023215 8.00E-12 Homo sapiens cDNA FLJ 10891 fis, clone NT2RP4002078, weakly similar to ZINC FINGER PROTEIN 91.
83 LI:801015.1 :2000SEP08 gl3623353 6.00E-42 Homo sapiens, Similar to zinc finger protein 136 (clone pHZ-20), clone MGC: 10647, mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
84 LI : 1 175590.1 :2000SEP08 g10432937 0 Homo sapiens cDNA FLJ 1 1637 fis, clone HEMBA1004321, weakly similar to ZINC FINGER PROTEIN 184.
85 LI:1 170585.2:2000SEP08 g10439974 2.00E-10 Homo sapiens cDNA: FLJ23327 fis, clone HEP12630, highly similar to HSZNF37 Homo sapiens ZNF37A mRNA for zinc finger protein.
86 LI:719531.2:2000SEP08 g6650686 7.00E-47 Homo sapiens Y-linked zinc finger protein (ZFY) gene, complete
87 LI;794623.1:2000SEP08 gl2804414 3.00E-93 Homo sapiens, Similar to hypothetical protein FLJ10891, clone
MGC:925, mRNA, complete eds.
88 Ll:l 173119.1 :2000SEP08 g7959276 2.00E-32 Homo sapiens mRNA for KIAA1508 protein, partial eds.
89 LI:1093285.1:2000SEP08 g10436361 2.00E-62 Homo sapiens cDNA FLJ 14012 fis, clone Y79AA 1002482, moderately similar to ZINC FINGER PROTEIN 91 .
90 LI:1091881.1:2000SEP08 g9802036 1.OOE-129 Homo sapiens zinc finger protein SBZF3 mRNA, complete eds.
91 U:1091617.1:2000SEP08 g5730195 1.00E-21 Homo sapiens partial gene encoding novel Kruppel-type zinc finger, exon 1.
92 LI:1082344.1:2000SEP08 g340473 2.00E-50 Homo sapiens DNA-binding protein (ZNF) gene, partial eds.
93 LI:1166249.1 :2000SEP08 g186773 1.OOE-100 Human Kruppel related zinc finger protein (HTF10) mRNA, complete eds.
94 LI:799675.1 :2000SEP08 g10434780 1.00E-32 Homo sapiens cDNA FLJ 12985 fis, clone NT2RP3000050, moderately similar to ZINC FINGER PROTEIN 91.
95 LI:1 178899.1 :2000SEP08 g10439974 0 Homo sapiens cDNA: FLJ23327 fis, clone HEP! 2630, highly similar to HSZNF37 Homo sapiens ZNF37A mRNA for zinc finger protein.
96 U;1 169241 .1 :2000SEP08 g9502201 0 Homo sapiens endothelial zinc finger protein induced by tumor necrosis factor alpha (EZF1T) mRNA, complete eds.
97 LI: 1 180090.1 :2000SEP08 g10433741 1.00E-24 Homo sapiens cDNA FLJ 12298 fis, clone MAMMA1001837, weakly similar to ZINC FINGER PROTEIN 29.
98 U:2049322.1 :2000SEP08 g10436361 3.00E-76 Homo sapiens cDNA FLJ 14012 fis, clone Y79AA1002482, moderately similar to ZINC FINGER PROTEIN 91.
99 LI:809074.1 :2000SEP08 g10434194 1.00E-175 Homo sapiens cDNA FLJ 12606 fis, clone NT2RM4001483, moderately similar to ZINC FINGER PROTEIN 136.
100 LI:805158.1 ;2000SEP08 g10437946 2.00E-53 Homo sapiens cDNA: FLJ21782 fis, clone HEP00266, highly similar to AF1 18063 Homo sapiens PRO 1400 mRNA.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
101 Ll:l 172697.1 :2000SEP08 g 10435737 0 Homo sapiens cDNA FLJ 13659 fis, clone PLACEIOI 1576, moderately similar to Human Kruppel related zinc finger protein
(HTFlO) mRNA.
102 LI:1174107.2:2000SEP08 g 13623586 Homo sapiens. Similar to zinc finger protein 254, clone
MGC: 10544, mRNA, complete eds.
103 U:1177434.2:2000SEP08 g9968289 1.00E-43 Homo sapiens mRNA for zinc finger protein (ZNF304 gene).
104 LI:1184255.1 :2000SEP08 g10436605 2.00E-25 Homo sapiens cDNA FLJ 14206 fis, clone NT2RP3003157.
105 LI:1164555.1 :2000SEP08 g13938350 9.00E-64 Homo sapiens. Similar to zinc finger protein 268, clone
IMAGE:3352268, mRNA, partial eds.
106 LI:238666.4:2000SEP08 g7022522 2.00E-27 Homo sapiens cDNA FLJ 10469 fis, clone NT2RP2000008, weakly similar to ZINC FINGER PROTEIN 84.
107 LI:1166752.1 :2000SEP08 g10434046 9.00E-09 Homo sapiens cDNA FLJ12515 fis, clone NT2RM2001771, moderately similar to ZINC FINGER PROTEIN 135.
108 U:2049654.1:2000SEP08 g487782 4.00E-23 Human zinc finger protein ZNF133.
109 LI:242665.2:2000SEP08 g6650686 5.00E-16 Homo sapiens Y-linked zinc finger protein (ZFY) gene, complete no LI:208637.1:2000SEP08 g2244657 5.00E-43 H. sapiens DNA fragment located on chromosome Xq24 containing CpG islands. in Ll:2051808.1 :2000SEP08 g1399027 1.OOE-175 Human cysteine-rich protein 2 (hCRP2) mRNA, complete eds.
112 LI:117513ό.l:2000SEP08 g12803656 1.00E-47 Homo sapiens, Similar to zinc finger protein homologous to mouse Zfp93, clone MGC:3594, mRNA, complete eds.
113 U:ll 77337.1 :2000SEP08 g10435737 1.OOE-147 Homo sapiens cDNA FLJ 13659 fis, clone PLACEIOI 1576, moderately similar to Human Kruppel related zinc finger protein
(HTFlO) mRNA.
114 LI:1165056.1:2000SEP08 g10436361 1.00E-65 Homo sapiens cDNA FLJ 14012 fis, clone Y79AA 1002482, moderately similar to ZINC FINGER PROTEIN 91.
115 LI:1175250.1 :2000SEP08 gl0434194 0 Homo sapiens cDNA FLJ 12606 fis, clone NT2RM4001483, moderately similar to ZINC FINGER PROTEIN 136.
116 LI:1183192.1:2000SEP08 g13938260 8.00E-21 Homo sapiens, clone MGC: 15514, mRNA, complete eds.
117 LI:1183325.1 :2000SEP08 g10435640 0 Homo sapiens cDNA FLJ 13590 fis, clone PLACE 1009398, moderately similar to ZINC FINGER PROTEIN 135.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
1 18 LI:1 178269.2:2000SEP08 g4753759 5.00E-16 Homo sapiens OZF gene exon 1.
1 19 LI:813422.1 :2000SEP08 g 10439208 2.00E-83 Homo sapiens cDNA: FLJ22713 fis, clone HSI13536.
120 LI:1093049.6:2000SEP08 g881563 4.00E-13 Human zinc finger containing protein ZNF157 (ZNF157) mRNA, complete eds.
121 U:202192.4:2000SEP08 gl 399185 7.00E-10 (fl) (Gallus gallus) zinc finger 5 protein
122 LG: 1041854.1 :2000SEP08 g 12803168 2.00E-69 Homo sapiens, ATP synthase, H+ transporting, mitochondrial Fl complex, delta subunit, clone MGC:8347, mRNA, complete eds.
123 LG:1 100502.1 :2000SEP08 g3659900 2.00E-92 Homo sapiens Fl FO-type ATP synthase subunit g mRNA, complete eds.
124 LI:726414.1 :2000SEP08 g3319340 4.00E-68 (fl) (Arabidopsis thaliana) contains similarity to E. coli cation transport protein ChaC (GB:D90756)
125 LI:400517.4:2000SEP08 g 10439793 0 Homo sapiens cDNA: FLJ23188 fis, clone LNG12038.
126 U:1078917.1 :2000SEP08 g339468 9.00E-29 Human transferrin mRNA, 3' end.
127 U:1012560.1 :2000SEP08 g9957541 0 Homo sapiens connexin 59 (CX59) gene, complete eds.
128 LI:427997.4:2000SEP08 g6996442 1.00E-58 (fl) (Homo sapiens) CTL1 protein 129 LI :197899.1 :2000SEP08 g 14042128 0 Homo sapiens cDNA FLJ 14541 fis, clone NT2RM2001499, moderately similar to LOW-AFFINITY CATIONIC AMINO ACID
TRANSPORTER-2.
130 LG:334199.1 :2000SEP08 g 14017846 7.00E-39 Homo sapiens mRNA for KIAA1815 protein, partial eds.
131 LG:334345.1 :2000SEP08 g 14336735 4.00E-08 Homo sapiens 16pl 3.3 sequence section 5 of 8.
132 LG:228092.1 :2000SEP08 g 135291 12 0 Homo sapiens, clone IMAGE:3930327, mRNA, partial eds.
133 LG:098580.1 :2000SEP08 g4096351 3.00E-09 Human apoptotic cysteine protease Mihl /TX isoform delta
(mihl /Tx) mRNA, complete eds.
134 LG:969572.1 :2000SEP08 g 177889 2.00E-07 Human alpha-2-thiol proteinase inhibitor mRNA, complete coding sequence.
135 LG: 196958.1 :2000SEP08 gl 182066 1.00E-07 Human tryptase mRNA, complete eds.
136 LG: 108781 1.1 :2000SEP08 g2565302 1. OOE-130 Macaca mulatta cyclophilin A mRNA, complete eds.
137 LG: 1327885.1 :2000SEP08 gl 3182746 0 Homo sapiens microsomal signal peptidase subunit mRNA, complete eds.
138 LI:449393.1 :2000SEP08 g 14348899 2.00E-07 Homo sapiens heat shock protein mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
139 LI:89761ό. l :2000SEP08 g337369 1.OOE-103 Human rapamycin- and FK50ό-binding protein, complete eds.
140 LI:736860.1 :2000SEP08 g475922 1.OOE-24 (fl) (Zea mays) proteinase inhibitor
141 LI:027066.6:2000SEP08 g10432866 2.00E-95 Homo sapiens cDNA FLJ 1 1583 fis, clone HEMBA1003680, weakly similar to PUTATIVE AMINOPEPTIDASE ZK353.6 IN CHROMOSOME
III (EC 3.4.1 1.-).
142 LI : 1074263.1 :2000SEP08 g2281120 6.00E-38 Saimiri sciureus cystatin C mRNA, complete eds.
143 LI:334345.1 :2000SEP08 g14336735 4.00E-08 Homo sapiens 16pl3.3 sequence section 5 of 8.
144 LI:1093914.1 :2000SEP08 g5926696 1.OOE-175 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 8/20.
145 LI: 1 188168.1 :2000SEP08 g10503945 0 Homo sapiens calpain-like protease CAPNIOe mRNA, complete
146 LI:1065168.1 :2000SEP08 g12653472 6.00E-90 Homo sapiens, proteasome (prosome, macropain) subunit, beta type, 1, clone MGC:8505, mRNA, complete eds.
147 U:1 180418.1 :2000SEP08 g2565300 1.OOE-130 Cercopithecus aethiops cyclophilin A mRNA, complete eds.
148 LG:232648.1 :2000SEP08 g14336723 0 Homo sapiens 16pl3.3 sequence section 4 of 8.
149 LG:1078420.1 :2000SEP08 g1263080 1.OOE-145 Human mariner! transposase gene, complete consensus
150 LG:1397599.1:2000SEP08 g2104909 5.00E-97 Human endogenous retrovirus H Dl leader region/integrase- derived ORFl, ORF2, and putative envelope protein mRNA,
151 LG:1397655.2:2000SEP08 g2104909 1.OOE-101 Human endogenous retrovirus H Dl leader region/integrase- derived ORFl, ORF2, and putative envelope protein mRNA,
152 LG:241055.1 :2000SEP08 g1263080 1.OOE-173 Human marinerl transposase gene, complete consensus
153 LG: 1 101065.1 :2000SEP08 g2226003 9.00E-44 Human Tiggerl transposable element, complete consensus sequence.
154 LG:475629.1 :2000SEP08 g4185l40 1.00E-39 (fl) (Arabidopsis thaliana) putative small nuclear
155 U:348991.1 :2000SEP08 g31394 3.00E-80 Human humFib mRNA for fibrillarin.
156 LI:475629.1 :2000SEP08 g4185140 1.00E-33 (fl) (Arabidopsis thaliana) putative small nuclear
157 LI:261331.1 :2000SEP08 g1263080 3.00E-88 Human marinerl transposase gene, complete consensus
158 LI:815686.1 :2000SEP08 g1698454 1.00E-150 Human mariner2 transposable element, complete consensus sequence.
159 LI:1 167327.2:2000SEP08 g2104909 3.00E-83 Human endogenous retrovirus H Dl leader region/integrase- derived ORFl, ORF2, and putative envelope protein mRNA,
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
160 U:758009.3:2000SEP08 g9650710 0 Homo sapiens mRNA for HEF like Protein (HEFL gene).
161 LG:331593.1 :2000SEP08 g4106980 6.00E-1 1 (fl) (Homo sapiens) immunoglobulin-like transcript 10 protein
162 U: 1094174.1 :2000SEP08 g3273727 4.00E-60 Homo sapiens MHC class 1 region.
163 LI:814362.1 :2000SEP08 g261239 1 . OOE-149 immunoglobulin M light chain V region=anti-lipid A antibody
(human, hybridoma cell line HR78, mRNA Partial, 460 nt).
164 U:219542.1 :2000SEP08 g30150 1. OOE-19 H.sapiens coxVllb mRNA for cytochrome c oxidase subunit Vllb.
165 U:726197.1 :2000SEP08 g21 14207 1.00E-48 (fl) (Oryza sativa) glutaredoxin
166 LI:1075314.1 :2000SEP08 g2588778 1 . OOE-100 Homo sapiens mRNA for cytochrome b large subunit of complex
II, complete eds.
167 LI:437883.1 :2000SEP08 g986883 1. OOE-146 Human nuclear-encoded mitochondrial NADH-ubiquinone reductase 24Kd subunit mRNA, complete eds.
168 LG:336265.1 :2000SEP08 g 159649 2.00E-43 (fl) (Ascaris suum) putative
169 LG:407788.2:2000SEP08 g 12052773 0 Homo sapiens mRNA; cDNA DKFZp564B052 (from clone
DKFZp564B052); complete eds.
170 LG:1326925.1 :2000SEP08 g2853300 1 . OOE-135 Homo sapiens mucin (MUC3) mRNA, partial eds.
171 LI:332655.2:2000SEP08 g6996452 3.00E-07 Homo sapiens SPP2 gene for secreted phosphoprotein 24 precursor, exons 1-8.
172 U: 1 184621.4:2000SEP08 g 12052773 3.00E-62 Homo sapiens mRNA; cDNA DKFZp564B052 (from clone
DKFZp564B052); complete eds.
173 U:2051386.1 :2000SEP08 g3228236 4.00E-49 Homo sapiens UHS KerB gene.
174 LG:362757.1 :2000SEP08 gl419370 7.00E-74 (fl) (Zea mays) actin depolymerizing factor
175 LG:406770.1 :2000SEP08 g508483 3.00E-48 Homo sapiens GATA-4 mRNA, complete eds.
176 LG: 1094640.1 :2000SEP08 gl 3249136 0 Homo sapiens chromosome 2 unknown sequence.
177 LG:001929.1 :2000SEP08 g908802 1.00E-44 Homo sapiens keratin 6 isoform K6e (KRT6E) mRNA, complete
178 U:401322.1 :2000SEP08 g 13623540 6.00E-23 Homo sapiens, tubulin alpha 1, clone MGC: 12832, mRNA, complete eds.
179 LI:208748.1 :2000SEP08 g 10433083 0 Homo sapiens cDNA FLJ 1 1756 fis, clone HEMBA1005595, weakly ' similar to DYNEIN HEAVY CHAIN, CYTOSOLIC.
180 LI:407242.1 :2000SEP08 g2282582 4.00E-71 (fl) (Mus musculus) actin-binding protein
181 LI:403409.1 :2000SEP08 g8896163 0 Homo sapiens kinesin-Iike protein GAKIN mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
182 U:450798.1:2000SEP08 g12804324 2.00E-15 Homo sapiens, clone IMAGE:2823044, mRNA, partial eds.
183 LI:410317.1:2000SEP08 g186699 1.00E-53 Human 56k cytoskeletal type II keratin mRNA.
184 11:340268.1 :2000SEP08 g12653054 4.00E-68 Homo sapiens, actin, gamma 1, clone MGC:8332, mRNA, complete eds.
185 LI:2051671.1:2000SEP08 g63805 1.00E-38 (fl) (Gallus gallus) tensin
186 LG:998844.1:2000SEP08 g7259234 3.00E-22 (fl) (Mus musculus) contains transmembrane (TM) region
187 LG:1043787.1:2000SEP08 g311699 1.OOE-156 H.sapiens GPx-4 mRNA for phospholipid hydroperoxide glutathione peroxidase.
188 LG:1098931.16:2000SEP08 g2138329 4.00E-48 Human acetyl-CoA carboxylase (ACC2) mRNA, complete eds.
189 LG:199423.2:2000SEP08 g4520343 1.OOE-12 Homo sapiens mRNA for N-copine, complete eds.
190 LI:1075297.1:2000SEP08 g13543530 1.OOE-36 Homo sapiens, microsomal glutathione S-transferase 1 , clone
MGC: 14525, mRNA, complete eds.
191 LI:1043321.1:2000SEP08 g546517 3.00E-71 stearoyl-CoA desaturase (human, adipose tissue, mRNA Partial,
712 nt) . 192 U:297070.1:2000SEP08 g307297 8.00E-26 Human I beta 1 -6 N-acetylglucosaminyltransferase mRNA, complete eds.
193 LI:1085041.1:2000SEP08 g13543567 8.00E-34 Homo sapiens, prostaglandin D2 synthase (21 kD, brain), clone
MGC; 14559, mRNA, complete eds.
194 LI:1071544.1:2000SEP08 g35069 1.OOE-161 H.sapiens RNA for nm23-H2 gene.
195 U:2Q52480.1:2000SEP08 g10439273 0 Homo sapiens cDNA: FLJ22761 fis, clone KAIA0893.
196 LG:450105.1:2000SEP08 g414348 8.00E-09 Human homolog of yeast ribosomal protein S28, complete eds.
197 LG:450581.1:2000SEP08 g2739219 7.00E-28 (fl) (Hordeum vulgare) rpS28
198 LG:450887.1:2000SEP08 g7629994 1.00E-40 (fl) (Arabidopsis thaliana) 60S RIBOSOMAL PROTEIN L36 homolog
199 LG:460809.1:2000SEP08 g36129 5.00E-55 Human mRNA for ribosomal protein L31.
200 LG:452089.1:2000SEP08 g7340874 4.00E-88 (fl) (Oryza sativa) ESTs
D15590(C0900),D48950(S15542),D22684(C0900) correspond to a region of the predicted gene. -Similar to Arabidopsis thaliana
60S ribosomal protein LI 1 A (L16A). (P42795)
201 LG:1099416.1:2000SEP08 g292440 1.00E-85 Homo sapiens ribosomal protein L37 mRNA, complete eds.
202 LG:255713.1:2000SEP08 g36129 3.00E-51 Human mRNA for ribosomal protein L31 .
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation 203 LG:998903.1 :2000SEP08 g5106775 4.00E-69 (fl) (Hordeum vulgare)' ribosomal protein SI 2 204 LG: 1 1 19656.1 :2000SEP08 g 1220360 6.00E-13 Homo sapiens (clone cori-1 cl5) S29 ribosomal protein mRNA, complete eds.
205 LG:1096907.1 :2000SEP08 g292438 8.00E-98 Homo sapiens ribosomal protein L37a (RPL37A) mRNA, complete eds.
206 LG:1323741.1 :2000SEP08 g562073 1.00E-127 Human ribosomal protein L35 mRNA, complete eds. 207 LG: 1098372.1 :2000SEP08 g31061 1. OOE-121 Human mRNA for Epstein-Barr virus small RNAs (EBERs)associated protein (EAP).
208 LG: 1006783.1 :2000SEP08 g550014 1. OOE-161 Human ribosomal protein L21 mRNA, complete eds. 209 LG: 1097562.1 :2000SEP08 g488414 1. OOE-148 H.sapiens mRNA for ribosomal protein L30. 210 LG:998868.1 :2000SEP08 g483431 1. OOE-134 (fl) (Oryza sativa) cyc07 211 LG:1063383.1 :2000SEP08 g36131 8.00E-72 Human mRNA for ribosomal protein L32. 212 LG:1400567.1 :2000SEP08 g505472 1 . OOE-107 H.sapiens mRNA for ribosomal protein L31. 213 U:449404.1 :2000SEP08 g 13905003 1.00E-29 Homo sapiens, ribosomal protein SI 4, clone MGC:5275, mRNA, complete eds.
214 LI:449941.2:2000SEP08 g968902 1. OOE-106 (fl) (Oryza sativa) ribosomal protein S8 215 LI:450229.1 :2000SEP08 g4588906 7.00E-97 (fl) (Secale cereale) ribosomal protein S7 216 U:450399.3:2000SEP08 g36125 2.00E-10 Human mRNA for ribosomal protein LI 7. 217 LI:455771 .1 :2000SEP08 g414348 2.00E-19 Human homolog of yeast ribosomal protein S28, complete eds. 218 LI:720459.1 :2000SEP08 g 13278716 1 . OOE-130 Homo sapiens, ribosomal protein L6, clone MGC: 1635, mRNA, complete eds.
219 U:723156.1 :2000SEP08 g915313 3.00E-53 (fl) (Nicotiana glutinosa) ribosomal protein L31 220 U:728055.1 :2000SEP08 g 140441 15 1 . OOE-140 Homo sapiens, ribosomal protein SI 6, clone MGC: 15283, mRNA, complete eds.
221 LI :1020789.1 :2000SEP08 g38422 1. OOE-146 Homo sapiens mRNA for ribosomal protein SI 8. 222 LI:1071728.1 :2000SEP08 g 13960132 1. OOE-139 Homo sapiens, ribosomal protein S20, clone MGC:4151, mRNA, complete eds.
223 LI: 1084329.1 :2000SEP08 g 14043190 1 . OOE-156 Homo sapiens, clone MGC: 15572, mRNA, complete eds. 224 LI:246422.1 :2000SEP08 g409069 1. OOE-136 Human mRNA for HBpl5/L22, complete eds. 225 LI :1086066.1 :2000SEP08 g550020 1. OOE-170 Human ribosomal protein S5 mRNA, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation 226 11:223142.1 :2000SEP08 g 12653460 2.00E-20 Homo sapiens, ribosomal protein LI 7, clone MGC:8457, mRNA, complete eds.
227 U:88536β.l :2000SEP08 g2150130 1.00E-66 (fl) (Arabidopsis thaliana) cytoplasmic ribosomal protein SI 5a 228 U:481782.1 :2000SEP08 g41 15717 8.00E-06 Chlorocebus aethiops mRNA for ribosomal protein S4X, complete eds.
229 U: 1093813.1 :2000SEP08 g562073 1 . OOE-125 Human ribosomal protein L35 mRNA, complete eds. 230 LI:449413.2:2000SEP08 g414348 2.00E-15 Human homolog of yeast ribosomal protein S28, complete eds. 231 LI:450105.1 :2000SEP08 g414348 3.00E-08 Human homolog of yeast ribosomal protein S28, complete eds. 232 11:814285.1 :2000SEP08 g 14250761 2.00E-59 Homo sapiens, clone MGC: 14308, mRNA, complete eds, 233 LI: 1 142855.1 :2000SEP08 g 10439463 1. OOE-146 Homo sapiens cDNA: FLJ22926 fis, clone KAT06984, highly similar to HUMPPARP1 Human acidic ribosomal phosphoprotein PI
234 LI :817330.1 :2000SEP08 g5106775 8.00E-28 (fl) (Hordeum vulgare) ribosomal protein SI 2
235 LI:817845.1 :2000SEP08 g9759463 3.00E-60 (fl) (Arabidopsis thaliana) 40S ribosomal protein SI 9
236 LI:460809.1 :2000SEP08 g505472 7.00E-55 H.sapiens mRNA for ribosomal protein L31. 237 Ll:815874.1 :2000SEP08 g292442 1 . OOE-144 Homo sapiens ribosomal protein S20 (RPS20) mRNA, complete eds.
238 U:255713.1 :2000SEP08 g36129 4.00E-52 Human mRNA for ribosomal protein L31. 239 LI:035973.1 '.2000SEP08 g292440 7.00E-66 Homo sapiens ribosomal protein L37 mRNA, complete eds. 240 U:l 1381 10.1 :2000SEP08 g 1220360 6.00E-13 Homo sapiens (clone cori-lclδ) S29 ribosomal protein mRNA, complete eds.
241 LI:2049074.1 :2000SEP08 g 1393841 1 3.00E-95 Homo sapiens, ribosomal protein LI 3, clone MGC: 15415, mRNA, complete eds.
242 U: 1092460.1 :2000SEP08 g 12652698 1.00E-154 Homo sapiens, purine-rich element binding protein B, clone
MGC: 1947, mRNA, complete eds.
243 U:399421.1 :2000SEP08 g 1568556 1.00E-35 H.sapiens H2B/I gene. 244 LI:816655.2:2000SEP08 g 10439444 1. OOE-127 Homo sapiens cDNA: FLJ22909 fis, clone KAT05694, highly similar to HUMHMG17 Human non-histone chromosomal protein HMG-
17 mRNA.
245 LG:414732.1 :2000SEP08 g 183232 1.00E-140 Human beta-glucuronidase mRNA, complete eds. 246 LG: 1 140250.1 :2000SEP08 g8101070 4.00E-91 Homo sapiens golgin-like protein (GLP) gene, complete eds.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation
247 LG: 174022.1 :2000SEP08 g2906145 1. OOE-160 Homo sapiens malate dehydrogenase precursor (MDH) mRNA, nuclear gene encoding mitochondrial protein, complete eds.
248 LI:00281 1.1 :2000SEP08 g5702305 7.00E-66 Homo sapiens vault protein mRNA, complete eds.
249 LI:414732.2:2000SEP08 g 183232 1. OOE-142 Human beta-glucuronidase mRNA, complete eds.
250 LI:1019920.1 :2000SEP08 g4164451 6.00E-71 Homo sapiens NADH-ubiquinone oxidoreductase B14.5A subunit mRNA, nuclear gene encoding mitochondrial protein.
251 LI :1038336.1 :2000SEP08 g4164443 2.00E-24 Homo sapiens NADH:ubiquinone oxidoreductase B9 subunit mRNA, nuclear gene encoding mitochondrial protein,
252 LI: 1 177772.1 1 :2000SEP08 g721 1437 1. OOE-141 Homo sapiens golgin-67 (GOLGA5) mRNA, complete eds.
253 LI:205642.2:2000SEP08 g4510363 3.00E-71 (fl) (Arabidopsis thaliana) putative ubiquitin-conjugating enzyme
254 LG:449685.1 :2000SEP08 g724841 1 8.00E-38 (f I) (Oryza sativa) ESTs C99632(E20954),C99633(E20954) correspond to a region of the predicted gene. -Similar to
Arabidopsis thaliana putative pathogenesis-related protein
255 LG:453922.1 :2000SEP08 g3789950 2.00E-55 (fl) (Oryza sativa) translation initiation factor
256 LG:476342.3:2000SEP08 g790641 3.00E-25 (fl) (Hordeum vulgare) gamma-thionin
257 LI:336801 .1 :2000SEP08 g5912457 1.00E-86 (fl) (Homo sapiens) dJ1068E13,2 (novel protein similar to bovine
SCP2 (Sterol Carrier Protein 2) and part of HSD17B4
(hydroxysteroid (17-beta) dehydrogenase 4))
258 LI:449685.1 :2000SEP08 g724841 1 9.00E-33 (f I) (Oryza sativa) ESTs C99632(E20954),C99633(E20954) correspond to a region of the predicted gene. -Similar to
Arabidopsis thaliana putative pathogenesis-related protein
259 LI:476342.1 :2000SEP08 g790641 2.00E-25 (fl) (Hordeum vulgare) gamma-thionin
260 LI :1072804.1 :2000SEP08 g453189 2.00E-58 (f I) (Zea mays) acyl carrier protein
261 LI:455450.1 :2000SEP08 g41051 1 1 2.00E-43 (fl) (Hordeum vulgare) dehydrin 6
262 LI :1073699.1 :2000SEP08 g219661 1 .00E-43 Human mRNA for growth inhibitory factor.
263 U:1013729.1 :2000SEP08 g 182353 6.00E-41 Human fatty acid binding protein homologue (PA-FABP) mRNA, complete eds.
264 LI:2050322.2:2000SEP08 g 10439685 1. OOE-1 15 Homo sapiens cDNA: FLJ23107 fis, clone LNG07738.
265 Ll:891327.1 :2000SEP08 g8570523 0 Homo sapiens genomic DNA, chromosome Iq22-q23, CD1 region, section 3/4.
Table 2
SEQ ID NO: Template ID Gl Number Probability Score Annotation 266 LI:2053076.1 :2000SEP08 g3879684 7.00E-20 (fl) (Caenorhabditis elegans) (Z74042) predicted using
Genefinder-Similarity to Haemophilus 3-oxoacyl-(acyl-carrier protein) reductase (SW:FABG_HAEIN), contains similarity to Pfam domain: PF00106 (short chain dehydrogenase), Score=170.5, E- value=9.2e-48, N=l ~cDNA ESTyk470b2.3 comes from this gene-cDNA EST yk470b2.5 comes from this gene
267 LG '.220085.1 :2000SEP08 g325464 4.00E-98 Human endogenous retrovirus type C oncovirus sequence.
268 LG:406709.1 :2000SEP08 g5106556 8.00E-1 1 Homo sapiens MLL septin-like fusion protein (MSF) mRNA, complete eds.
269 LG:347863.9:2000SEP08 g6708478 2.00E-22 (3' incom)(Mus musculus) formin-like protein
270 U: 1073027.1 :2000SEP08 g 14198256 1. OOE-1 16 Homo sapiens, clone MGC:5243, mRNA, complete eds.
271 LI:347635.1 :2000SEP08 gl 1527996 0 Homo sapiens NOTCH2 protein (NOTCH2) mRNA, complete eds.
272 U:013685.1 :2000SEP08 g4868434 3.00E-79 Homo sapiens apoptosis related protein APR-2 mRNA, complete > 273 LI:406709.1 :2000SEP08 g5106556 8.00E-08 Homo sapiens MLL septin-like fusion protein (MSF) mRNA, o complete eds.
274 LI:2052938.1 :2000SEP08 g2662080 1.00E-08 Homo sapiens KIAA0400 mRNA, complete eds.
275 LI:213208.1 :2000SEP08 g 179303 2.00E-12 Human Bl 2 protein mRNA, complete eds.
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
1 LG:405741 ,3:2000SEP08 204 380 forward 3 hexokinase Hexokinase 2.10E-30
3 LG:017108.4:2000SEP08 335 574 forward 2 adenylatekinase Adenylate kinase 2.40E-09
4 LG:372569.5:2000SEP08 1340 1441 forward 2 ank Ank repeat 9.50E-05
4 LG:372569.5:2000SEP08 202 1 125 forward 1 Asparaginase Asparaginase 4.60E-49
5 LG:968765.1 :2000SEP08 182 433 forward 2 thiored Thioredoxin 5.90E-40
7 LG:977820.9:2000SEP08 1 1 844 forward 2 AMP-binding AMP-binding enzyme 1.70E-13
8 LI:1071608.1 :2000SEP08 267 539 forward 3 aldo_ket_red Aldo/keto reductase family 1 .90E-43
9 LI :1074023.1 :2000SEP08 294 575 forward 3 GST Glutathione S-transferase, C-terminal 3.30E-19
9 LI:1074023.1 :2000SEP08 60 278 forward 3 GST_N Glutathione S-transferase, N-terminal 9.90E-20
10 LI:453570.1 :2000SEP08 186 626 forward 3 Glyoxalase Glyoxalase/Bleomycin resistance 1 .10E-36 protein/Dioxygenase superfamily
1 1 LI:072072.1 :2000SEP08 535 729 forward 1 zf-DHHC DHHC zinc finger domain 2.50E-25
12 U:148565.4:2000SEP08 449 580 forward 2 gpdh Glyceraldehyde 3-phosphate 1 .90E-14 dehydrogenase, NAD binding domain
12 U:148565.4:2000SEP08 400 453 forward 1 gpdh Glyceraldehyde 3-phosphate 3.50E-08 dehydrogenase, NAD binding domain
13 LI:368626.4:2000SEP08 105 239 forward 3 lipoxygenase Lipoxygenase 1 1..2200EE--1133
13 LI:368626.4:2000SEP08 226 462 forward 1 lipoxygenase Lipoxygenase 2 2..5500EE--0055
14 LI:346123.1 :2000SEP08 99 302 forward 3 TIM Triosephosphate isomerase 1 1..2200EE--0088
14 LI:346123.1 :2000SEP08 373 492 forward 1 TIM Triosephosphate isomerase 1 1..7700EE--0066
15 LI:335795.1 1 :2000SEP08 512 706 forward 2 TGS TGS domain 1 1 ..3300EE--11 11
15 LI:335795.1 1 :2000SEP08 1203 2375 forward 3 tRNA-synt_2b tRNA synthetase class II (G, H, P, S and T) 2 2..5500EE--1199
16 LI:246023.2:2000SEP08 319 1050 forward 1 abhydrolase alpha/beta hydrolase fold 8 8..2200EE--2200
17 LG: 1 100661 .1 :2000SEP08 138 371 forward 3 hormoneδ Neurohypophysial hormones, C-terminal 3 3..0000EE--4455
Domain
18 LG:475856.1 :2000SEP08 139 207 forward 1 zf-C2H2 Zinc finger, C2H2 type 4 4..3300EE--0055
20 LG: 1400575.1 :2000SEP08 152 304 forward 2 KRAB KRAB box 2 2..3300EE--1166
20 LG: 1400575.1 :2000SEP08 440 508 forward 2 zf-C2H2 Zinc finger, C2H2 type 1 1 ..4400EE--0077
21 LG: 1080545.1 :2000SEP08 221 289 forward 2 zf-C2H2 Zinc finger, C2H2 type 1 1 ..9900EE--0066
23 LI:720641.1 :2000SEP08 236 985 forward 2 7tm_l 7 transmembrane receptor (rhodopsin 1 1 ..6600EE--2277
25 U:734904.1 :2000SEP08 1 197 1256 forward 3 7tm_l 7 transmembrane receptor (rhodopsin 3.90E-06
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value 25 LI:734904.1 :2000SEP08 964 1 182 forward 1 7tmJ 7 transmembrane receptor (rhodopsin 4.20E-05 26 LI: 1 1781 18.1 :2000SEP08 753 821 forward 3 zf-C2H2 Zinc finger, C2H2 type 2.60E-06 29 LG:337358.1 :2000SEP08 701 1438 forward 2 ras Ras family 5.50E-32 30 LG:986090.1 :2000SEP08 85 798 forward 1 14-3-3 14-3-3 proteins 1.30E-145 31 LG:123250. V.2000SEP08 812 1315 forward 2 RasGEF RasGEF domain 1.30E-07 33 LG:338927.6:2000SEP08 968 1219 forward 2 PH PH domain 3.60E-07 34 LG:332944.2:2000SEP08 700 1 185 forward 1 RhoGAP RhoGAP domain 3.00E-22 35 LI:347174.5:2000SEP08 23 526 forward 2 arf ADP-ribosylation factor family 2.80E-04 35 LI:347174.5:2000SEP08 35 640 forward 2 ras Ras family 5.70E-78 36 U:477070.1 :2000SEP08 83 169 forward 2 efhand EF hand 7.80E-08 37 LI:723144.1 :2000SEP08 68 604 forward 2 arf ADP-ribosylation factor family 3.80E-129 37 LI:723144.1 :2000SEP08 1 19 625 forward 2 ras Ras family 1.60E-04 38 U: 1007188. l :2000SEP08 295 534 forward 1 PI-PLC-X Phosphatidylinositol-specific 2.30E-41 phospholipase C, X domain
39 U: 1024412. l :2000SEP08 1 17 281 forward 3 G-gamma GGL domain 4.50E-36 40 U:284797.3:2000SEP08 291 377 forward 3 efhand EF hand 2.50E-08 43 LI:722913.1 :2000SEP08 172 744 forward 1 arf ADP-ribosylation factor family 6.60E-105 44 LG:457478.1 :2000SEP08 98 220 forward 2 chromo 'chromo' (CHRromatin Organization 1 .70E-24 Modifier) domain
45 LG:358719.1 :2000SEP08 285 1 151 forward 3 Adeno_ElA Early El A protein 6.00E-201 47 LG:400705.1 :2000SEP08 277 435 forward 1 HLH Helix-loop-helix DNA-binding domain 1.10E-17 49 LG:898771.1 :200OSEP08 79 291 forward 1 FHA FHA domain 1 .50E-14 49 LG:898771.1 :2000SEP08 1825 21 12 forward 1 Fork_head Fork head domain 7.10E-65 50 LI:457478.1 :2000SEP08 98 220 forward 2 chromo 'chromo' (CHRromatin Organization 1.70E-24 Modifier) domain
51 LI:125140.1 :2000SEP08 143 469 forward 2 BTB BTB/POZ domain 8.40E-06 53 LI:888730.1 :2000SEP08 337 606 forward 1 ferritin Ferritin 1.80E-44 53 U:888730.1 :2000SEP08 225 344 forward 3 ferritin Ferritin 9.00E-17 54 U:358719.1 :2000SEP08 291 1 109 forward 3 Adeno_ElA Early El A protein 5.60E-14 54 U:358719.1 :2000SEP08 295 1 161 forward 1 Adeno_ElA Early El A protein 2.60E-10
O
LO
CM CM CM CM S CM CM CM CM CM ca co X X X 0Q £3Q cα χ cα χ cθ oα oα _1_ CQ CD CQ X X X 00
U_
E < < C »M- C »M, ~ CM> < ^ < ^ < CM < CM < < < CM ,^ < s „. < „. < „. C-M. C-M. ^. CC -MM. „, C C..MM. CC..MM, < < * 9. C_ r_ D-
-X- ϊ ■ ■ ^' _ _. _i i !-: ±t- -- - '^ ^ -^ .^': ^ .*: ■ ■ " ^ ^ "
Q. IM N
i— CM CO CM r— r— r— CM CO CO CM CM CM I— ■— CM CM ■— 00 CM ■— CM CM CO
Φ p Ό O "D "O "O O O ' 'O 'Q 'Q Ε Ε V. V. V. Ό "O ) — Ό -g "σ "o "a "2 "2 o O D D O O O O O O O O O O o σ b b δ b b o δ δ σ b
,p .P .P ,P 2 o ≥ o p≥ ≥ ≥ p≥ p≥ p≥ p≥ p≥ p≥ .0 o ,ρ o o ,p p ,p .0.P
Q.00 "sr CO 00 O O >— P co CM O CO >— 00 CM o "> ) CM is i— CO cO o
Q s O O M 'ϊ iO O O cO O O f- CM CM 'ϊ iβ O O O Cvl cO 'vr iO lO N OO O O
LO L0 O O O O O O O O 1S S. S S IS IS S 1S IS 00 00 00 00 C0 00 00 C0 00 00 O
O <f
CQ < c≥
CM •— CO -— CM CO ■— ■— CM CM CO ■— CO CO CM CM CO ■— ■— ■— ■— r— CM CM CM — i— CO CO
Φ P O O P T3 O P O P O P P O O O O "O O Ό P ~Ό P P P O P
P ε δ δ > δ δ δ δ δ σ δ o δ δ σ δ δ σ δ δ b δ σ b δ o δ δ o δ δ δ o
-_ ^ r*^ p o o p o o p o p p p p p o p p p p p p o p p o p p p P P P
Q- co LO is -- r CO ^ L oo o 'vr o cM o o co ^i o s. oo o oo s. is o ≤ o t ! "f «
O ^- CM Is O CM O 'vr — CO LO LO CM CM '— LO OO "vr cθ X θ o> S o 3■
CO LO CM <vT ! o n c o r- ^ n ' O '- M' CN io io io to iO N n CM jo O 3
I IS IS O CM O O -— O — OO OO O CO S- LO O O CM o θ 00 ? O O IS CM S. O CM CM r- S. LO CO O ) CO CO is "vT LO CO o J5 O "T °^ CM ^r C | r_ -_ v0 r— CM r- CM CO CM CM OO OO '— cO c "vr L ^ CM_ ^-i ■- ^ CM LO _ CM
( ) < ) ( ) «) «) ( ) c ) < ) o ( ) ( ) n n n ( ) n n ( ) n n n
1 ~~ \ n n n n n n n en c/> en c/> c/> φ ( ) c > ( ( ) ( » ( ) < ) ( ) r ) ( ) ( ) < ) ( ) ( ) ( )
( ) r ) ( ) ( ) ( ) < ) ( ) ( ) ( ) ( ) < ) < ) ( ) ( ) ( )
() ( ) r ) ( ) ( ) < ) ( ) ( ) ( ) ( ) ( ) ( ) < ) c -) a ΓM ( ) ( ) ( > E ,— , — ,— s o 11 — s. s <^ 1 s -) o o o
^r c> ^r c . ri s o
( ) s. ^r ) o
^ < ) <> ^r ( • ) rs
^ <>. «> -r (> • -o s.
() (^ o <r>
1 ' s 1 1 o
Q — CM cO ^T LO O S- oO O O O r- CM 00 O O - 00 o •— co oo -sr -^- LO Lo o o is o o o o o o O o o o o o co o ^r L o o o o o o o •— TABLE 3 i lD NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
1 17 LI: 1 183325.1 :2000SEP08 1687 1755 forward 1 zf-C2H2 Zinc finger, C2H2 type 9.90E-07
1 18 LI:1 178269.2:2000SEP08 150 332 forward 3 KRAB KRAB box 2.80E-20
1 19 LI:813422.1 :2000SEP08 1 18 288 forward 1 KRAB KRAB box 1 .80E-21
120 LI:1093049.ό:2000SEP08 455 643 forward 2 KRAB KRAB box 2.10E-38
122 LG: 1041854.1 :2000SEP08 489 629 forward 3 ATP-synt_DE ATP synthase, Delta/Epsilon chain, long 1 .50E-05 alpha-helix domain
122 LG: 1041854.1 :2000SEP08 231 485 forward 3 ATP-synt_DE_N ATP synthase, Delta/Epsilon chain, beta- 1.00E-29 sandwich domain
126 U: 1078917.1 :2000SEP08 8 421 forward 2 transferrin Transferrin 1 .60E-37
126 LI:1078917.1 :2000SEP08 405 548 forward 3 transferrin Transferrin 1.50E-1 1
127 LI:1012560.1 :2000SEP08 372 884 forward 3 connexin Connexin 6.50E-12
127 LI:1012560.1 :2000SEP08 191 748 forward 2 connexin Connexin 9.60E-1 1
131 LG:334345.1 :2000SEP08 153 314 forward 3 trypsin Trypsin 1 JOE-21
131 LG:334345.1 :2000SEP08 293 430 forward 2 trypsin Trypsin 2.60E-21
133 LG:098580.1 :2000SEP08 120 377 forward 3 ICE_pl0 ICE-like protease (caspase) pi 0 domain 4.90E-31
134 LG:969572.1 :2000SEP08 225 539 forward 3 cystatin Cystatin domain 9.70E-38
135 LG: 196958.1 :2000SEP08 401 547 forward 2 trypsin Trypsin 3.20E-23
135 LG: 196958.1 :2000SEP08 889 1029 forward 1 trypsin Trypsin 1.60E-22
136 LG: 108781 1 .1 :2000SEP08 235 387 forward 1 projsomerase Cyclophilin type peptidyl-prolyl cis -trans . 1 .10E-26 isomerase
136 LG: 108781 1.1 :2000SEP08 396 539 forward 3 projsomerase Cyclophilin type peptidyl-prolyl cis -trans 5.50E-17 isomerase
136 LG:108781 1.1 :2000SEP08 539 667 forward 2 projsomerase Cyclophilin type peptidyl-prolyl cis -trans 3.60E-10 isomerase
137 LG: 1327885.1 :2000SEP08 93 554 forward 3 Peptidase_S26 Signal peptidase 1 2.10E-61
138 LI:449393.1 :2000SEP08 90 788 forward 3 cpnόOJOPl TCP-1 /cpnόO chaperonin family 9.80E-66
139 LI:897616.1 :2000SEP08 183 467 forward 3 FKBP FKBP-type peptidyl-prolyl cis-trans 1 .10E-59 isomerases
140 LI:736860.1 :2000SEP08. 58 237 forward 1 potatojnhibit Potato inhibitor 1 family 3.70E-26
142 LI :1074263.1 :2000SEP08 99 416 forward 3 cystatin Cystatin domain 3.10E-43
143 U:334345.1 :2000SEP08 382 519 forward 1 trypsin Trypsin 2.60E-21
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value 143 LI:334345.1 :2000SEP08 248 409 forward 2 trypsin Trypsin 5.00E-20 145 U: 1 188168.1 :2000SEP08 309 1091 forward 3 Peptidase_C2 Calpain family cysteine protease 4.00E-10 146 U:1065168.1 :2000SEP08 226 408 forward 1 proteasome Proteasome A-type and B-type 2.70E-16 146 U:1065168.1 :2000SEP08 1 13 226 forward 2 proteasome Proteasome A-type and B-type 3.70E-08 147 U:1 180418.1 :2000SEP08 235 387 forward 1 projsomerase Cyclophilin type peptidyl-prolyl cis-trans 1.1 OE-26 isomerase
147 U:l 180418.1 :2000SEP08 396 539 forward 3 projsomerase Cyclophilin type peptidyl-prolyl cis-trans 5.50E-17 isomerase
147 U:l 180418.1 :2000SEP08 539 667 forward 2 projsomerase Cyclophilin type peptidyl-p >rolyl cis-trans 3.60E-10 isomerase
148 LG:232648.1 :2000SEP08 277 774 forward 1 PseudoU_synth_5 ! RNA pseudouridylate synthase 4.20E-16 152 LG:241055.1 :2000SEP08 742 972 forward 1 Transposase_l Transposase 3.1 OE-26 154 G:475629.1 :2000SEP08 164 364 forward 2 Sm Sm protein 5.20E-16
155 LI:348991 .1 :2000SEP08 106 801 forward 1 Fibrillarin Fibrillarin 3.40E-167
156 LI:475629.1 :2000SEP08 162 356 forward 3 Sm Sm protein 7.70E-06 160 LI:758009.3:2000SEP08 202 375 forward 1 SH3 SH3 domain 1.70E-09 161 LG:331593.1 :2000SEP08 306 476 forward 3 ig Immunoglobulin domain 6.80E-05 163 LI:814362.1 :2000SEP08 138 380 forward 3 ig Immunoglobulin domain 2.70E-09 165 LI:726197.1 :2000SEP08 60 242 forward 3 glutaredoxin Glutaredoxin 8.30E-25 166 U: 1075314.1 :2000SEP08 148 510 forward 1 Sdh_cyt Succinate dehydrogenase cytochrome b 6.90E-44 subunit
167 LI:437883.1 :2000SEP08 176 640 forward 2 complexl_24kD Respiratory-chain NADH dehydrogenase 1.20E-14
24 Kd subunit
167 L1:437883.1 :2000SEP08 135 623 forward 3 complexl_24kD Respiratory-chain NADH dehydrogenase 3.60E-04
24 Kd subunit
168 LG:336265.1 :2000SEP08 60 239 forward 3 Collagen Collagen triple helix repeat (20 copies) 2.20E-10 168 LG:336265.1 :2000SEP08 439 612 forward 1 Collagen Collagen triple helix repeat (20 copies) 3.50E-04 169 LG:407788.2:2000SEP08 • 984 1 163 forward 3 Collagen Collagen triple helix repeat (20 copies) 4.30E-07 171 LI:332655.2:2000SEP08 415 513 forward 1 ank Ank repeat 3.10E-10 174 LG:362757.1 :2000SEP08 97 480 forward 1 cofilin ADF Cofilin/tropomyosin-type actin-binding 3.50E-47 proteins
TABLE 3 l ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
175 LG:406770.1 :2000SEP08 101 220 forward 2 GATA GATA zinc finger 2.80E-20
177 LG:001929.1 :2000SEP08 373 1314 forward 1 filament Intermediate filament protein 1 .60E-1 19
178 Ll:401322.1 2000SEP08 161 271 forward 2 tubulin Tubulin/FtsZ family 4.70E-14
180 U:407242.1 2000SEP08 224 568 forward 2 BTB BTB/POZ domain 3.40E-27
180 IJ:407242.1 2000SEP08 1655 1798 forward 2 Kelch Kelch motif 1.70E-10
181 U:403409.1 2000SEP08 1458 1652 forward 3 FHA FHA domain 3.00E-04
181 Ll:403409.1 2000SEP08 78 1 193 forward 3 kinesin Kinesin motor domain 1.70E-170
182 Ll:450798.1 2000SEP08 96 260 forward 3 tubulin Tubulin/FtsZ family 2.40E-25
182 Ll:450798.1 2000SEP08 266 484 forward 2 tubulin Tubulin/FtsZ family 3.00E-20
183 Ll:410317.1 2000SEP08 205 564 forward 1 filament Intermediate filament protein 5.60E-32
183 Ll:410317.1 2000SEP08 545 745 forward 2 filament Intermediate filament protein 2.40E-29
183 Ll:410317.1 2000SEP08 81 206 forward 3 filament Intermediate filament protein 9.40E-08
184 Ll:340268.1 2000SEP08 410 556 forward 2 actin Actin 1 .90E-19
O 184 Ll:340268.1 2000SEP08 93 716 forward 3 actin Actin 5.80E-17 Co
184 Ll:340268.1 2000SEP08 292 399 forward 1 actin Actin 2.80E-13
187 LG: 1043787.1 :2000SEP08 109 432 forward 1 GSHPx Glutathione peroxidase 4.50E-64
188 LG:1098931 .16:2000SEP08 124 228 forward 1 Carboxyljrans Carboxyl transferase domain 3.00E-06
190 U:1075297.1 :2000SEP08 251 517 forward 2 MAPEG MAPEG family 9.50E-37
192 LI:297070.1 :2000SEP08 784 1083 forward 1 Branch Core-2/l-Branching enzyme 3.60E-27
193 LI: 1085041.1 :2000SEP08 168 614 forward 3 lipocalin Lipocalin / cytosolic fatty-acid binding 1.20E-37 protein family
194 LI:1071544.1 :2000SEP08 131 436 forward 2 NDK Nucleoside diphosphate kinases 2.60E-76
194 U: 1071544.1 :2000SEP08 436 567 forward 1 NDK Nucleoside diphosphate kinases 9.30E-24
195 LI:2052480.1 :2000SEP08 1000 1773 forward 1 hexokinase Hexokinase 6.40E-99
195 LI:2052480.1 :2000SEP08 513 1019 forward 3 hexokinase Hexokinase 3.50E-51
195 LI:2052480.1 :2000SEP08 155 514 forward 2 hexokinase Hexokinase 1.40E-18
196 LG:450105.1 :2000SEP08 86 490 forward 2 Ribosomal_S12 Ribosomal protein SI 2 6.60E-78
197 LG:450581.1 :2000SEP08 82 276 forward 1 Ribosomal_S28e Ribosomal protein S28e 5.00E-42
198 LG:450887.1 :2000SEP08 48 344 forward 3 RibosomaLL36e Ribosomal protein L36e 6.90E-41
199 LG:460809.1 :2000SEP08 3 236 forward 3 Ribosomal_L31e Ribosomal protein L31e 6.00E-17
200 LG:452089. 1 :2000SEP08 107 268 forward 2 RibosomalJ-5 Ribosomal protein L5 2.30E-25
TABLE 3
SEQ ID NO Template ID Start Stop Frame Pfam Hit Pfam Description E-value
200 LG:452089.1 :2000SEP08 278 577 forward 2 Ribosomal J_5_C ribosomal L5P family C-terminus 2.60E-59
201 LG: 1099416.1 :2000SEP08 318 479 forward 3 Ribosomal_L37e Ribosomal protein L37e 1.60E-13
202 LG:255713.1 :2000SEP08 1 18 396 forward 1 Ribosomal_L31 e Ribosomal protein L31 e 2.30E-14
203 LG:998903.1 :2000SEP08 130 426 forward 1 Ribosomal_L7Ae Ribosomal protein 8.40E-37 L7Ae/L30e/S12e/Gadd45 family
' 205 LG: 1096907.1 2000SEP08 25 297 forward 1 Ribosomal_L37ae Ribosomal L37ae protein family 3.20E-58
206 LG: 1323741.1 2000SEP08 134 325 forward 2 Ribosomal_L29 Ribosomal L29 protein 1.70E-15
207 LG: 1098372.1 2000SEP08 64 384 forward 1 Ribosomal_L22e Ribosomal L22e protein family 2.00E-27
207 LG: 1098372.1 2000SEP08 53 391 forward 2 Ribosomal_L22e Ribosomal L22e protein family 1 .70E-06
208 LG: 1006783.1 2000SEP08 16 312 forward 1 Ribosomal_L21 e Ribosomal protein L21 e 1.50E-69
209 LG: 1097562.1 2000SEP08 137 430 forward 2 Ribosomal_L7Ae Ribosomal protein 2.40E-37
L7Ae/L30e/Sl 2e/Gadd45 family
210 LG:998868.1 :2000SEP08 32 667 forward 2 Ribosomal_S3Ae Ribosomal S3Ae family 1.60E-145
212 LG: 1400567.1 :2000SEP08 38 271 forward 2 RibosomalJ-31 e Ribosomal protein L31 e 1 .30E-07 i oo 213 LI:449404.1 :2000SEP08 175 531 forward 1 RibosomaLSl 1 Ribosomal protein SI 1 6.90E-77
4-» 214 U:449941 .2:2000SEP08 61 438 forward 1 RibosomalJ38e Ribosomal protein S8e 4.70E-84
215 U:450229.1 :2000SEP08 85 648 forward 1 RibosomalJ37e Ribosomal protein S7e 4.60E-83
216 LI:450399.3:2000SEP08 81 446 forward 3 RibosomaLL14 Ribosomal protein L14p/L23e 1.20E-53
217 LI:455771 .1 :2000SEP08 69 473 forward 3 RibosomaLSl 2 Ribosomal protein SI 2 6.60E-78
218 LI:720459.1 :2000SEP08 64 564 forward 1 Ribosomal_L6e Ribosomal protein L6e 8.80E-1 17
219 LI:723156.1 :2000SEP08 96 380 forward 3 RibosomaLL31 e Ribosomal protein L31 e 8.40E-62
220 LI:728055.1 :2000SEP08 81 479 forward 3 Ribosomal_S9 Ribosomal protein S9/S16 2.00E-89
221 LI :1020789.1 :2000SEP08 70 441 forward 1 RibosomaLSl 3 Ribosomal protein S13/S18 4.50E-73
222 LI:1071728.1 :2000SEP08 139 426 forward 1 RibosomaLSl 0 Ribosomal protein S10p/S20e 2.80E-54
223 LI: 1084329.1 :2000SEP08 200 439 forward 2 Ribosomal_L23 Ribosomal protein L23 5.00E-32
224 LI:246422.1 :2000SEP08 64 41 1 forward 1 Ribosomal_L22e Ribosomal L22e protein family 3.20E-69
225 LI :1086066.1 :2000SEP08 209 571 forward 2 Ribosomal_S7 Ribosomal protein S7p/S5e 2.30E-25
227 LI:885368.1 :2000SEP08 89 466 forward 2 RibosomalJ38 Ribosomal protein S8 1 .60E-60
228 U:481782.1 :2000SEP08 263 550 forward 2 Ribosomal_S4e Ribosomal family S4e 2.50E-27
229 LI:1093813.1 :2000SEP08 43 234 forward 1 Ribosomal_L29 Ribosomal L29 protein 1.70E-15
230 Ll:449413.2:20 00SEP08 90 494 forward 3 RibosomaLSl 2 Ribosomal protein SI 2 6.60E-78
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
231 U:450105.1 :2000SEP08 274 483 forward 1 RibosomaLSl 2 Ribosomal protein SI 2 5.60E-37
231 U:450105.1 :2000SEP08 150 278 forward 3 RibosomaLSl 2 Ribosomal protein SI 2 5.00E-15
231 LI:450105.1 :2000SEP08 86 169 forward 2 RibosomaLSl 2 Ribosomal protein SI 2 1.40E-06
234 LI:817330.1 :2000SEP08 138 389 forward 3 Ribosomal_L7Ae Ribosomal protein 1.50E-05 L7Ae/L30e/S12e/Gadd45 family
235 LI:817845.1 :2000SEP08 83 502 forward 2 RibosomaLSl 9e Ribosomal protein S19e 2.10E-101
236 LI:460809.1 :2000SEP08 1 306 forward 1 Ribosomal_L31 e Ribosomal protein L31 e 4.20E-19
237 LI:815874.1 :2000SEP08 237 464 forward 3 RibosomaLSl 0 Ribosomal protein S10p/S20e 3.90E-27
239 U:035973.1 :2000SEP08 322 483 forward 1 Ribosomal_L37e Ribosomal protein L37e 5.50E-09
241 LI:2049074.1 :2000SEP08 225 569 forward 3 RibosomalJJ 3e Ribosomal protein L13e 2.80E-49
241 LI:2049074.1 :2000SEP08 97 180 forward 1 RibosomalJ 3e Ribosomal protein L13e 1.20E-12
241 LI:2049074.1 :2000SEP08 44 229 forward 2 RibosomalJJ 3e Ribosomal protein L13e l .lOE-10
242 LI: 1092460.1 :2000SEP08 34 41 1 forward 1 histone Core histone H2A/H2B/H3/H4 4.50E-47
244 LI:816655.2:2000SEP08 136 393 forward 1 HMG14 7 HMG14 an HMG17 5.50E-24
245 LG:414732.1 :2000SEP08 79 534 forward 1 GlycoJ ydro_2JM Glycosyl hydrolases family 2, sugar binding 2.70E-10 domain
247 LG:174022.1 :2000SEP08 104 382 forward 2 Idh lactate/malate dehydrogenase, NAD 1.10E-35 binding domain
247 LG:174022.1 :2000SEP08 363 446 forward 3 Idh lactate/malate dehydrogenase, NAD 1 .70E-08 binding domain
247 LG:174022.1 :2000SEP08 31 108 forward 1 Idh lactate/malate dehydrogenase, NAD 2.40E-06 binding domain
249 U:414732.2:2000SEP08 79 534 forward 1 Glyco_hydro_2JN 1 Glycosyl hydrolases family 2, sugar binding 2.70E-10 domain
255 LG:453922.1 :2000SEP08 .175 477 forward 1 sun Translation initiation factor SUM 4.20E-58
256 LG:476342.3:2000SEP08 190 330 forward 1 Gamma-thionin Gamma-thionins family 1.70E-19
257 U:336801.1 :2000SEP08 21 1 543 forward 1 SCP2 SCP-2 sterol transfer family 2.00E-28
259 LI:476342.1 :2000SEP08 159 299 forward 3 Gamma-thionin Gamma-thionins family 1.70E-19
260 LI :1072804.1 :2000SEP08 278 481 forward 2 pp-binding Phosphopantetheine attachment site 3.90E-14
261 LI:455450.1 :2000SEP08 1 426 forward 1 dehydrin Dehydrin 4.20E-41
262 LI :1073699.1 :2000SEP08 51 248 forward 3 metalthio Metallothionein 1.70E-25
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
263 U:1013729.1 :2000SEP08 33 392 forward 3 lipocalin Lipocalin / cytosolic fatty-acid binding 1.30E-24 protein family
264 LI:2050322.2:2000SEP08 109 318 forward 1 rrm RNA recognition motif, (a.k.a. RRM, RBD, or 5.00E-24 RNP domain)
265 Ll:891327.1 :2000SEP08 303 551 forward 3 HIN HIN-200/IF120X domain 1 .60E-1 1
267 LG:220085.1 :2000SEP08 228 521 forward 3 rvp Retroviral aspartyl protease 2.10E-15
267 LG:220085.1 :2000SEP08 728 853 forward 2 rvt Reverse transcriptase (RNA-dependent 1.10E-04
DNA polymerase)
268 LG:406709.1 :2000SEP08 198 61 1 forward 3 GTP_CDC Cell division protein 3.70E-70
268 LG:406709.1 :2000SEP08 635 1018 forward 2 GTP_CDC Cell division protein 1 .30E-44
270 LI: 1073027.1 :2000SEP08 16 525 forward 1 ThiJ ThiJ/Pfpl family 6.10E-54
271 Ll:347635.1 2000SEP08 515 610 forward 2 EGF EGF-like domain 6.70E-09
271 Ll:347635.1 2000SEP08 790 894 forward 1 EGF EGF-like domain 1 .10E-04
IO O 271 Ll:347635.1 2000SEP08 1251 1346 forward 3 EGF EGF-like domain 1.20E-04 o
273 Ll:406709.1 2000SEP08 374 835 forward 2 GTP CDC Cell division protein 5.20E-39
273 U:406709.1 2000SEP08 852 1031 forward 3 GTP_CDC Cell division protein 7.60E-27
275 LI :213208.1 2000SEP08 279 491 forward 3 KJetra K+ channel tetramerisation domain 1.10E-07
TABLE 4
ID NO Template ID Start Stop Frame Domain Topology
LG:405741.3:2000SEP08 76 162 forward 1 TM Nin
LG:405741 ,3:2000SEP08 499 585 forward 1 TM Nin
LG:405741.3:2000SEP08 1000 1086 forward 1 TM Nin
LG:405741.3:2000SEP08 647 733 forward 2 TM Nout
LG:405741.3:2000SEP08 995 1081 forward 2 TM Nout
LG:405741.3:2000SEP08 537 593 forward 3 TM Nin
LG:405741 ,3:2000SEP08 1029 1115 forward 3 TM Nin
2 LG:337194.1:2000SEP08 208 294 forward 1 TM
2 LG:337194.1:2000SEP08 331 399 forward 1 TM
2 LG:337194.1:2000SEP08 1129 1215 forward 1 TM
2 LG:337194.1:2000SEP08 1501 1551 forward 1 TM
2 LG:337194.1:2000SEP08 1639 1710 forward 1 TM
2 LG:337194.1:2000SEP08 1801 1887 forward 1 TM
2 LG:337194.1:2000SEP08 1948 1998 forward 1 TM
2 LG:337194.1:2000SEP08 188 271 forward 2 TM Nin
2 LG;337194.1:2000SEP08 1472 1540 forward 2 TM Nin
2 LG:337194.1:2000SEP08 1568 1624 forward 2 TM Nin
2 LG:337194.1:2000SEP08 1622 1708 forward 2 TM Nin
2 LG:337194.1:2000SEP08 1793 1855 forward 2 TM Nin
2 LG:337194.1:2000SEP08 1895 1957 forward 2 TM Nin
2 LG:337194.1:2000SEP08 1988 2050 forward 2 TM Nin
2 LG:337194,1:2000SEP08 2081 2143 forward 2 TM Nin
2 LG:337194.1:2000SEP08 204 290 forward 3 TM Nin
2 LG:337194.1:2000SEP08 339 392 forward 3 TM Nin
2 LG:337194.1:2000SEP08 678 764 forward 3 TM Nin
2 LG:337194.1:2000SEP08 813 899 forward 3 TM Nin
2 LG:337194.1:2000SEP08 1086 1172 forward 3 TM Nin
2 LG:337194.1:2000SEP08 1455 1541 forward 3 TM Nin
2 LG:337194.1:2000SEP08 1848 1913 forward 3 TM Nin
2 LG:337194.1:2000SEP08 2106 2159 forward 3 TM Nin
4 LG:372569.5:2000SEP08 400 465 forward 1 TM
4 LG:372569.5:2000SEP08 544 594 forward 1 TM
4 LG:372569.5:2000SEP08 985 1044 forward 1 TM
4 LG:372569.5:2000SEP08 135 221 forward 3 TM Nout
5 LG:968765.1:2000SEP08 131 190 forward 2 TM Nout
6 LG:255999.16:2000SEP08 221 277 forward 2 TM Nout
7 LG:977820.9:2000SEP08 1267 1338 forward 1 TM Nout
8 LI:1071608.1:2000SEP08 31 117 forward 1 TM Nin
8 LI:1071608.1:2000SEP08 319 405 forward 1 TM Nin
8 LI:1071608.1:2000SEP08 108 155 forward 3 TM Nout
10 Ll:453570.1 2000SEP08 361 447 forward 1 TM
Ll:072072.1 2000SEP08 325 396 forward 1 TM Nout
LI.O72072.1 2000SEP08 664 750 forward 1 TM Nout
Ll:072072.1 2000SEP08 1489 1566 forward 1 TM Nout
Ll:072072.1 2000SEP08 1624 1710 forward 1 TM Nout
LI.O72072.1 2000SEP08 1810 1872 forward 1 TM Nout
Ll:072072.1 2000SEP08 1900 1962 forward 1 TM Nout
U:072072,l 2000SEP08 2119 2187 forward 1 TM Nout
Ll:072072.1 2000SEP08 227 313 forward 2 TM SEQ ID NO Template ID Start Stop Frame Domain Topoloς
11 LI:072072.1:2000SEP08 620 682 forward 2 TM
11 LI:072072.1:2000SEP08 695 757 forward 2 TM
11 LI:072072.1:2000SEP08 800 886 forward 2 TM
11 LI:072072.1:2000SEP08 1508 1594 forward 2 TM
11 LI:072072.1:2000SEP08 1658 1744 forward 2 TM
11 LI:072072.1:2000SEP08 1931 2017 forward 2 TM
11 LI:072072.1:2000SEP08 2699 2782 forward 2 TM
11 LI:072072.1:2000SEP08 312 398 forward 3 TM Nout
11 LI:072072.1:2000SEP08 969 1055 forward 3 TM Nout
11 LI:072072.1:2000SEP08 1449 1508 forward 3 TM Nout
11 LI:072072.1:2000SEP08 1674 1745 forward 3 TM Nout
11 LI:072072.1:2000SEP08 1917 1985 forward 3 TM Nout
12 LI:148565.4:2000SEP08 1350 1436 forward 3 TM Nout
13 LI:368626.4:2000SEP08 1219 1287 forward 1 TM Nout
13 LI:368626.4:2000SEP08 590 673 forward 2 TM Nout
13 LI:368626.'4:2000SEP08 671 739 forward 2 TM Nout
13 LI:368626.4:2000SEP08 908 970 forward 2 TM Nout
13 LI:368626.4;2000SEP08 108 167 forward 3 TM Nout
13 LI:368626.4:2000SEP08 777 863 forward 3 TM Nout
14 LI;346123.1:2000SEP08 199 246 forward 1 TM Nin
14 LI:346123.1:2000SEP08 738 824 forward 3 TM Nout
14 U:346123.1:2000SEP08 981 1049 forward 3 TM Nout
15 Ll:335795.n :2000SEP08 1483 1551 forward 1 TM Nin
15 Ll:335795.n :2000SEP08 2791 2841 forward 1 TM Nin
15 Ll:335795.n :2000SEP08 2965 3030 forward 1 TM Nin
15 Ll:335795.11 :2000SEP08 1691 1747 forward 2 TM Nin
15 Ll:335795.11 :2000SEP08 1952 2038 forward 2 TM Nin
15 Ll:335795.n :2000SEP08 2492 2566 forward 2 TM Nin
15 U:335795.n :2000SEP08 3035 3097 forward 2 TM . Nin
15 Ll:335795.11 :2000SEP08 1665 1751 forward 3 TM Nin
15 Ll:335795.11 :2000SEP08 2418 2504 forward 3 TM Nin
15 Ll:335795.11 :2000SEP08 2742 2822 forward 3 TM Nin
16 LI:246023.2:2000SEP08 244 303 forward 1 TM out
16 LI:246023.2:2000SEP08 439 489 forward 1 TM Nout
17 LG: 1100661.1:2000SEP08 21 83 forward 3 TM Nout
18 LG:475856.1:2000SEP08 454 540 forward 1 TM Nout
18 LG:475856.1:2000SEP08 29 106 forward 2 TM Nin
18 LG:475856.1:2000SEP08 449 535 forward 2 TM in
18 LG:475856.1:2000SEP08 447 533 forward 3 TM
19 LG: 1015343.1:2000SEP08 97 159 forward 1 TM
19 LG: 1015343.1:2000SEP08 431 499 forward 2 TM Nin
19 LG: 1015343.1:2000SEP08 411 482 forward 3 TM Nout
22 LG:213947.1:2000SEP08 100 186 forward 1 TM Nout
22 LG:213947,1:2000SEP08 244 330 forward 1 TM Nout
23 Ll:720ό41.1:_ .000SEP08 664 732 forward 1 TM Nout
23 Ll:720641.1:_ .000SEP08 197 262 forward 2 TM Nin
23 Ll:720641.1:i .00OSEPO8 293 352 forward 2 TM Nin
23 LI '720641.1:. .000SEP08 413 484 forward 2 TM Nin
23 Ll:720641.1:. .000SEP08 533 619 forward 2 TM Nin TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς
23 Ll:720641.1 2000SEP08 683 769 forward 2 TM Nin
23 Ll:720641 1 2000SEP08 920 979 forward 2 TM Nin
23 Ll:720641 1 2000SEP08 132 218 forward 3 TM Nout
23 Ll:720641 1 2000SEP08 519 569 forward 3 TM Nout
23 LI: 720641 1 2000SEP08 642 728 forward 3 TM Nout
25 LI:734904 1 2000SEP08 58 132 forward 1 TM Nout
25 Ll:734904 1 2000SEP08 196 243 forward 1 TM Nout
25 Ll:734904 1 2000SEP08 508 573 forward 1 TM Nout
25 LI :734904 1 2000SEP08 886 972 forward 1 TM Nout
25 Ll:734904 1 2000SEP08 50 136 forward 2 TM Nout
25 Ll:734904 1 2000SEP08 578 655 forward 2 TM Nout
25 LI: 734904 1 2000SEP08 1277 1342 forward 2 TM Nout
25 Ll:734904 1 2000SEP08 1491 1547 forward 3 TM Nout
26 Ll:l 178118.1 :2000SEP08 1054 1140 forward 1 TM Nin
26 LI: 1178118.1 :2000SEP08 1411 1488 forward 1 TM N'in
26 LI :1178118.1 :2000SEP08 1612 1662 forward 1 TM Nin
26 LI: 1178118.1 :2000SEP08 1891 1977 forward 1 TM Nin
26 U:l 178118.1 :2000SEP08 2041 2127 forward 1 TM Nin
26 U:1178118.1:2000SEP08 2398 2475 forward 1 TM Nin
26 Ll:l 178118.1 :2000SEP08 1394 1468 forward 2 TM Nin
26 Ll:l 178118.1 :2000SEP08 1514 1600 forward 2 TM Nin
26 Ll:l 178118.1 ;2000SEP08 1883 1945 forward 2 TM Nin
26 LI: 1178118.1 :2000SEP08 2351 2437 forward 2 TM Nin
26 U:1178n8.1:2000SEP08 1641 1703 forward 3 TM Nin
26 U:1178118.1:2000SEP08 1995 2081 forward 3 TM Nin
26 U:l 178118.1 :2000SEP08 2472 2534 forward 3 TM Nin
27 U:213947.1:2000SEP08 63 149 forward 3 TM Nout
27 LI:213947.1:2000SEP08 198 284 forward 3 TM Nout
28 LG:407304.1:2000SEP08 157 243 forward 1 TM Nout
28 LG:407304.1:2000SEP08 107 193 forward 2 TM in
29 LG:337358.1:2000SEP08 2452 2508 forward 1 TM Nin
29 LG:337358.1:2000SEP08 2767 2853 forward 1 TM Nin
29 LG:337358.1:2000SEP08 884 970 forward 2 TM
29 LG:337358.1:2000SEP08 1556 1642 forward 2 TM
29 LG:337358.1:2000SEP08 2096 2173 forward 2 TM
29 LG:337358.1:2000SEP08 2756 2842 forward 2 TM
29 LG:337358.1:2000SEP08 2730 2816 forward 3 TM Nin
30 LG:986090.1:2000SEP08 583 630 forward 1 TM Nin
30 LG:986090.1:2000SEP08 197 271 forward 2 TM Nout
30 LG:986090.1:2000SEP08 521 607 forward 2 TM Nout
31 LG:123250.1:2000SEP08 1223 1309 forward 2 TM Nout
32 LG:1028774.2:2000SEP08 313 399 forward 1 TM Nin
32 LG:1028774.2:2000SEP08 934 1020 forward 1 TM Nin
32 LG:1028774.2:2000SEP08 554 640 forward 2 TM Nout
32 LG:1028774,2:2000SEP08 1010 1093 forward 2 TM out
32 LG:1028774.2:2000SEP08 1154 1240 forward 2 TM Nout
33 LG:338927.6:2000SEP08 17 88 forward 2 TM Nout
34 LG:332944.2:2000SEP08 13 66 forward 1 TM Nin
34 LG:332944.2:2000SEP08 1609 1659 forward 1 TM Nin TABLE 4
SEQ ID NO : Template ID Start Stop Frame Domain Topology
34 LG:332944.2:2000SE'P08 1852 1938 forward 1 TM Nin
34 LG:332944.2:2000SEP08 1154 1240 forward 2 TM Nout
34 LG:332944.2:2000SEP08 2339 2407 forward 2 TM Nout
34 LG:332944.2:2000SEP08 1785 1856 forward 3 TM Nin
35 U:347174.5:2000SEP08 640 702 forward 1 TM Nin
35 U:347174.5:2000SEP08 721 783 forward 1 TM Nin
35 LI:347174.5:2000SEP08 1330 1407 forward 1 TM Nin
35 LI:347174.5:2000SEP08 743 808 forward 2 TM Nin
35 LI:347174.5:2000SEP08 854 940 forward 2 TM in
35 LI:347174.5:2000SEP08 753 839 forward 3 TM
35 LI:347174.5:2000SEP08 873 959 forward 3 TM
37 U:723144.1:2000SEP08 396 473 forward 3 TM Nout
40 U:284797.3:2000SEP08 994 1044 forward 1 TM Nin
40 LI:284797.3:2000SEP08 1426 1476 forward 1 TM Nin
40 U:284797.3:2000SEP08 1564 1632 forward 1 TM in
40 LI:284797.3:2000SEP08 998 1060 forward 2 TM Nin
40 LI:284797.3:2000SEP08 1796 1858 forward 2 TM' Nin
40 LI:284797.3:2000SEP08 735 821 forward 3 TM Nin
40 LI:284797,3:2000SEP08 969 1028 forward 3 TM Nin
40 LI:284797.3:2000SEP08 1422 1508 forward 3 TM Nin
41 LI :l 092901.1:2000SEP08 92 160 forward 2 TM Nout
41 1.1:1092901.1:2000SEP08 473 559 forward 2 TM Nout
41 LI: 1092901.1:2000SEP08 495 578 forward 3 TM Nin
42 LI:228930.1:2000SEP08 141 197 forward 3 TM Nin
43 LI:722913.1:2000SEP08 142 201 forward 1 TM Nin
43 LI:722913.1:2000SEP08 485 541 forward 2 TM Nout
44 LG:457478.1:2000SEP08 454 540 forward 1 TM Nout
44 LG:457478.1:2000SEP08 282 329 forward 3 TM
45 LG:358719.1:2000SEP08 109 195 forward 1 TM Nin
45 LG:358719.1:2000SEP08 346 426 forward 1 TM Nin
45 LG:358719.1:2000SEP08 691 762 forward 1 TM Nin
45 LG:358719.1:2000SEP08 772 852 forward 1 TM Nin
45 LG:358719.1:2000SEP08 913 969 forward 1 TM Nin
45 LG:358719.1:2000SEP08 35 97 forward 2 TM
45 LG:358719.1:2000SEP08 125 187 forward 2 TM
46 LG:105160.5:2000SEP08 205 255 forward 1 TM Nout
46 LG:105160.5:2000SEP08 737 793 forward 2 TM Nin
47 LG:400705.1:2000SEP08 871 948 forward 1 TM Nout
47 LG:400705.1:2000SEP08 867 953 forward 3 TM Nout
48 LG:221977.1:2000SEP08 76 126 forward 1 TM Nout
48 LG:221977.1:2000SEP08 1208 1276 forward 2 TM Nout
49 LG:898771.1:2000SEP08 910 996 forward 1 TM Nout
49 LG:898771.1:2000SEP08 767 823 forward 2 TM Nout
49 LG:898771.1;2000SEP08 905 973 forward 2 TM Nout
49 LG:898771.1:2000SEP08 894 962 forward 3 TM Nin
50 LI:457478.1:2000SEP08 284 331 forward 2 TM Nout
50 LI:457478.1:2000SEP08 456 542 forward 3 TM Nout
53 LI:888730.1:2000SEP08 55 123 forward 1 TM Nout
53 U:888730.1:2000SEP08 262 324 forward 1 TM Nout C> C C C C C> C C C 0 C> C> C 0'- C> C> C> C> C> C> C> C> C> C> C
45i CO CO CO Go GO GO Co ro iO O O O O O O O
.fc. 45- CO Oi _ co ro ro ^ co ro h O si o • 00 O -O CO si — i ro >
O cS S c^ ≥ rtoo' ro CO O co si — ' — ^ _ι .- -fc. Go O C si - O-i O si .fc. ro f O O O Oi O — ' SI o .15- O ro CD
Oi Co S j-. ro •fc.
5gi O _ β h co" co <-> ii O W O i^ M O J- O O g .fc. ro co o oo .fc, sj o si si ro f-. C ϋi ϋi ro 00 cn CO 00 O O io ro co ro o ro
45-
— < Ol 45* CO ro 4-- IO GO Go rO — ' 4i- G0 θi G0 co oo ro — ■ co — ' _, ro ro oo — —■ ,-, ro —' 00 oo it? o o o oo ro O Ol GO O —' o 00 --J o cn -o o oo c s o ro 4s- O rO O sl C Go 00 _ GO — o Go o
45- 00 ro co co c o oo 4-v θ O •O .fc. IO .fc. O .fc. o CO o z G O 45- CO ^ 00 —.45- 9
45. ro CO -=■ i oo O O s •o H o o oo o o
O 45- _ O O — ' o o ξ2 si cn .fc. oo
CO s cn o ro 5- -Q
TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς
64 LG: 1060884.1 2000SEP08 595 681 forward 1 TM Nin
64 LG: 1060884.1 2000SEP08 17 103 forward 2 TM Nin
64 LG: 1060884.1 2000SEP08 101 166 forward 2 TM Nin
64 LG: 1060884.1 2000SEP08 236 301 forward 2 TM Nin
64 LG: 1060884.1 2000SEP08 42 no forward 3 TM Nout
68 LG:979390.2:2000SEP08 190 240 forward 1 TM Nout
68 LG:979390.2:2000SEP08 976 1029 forward 1 TM Nout
68 LG:979390.2:2000SEP08 857 919 forward 2 TM
69 LG: 1400447.1 2000SEP08 209 280 forward 2 TM Nin
70 LG: 1400562.1 2000SEP08 337 411 forward 1 TM Nout
70 LG: 1400562.1 2000SEP08 433 519 forward 1 TM Nout
70 LG: 1400562.1 2000SEP08 573 629 forward 3 TM Nin
71 LG: 1076130.1 2000SEP08 481 531 forward 1 TM Nout
72 LG: 1064459.1 2000SEP08 747 833 forward 3 TM Nout
73 LG: 1079415.14:2000SEP08 86 172 forward 2 TM out
73 LG: 1079415.14:2000SEP08 24 no forward 3 TM Nout
75 LG:1088431.2:2000SEP08 403 465 forward 1 TM Nout
' 75 LG:1088431.2:2000SEP08 378 455 forward 3 TM Nout
76 LG: 1329462.2:2000SEP08 55 141 forward 1 TM Nout
76 LG:1329462.2:2000SEP08 313 366 forward 1 TM Nout
76 LG:1329462.2:2000SEP08 300 383 forward 3 TM Nin
78 LI722577.1 2000SEP08 236 310 forward 2 TM Nout
78 U722577.1 2000SEP08 320 406 forward 2 TM Nout
78 Ll:722577.1 2000SEP08 9 95 forward 3 TM Nin
78 LI 722577.1 2000SEP08 333 407 forward 3 TM Nin
79 LI:322783.16:2000SEP08 169 231 forward 1 TM Nin
79 U:322783.16:2000SEP08 256 318 forward 1 TM Nin
79 U:322783.16:2000SEP08 218 304 forward 2 TM Nout
79 LI:322783.16:2000SEP08 326 412 forward 2 TM Nout
79 LI:322783.16:2000SEP08 72 134 forward 3 TM Nin
79 LI:322783.16:2000SEP08 150 212 forward 3 TM Nin
79 U:322783.16:2000SEP08 270 356 forward 3 TM Nin
81 LI:038859.2;2000SEP08 679 765 forward 1 TM Nout
81 LI:038859.2;2000SEP08 47 130 forward 2 TM out
81 LI:038859.2:2000SEP08 260 346 forward 2 TM Nout
81 LI:038859.2:2000SEP08 99 161 forward 3 TM Nout
81 LI:038859.2:2000SEP08 177 239 forward 3 TM Nout
82 LI:1046117.1:2000SEP08 351 431 forward 3 TM Nout
85 U:1170585.2:2000SEP08 141 203 forward 3 TM Nout
86 LI719531.2:2000SEP08 81 134 forward 3 TM Nout
90 LI:1091881.1:2000SEP08 53 133 forward 2 TM Nout
93 LI: 1166249.1 :2000SEP08 11 88 forward 2 TM Nin
93 LI: 1166249.1 :2000SEP08 395 481 forward 2 TM Nin
94 LI 799675.1 2000SEP08 25 99 forward 1 TM Nout
94 LI 799675.1 2000SEP08 403 480 forward 1 TM Nout
94 LI799675.1 2000SEP08 98 184 forward 2 TM Nout
94 LI799675.1 2000SEP08 383 442 forward 2 TM Nout
94 LI799675.1 2000SEP08 120 203 forward 3 TM Nout
95 LI: 1178899. 1:2 000SEP08 73 159 forward 1 TM Nout TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topology
95 U: 1178899.1:2000SEP08 331 384 forward 1 TM Nout
95 LI: 1178899.1 :2000SEP08 315 398 forward 3 TM Nout
100 U:805158.1:2000SEP08 22 99 forward 1 TM
102 LI:1174107.2:2000SEP08 300 362 forward 3 TM Nout
102 LI:1174107.2:2000SEP08 375 437 forward 3 TM Nout
103 LI:1177434.2:2000SEP08 1023 1091 forward 3 TM Nout
104 U: 1184255.1:2000SEP08 611 694 forward 2 TM Nout
107 LI: 1166752.1:2000SEP08 31 93 forward 1 TM Nin
107 U:l 166752.1 :2000SEP08 112 174 forward 1 TM Nin
107 LI: 1166752.1 :2000SEP08 193 255 forward 1 TM Nin
107 LI: 1166752.1 :2000SEP08 263 340 forward 2 TM Nout
107 LI: 1166752.1 :2000SEP08 33 no forward 3 TM Nin
107 LI: 1166752.1:2000SEP08 117 173 forward 3 TM Nin
107 LI: 1166752.1 :2000SEP08 219 305 forward 3 TM Nin
109 Ll:242665.2 2000SEP08 81 137 forward 3 TM Nout no Ll:208637.1 2000SEP08 304 390 forward 1 TM Nout no Ll:208637.1 2000SEP08 2449 2535 forward 1 TM Nout no Ll:208637.1 2000SEP08 2983 3069 forward 1 TM Nout no Ll:208637.1 2000SEP08 3439 3525 forward 1 TM Nout no Ll:208637.1 2000SEP08 3538 3624 forward 1 TM Nout no Ll:208637.1 2000SEP08 3793 3879 forward 1 TM Nout no Ll:208637.1 2000SEP08 3889 3945 forward 1 TM Nout no Ll:208637.1 2000SEP08 4081 4152 forward 1 TM Nout no Ll:208637.1 2000SEP08 4159 4212 forward 1 TM Nout no Ll:208637.1 2000SEP08 4435 4521 forward 1 TM Nout no Ll:208637.1 2000SEP08 4900 4953 forward 1 TM Nout no Ll:208637.1 2000SEP08 278 340 forward 2 TM Nout no Ll:208637.1 2000SEP08 353 415 forward 2 TM Nout no Ll:208637.1 2000SEP08 503 553 forward 2 TM Nout no Ll:208637.1 2000SEP08 566 628 forward 2 TM Nout no Ll:208637.1 2000SEP08 653 715 forward 2 TM Nout no Ll:208637.1 2000SEP08 764 850 forward 2 TM Nout no Ll:208637.1 2000SEP08 899 952 forward 2 TM Nout no Ll:208637.1 2000SEP08 1721 1804 forward 2 TM Nout no Ll:208637.1 2000SEP08 2393 2443 forward 2 TM Nout no Ll:208637.1 2000SEP08 2903 2977 forward 2 TM Nout no Ll:208637.1 2000SEP08 3011 3082 forward 2 TM Nout no Ll:208637.1 2000SEP08 3119 3205 forward 2 TM Nout no Ll:208637.1 2000SEP08 3299 3370 forward 2 TM Nout no Ll:208637.1 2000SEP08 3404 3478 forward 2 TM Nout no Ll:208637.1 2000SEP08 4238 4291 forward 2 TM Nout no Ll:208637,l 2000SEP08 4301 4375 forward 2 TM Nout no Ll:208637.1 2000SEP08 381 431 forward 3 TM Nin no Ll:208637.1 2000SEP08 2094 2165 forward 3 TM Nin no Ll:208637.1 2000SEP08 2547 2630 forward 3 TM Nin no Ll:208637.1 2000SEP08 2877 2948 forward 3 TM Nin no Ll:208637.1 2000SEP08 2982 3068 forward 3 TM Nin no LI. '208637.1 2000SEP08 3186 3272 forward 3 TM in no Ll:208637,l 2000SEP08 3273 3359 forward 3 TM Nin TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς no Ll:208637.1: 2000SEP08 3369 3431 forward 3 TM Nin no Ll:208637.1: 2000SEP08 3453 3515 forward 3 TM Nin no Ll:208637.1: 2000SEP08 3528 3599 forward 3 TM in no Ll:208637.1: 2000SEP08 3666 3752 forward 3 TM Nin no Ll:208637.1: 2000SEP08 3930 4016 forward 3 TM Nin no Ll:208637.1: 2000SEP08 4242 4328 forward 3 TM Nin no Ll;208637.1: 2000SEP08 4440 4526 forward 3 TM Nin no Ll:208637.1: 2000SEP08 4695 4775 forward 3 TM Nin
111 Ll:2051808.1 :2000SEP08 775 852 forward 1 TM Nin
111 Ll:2051808.1 :2000SEP08 758 844 forward 2 TM Nin
113 LI: 1177337.1 :2000SEP08 1639 1713 forward 1 TM Nout
113 LI: 1177337.1 :2000SEP08 846 932 forward 3 TM Nout
113 LI: 1177337.1 .-2000SEP08 1620 1670 forward 3 TM Nout
114 LI: 1165056.1 :2000SEP08 918 1004 forward 3 TM Nout
115 LI: 1175250.1 :2000SEP08 596 652 forward 2 TM Nin
115 LI: 1175250.1 :2000SEP08 917 1003 forward 2 TM Nin
116 Lhll 83192.1 :2000SEP08 1405 1491 forward 1 TM Nout
116 Ll:ll 83192.1 :2000SEP08 1319 1378 forward 2 TM Nin
116 Ll;l 183192.1 :2000SEP08 1449 1529 forward 3 TM
117 LI '.1183325.1 :2000SEP08 554 607 forward 2 TM Nin
119 Ll:813422.1: 2000SEP08 61 138 forward 1 TM Nout
119 LI :813422 1: 2000SEP08 493 573 forward 1 TM Nout
119 LI :813422 1: 2000SEP08 682 753 forward 1 TM Nout
119 LI :813422 1: 2000SEP08 1123 1185 forward 1 TM Nout
119 Ll:813422 1: 2000SEP08 1198 1260 forward 1 TM Nout
119 Ll:813422 1: 2000SEP08 1555 1632 forward 1 TM Nout
119 Ll:813422 1: 2000SEP08 692 751 forward 2 TM Nout
119 L 813422 1: 2000SEP08 1106 1183 forward 2 TM Nout
119 LI .-813422 1: 2000SEP08 1553 1615 forward 2 TM out
119 LI :813422 1: 2000SEP08 57 143 forward 3 TM
119 Ll:813422 1: 2000SEP08 255 341 forward 3 . TM
119 Ll:813422 1: 2000SEP08 492 575 forward 3 TM
119 LL813422 1: 2000SEP08 585 635 forward 3 TM
119 LL813422 1: 2000SEP08 705 779 forward 3 TM
119 LL813422 1: 2000SEP08 1113 1175 forward 3 TM
119 Ll:813422 1: 2000SEP08 1452 1499 forward 3 TM
120 LI:1093049.6:2000SEP08 25 87 forward 1 TM Nin
120 LI:1093049.6:2000SEP08 121 183 forward 1 TM Nin
120 LI:1093049.6:2000SEP08 424 480 forward 1 TM Nin
120 LI:1093049.ό:2000SEP08 925 1011 forward 1 TM Nin
120 LI:1093049.6:2000SEP08 1069 1152 forward 1 TM Nin
120 LI:1093049.6:2000SEP08 1204 1290 forward 1 TM Nin
120 LI:1093049.6:2000SEP08 92 142 forward 2 TM Nin
120 LI:1093049.6:2000SEP08 815 883 forward 2 TM Nin
120 LI:1093049.6:2000SEP08 968 1039 forward 2 TM Nin
120 LI:1093049.6:2000SEP08 1235 1318 forward 2 TM Nin
120 U:1093049.6:2000SEP08 27 89 forward 3 TM Nin
120 U:1093049.6:2000SEP08 105 167 forward 3 TM Nin
120 LI: 109304 ?.( ):2000SEP08 939 1010 forward 3 TM Nin c ? ZΠ _\ _: C ZH Z3 H Π ZZI ! D D ZZΠ
~ o o ~ ~ ~ o o o o o o o o o o o o o o o o o o o o o o o o~ ~ o o o o o~ ~ ~ ~~ ~ z ZZZZ Z ZZZZZZZZZZZZZZZZZZZZZ z z z z z z z z
I— I— I— I— I— -> -> -> -> _> -> -> Σ S
00 CO r— r— o CM ■— CM CM CM CM CM CM CM CM CM CM CM CO CO CO r— >— CM CM CO CO '— ■— ■— ■— ■— CM CM CM CM CM CM CO CO
Φ ε "2 "2 "2 3 O 3 P T3 O P P P P P g _\ p p Ό Ό Ό Ό o δ δ "2 δ "2 δ "2 δ "2 δ δ "2 b D D O D D O O D D D D p D δ δ δ δ δ δ o δ δ δ δ δ δ δ δ δ δ b b δ δ δ δ p p o o p p P P ≥ o p≥ p≥ p≥ p p≥ ≥ o o o p≥ p≥ O O O O O O P ≥ P ≥ P≥ P≥ p p p p p p o p o p p o £ P P P
L C
00 00 00 OO CO O O O O 00 CO CO CO CO
O O oθ oθ CL Q- Q- oO oo oO θO oO cθ oθ oθ oθ cθ oO cθ cθ oθ oθ oθ oθ oO oθ oθ oθ oθ oθ O O O O O O oθ cxD cO cO oθ oθ oθ oθ oθ oθ oo oθ oθ Q- Q- O O LU LU LU O O O O O O O O O O O O O O O O O O O O O O O U- Q- Q- 0_ Q- Q- 0 0 0 0 0 0 0 0 0 0 0 0 0 m Q_ 0_ 0 O D_ 0_ D_ CL l IX 0_ l- Q. Q. Q- 0_ L u. L 0^ co o O O O in i in 111 n cn co cn C/1 cn III III 111 III III III III III III III III III III
O ci Or-i OO COΛ QP QP pP CoO COO COO COO COO OCO COO COO COO COO COO OCΛ COΛ COΛ OCO COΛ COO OCΛ COΛ COΛ COO COΛ WO O O O O O O O O O O O O O O O O O O O O O CM CM CM O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
CM CM O O -^ -^ -^ O O O O O O O O O O O O O O O O O O O O O O O CM CM CM CM CM CM O O O σ O O O O O O O O O j -o W W Nt Λi ΛΪ N N N N N W N N W N N N N W N N N N W N N N N ^ ^ ^ '- '^ ^ W N N N < I CSI < CNI CNI C I ( CV| < c> c> ^'^S oo^ ^ '^-^ '^ "^'^ "^'^ '^'^ '^'^'^ '^'^ '^ '^"^'^'^ c i cM co LO Lo ^' ^ [ rs is rs rs rs rs rs r^ s' is s rs s rs rs rs s o o O h •— O O .— .— ,— ,— .— r— ,— ,— -- -- -- -- -- -- -- -- -- -- i— i— ,— ,— ,— LO LO LO LO LO LO O O O O O O O O O O O O O
CO CO ^ r- 2 O θ N7 ^ L0 lΛ 0 L0 1-D L L0 L0 L0 0 l0 L0 L0 L0 L0 L0 L0 L0 L0 L0 CM CM C CM CM C> 0^ C> CN C^ o o CM CN O r- r- •O O O O O O O O O O O O O O O O O O O O O O O '— ■— •— r- -- -- s- s s s s rs s rs s- s- rs s- s o o O O 1- "— — CM CM O O O O O O O O O O O O O O O O O O O O O O O O O O O CM CM CM CM CM CM CM CM CM CM CM CM CM
— — 000 — — — ^ — — " — - — — — — - — — — — ^ " - — — — — ^ — — o o - O O r— ■— CM O CO ' j N-f u^ LO LO LO LO LO LO LO IJO O LO LO L LO L O LO LO lO O LO IS Is rs S CO O^
JNJ ^ C_ C_M C_M C_M C_M Cr_M C_M rC_M C_M C_M Cr_M C--M CrM- CrM- C-M_ Cr_M C_M Cr_M C-_M C-M_ C-_M C-M- C--M Cr_M C-M- Cr-M Cr-M C-_M C-_M Cr_M C-- r_ r- -_ -_ -_ -_ -_ -_ -_ -- _ r- r- r- r- -_ -- -- o LU O
TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς
128 U:427997.4:2000SEP08 786 869 forward 3 TM Nin
128 LI:427997.4:2000SEP08 1665 1751 forward 3 TM Nin
128 LI:427997.4:2000SEP08 1788 1874 forward 3 TM Nin
128 LI:427997.4:2000SEP08 2169 2249 forward 3 TM Nin
129 LI :197899.1:2000SEP08 205 291 forward 1 TM Nin
129 LI :197899.1:2000SEP08 301 381 forward 1 TM Nin
129 LI :197899.1:2000SEP08 439 513 forward 1 TM Nin
129 LI :197899.1:2000SEP08 604 690 forward 1 TM Nin
129 LI :197899.1;2000SEP08 688 774 forward 1 TM in
129 LI :197899.1:2000SEP08 988 1044 forward 1 TM Nin
129 LI :197899.1:2000SEP08 1525 1584 forward 1 TM Nin
129 LI :197899.1:2000SEP08 2029 2115 forward 1 TM Nin
129 LI :197899.1:2000SEP08 197 283 forward 2 TM Nout
129 LI :197899.1:2000SEP08 326 412 forward 2 TM Nout
129 LI :197899.1:2000SEP08 824 910 forward 2 TM Nout
129 LI :197899.1:2000SEP08 956 1042 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1094 1180 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1241 1303 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1328 1390 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1529 1615 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1628 1702 forward 2 TM Nout
129 LI :197899.1:2000SEP08 1727 1789 forward 2 TM Nout
129 LI :197899.1.-2000SEP08 1805 1867 forward 2 TM out
129 Li :197899.1:2000SEP08 2075 2140 forward 2 TM Nout
129 LI :197899.1:2000SEP08 90 140 forward 3 TM
129 LI :197899.1:2000SEP08 321 407 forward 3 TM
129 LI :197899.1:2000SEP08 951 1016 forward 3 TM
129 LI :197899.1:2000SEP08 1059 1133 forward 3 TM'
129 LI :197899.1:2000SEP08 1485 1568 forward 3 TM
130 LG:334199.1 2000SEP08 80 154 forward 2 TM Nout
131 LG:334345.1 2000SEP08 503 586 forward 2 TM Nin
131 LG:334345.1 2000SEP08 60 125 forward 3 TM Nin
132 LG:228092.1 2000SEP08 319 402 forward 1 TM Nin
132 LG:228092.1 2000SEP08 511 597 forward 1 TM Nin
132 LG:228092.1 2000SEP08 818 877 forward 2 TM
132 LG:228092.1 2000SEP08 333 419 forward 3 TM Nin
132 LG:228092.1 2000SEP08 1113 1184 forward 3 TM Nin
133 LG:098580.1 2000SEP08 520 606 forward 1 TM Nout
133 LG:098580.1 2000SEP08 640 708 forward 1 TM Nout
133 LG:098580.1 2000SEP08 494 580 forward 2 ' TM Nout
133 LG:098580.1 2000SEP08 668 739 forward 2 TM Nout
133 LG:098580.1 2000SEP08 513 599 forward 3 TM Nout
133 LG:098580.1 2000SEP08 636 722 forward 3 TM Nout
134 LG:969572.1 2000SEP08 55 141 forward 1 TM Nout
134 LG:969572.1 2000SEP08 256 342 forward 1 TM Nout
135 LG: 196958,1 2000SEP08 287 367 forward 2 TM Nin
135 LG: 196958.1 2000SEP08 1007 1057 forward 2 TM Nin
135 LG: 196958.1 2000SEP08 1247 1333 forward 2 TM Nin
135 LG: 196958.1 2000SEP08 1281 1367 forward 3 TM Nout TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς
136 LG: 1087811.1:2000SEP08 799 855 forward 1 TM Nout
136 LG: 1087811.1:2000SEP08 728 814 forward 2 TM Nout
137 LG: 1327885.1:2000SEP08 84 167 forward 3 TM Nin
137 LG: 1327885.1:2000SEP08 471 548 forward 3 TM Nin
139 LI:897616.1:2000SEP08 63 149 forward 3 TM Nout
141 U:027066.6:2000SEP08 707 793 forward 2 TM Nout
142 LI :1074263.1:2000SEP08 21 101 forward 3 TM Nout
143 LI:334345.1:2000SEP08 592 675 forward 1 TM Nin
143 LI -.334345.1-.2000SEP08 128 202 forward 2 TM out
144 Ll:1093914.1 :2000SEP08 772 825 forward 1 TM Nout
144 Ll:1093914.1 :2000SEP08 479 550 forward 2 TM Nout
144 LI: 1093914.1 :2000SEP08 324 374 forward 3 TM Nin
145 ' Ll:l 188168.1 :2000SEP08 366 449 forward 3 TM Nin
145 U:l 188168.1 :2000SEP08 2598 2645 forward 3 TM Nin
146 LI: 1065168.1 :2000SEP08 295 381 forward 1 TM Nin
146 Ll:1065168.1 :2000SEP08 194 280 forward 2 TM Nout
147 Ll:1180418.1 :2000SEP08 799 855 forward 1 TM Nout
149 LG: 1078420.1:2000SEP08 707 784 forward 2 TM Nout
149 LG: 1078420.1:2000SEP08 669 749 forward 3 TM Nin
150 LG: 1397599, 1:2000SEP08 46 120 forward 1 TM Nout
• 150 LG: 1397599.1:2000SEP08 178 261 forward 1 TM Nout
150 LG: 1397599.1:2000SEP08 29 115 forward 2 TM Nout
150 LG: 1397599.1;2000SEP08 149 232 forward 2 TM Nout
150 LG: 1397599.1:2000SEP08 24 no forward 3 TM Nin
150 LG: 1397599.1:2000SEP08 141 191 forward 3 TM Nin
151 LG:1397655.2:2000SEP08 406 * 489 forward 1 TM Nout
151 LG:1397655.2:2000SEP08 131 190 forward 2 TM Nout
151 LG:1397655.2:2000SEP08 401 451 forward 2 TM Nout
151 LG:1397655.2:2000SEP08 444 527 forward 3 TM Nout
152 LG:241055.1 :2000SEP08 19 90 forward 1 TM Nout
152 LG:241055.1 :2000SEP08 172 255 forward 1 TM Nout
152 LG:241055.1 :2000SEP08 1045 1110 forward 1 TM Nout
152 LG:241055.1 :2000SEP08 51 113 forward 3 TM Nout
152 LG:241055.1 :2000SEP08 162 224 forward 3 TM Nout
153 LG:1101065.1 :2000SEP08 1 57 forward 1 TM Nout
153 LG:1101065.1 :2000SEP08 11 73 forward 2 TM Nin
153 LG:1101065.1:2000SEP08 92 139 forward 2 TM Nin
153 LG: 1101065.1:2000SEP08 732 809 forward 3 TM Nout
155 LI 348991.1:: 2000SEP08 835 900 forward 1 TM Nin
155 LI 348991.1:: 2000SEP08 803 889 forward 2 TM Nout
155 LI 348991.1:: 2000SEP08 843 899 forward 3 TM Nout
158 LI 815686.1:: 2000SEP08 163 246 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 673 759 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 868 954 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 1108 1173 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 1201 1287 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 1369 1452 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 1597 1659 forward 1 TM Nin
158 LI 815686.1:: 2000SEP08 164 250 forward 2 TM Nout TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topoloς
158 LI :815686,1 2000SEP08 662 748 forward 2 TM Nout
158 LI :815686.1 2000SEP08 824 910 forward 2 TM Nout
158 LI :815686.1 2000SEP08 938 988 forward 2 TM Nout
158 Ll:815686.1 2000SEP08 1031 1117 forward 2 TM Nout
158 LI :815686.1 2000SEP08 1118 1192 forward 2 TM Nout
158 LI :815686.1 2000SEP08 1559 1645 forward 2 ' TM Nout
158 Ll:815686.1 2000SEP08 12 74 forward 3 TM Nin
158 LI :815686.1 2000SEP08 474 557 forward 3 TM Nin
158 Ll:815686.1 2000SEP08 813 899 forward 3 TM Nin
158 Ll:815686.1 2000SEP08 1110 1196 forward 3 TM Nin
158 Ll:815686.1 2000SEP08 1389 1451 forward 3 TM Nin
158 Ll:815686.1 2000SEP08 1470 1532 forward 3 TM Nin
159 LI:1167327.2:2000SEP08 31 117 forward 1 TM Nout
159 LI:1167327.2:2000SEP08 151 228 forward 1 TM Nout
159 LI:1167327.2:2000SEP08 23 109 forward 2 TM Nout
159 LI:1167327.2:2000SEP08 263 349 forward 2 TM Nout
159 LI:1167327.2:2000SEP08 48 122 forward 3 TM Nout
159 U:1167327.2:2000SEP08 150 236 forward 3 TM Nout
161 LG:331593.1:2000SEP08 902 973 forward 2 TM Nout
161 LG:331593.1:2000SEP08 579 656 forward 3 TM Nout
162 U:1094174.1:2000SEP08 1270 1329 forward 1 TM Nout
162 LI:1094174.1:2000SEP08 1656 1742 forward 3 TM Nin
163 Ll:814362.1 2000SEP08 331 417 forward 1 TM Nout
163 Ll:814362.1 2000SEP08 383 469 forward 2 TM Nout
163 Ll:814362.1 2000SEP08 18 95 forward 3 TM Nin
164 Ll:219542,l 2000SEP08 13 72 forward 1 TM Nout
164 Ll:219542.1 2000SEP08 265 345 forward 1 TM Nout
164 U:219542.1 2000SEP08 279 365 forward 3 TM Nin
165 U726197.1 2000SEP08 187 267 forward 1 TM Nin
166 LI:1075314.1:2000SEP08 235 297 forward 1 TM Nin
166 LI:1075314.1:2000SEP08 325 387 forward 1 TM Nin
166 LI:1075314.1:2000SEP08 580 642 forward 1 TM Nin
166 LI:1075314.1:2000SEP08 694 765 forward 1 TM Nin
166 LI:1075314.1:2000SEP08 422 484 forward 2 TM Nout
166 LI:1075314.1:2000SEP08 509 571 forward 2 TM Nout
166 U:1075314.1:2000SEP08 657 737 forward 3 TM Nout
168 LG:336265.1:2000SEP08 868 930 forward 1 TM Nout
168 LG:336265.1:2000SEP08 943 1005 forward 1 TM Nout
168 LG:336265.1:2000SEP08 848 919 forward 2 TM Nout
168 LG:336265.1:2000SEP08 965 1024 forward 2 TM Nout
168 LG:336265.1:2000SEP08 825 911 forward 3 TM Nin
168 LG:336265.1:2000SEP08 951 1013 forward 3 TM Nin
168 LG:336265.1:2000SEP08 1038 1100 forward 3 TM Nin
168 LG:336265.1:2000SEP08 1497 1583 forward 3 TM Nin
169 LG:407788.2:2000SEP08 253 336 forward 1 TM Nout
169 LG:407788.2:2000SEP08 493 558 forward 1 TM Nout
169 LG:407788.2:2000SEP08 562 648 forward 1 TM Nout
169 LG:407788.2:2000SEP08 278 355 forward 2 TM
169 LG:407788. 2:2000SEP08 542 604 forward 2 TM m
Ό
-^1 SI SI SI SI SI -^1 SJ SI ~-J SI SI si si SI -^1 Sl si si SI ^1 SI SI si SI SI SI SJ SJ si si SI -~J SI SI si si SJ SI Ch o Ch o o •o 00 si s) SI si cn 45* •fc. CO Go CO CO CO CO ro ro ro ro IO ro ro ro ro ro IO o o O o O o o o O o o O o o o vO Ό
Z p
ro SI O GO _ * 45* cn o o Cn .fc. 00 ro 00 o ro Oi 45* CO ro co
CO O cn ro oo cn 45* ro o o 00 Ch cn cn _, _. ro o 00 ^J. o o ro o CD ) 1 ^J CO o 4-
-fc. C ro o S o ro Go Oi o o O ro -fc. o O —' 00 o CO cn 00 45* o O o Ch 00 ro Go .fc. ro cn si o o SI si o cn o 45* o SI ro 45* c ro Co ro Oi < en O O o CO o cn —' o ro 45* ro 00 00 4-» 45* si 00 O Oi ^j 00 co o o o o O o ro 45* ro IO £ Ch Q 1—
=!- m co ro O Go O CO ro 45* oo o — ' oo p o o ro o cn o oo-Q c c c?T c ? o o o o o g o o o g g g o" P' o" d g" o" o" o' o" g'' d' o" d' o" g" o" g" o o o" g" σ o" o" __ 3 o o o g o
? ? ^ ^
Ώ O ϋ o o a Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω a a a a a a aaaaaaaaaaaaaaaaaaaaaaaaa aa aa aaa aaaa aac
— ' ro co co ro io ro Go ro co Go Go Go io ro co Go co Go ro r ro — • — • — co ro ro ro ro ro ro io ro ro ro ro io io ro ro iO Go Go Go ro
D o
SSSSJ S 22 S S S2 S S 22 3
Ω
ZJ z z zzzzzzzzzz zzzzzzz z z z z z o
P o o o c c o C _3. -3.0C 0 -=3-00C -3.-rJ.-D-_. OC OC OC OC OC OC OC OC OC OC OC OC OC —-3 oC oC oC oC oC oCoZo CZoZoZoZoCoZoCZoCZ oCZoCIoCIoCZoCZ
— H — i- — H — H — I- — I- —*- — + —*- — H o
• C<Q
CO rn
© ) O ) 3 3 3 C» C» 00 00 00 00 CD 00 CD C» 00 00 CD 00 00 CO CD 00 00 00 00 00 CD C» 00 00 00 CD 00 MM MM M M M M M M M M M
— ' — ' O O o o -o -o oo oo sj sj o cn oi 45* co co ro ro ro io — . _. _. — - o o o o o o o o o oooooooooooooo σ z o
ro io cn 45* o co ro _ι io ro Ol CO o ro cn 5 Oπl o — ' ^J Ol w « ^ M » c; ω o ^ (> ω ^ ^ ^ > 00 o ro co ro 45* oo oo ° cn o 45* — g
Co Cπ f— . i — r roo .. i j-> r^ r.i m v° _ 45* -^ J ro m en Q
Go O 45* 45* sj Go cn 00 — K oo si — co ' - C ωO K i o ° * 45* ° o o co _, co si si — ■ cn 0 cn ro Go o> si O sj Ol SI ~ O 45* CO CD CD — - 45* Ch — O SI Ol O CO O Ch
O £o ∞ gQ
Oi 45* sl 45* Co — ' — ' n IO Go ,-- — ' IO O Co ro — ■ — • — • > oo cn — r-1 ' m ya ro.i — ,' I KO-. — CO (. 00 J ^ 4 C C0 r^ ro U^ O C> 4i. ^J -0 5? Nj.cftJ vS*n -^-J' fS1oιi mSc> ω"j p ro M ji. S i roo 8 COl 4fe5* 45* i c»' s!li « CD S <) « C vj ϋl Go O CD si CD ro a αι o <) ω ω u θ3 M υ' co ω o M K) g f3 y ^ ω oo ^ Oi — ' O — O w w si 45* 45* — ' O ^J tπ lN3 G0 O (-, w 'e' θ oooooooooo O O O O O O O o o o δ o O O O O O O 0 O O O O O ό* o o o o
Z £ ? Ξ ^ €
Q Q Ω — Q x — Ω Ω Ω Ω Ω Ω Ω Ω Ω Q Q Ω Ω Ω O O O o Ω Ω Ω Q Ω Ω Ω Ω Ω Ω Ω Ω 3. & a a a\ Ω Ω aa a a a a a a 3. a a a a a a a a a a a a a & Q. a a a a a a a α ai Φ
— ■ — ' Co ro ro ro — ' — ' ro — ' — ' co io — ■ — n ro ro ro — ' Go CO Go CO Go to Go co Go GO Go Go co co ro ro ro ro ro
^ ^ ^ Z Z Z Z -^ -^ Z -^ Z -^ Z ^ Z Z -T.
± ± o O O O O ± ± O ± O ± O O _- O O O -. Z Z ? ? ? 2 Z Z Z Z Z Z Z Z Z -7 -7 -7 -7 -7 - Z Z Z
_. _ . o O O — — — — — — — — — _ ±f± f ooooooo D ZJ D Z3 C C C C C C O — ^ — t- — i- — t- — t- — t- -Q
-<
ro ro ro ro ro ro α ro ro io io ro ro io ro io ro No ro io ro ro ro ro ro ro No ro ro ro ro io ro io ro io © ro ro O O O O O O O O O O 0_ 0_ 0_ 0 0 0 0 0 0 0 0 -0 0 0 0 -0 -0 0 -0 0 0 0 co o sl O O O O 0ι 4_* G0 G0 G0 O O O O- 45* 45* 45* 45* 45* 45* co ccoo ccoo rroo — ■ — ' -■ n n M M M si ϋi ϋi ϋi Ji Cio i M -' " o
co ro —' 45* 45* oi —' Co —- ro cn — 45* ro oi IO SI O- iv, __, N-, —' ,,, —' ro 45* ro 45* co cn cn o si o Go ro Go ro _, > CD
Ol O O — ' O 45* 45* si sj CD IO O O- CO o ι sl Oi O sl — ■ 45* Cn 4-» Go C -o CO IO O Ol o 0 0- 0 cn oo o ro — o — co si s ro o co — ' 00 θ rθ C 45* sJ O O 45* ro
45* ro ro cn oι θι ro 45* ro ro cn ro oi Go θi co oo si — ■ . , — - ro — ■ ro — ro — 45* co oi 45* O 0- sj sl sl G0 C0 45* C0 o ro -5
— ' O OO O O O Go θl Ol CD θ rθ 45* rθ sl O — 01 M j o ω o si o 45* ro 00 cn si — ' sj oo θ θ sl sj o rθ θi M.1-a Os — ' o o o — ' Cn co oi sj -- u. o o .fc. ro oo ro o o .fc. Go ro o o si o 4 co o — ' o co o 00 cn o co 45* ro oo o — ' — • 0 — < ^ si — ' O ^3 — ' OO O -ζj o 3* O O O O O O O O o 0 0 0 g 0 0 0 0 0 0 i 0 0 0 0 0 0 0 0 O O O O O O c O O O O O O O δ* O -π s — *
Ω
Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω aaaaaaaaaaaa aaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaao
GO co ro — ■ ro ro ro — ■ GO ro — ■ — ■ ro GO GO ro co co ro ro co ro ro ro — ' Co ro ro ro co co ro
O O
Ω
-^ Z ^ Z ^ Z Z Z -^ -^ -^ ^ Z ^ Z Z -^ ^ -^ ^ z z Z z z zzzz o
Z Z
3 o oC ±3 oC oC o -3 o o o o -3 ±D 3r.-3 o o o z z z~o
0 0zz — 0 0 0 O 0 0 0 0 0 — 0 0 - o 3 C C o -3.-3 ±±o o Z CZ ZJ c c c c c c z z Z J c c => o
CQ <
TABLE 4
SEQ ID NO Template ID Start Stop Frame Domain Topology
225 Li :1086066.1:2000SEP08 99 185 forward 3 TM Nout
226 (.1:223142.1 :2000SEP08 826 888 forward 1 TM
226 Li:223142.1:2000SEP08 973 1059 forward 1 TM
226 LI:223142.1:2000SEP08 770 841 forward 2 TM Nin
226 LI:223142.1:2000SEP08 995 1051 forward 2 TM Nin
226 LI:223142.1:2000SEP08 798 857 forward 3 TM Nout
227 LI:885368.1:2000SEP08 540 608 forward 3 TM Nin
230 LI;449413.2:2000SEP08 349 435 forward 1 TM Nout
231 LI:450105.1:2000SEP08 208 294 forward 1 TM Nin
233 LI:11 2855,1:2000SEP08 17 91 forward 2 TM Nout
233 LI: 1142855.1 :2000SEP08 245 295 forward 2 TM Nout
233 LI: 1142855.1.-2000SEP08 344 430 forward 2 TM Nout
233 LI: 1142855.1 :2000SEP08 51 98 forward 3 TM Nout
235 Ll:817845.1: 2000SEP08 460 513 forward 1 TM Nin
235 LI :817845.1: 2000SEP08 515 601 forward 2 TM Nin
237 Ll:815874.1: 2000SEP08 467 550 forward 2 TM Nin
237 LI :815874.1: 2000SEP08 513 587 forward 3 TM Nout
238 Ll:255713.1: 2000SEP08 621 674 forward 3 TM
239 Ll:035973.1: 2000SEP08 790 849 forward 1 TM Nin
239 0:035973.1: 2000SEP08 626 712 forward 2 TM Nout
239 U.O35973.1: 2000SEP08 642 728 forward 3 TM Nin
240 Ll:l 138110.1 :2000SEP08 22 108 forward 1 TM Nin
240 LI: 1138110.1 :2000SEP08 154 240 forward 1 TM Nin
240 U:l 138110.1 :2000SEP08 47 121 forward 2 TM Nout
240 LI: 1138110.1 :2000SEP08 170 244 forward 2 TM Nout
240 Ll:l 138110.1 :2000SEP08 51 110 forward 3 TM Nin
240 LI: 1138110.1 :2000SEP08 195 272 forward 3 TM Nin
242 LI: 1092460.1 :2000SEP08 181 243 forward 1 TM Nout
243 LI:399421.1:2000SEP08 310 396 forward 1 TM Nin
243 LI:399421.1:2000SEP08 1681 1767 forward 1 TM Nin
243 LI :399421.1.-2000SEP08 1900 1950 forward 1 TM Nin
243 U:399421.1:2000SEP08 59 112 forward 2 TM Nout
243 LI . -399421.1.-2000SEP08 593 664 forward 2 TM Nout
243 LI:399421.1:2000SEP08 797 853 forward 2 TM Nout
243 LI:399421.1:2000SEP08 1445 1519 forward 2 TM Nout
243 LI:399421.1:2000SEP08 1640 1705 forward 2 TM Nout
243 U:399421.1:2000SEP08 2000 2074 forward 2 TM Nout
243 LI:399421.1:2000SEP08 666 752 forward 3 TM Nout
243 LI:399421.1:2000SEP08 1461 1517 forward 3 TM Nout
243 LI:399421.1:2000SEP08 1893 1955 forward 3 TM Nout
244 LI:816655.2:2000SEP08 373 435 forward 1 TM Nin
244 LI:816655.2:2000SEP08 466 528 forward 1 TM Nin
244 LI.-816655.2.-2000SEP08 871 948 forward 1 TM in
244 U:81όό55.2:2000SEP08 1099 1185 forward 1 TM Nin
244 U:816655.2:2000SEP08 344 427 forward 2 TM Nin
244 LI:816655.2:2000SEP08 1127 1213 forward 2 TM Nin
244 LI:816655.2:2000SEP08 453 539 forward 3 TM Nin
244 LI:816655.2:2000SEP08 1062 1127 forward 3 TM Nin
245 LG:414732.1 :2000SEP08 40 93 forward 1 TM Nout M M M M r N) M M W M M M r W ) r W M M M M M W M M W M M M M M N) M M N M Cfl W Λ W CJi ϋi lΛ Ol Jl W Oi Oi Ji ^ W Ci ϋi ϋi Ji C^ Cr Oi Cfl Jl O ϋi C^ sj s c oι J5* 45* 45* co co co N3 ro ιo ro N3 rθ M ro ND io ro ιo ro ιo ro ιo ro ro ro ro ro — • — ' o o o o oo oo oo oo oo oo oo si si sj o &σ z o
o-
'
( ) πι
>
-_. 2. si 45* 45* —' 45* cn cn ro ro ro cn o CD O 45* C O ro •fc. o o o o ro cn co cn cn si ro —■ o o o cn ro cn o ro o o o io co o cn o o
O ro o si o o Ω o o sl ro ro o o
O Oi CO GO O Oi GO Oo sl CD 45* IO ϋi Oi O NT ϋi sl OO Oo ϋi O O CO Co OO NT 45* G0 45* C SI 45* CO CO -Q o o o o o o o o o o o o o o o o o o o o o O O O O O O O O O O O O O o oo o o oo o o o o o o
Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω ' ϊ Ω ' Ω ' ζ Ω Ω I Ω ϊ Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω ΩΩ Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω 0.0.0.0.0. a. a a a a Q. a a a Q. Q. a a a a a a a a a a a a a a a aa
NO — CO CO CO aNT NT — ' — ' Go CO GO Co GO GO CO CO Go rO N IO NT — ' — ' — — ' — NT to Co CO GO a a CO NT — ' — ' NT NT NT GO
Ό
— H — I — 1 o
3
Ω
Z Z Z Z 2 z Z Z Z Z Z Z Z z z z z z z z z z z z z z z 2 z z 22 -7 z z z z .7 -7 -2. -7 Zτ3
O O O O -- -. o o o
C C C C -J D Z Z cz oZ oCZ o oCZ oCZ oCZ oCZ oCZ =-J _7.rZJ: ~oZ oCZ o o o o o o O O — O O O — — — o o o O -. — — ± 0 o cZ cz c z cz z cz cZ J _ cz ^ _ _ c - ^ J ZJ Z-i s -'.' CQ °
TABLE 4
HDNO Template ID Start Stop Frame Domain Topoloς
258 U: 49685.1:2Q00SEP08, 499 552 forward 1 TM
258 LI :449ό85. l:2000SEPOβ' 446 511 forward 2 TM out
258 LI .-449685.1:2000SEP08 132 218 forward 3 TM Nin
259 LI:476342.1:2000SEP08 39 122 forward 3 TM Nout
260 LI: 1072804.1:2000SEP08 500 586 forward 2 TM Nout
260 LI: 1072804.1:2000SEP08 276 332 forward 3 TM Nout
261 LI:455450.1:2000SEP08 422 490 forward 2 TM Nout
263 U:1013729.1:2000SEP08 475 540 forward 1 TM Nin
263 U: 1013729.1:2000SEP08 470 556 forward 2 TM Nout
263 U:1013729.1:2000SEP08 507 554 forward 3 TM Nin
264 LI:2050322.2;2000SEP08 1103 1156 forward 2 TM Nout
266 LI:2053076.1:2000SEP08 112 198 forward 1 TM Nin
266 U:2053076.1:2000SEP08 490 576 forward 1 TM Nin
266 LI:2053076.1:2000SEP08 62 148 forward 2 TM Nin
266 L1:2053076.1:2000SEP08 720 806 forward 3 TM
268 LG:406709.1:2000SEP08 183 233 forward 3 TM Nin
269 LΘ:347863.9:2000SEP08 125 211 forward 2 TM Nout
269 LG:347863.9:2000SEP08 497 562 forward 2 TM Nout
269 LG:347863.9:2000SEP08 24 no forward 3 TM Nout
269 LG:347863.9:2000SEP08 156 230 forward 3 TM out
271 Ll:347635.1 2000SEP08 664 735 forward 1 TM Nin
271 Ll:347635.1 2000SEP08 1468 1554 forward 1 TM Nin
271 Ll:347635.1 2000SEP08 815 901 forward 2 TM Nout
271 Ll:347635.1 2000SEP08 1547 1597 forward 2 TM Nout
271 Ll:347635.1 2000SEP08 414 488 forward 3 TM Nout
271 Ll:347635.1 2000SEP08 600 650 forward 3 TM Nout
272 Ll:013685.1 2000SEP08 22 108 forward 1 TM Nout
272 Ll:013685.1 2000SEP08 1483 1536 forward 1 TM Nout
272 LI :013685.1 2000SEP08 221 304 forward 2 TM Nout
272 Ll:013685.1 2000SEP08 653 700 forward 2 TM Nout
272 Ll:013685.1 2000SEP08 711 767 forward 3 TM Nin
273 Ll:406709.1 2000SEP08 183 233 forward 3 TM Nin
274 Ll:2052938. 1:2000SEP08 ' 437 505 forward 2 TM
"ABLE 5
SEQ ID NO: Template ID Component ID Start Stop LG:405741 ,3:2000SEP08 g4076447 772 1210 LG:405741.3:2000SEP08 g6473689 757 1199 LG:405741.3:2000SEP08 g764385 877 1199 LG:405741.3:2000SEP08 g4332695 906 1198 LG:405741.3:2000SEP08 g5742159 755 1198 LG:405741 ,3:2000SEP08 g4326577 1008 1198 LG:405741.3:2000SEP08 gl721162 906 1197 LG:405741.3:2000SEP08 g4393490 746 1196 LG:405741.3:2000SEP08 g4004428 793 1196 LG:405741.3:2000SEP08 g6658709 732 1196 LG:405741.3:2000SEP08 g4510098 936 1196 LG:405741.3:2000SEP08 g2224160 838 1196 LG.-405741.3:2000SEP08 g!391615 774 1193 LG:405741 ,3:2000SEP08 2605255H1 953 1194 LG:405741.3:2000SEP08 g2986591 948 1193 LG:405741 ,3:2000SEP08 1632154H1 985 1193 LG:405741.3:2000SEP08 g4327028 998 1193 LG:405741.3:2000SEP08 g1225424 827 1193 LG:405741.3:2000SEP08 g2705577 772 1193 LG:405741.3:2000SEP08 g6710344 797 1193 LG:405741 ,3:2000SEP08 2888488H1 1072 1193 LG:405741.3:2000SEP08 2605255F6 953 1193 LG-.405741.3:2000SEP08 238539 6 733 1185 LG:405741.3:2000SEP08 g1080925 861 1180 LG:405741.3:2000SEP08 7624412H1 973 1169 LG:405741.3:2000SEP08 7402118H1 725 1164 LG:405741.3:2000SEP08 614864T6 629 1158 LG:405741 ,3:2000SEP08 2605255T6 953 1156 LG:405741.3:2000SEP08 1437574T6 823 1155 LG:405741 ,3:2000SEP08 3696915T6 1057 1149 LG:405741 ,3:2000SEP08 1501410T6 666 1143 LG-.405741.3-.2000SEP08 2779559T6 711 1134 LG:405741.3:2000SEP08 6848176H1 599 1115 LG:405741.3:2000SEP08 7624412J1 934 1097 LG-,405741.3:2000SEP08 g4310285 632 1075 LG:405741 ,3:2000SEP08 g3917108 632 1019 LG:405741.3:2000SEP08 614864R6 481 960 LG-.405741 ,3:2000SEP08 345217H1 759 955 LG:405741.3:2000SEP08 1581267F6 479 953 LG:405741.3:2000SEP08 238539H1 733 945 LG:405741.3:2000SEP08 6822694J1 406 888 LG:405741.3:2000SEP08 6822694H1 406 888 LG:405741.3:2000SEP08 g1272796 337 789 LG:405741.3:2000SEP08 614864H1 481 725 LG:405741.3:2000SEP08 5655089H1 246 715 LG:405741 ,3:2000SEP08 1581267H1 479 693 LG:405741.3:2000SEP08 5327332H1 411 696 LG:405741 ,3:2000SEP08 1581267T6 600 684 LG:405741.3:2000SEP08 1504659T1 571 684 TABLE 5
SEQ ID NO: Template ID Component ID Stait Stop LG:405741 ,3:2000SEP08 g!4014όl 347 680 LG:405741.3.-2000SEP08 532761OH1 411 638 LG:405741 ,3;2000SEP08 gl389301 265 633 LG:405741.3:2000SEP08 gl081018 265 634 LG:405741 ,3:2000SEP08 g762007 337 609 LG-.405741 ,3:2000SEP08 5392809H1 212 423 LG:405741.3:2000SEP08 4181142H1 153 417 LG:405741 ,3;2000SEP08 5288480F6 1 385 LG:405741 ,3:2000SEP08 5288480H1 1 256
2 LG:337194.1 :2000SEP08 g2877617 1127 1490 2 LG:337194.1 :2000SEP08 1391371H1 1238 1516 2 LG:337194.1 :2000SEP08 3216293F6 1284 1847 2 LG:337194.1 :2000SEP08 3216293H1 1284 1525 2 LG:337194.1 :2000SEP08 5973325H1 1298 1517 2 LG:337194.1 :2000SEP08 842273R1 1467 2031 2 LG:337194.1 :2000SEP08 842273H1 1467 1724 2 LG;337194,1 :2000SEP08 4758225H1 1475 1740 2 LG:337194.1 :2000SEP08 1312387F6 1522 1859 2 LG:337194.1 :2000SEP08 1312387H1 1522 1737 2 LG:3371 4.1 :2000SEP08 999991To 1636 2287 2 LG:337194.1 :2000SEP08 1000054R1 1639 2093 2 LG:337194,1 :2000SEP08 999991 R6 1639 ' 2146 2 LG:337194.1 :2000SEP08 1000054H1 1639 1863 2 LG:337194.1 :2000SEP08 6537357H1 1647 2055 2 LG:337194.1 :2000SEP08 4351723H1 1689 2024 2 LG:337194.1 :2000SEP08 6412057H1 1752 2160 2 LG:337194.1 :2000SEP08 gl 123225 1763 1843 2 LG:337194.1 :2000SEP08 1312387T6 1795 2276 2 LG:337194.1 :2000SEP08 6415837H1 1887 2175 2 LG:337194.1 :2000SEP08 g4734612 1 439 2 LG:337194.1 :2000SEP08 2673870F6 1 458 2 LG:337194.1 :2000SEP08 2673870H1 1 224 2 LG:337194.1 :2000SEP08 3040985H1 6 291 2 LG:337194,1 :2000SE'P08 6551863H1 37 235 2 LG:337194.1 :2000SEP08 g5839202 51 395 2 LG:337194.1 :2000SEP08 4213969F6 154 733 2 LG:337194.1 :2000SEP08 4213969H1 154 402 2 LG:337194.1 :2000SEP08 6339716H1 175 639 2 LG:337194.1 :2000SEP08 6339716F6 175 772 2 LG:337194.1 :2000SEP08 g2036487 191 444 2 LG:3371 4.1 :2000SEP08 g2037866 216 512 2 LG:337194.1 :2000SEP08 7762752H1 313 861 2 LG.-337194.1.-2000SEP08 g5450422 402 855 2 LG:337194.1 :2000SEP08 g5639085 426 853 2 LG:337194.1 :2000SEP08 g3678295 429 852 2 LG:337194.1 :2000SEP08 g3924212 442 855 2 LG:337194.1 :2000SEP08 g3678424 447 851 2 LG:337194.1 :2000SEP08 3417306H1 485 725 2 LG:337194.1 :2000SEP08 3417306F6 485 882 CΛ m ©
4-* 45* 4-* 45* 45* 45* 45* 4-* 4i* 45* 45* 45* 45* 45* 45* 4-* 45* 45* 45* 45* 45* 45* 45* 4-* 45* ω ω
O
Ό
NT NT 45* GO CO >J oo ϋi 45* __, . GO Go —' - o o o o ∞ o c> ϋι ϋι cn ϋι ϋι -?
— 5' —S• 8n o§o θcn yOj OMgco co. S^^ ϋϋι OB r CoO 4455** rCoo fGjo J45*. °P M O-> " CD _, ^ NT
O NT 00 CD sl sj O O O NT sl O O O sI O Ol O 00 ^1 01 "0 ^1 ^
45* sJ 00 00 0i NT C0 C0 G0 G0 θ O Oo O O CO CO CD O NT Oo O O SJ O — ' Sl — ' 00 00 SJ O NT
TABLE 5 ID NO: Template ID Component ID Start Stop
4 LG:372569.5:2000SEP08 2014656H1 1218 1484
4 LG;372569.5:2000SEP08 6820141J1 1254 ' 1820
4 LG:372569.5:2000SEP08 gl313809 1077 1249
4 LG:372569.5:2000SEP08 g3873022 1076 1361
4 LG:372569.5:2000SEP08 g3429599 1146 1389
4 LG:372569.5:2000SEP08 7601833J1 1038 1395
4 LG:372569.5:2000SEP08 3752338H1 1 239
5 LG:968765.1:2000SEP08 5334376F8 197 709
5 LG:968765.1:2000SEP08 gl 618573 1 112
5 LG:968765.1:2000SEP08 6796390F8 1 533
5 LG:968765.1:2000SEP08 6796390H1 1 240
5 LG:968765.1:2000SEP08 5334376H1 197 356
6 LG:255999.16:2000SEP08 7075546H1 1 533
6 LG.-255999.1ό:2000SEP08 7075583H1 1 441
7 LG:977820.9:2000SEP08 7655023H1 1 578
7 LG:977820.9:2000SEP08 7655023Jl 1 566
7 LG:977820.9:2000SEP08 g2835908 58 346
7 LG:977820.9:2000SEP08 3503804H1 139 451
7 LG:977820.9:2000SEP08 7644176J1 163 724
7 LG:977820.9:2000SEP08 6950721 R8 213 815
7 LG:977820.9:2000SEP08 6867192H1 231 818
7 LG:977820.9:2000SEP08 6464824H1 609 1241
7 LG:977820.9:2000SEP08 6825706H1 904 1203
7 LG:977820.9:2000SEP08 6772076H1 911 1428
7 LG:977820.9:2000SEP08 1513539H1 1091 1293
7 LG:977820.9:2000SEP08 6935925H1 1108 1407
7 LG:977820.9:2000SEP08 723201 HI 1114 1341
8 LI:1071608.1:2000SEP08 g1260446 2 316
8 LI:1071608.1;2000SEP08 6791379H1 1 397
8 LI:1071608.1:2000SEP08 gl614819 215 655
8 LI:1071608.1:2000SEP08 g1647514 244 543
9 LI:1074023. :2000SEP08 6796546H1 1 475
9 LI:1074023.T.2000SEP08 6796546F8 1 510
9 LI:1074023.1:2000SEP08 6790876H1 9 540
9 LI:1074023.1:2000SEP08 6791780F8 9 487
9 I:1074023.1-.2000SEP08 6791780H1 9 539
9 LI:1074023.1:2000SEP08 6790876F8 9 619
9 LI:1074023.1.-2000SEP08 6790876T8 16 673
9 LI:1074023.1:2000SEP08 6796546T8 175 651
10 LI:453570.1:2000SEP08 5911492T7 136 691 0 LI:453570.1:2000SEP08 5911492T9 305 681 0 LI:453570.1:2000SEP08 5911492T8 303 634
10 LI:453570.1:2000SEP08 5911492F8 1 467 0 LI:453570.1:2000SEP08 5911492F7 1 449 0 LI:453570.1:2000SEP08 5911492H1 1 271 1 LI:072072.1:2000SEP08 71678830V1 2203 2749 1 LI:072072.1:2000SEP08 71677535V1 2309 2797 1 LI:072072.1:2000SEP08 5306756H1 1174 1333 1 U:072072.1:2000SEP08 5306856H1 1174 1341 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
U:072072.1:2000SEP08 8112368H1 1180 1720
LI:072072.1:2000SEP08 g6837302 1185 1568
LI:072072.1:2000SEP08 g5634396 1273 1588
LI:072072.1:2000SEP08 g2167859 1292 1598
LI:072072.1:2000SEP08 71680895V1 2438 2886
LI:072072.1:2000SEP08 2531179H1 418 669
LI:072072.1:2000SEP08 g994114 2599 2896
LI:072072.1:2000SEP08 71677855V 1 2653 3158
LI:072072.1:2000SEP08 71676834V1 2672 3256
LI:072072.1:2000SEP08 g4739904 2668 2904
LI:072072.1:2000SEP08 g2035952 2121 2368
LI:072072.1:2000SEP08 gl 138172 321 623
LI:072072.1:2000SEP08 g3918602 323 704
U:072072.1:2000SEP08 8118160H1 1 508
LI:072072.1:2000SEP08 7459387H1 32 573
LI:072072.1:2000SEP08 7594566H1 68 661
LI. -072072.1.-2000SEP08 g7276911 321 782
U:072072.1:2000SEP08 71677001V1 1936 2597
LI:072072.1;2000SEP08 7952295H2 2109 2730
LI -.072072.1-.2000SEP08 5626631 R8 2114 2637
U:072072.1:2000SEP08 g 1494233 2552 2812
U:072072.1:2000SEP08 6388364F8 2493 2639
LI:072072.1:2000SEP08 71679256V1 2531 2885
LI:072072.1:2000SEP08 71681012V1 2537 3209
LI:072072.1:2000SEP08 g2167858 862 1363
LI:072072.1:2000SEP08 5968471 HI 894 1444
U:072072.1:2000SEP08 6922455H1 1101 1573
LI:072072.1:2000SEP08 6609645H1 2444 2858
LI:072072.1:2000SEP08 7380154H1 2457 2930
U:072072.1:2000SEP08 7601926H1 1897 2483
LI:072072.1:2000SEP08 7111074H1 547 982
LI:072072.1:2000SEP08 2729623F6 687 1145
LI:072072.1:2000SEP08 2729623H1 687 942
LI:072072.1:2000SEP08 g 1156558 751 1080
LI:072072.1:2000SEP08 3377873H1 1499 1687
LI:072072.1:2000SEP08 2133730F6 1673 2027
U:072072.1:2000SEP08 2133730H1 1673 1943
U:072072.1:2000SEP08 2531179F7 418 625
LI:072072,1:2000SEP08 6344340H1 420 695
LI:072072.1:2000SEP08 7645365J1 1948 2340
LI:072072.1:2000SEP08 2312253H1 1939 2191
U:072072.1:2000SEP08 2741 34F6 2023 2365
LI:072072.1:2000SEP08 7645365H1 2139 2769
LI:072072.1:2000SEP08 71679742V1 1117 1787
LI:072072.1:2000SEP08 g4703402 1139 1586
LI:072072.1:2000SEP08 6388364H1 2397 2637
LI:072072.1:2000SEP08 71680692V1 2348 2918
LI:072072.1:2000SEP08 6242978H1 2382 2637
LI:072072.1:2000SEP08 5626631 HI 2114 2436 CΛ m
(0
CTJ CO CO CO Co ro rO rO lo rO NT rO rO NO NT NO NT NO NO NO NT NT NT NT NO rO NT NT NT o
o o o oo si cn j. -. _, _- _ „ — ; NT NO NO NT NO NT NT NT NT — * — ro ro ro —' —' NT NT NT CO
— . — i o o O 0 -s ∞ NO CΛ NT -3 θo O O — , -_, CD CD OO Co go OO C» Oo NT O _, sJ sl O O O O O O O O CD sl O O O ro ro o —■ -n j- o M si J4i-* ωco Mro ωco Mro r0 S4-* c0D0 ra00 ^45* icύo M o O O O O O O '0 J ^- 045*) C450* rθ N0 N0 N0 NT NT MNT Oω 0O- OG0 MG0 SJ NG0 O 00 O NT Q
45* ro —' o si -4-
co m o
45* 45* 45* 45* 45* 45* 45* 45* K 4-* S 45* 45* 45* 45* 45* 45* 45* 45* 45* ^ 45* S 45* 45* 45* 45* 45* 45* 4^ z o co co co co Go co co co co co co co co c co ω co co Go co co co co co co co o ω co co co c^ co c co ω
45* 45* 45* 4-* 45* 45* 45* 45* 45* 45* 45* 4i* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4-* 45* 45* 45* 45* 45* 45* 45^ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O CD CD Oo CD Oo Oo OO CO OO Oo Oo OO OO OO OO Oo CD
— ι _ι — i — i — I — i — i — i — i — i — i — i — i — i — i — i — — i — i — i — i — i — i — i — > — i — i — i — — i — ' — ' O O O O O O O O O O O O O O O O O
Nτ ro M ro ro ιo ro ro NT No ro ro ιo w Nύ No ro NT io Nτ ro ro ro ro ro ιθ No ro Nτ ro ιo Nτ ro M
CO Go Co CO C^ CO CO CO Co ω cjo CO CO CO CO CO Go CO C CO Co CO Co GO CO Co co CO Co Co CO CO O O O O O O O O O O O O O O O O O ^ loo ioόoiό ioό ioόoiό ioό ioόoiό ioό ioό Foό ioόoFύ ioόoiό Foό ioό Foό roό ioo ioo ioό ioόoooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooooooooQ
WooWo»oWo(Λ MooMoCΛ CoΛoCΛoWoCnoWoWoωoWo(ΛoCΛoCΛo(Λ WooCΛ MoMoωoCo/) WoωooW WooWoCΛoWoω MooWooooooooooooo-→- m m ιn m ιn ιn ιn ιn π rπ m πι π πι m π-ι π rπ πτ rn rπ rπ rπ m rπ πτ πι rπ rπ π-ι πτ m ιn m rπ ππ rπ m ιn ιn m m ιn rπ rπ m m rn m o TJ ToJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TT TJ TJ TJ TJ TJ TJ T J ToJ TJ TJ TJ TJ J TJ TJ T^
C» C»oO0oO3oC»o00 CoDo0θoC»o0ooO0oCT oCDo0Oo00o0θoCDo0θo00o0θoCDo00oJ T 0θ CoD CDo00oC»oCT0o TT TJoTJ TJ T 0θ 0o0o00 0o Co»oC» CoT3o∞ooooooooooooo
C on o
C oO 45*
NT TI CD
Ol Ol O Ol O sl O sl O O — J 4-* Ui Oi 45* Oi ui sJ 4-* OT r-, O O I0 45* 0 45* NO N0 00 0 45* O CD U1 C0 0 45* 45* 00 45* U 00 O NT SI O Oi Ol —' 45* y O CO -" _i Ol —' 00 O — Ol O C Go CO —• O Go — NO —> NT cn cn o o oi o o
C
Oi O — ' Oi si O 45* O SI — ' Oi 45* CO O O O CD O δ ° °° ^ O CO O 45* O O O NT 45* 45* O *^J O s^ O v_ - —" O uiι — C »«-N.» KI O t tsO rNOo-i i IsS SI N rsO —CD > -ϋO O O O
TABLE 5 .
SEQ ID NO Template ID Component ID Start Stop
14 LI;346123.1:2000SEP08 71635936V1 44 446
14 LI:346123.1:2000SEP08 71637681 VI 44 435
14 LI:346123.1:2000SEP08 71634551 VI 339 923
14 LI:346123.1:2000SEP08 71637225V1 587 1116
14 LI:346123.1:2000SEP08 71634188V1 1 588
14 LI:346123.1:2000SEP08 71634234V1 44 508
14 LI:346123.1:2000SEP08 71638668V1 44 509
14 LI:346123.1:2000SEP08 5909342H1 44 305
15 11:335795.11 :2000SEP08 5501337H1 1731 1984
15 .Ll:335795.11 :2000SEP08 gl 192132 2915 3230
15 Ll;335795.11 :2000SEP08 g820818 2950 3235
15 U:335795.11 :2000SEP08 3052621 HI 2945 3231
15 Ll:335795.11 .-2000SEP08 g4078441 2959 3414
15 Ll:335795.11 :2000SEP08 gl 149247 3015 3228
15 Ll:335795.11 :2000SEP08 71031243V1 2436 3112
15 Ll:335795,ll :2000SEP08 4445762H1 2435 2685
15 Ll:335795.11 :2000SEP08 5760179H1 1532 1795
15 Ll:335795.11 :2000SEP08 6495832H1 1544 2166
15 LI .-335795.11 .-2000SEP08 7113995H1 1880 2485
15 Ll:335795.n :2000SEP08 5329068H1 2059 2319
15 Ll:335795.11 :2000SEP08 70911787V1 2592 3205
15 Ll:335795.11 :2000SEP08 g3240207 70 399
15 Ll:335795.11 :2000SEP08 6781168H1 1129 1543
15 Ll:335795.11 :2000SEP08 2290180R6 2819 3056
15 U:335795.11 -.2000SEP08 5310325H1 2829 3096
15 Ll:335795.11 •2000SEP08 gl 162453 3038 3227
15 Ll:335795.11 :2000SEP08 3368787H1 3072 3227
15 Ll:335795.n 2000SEP08 g875549 3163 3231
15 Ll:335795.11 :2000SEP08 5672928F8 2586 3068
15 Ll:335795.n 2000SEP08 6353715H1 2410 2701
15 Ll-.335795.il :2000SEP08 70911318V1 2428 2846
15 Ll:335795.11 2000SEP08 4846483T6 2494 2966
15 Ll-.335795.il 2000SEP08 g 1516482 2506 2923
15 LI. '335795.11 2000SEP08 1623494T6 2480 2882
15 Ll:335795.11 :2000SEP08 6778208J1 378 1010
15 Ll:335795.11 2000SEP08 g2985527 577 916
15 Ll:335795.n 2000SEP08 70911560V1 2403 3101
15 Ll:335795.11 2000SEP08 g6229165 99 397
15 Ll:335795.11 2000SEP08 3282810F6 281 712
15 Ll:335795.11 2000SEP08 328281 OH 1 281 530
15 Ll:335795.11 2000SEP08 5691909H1 2324 2632
15 Ll:335795.11 2000SEP08 70911658V1 2346 2848
15 Ll:335795.11 2000SEP08 7116005H2 1880 2531
15 Ll:335795.n 2000SEP08 g4969831 2395 2867
15 Ll:335795.11 2000SEP08 3171081H1 2395 2680
15 Ll:335795.11 2000SEP08 3171374H1 2396 2681
15 Ll:335795.11 2000SEP08 1287842F1 2402 2974
15 Ll:335795.11 2000SEP08 4743929H1 1270 1497
15 11:335795.11 2000SEP08 5351789H1 1398 1613 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
15 Ll:335795.n 2000SEP08 5883285H1 2798 3131
15 Ll:335795.n 2000SEP08 2290180T6 2819 3190
15 Ll:335795.11 2000SEP08 3246061 HI 930 1185
15 Ll:335795.11 2000SEP08 2245068H1 2385 2637
15 Ll:335795.11 2000SEP08 3959886H2 1629 1906
15 Ll:335795.11 2000SEP08 7114191H2 1880 2325
15 Ll:335795.11 2000SEP08 2375912H1 1867 2130
15 Ll:335795.11 2000SEP08 4639879H1 1849 2099
15 Ll-.335795.il 2000SEP08 2881484F6 1589 2066
15 Ll:335795.11 2000SEP08 70914875V1 2716 3243
15 Ll:335795.11 2000SEP08 g5744770 2542 2931
15 Ll:335795.11 2000SEP08 5610413H1 2543 2747
15 Ll:335795.11 2000SEP08 70913456V1 2577 3227
15 Ll:335795.11 2000SEP08 g4137530 2541 2929
15 Ll:335795.11 2000SEP08 g3052971 2773 3227
15 Ll:335795.11 2000SEP08 g7156689 2794 3229
15 Ll:335795.11 2000SEP08 5881091 HI 2797 3082
15 Ll:335795.11 2000SEP08 g6575589 2737 3226
15 Ll.-335795.il 2000SEP08 g2155531 2744 3230
15 Ll:335795.11 2000SEP08 1368211 HI 2192 2445
15 11:335795.11 2000SEP08 1368211 Rl 2192 2671
15 Ll:335795.11 2000SEP08 g1516483 2206 2651
15 Ll:335795.11 2000SEP08 7113914H1 1880 2419
15 Ll:335795.11 2000SEP08 7113460H1 1880 2359
15 Ll:335795.11 2000SEP08 70913301VI 1867 2459
15 LI.-335795.11 2000SEP08 6518284H1 1837 2391
15 Ll:335795.n 2000SEP08 2375912F6 1867 2457
15 Ll:335795.11 2000SEP08 71272229V1 1867 2494
15 Ll:335795.11 2000SEP08 3187232H1 2453 2783
15 Ll:335795.11 2000SEP08 5271263H1 2453 2715
15 Ll:335795.11 2000SEP08 g3433280 2464 2923
15 Ll:335795.11 2000SEP08 g1187857 2450 2606
15 LI.-335795.11 2000SEP08 7613963H1 2698 3010
15 Ll:335795.11 2000SEP08 5438870T9 2711 3320
15 Ll:335795.11 2000SEP08 g2834848 2692 2923
15 Ll:335795.11 2000SEP08 432992H1 2692 2911
15 Ll:335795.11 2000SEP08 2290180H1 2831 3110
15 Ll:335795.11 2000SEP08 g3098814 2842 3227
15 Ll:335795.11 2000SEP08 g4876651 2884 2950
15 Ll:335795.11 2000SEP08 796575H1 2891 3176
15 Ll:335795.11 2000SEP08 g5529523 2908 3227
15 Ll:335795.11 2000SEP08 70913232V1 2661 3226
15 Ll:335795.11 2000SEP08 5468292H1 2665 2897
15 Ll:335795.11 2000SEP08 2347807H1 2675 2931
15 Ll:335795.11 2000SEP08 71666722V1 2650 2760
15 Ll:335795.11 2000SEP08 2375912T6 2661 3188
15 LI.-335795.11 2000SEP08 1623494F6 2160 2620
15 Ll:335795.11 2000SEP08 1623494H1 2160 2392
15 Ll:335795.11 2000SEP08 2972467H2 2184 2495 TABLE 5 a ID NO Template ID Component ID Start Stop
15 U;335795.π :2000SEP08 3881684H1 2190 2484
15 Ll:335795.11 2000SEP08 1287842H1 2402 2669
15 Ll:335795.n :2000SEP08 6944693H1 1728 2354
15 Ll:335795.11 2000SEP08 g201872ό 2008 2383
15 Ll:335795.11 2000SEP08 6888836J1 2219 2850
15 Ll:335795.11 2000SEP08 2704991 HI 2230 2526
15 Ll.-335795.il 2000SEP08 7646995H1 1604 1746
15 Ll:335795.11 2000SEP08 6888836H1 1644 2135
15 Ll:335795.11 2000SEP08 5189887H1 2148 2390
15 Ll:335795.11 2000SEP08 5831568H2 383 627
15 LI-.335795.11 2000SEP08 6379449H1 432 718
15 Ll:335795.11 2000SEP08 7112844H2 1880 2455
15 LI'.335795.11 2000SEP08 5376792H1 2358 2576
15 Ll:335795.11 2000SEP08 g6300247 1 198
15 Ll:335795.11 2000SEP08 g6302837 1 200
15 Li:335795.11 2000SEP08 4959826T8 14 284
15 Ll:335795.11 2000SEP08 71271067V1 2055 2685
15 Ll:335795.11 2000SEP08 3526296H1 2437 2727
15 Ll:335795.11 2000SEP08 5267964H1 2437 2708
15 Ll:335795.11 2000SEP08 1491592H1 2443 2648
15 Ll:335795.11 2000SEP08 g1165370 2450 2813
15 LI-.335795.11 2000SEP08 2918889T6 2388 2956
15 Ll:335795.11 2000SEP08 5374261To 2354 2891
15 Ll:335795.11 2000SEP08 7114829H1 1880 2228
15 Ll:335795.11 2000SEP08 70911039V1 1867 2479
15 Ll:335795.11 2000SEP08 71271501V1 1867 2500
15 Ll:335795.11 2000SEP08 70913941VI 1866 2460
15 Ll:335795.11 2000SEP08 4959826F8 14 404
15 Ll:335795.11 2000SEP08 g6229013 62 397
15 LI:335795.11 2000SEP08 5681962H1 2232 2513
15 Ll:335795.11 2000SEP08 3089367H1 2259 2541
15 Ll:335795.11 2000SEP08 1787665H1 2256 2311
15 Ll:335795.11 2000SEP08 71271124V1 2259 2848
15 Ll:335795.11 2000SEP08 7032667H1 2271 2784
15 Ll:335795.11 2000SEP08 7345758H1 2275 2886
15 Ll:335795.11 2000SEP08 3451289T6 2282 2865
15 0:335795.11 2000SEP08 3012053T6 2294 2887
15 Ll:335795.11 2000SEP08 3761546H1 2295 2636
15 Ll:335795.11 2000SEP08 2881484H1 1589 1867
15 Ll:335795.11 2000SEP08 7134302H1 1903 2436
15 Ll:335795.11 2000SEP08 g5232793 I960 2293
15 Ll:335795.11 2000SEP08 625726H1 1982 2277
15 Ll:335795.11 2000SEP08 70912155V1 1997 2607
15 Ll:335795.n 2000SEP08 3468658H1 1998 2298
15 Ll:335795.11 2000SEP08 7115807H2 1881 2547
15 Ll:335795.n 2000SEP08 7113392H1 1885 2253
15 Ll:335795.n 2000SEP08 71271025V1 1890 2400
15 Ll:335795.11 2000SEP08 5593330H1 2082 2287
15 Ll:335795.11 2000SEP08 5268570H1 2145 2423 TABLE 5
Q ID NO Template ID Component ID Start Stop
15 Ll:335795.1 1 2000SEP08 3801457H1 1 1 12 1357
15 Ll-.335795.i l 2000SEP08 6011 1227B1 2597 3200
15 Ll:335795,n 2000SEP08 2881484T6 2597 3202
15 Ll:335795.1 1 2000SEP08 g2141943 2648 3162
15 Ll:335795.1 1 2000SEP08 71667206V1 2650 2762
15 Ll:335795.1 1 2000SEP08 716671 14V1 2650 2817
15 Ll-.335795.i l 2000SEP08 6046416H1 1 107 1627
15 Ll:335795.1 1 2000SEP08 6046416J1 1 183 1735 ■
15 Ll:335795.1 1 2000SEP08 3012053F6 1214 1615
15 LI .'335795.1 1 2000SEP08 5376784H1 1231 1458
16 LI:246023.2:2000SEP08 71081207V1 433 1065
16 LI:246023.2:2000SEP08 81 13269H1 442 1036
16 LI:246023.2:2000SEP08 7435763H1 446 1007
16 L1:246023.2:2000SEP08 71253312V1 441 1089
16 LI:246023.2:2000SEP08 71252981 VI 293 932
16 LI:246023.2:2000SEP08 6567190H1 317 846
16 LI:246023.2:2000SEP08 859574H1 317 552
16 LI:246023.2:2000SEP08 3761 187H1 353 561
16 U:246023.2:2000SEP08 4863263H1 428 691
16 LI:246023.2:2000SEP08 71253372V1 435 1 109
16 LI:246023.2:2000SEP08 6610019H1 49 566
16 LI:246023.2:2000SEP08 6022017H1 731 1044
16 LI:246023.2:2000SEP08 3695673H1 760 1042
16 LI:246023.2:2000SEP08 3520128H1 770 1040
16 U:246023.2:2000SEP08 909243H1 793 896
16 LI:246023.2:2000SEP08 2627223H1 926 1 157
16 LI:246023.2:2000SEP08 2728422H1 10 254
16 LI:246023.2:2000SEP08 2662990H1 1 1 238
16 LI:246023.2:2000SEP08 7716637H1 665 1281
16 LI:246023.2:2000SEP08 5660534F8 268 702
16 LI:246023.2:2000SEP08 71081365V1 285 894
16 L1:246023.2:2000SEP08 2663406H1 n 256
16 LI:246023.2:2000SEP08 4239284H1 12 289
16 LI:246023.2:2000SEP08 7729208H1 510 990
16 LI:246023.2:2000SEP08 71082814V1 539 1 147
16 LI:246023.2;2000SEP08 7674440H2 542 1026
16 LI:246023.2:2000SEP08 7715837J1 589 1233
16 LI:246023.2:2000SEP08 7716637J1 589 1216
16 LI:24ό023.2:2000SEP08 7325365H1 602 1 173
16 LI:246023.2:2000SEP08 71252939V1 606 1 185
16 LI:246023.2:2000SEP08 3077475F7 616 869
16 LI:246023.2:2000SEP08 772691 1 HI 624 1 124
16 LI:246023.2:2000SEP08 71252962V1 642 1212
1 LI:246023.2:2000SEP08 5896708H1 645 970
16 LI;246023.2:2000SEP08 3044820H1 40 314
16 LI:246023.2:2000SEP08 71083664V1 47 658
16 LI:246023.2:2000SEP08 7579683H1 48 496
16 LI:246023.2:2000SEP08 3225085H1 43 314
16 Ll:246023.2:2 000SEP08 7217489H1 49 507 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
16 LI:26023.2:2000SEP08 4982031HI 50 203
16 LI:246023.2:2000SEP08 2719787H1 2 233
16 LI:246023.2:2000SEP08 7589302H2 7 574
16 U;246023.2:2000SEP08 2728422F6 10 513
16 LI:246023.2:2000SEP08 6077914F8 1 626
16 LI:246023.2:2000SEP08 3634404H1 1 309
16 LI:246023.2:2000SEP08 6077914H1 1 326
16 LI:246023.2:2000SEP08 7003247H1 49 605
16 LI:246023.2;2000SEP08 3119527H1 45 140
16 LI:246023.2:2000SEP08 880437H1 50 193
16 LI:246023.2:2000SEP08 4158283H1 67 311
16 LI:246023.2:2000SEP08 7219095H1 71 623
16 LI:246023.2:2000SEP08 5660534H1 199 439
16 LI:246023.2:2000SEP08 7641994J1 211 468
16 LI:246023.2:2000SEP08 7641994H1 211 497
16 LI:246023.2:2000SEP08 3348779H1 216 473
16 LI:246023.2:2000SEP08 71089987V1 225 508
16 U:246023.2:2000SEP08 6609320H1 236 740
16 LI:246023.2:2000SEP08 7262974H1 49 549
16 L1:246023.2:2000SEP08 7169252H1 47 597
16 LI:246023.2:2000SEP08 3448653H1 48 199
16 LI:246023.2:2000SEP08 3147889H1 50 276
16 U:246023.2:2000SEP08 4448069H1 50 315
16 LI:246023.2:2000SEP08 880437R1 50 637
17 LG:1100661,1 2000SEP08 6790680H1 1 294
17 LG:1100661.1 2000SEP08 6791384H1 1 293
17 LG:1100661.1 2000SEP08 6798074F8 3 459
17 LG:1100661.1 2000SEP08 6798074H1 3 308
17 LG:11006όl.l 2000SEP08 6792683T8 14 407
17 LG;1100661.1 2000SEP08 6792683F8 7 457
17 LG:1100661.1 2000SEP08 6795817F8 9 514
17 LG:1100661.1 2000SEP08 6792683H1 9 348
17 LG:1100661.1 2000SEP08 6795817H1 15 311
17 LG:1100661.1 2000SEP08 6795817T8 249 414
17 LG:1100661.1 2000SEP08 6790621 HI 263 466
17 LG:1100661,1 2000SEP08 6794755F8 263 466
17 LG:1100661.1 2000SEP08 6794755H1 263 466
18 LG:475856.1:2000SEP08 7761183H1 1 614
18 LG:475856.1:2000SEP08 5779813H1 373 493
18 LG:475856.1:2000SEP08 5779913H1 374 493
18 LG^:475856.1:2000SEP08 5779813T6 373 883
19 LG: 1015343.1 2000SEP08 6798647T8 1 541
19 LG:1015343.1 2000SEP08 6798647H1 1 434
19 LG:1015343.1 2000SEP08 6798647F8 1 615
20 LG:1400575.1 2000SEP08 5494669R6 596 908
20 LG:1400575.1 2000SEP08 4645588F9 1 571
20 LG:1400575.1 2000SEP08 7273781 HI 8 564
20 LG:1400575.1 2000SEP08 2383223F6 11 483
20 LG:1400575.1 2000SEP08 7467241 HI 24 579 Q- o CO CD -t lO C rs CM CM .— .— 'vr cO oO OO CM cO O CM cO O '— rs LO '— C OO CM LO CO CO c O -sf O -g- O O CM L o P 00 CT O N Λ ' W n θ ιO ιO N o cO r- N N iO ,c ι- ) O lO θ N <) C l ^ cθ oo o in Oo 12-- 2"" rri CN CN OO LO O CO O i— O CT
O cO O O O O O CT O O LO O O O O O O LO CT O O O ' r cO 'vr CT O CT CT r- LO O -C O 'sT O
H U α.
rs o o o "f o O LO CM oo c oo rs rs O rs cO CM o o CM rs
2 CO 00 N ^ O O CO CM O OO CM CM O CO CO is "v CM O r— CT CT ^ >— CO T CT CM CM O ' ^- ' r o rs T o
^r co CM CM <N § S 5 — 00 r- Lo rs o o 00 00 00 00 00 00 00 "^ ^ 03 CO ^ c> c> ^ ^ '~ ^ rZ rZ '~ T . '~ '~ 'Z.
o z o
Q O O O O O '- ^ '- '- '- ^ ^ '- '- '- ^ '- '- ^ '- '- '- '- ^ '- '- '- W N N cO CO c 'ϊ ' r 'ϊ iO iO iO iO -T iO iO T iO iO iO iO UT CN CN C OJ CN C>l C C CM CM CN CM CM CN CM CN CM CM CM CM CM CN CM CM CM CM CN CN C C^ o o o LU CO
TABLE 5
SEQ ID NO' Template ID Component ID Start Stop
25 Ll:734904.1 2000SEP08 5080305H1 460 620
25 Ll:734904.1 2000SEP08 55001377H1 1186 1785
25 Ll:734904.1 2000SEP08 55001371 HI 1186 1810
25 Ll:734904.1 2000SEP08 g4069540 324 760
25 LI.-734904.1 2000SEP08 5954166H1 334 675
25 Ll:734904.1 2000SEP08 5954266H1 447 675
25 Ll:734904.1 2000SEP08 55001372J1 1186 1799
25 Ll:734904.1 2000SEP08 55001373J1 1186 1799
25 Ll:734904.1 2000SEP08 55001380H1 1186 1794
25 LI.-734904.1 2000SEP08 55001380J1 1186 1798
25 Ll:734904,l 2000SEP08 55001374J1 1186 1789
25 Ll:734904.1 2000SEP08 5155483H1 444 713
25 Ll:734904.1 2000SEP08 71470137V1 1145 1668
25 LI-.734904.1 2000SEP08 71468740V1 ' 1145 1670
25 LI.-734904.1 2000SEP08 71470749V1 1145 1612
25 Ll:734904.1 2000SEP08 71471307V1 1145 1589
25 Ll:734904.1 2000SEP08 71471777V1 1145 1553
25 Ll:734904.1 2000SEP08 71471014V1 1145 1498
25 Ll:734904.1 2000SEP08 71528917V1 1145 1473
25 Ll:734904.1 2000SEP08 55009941Jl 1077 1265
25 LI-734904.1 2000SEP08 7726872H1 1166 1614
25 Ll:734904,l 2000SEP08 5090822F6 . 1168 1618
25 LI-.734904.1 2000SEP08 71469065V1 1143 1712
25 Ll:734904.1 2000SEP08 71472073V1 1144 1661
25 Ll:734904.1 2000SEP08 8100283H1 969 1556
25 Ll:734904.1 2000SEP08 2685561 HI 8 109
25 LI.-734904.1 2000SEP08 1921627H1 91 347
25 Ll:734904.1 2000SEP08 2767577H1 1 184
25 LI.-734904.1 2000SEP08 gl9217όό 389 812
25 Ll:734904.1 2000SEP08 55001379H1 1186 1546
25 LI.-734904.1 2000SEP08 5734758F6 292 886
25 Ll:734904.1 2000SEP08 5734758H1 292 571
25 Ll:734904.1 2000SEP08 g3214644 319 743
25 Ll:734904.1 2000SEP08 5807132H1 190 426
25 Ll:734904.1 2000SEP08 2554815H1 205 445
25 Ll:734904.1 2000SEP08 55001378H1 1187 1759
25 Ll:734904.1 2000SEP08 866814H1 98 406
25 Ll:734904.1 2000SEP08 g6836844 1599 1975
25 LI.-734904.1 2000SEP08 3483164H1 1645 1895
25 Ll:734904.1 2000SEP08 60211015U1 501 1050
25 LI.-734904.1 2000SEP08 g7157221 1581 1994
25 Ll:734904.1 2000SEP08 5136773H2 1534 1802
25 Ll:734904.1 2000SEP08 g7237043 1536 1990
25 Ll:734904.1 2000SEP08 71468287V1 1560 2008
25 Ll:734904.1 2000SEP08 7618825J1 1565 1947
25 Ll:734904.1 2000SEP08 71470403V1 1145 1771
25 Ll:734904.1 2000SEP08 8103182H1 951 1601
25 LI.-734904.1 2000SEP08 7736288J1 496 1110
25 Ll:734904.1 2000SEP08 2914008H1 641 787 CΛ rπ
D
NT NO NO NT NT NT NT NT NT NT NO NO NT NT M NT NT NT NO NT NT NT NT NT NT NO NT NT rO NT NT NT NO NO NO NO NT NT NT NT NT NO NO NT M O O O O O O O O O O O O O O O O O O O O O O ϋi Ol ϋi Oi Oi Oi Oi Oi Oi Oi OT Cn ϋi ϋl ϋl Ol Oi Oi Ol Oi ϋi ϋi ϋi ϋl ϋi ϋl Ol
O
NJ o
45* — ' — • Lp
-J — ' N — ' —' —' IO NT —' — - NO — ' NT N0 —' — ' N — ' NT IO IO —' Λ O O — ' Uι 45* Oι Cθ 45* O Co O Cθ 45* 45* 45* rθ 45* — ' 45* 45* 0 sl Ch Ol 0 SI NT NT 45* O sI sJ CD Oi Oi O CO Ol sJ sJ — - SI JS. 4S. — ' O O sI sl Oi rO sI NT si CO CO S1 0 0 00 0 0 45* ≥ NT O 45* l O —' O 00 sl — i — ' 00 CO 45* O O ∞ IO vn ro NO Co Go sl NT .υ 0 0 0 45* 4i. 0 45* O O O Go O Oo 0 0 45* O O Co O — ' O sl ϋ O Ol 00 45* 45* Oi C» OO C» fo> nO MIO ΛO OCJTl o — - r Oτlι sJ t-° rvi rn ro rn vD 5
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
26 Ll:l 178118.1 2000SEP08 70885893V1 1270 1460
26 U:l 178118.1 2000SEP08 70886871VI 1273 1679
26 Ll:l 178118.1 2000SEP08 71226596V1 1592 1679
26 U:l 178118.1 2000SEP08 71225755V1 2203 2459
26 Ll:l 178118.1 2000SEP08 70856256V1 1294 1679
26 11:1178118.1 2000SEP08 70885056V1 1441 1676
26 Ll:1178118.1 2000SEP08 70856069V1 1071 1519
26 Ll:l 178118.1 .2000SEP08 70885472V1 1993 2470
26 Ll:1178118.1 2000SEP08 70858123V1 1489 1679
26 _J:1178118.1 2000SEP08 70888070V1 897 1476
26 Ll:l 178118.1 2000SEP08 70885504V1 2008 2459
26 U:l 178118.1 2000SEP08 70856838V1 1480 2130
26 Ll:1178118.1 2000SEP08 3918792H1 1981 2111
26 Ll:l 178118.1 2000SEP08 4246988H1 1982 2116
26 Ll:l 178118.1 2000SEP08 70856171VI 1981 2282
26 Ll:l 178118.1 2000SEP08 gl691161 1982 2342
26 Ll:1178118.1 2000SEP08 2878289T6 1984 2435
26 Ll:l 178118.1 2000SEP08 5768870H1 1993 2459
26 Ll:1178118.1 2000SEP08 70176041VI 2300 2487
26 U:l 178118.1 2000SEP08 g2079789 1995 2471
26 Ll:l 178118.1 2000SEP08 612729R6 2007 2297
26 Ll:l 178118.1 2000SEP08 612729H1 2007 2190
26 U:l 178118.1 2000SEP08 g2180032 2018 2409
26 U:l 178118.1 2000SEP08 70855176V1 2043 2470
26 Ll:1178118.1 2000SEP08 g2566462 2091 2404
26 LI:1178118.1 2000SEP08 g4175126 2108 2483
26 Ll:1178118.1 2000SEP08 g3433262 2108 2486
26 Ll:l 178118.1 2000SEP08 7737658J1 2142 2616
26 Ll:l 178118.1 2000SEP08 g832488 2155 2478
26 Ll:l 178118.1 2000SEP08 gl018319 2178 2472
26 Ll:l 178118.1 2000SEP08 g2594405 2216 2459
26 Ll:l 178118.1 2000SEP08 70856350V1 1579 2063
26 U:l 178118.1 2000SEP08 70856794V1 1547 2160
26 LI:1178118.1 2000SEP08 70858103V1 1358 1679
26 - U:l 178118.1 2000SEP08 70856126V1 1457 1679
26 Ll:l 178118.1 2000SEP08 70885049V1 1449 1679
26 Ll:l 178118.1 2000SEP08 70855594V1 1371 1679
26 Ll:1178118.1 2000SEP08 70887152V1 2015 2396
26 LI:1178118.1 2000SEP08 3918441 HI 1981 2111
26 Ll:1178118.1 2000SEP08 5508142R6 1981 2185
26 U:l 178118.1 2000SEP08 1242730H1 1981 2033
26 U:1178118.1 2000SEP08 70818712V1 1378 1679
26 Ll:1178118.1 2000SEP08 70886942V1 1204 1656
26 LI:1178118.1 2000SEP08 6880021F8 90 724
26 Ll:1178118,l 2000SEP08 3493676F6 97 456
26 Ll:1178118.1 2000SEP08 3493676H1 98 348
26 U:l 178118.1 2000SEP08 7155878H1 355 875
26 11:1178118.1 2000SEP08 6176589H1 573 842
26 Ll:1178118.1- 2000SEP08 6880021J1 693 1325 ©
NT NO NO NT NO NT NT NO NO NO NO NO NO NT N NT NT M NT NT NT NT NT NT NT NT NT NT NT rO NT NO M rO NO NO NT NT NT NO NT NT NT NT N^ CX CD CD CD OO CO OO sl sl sI O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O sJ o
NT —• — • —> 45* NT Co CO Mvonr1 NT NO NT NT NT NT NO NT — ' NT NT — ' — ' — ' NO — • —■ —> —, _.. _■ _ _. CO o o cn si o ϋi co O O 4-* N0 NT O 45* C0 45* O O O — ' Sl O O O O O sl sj — NO 45* sl O O , NO
—' NT —< —' O 45* Co O sl sl NT Co CO O ϋi ϋi NO CD sl sl O sl sl sI O NT sl Oi sJ sj — ' Oo ϋi 00 — si o -o NivT Ooo s Oi o 45* sroi o ϋil o Ol o O O O O O Oι r O CD 45* θ rθ 45* O O O O Oι O sl O O O CD NT 00 Ol O ϋi ^ CO CD NT Oi NT O
TABLE 5
SEQ ID NO- Template ID Component ID Start Stop
28 LG:407304.1 2000SEP08 2179865H1 75 346
28 LG:407304.-1 2000SEP08 4970021 HI 162 432
28 LG:407304.1 2000SEP08 4706031To 169 610
28 LG:407304.1 2000SEP08 7309340H1 199 496
28 LG:407304.1 2000SEP08 g5904385 232 695
28 LG.-407304.1 2000SEP08 2669002H1 249 504
28 LG:407304.1 2000SEP08 g3307401 344 692
28 LG.-407304.1 2000SEP08 7006680H1 507 618
28 LG:407304.1 2000SEP08 5423107H1 523 578
28 LG:407304.1 2000SEP08 5067631 HI 524 816
28 LG:407304.1 2000SEP08 5422307H1 523 .578
28 LG:407304.1 2000SEP08 5067631 F6 524 752
28 LG:407304,1 2000SEP08 6534646H1 543 764
28 LG-.407304.1 2000SEP08 6914529J1 590 661
28 LG:407304.1 2000SEP08 6914529H1 590 661
28 LG:407304.1 2000SEP08 440475H1 716 845
28 LG:407304,1 2000SEP08 g1894072 758 1079
29 LG.-337358.1 2000SEP08 2244837H1 2959 3211
29 LG:337358.1 2000SEP08 6019194H1 2844 3383
29 LG:337358.1 2000SEP08 6019119H1 2845 3350
29 LG:337358.1 2000SEP08 4089922T6 2917 3177
29 LG:337358.1 2000SEP08 4090015T6 2950 3177
29 LG.-337358.1 2000SEP08 1723941HI 2858 3068
29 LG:337358.1 2000SEP08 676035T6 2952 3362
29 LG:337358.1 2000SEP08 1723941 F6 2858 3272
29 LG.-337358.1 2000SEP08 4255978H1 1384 1624
29 LG:337358.1 2000SEP08 3447804H2 1426 1637
29 LG.-337358.1 2000SEP08 7077206H1 1 342
29 LG:337358.1 2000SEP08 7582983H1 59 506
29 LG:337358.1 2000SEP08 7228466H1 260 871
29 LG:337358.1 2000SEP08 6698505H1 520 993
29 LG:337358.1 2000SEP08 6698505F8 520 937
29 LG.-337358.1 2000SEP08 7125751F8 543 1072
29 LG:337358.1 2000SEP08 7125751 HI 543 1019
29 LG:337358.1 2000SEP08 7178269H1 616 1097
29 LG:337358.1 2000SEP08 6054168H1 648 1198
29 LG:337358.1 2000SEP08 7074882H1 858 1407
29 LG.-337358.1 2000SEP08 5386919H1 1167 1274
29 LG:337358.1 2000SEP08 7594269H1 1264 1853
29 LG:337358.1 2000SEP08 5680326H1 1334 1595
29 LG:337358.1 2000SEP08 2228689H1 1493 1738
29 LG:337358.1 2000SEP08 7180676H1 1499 2038
29 LG.-337358.1 2000SEP08 5404677H1 1547 1716
29 LG:337358.1 2000SEP08 4091424H1 1576 1825
29 LG.-337358.1 2000SEP08 5999641 HI 1601 2107
29 LG:337358.1 2000SEP08 4090015F6 1672 2208
29 LG:337358.1 2000SEP08 4090015H1 1672 1939
29 LG:337358.1 2000SEP08 4837487H1 1702 1992
29 LG:337358.1 2000SEP08 5388090H1 1734 2034 TABLE 5 ID NO: Template ID Component ID Start Stop
29 LG:337358.1 2000SEP08 5662028H1 1741 1940
29 LG.-337358.1 2000SEP08 4091549H1 1753 2019
29 LG:337358.1 2000SEP08 5664250H1 1783 2062
29 LG:337358.1 2000SEP08 3274759H1 1856 2105
29 LG:337358.1 2000SEP08 7074659H1 1918 2416
29 LG:337358.1 2000SEP08 676035H1 1940 2204
29 LG:337358.1 2000SEP08 676035R6 1940 2238
29 LG:337358.1 2000SEP08 7678201Jl 1952 2366
29 LG:337358.1 2000SEP08 3969273H1 1980 2253
29 LG:337358.1 2000SEP08 g826435 2014 2316
29 LG:337358.1 2000SEP08 4091025H1 2078 2357
29 LG:337358.1 2000SEP08 2822209H1 2127 2432
29 LG:337358.1 2000SEP08 4785179H1 2197 2465
29 LG:337358.1 2000SEP08 5107431 HI 2206 2276
29 LG:337358.1 2000SEP08 1818504F6 2254 2635
29 LG;337358.1 2000SEP08 1818504H1 2254 2533
29 LG:337358.1 2000SEP08 7119412H1 2257 2455
29 LG:337358.1 2000SEP08 6505231 HI 2258 2465
29 LG:337358.1 2000SEP08 4129969H2 2336 2582
29 LG:337358.1 2000SEP08 5026554H1 2355 2602
29 LG:337358.1 2000SEP08 5998266H1 2385 2853
29 LG.-337358.1 2000SEP08 776745H1 2411 2648
29 LG;337358.1 2000SEP08 776745R6 2411 2677
29 LG:337358.1 2000SEP08 775606R1 2411 2957
29 LG:337358.1 2000SEP08 775606H1 2411 2641
29 LG:337358.1 2000SEP08 5875073F6 2512 3100
29 LG:337358,1 2000SEP08 5875073H1 2513 2729
29 LG:337358.1 2000SEP08 6122560H1 2523 3099
29 LG:337358.1 2000SEP08 6129231 HI 2523 2895
29 LG.-337358.1 2000SEP08 2020216H1 2530 2635
29 LG:337358.1 2000SEP08 5692486H1 2571 2807
29 LG-.337358.1 2000SEP08 5493633H1 2594 2882
29 LG-.337358.1 2000SEP08 4093974H1 2631 2770
29 LG:337358.1 2000SEP08 4093981 HI 2632 2821
29 LG:337358.1 2000SEP08 7237277H1 2632 2939
29 LG:337358.1 2000SEP08 4089922F6 2653 3048
29 LG:337358.1 2000SEP08 4089922H1 2653 2851
29 LG.-337358.1 2000SEP08 3961245H2 2667 2865
29 LG:337358.1 2000SEP08 856602H1 2740 2913
29 LG:337358.1 2000SEP08 7129582H1 2754 3165
29 LG:337358.1 2000SEP08 3316652H1 2802 3114
29 LG.-337358.1 2000SEP08 5758353H1 2819 3105
29 LG:337358.1 2000SEP08 4714095H1 2818 3045
29 LG:337358.1 2000SEP08 620821 HI 2832 3106
29 LG:337358.1 2000SEP08 7342523H1 2996 3373
29 LG.-337358.1 2000SEP08 5813971H1 2996 3204
29 LG:337358,1 2000SEP08 5823037H1 2996 3166
29 LG:337358.1 2000SEP08 5815923H1 2996 3186
29 LG:337358.1 2000SEP08 5816950H1 2997 3183 CΛ m
£>
CO Co CO CO GO GJ GO Go Go GO CO CO CO Co CO CO CO CO CO Go CO CO GO CO CJO Oj ω GO CO CO CO Go CO Go ω
NT NT NO NT NT NO NT NO NO NO NT NT NT NT NT NT NT NT NT NT ooooooooooooooood
0000000000000000000 øøøøøøøøøøøøøøøøøøøøøøøøøøøøøø
X X X X X X X X X X X X X X ^ ^ ^ ^ ^ ^ GO Go όό cό co ω c CO CO GO o o o o o o o o CD O CO OO GO GO GO CO GO
NT NT NT NO NT N NTT O O o o o o O NO NO NO NO NO NT NT NO NO NO NT NT NT CTO OO OO OO GO GO CO GO CO
NT ONT NT NT NT NT Nn CD T NT N NT NT NO NT CO GO GO GO CO Co GO GO CO GO CO Go Go O O O O O O SI Sl sj sl sl sl sI sl sl sl
00 00 00 00 00 0 000 0 CD0 0000 CD CO CD CD CO CD oo CD 00 00 OO NT NT NO NT NT NT NO NT NT NT NT NT NO O O O O O O CO CO GO CO GO CO CO CO CO CO . sl ' si si sj si ssli ssii si i si Sl SJ SI sl sj si SJ sl i Oi Ol Cn Oi ϋi ϋi Oi ϋi O O O O O O ϋi ϋi ϋi ϋi Oi Ul Oi ϋi Oi O-i fTi si SI Sl ssli ssll sj Sl sl ssll Sl Sl SI Sl Sl sl Oi Cπ Cn ϋi ϋ sl SJ O O O O O O O O O O O O O O O O O O O CTO CD OO OO OO OO CD OO OO OO -Si 45* 45* 45* 45* 45* 4 455** 4455** 45* 45* 45* 4 455** 45* 45* 45* 45* 45* 45* 4* -_ x χ -. χ x χ -_. x χ -_, x x χ -_- x x χ -_, x x x x x x x χ -_. "_- 3
NT NT NT NT NT NT NT NT NT NT NT NT NT NO NT ΪO NT NO NO !S> io fό NT NT NT N N rό ro ro ro ro ro ro NT NT NT NT NT NT N
NO NT NT NT NT NT NT N NT NT NT NT NT NT NT ro rό fό NT NO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Ω O O O < T ( ) C T O O CD C ) C T C ) O CD o o O O O O O
OT O O O ( ) C ) ( ) o o O O O OO OO OO OO OO OO OO OO OO OO OO O O O O O O < ) ( ) r O O O O OO OO OO OO OO OO OO OO OO OO OO OO C*D- O O ( ) < ) C O) _ O C ) C ) C ) o ω w ω ω w w w w ω cΛ CΛ W w αi w cn w w cΛ W CΛ W CΛ W w αi cn ω c - CO CO CΛ CΛ CΛ CΛ σ CΛ o CΛ CO CO CΛ CΛ CΛ en cn o CO o CO CΛ CO cjr m m m m m m m m m m m m m m m m m m m m m m m m m m m m m o m — m 111 I'l 1 m m m m m I'll π i m m m m rn m m TJ TJ T3 TJ TJ TJ T3 TJ T3 TJ TJ TJ T) T3 T) TJ TJ TJ T3 T3 TJ TJ TJ TJ TJ TJ TJ TJ TJ
TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T T TJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O o O C T C ) O _ O p O 00 00 CT0 CD 00 00 00 00 00 00 00 00 00 00 00 CD 00 00 00 00 00 CD 00 C» C» CTi CT3 C» C» oo oo oo 00 CD 00 oo o oo 00 00 o CD o CD o 00 CD 00 o CD o 00 GO 00 00
_ — , o - GO GO CO GO CO CO GO CO CO GO CΛ O O — ' NO NT 45* CO — — ' O O CO CO GO ϋl O Ol O 00 si 45* 45* 45* r
C ^D ∞ O Oi sJ sj CD C . , ft M ro ro ro ro - ' O o o o o O 00 NT r 00 0l NT Uι 45* 00 O si O —' O —' 01 45* Ol Ol Ol —' NO ^ Cn ^ O -fc- NT NT Go O Cn NO GO O Ω 00 — ' — ' O O ro O NT CD O — ' SJ C0 45* O 00 O M ci co co ω ^-> GO S1 NT NT 45* — O O O SJ
Ul 0T Ul 0τ 0T Ul 45* O O O O O O O O O O !^ l ^ CD 00 C θ o 5* — o o oo cn CO CO CO Co Go GO GO CO CO CO GO
O — ' * G0 C 45* 45* — • 7t T Ol sl oo 00 0 CT»00 0CJlΪ O O^ C>0 O O
O O Oo O O sl 45* sl CD —' —' O NT NT NO N CD ^ O sI O 0l C0 CD θ χ 45*. χ45*. 45* Co 45 OO SI SI SI O — ' O — ' NT 45* CO CO 45* CO NO NT 45* 45* 45* 00 00 00 — —' O_ O O uJ c 4- rj NT 0 45* 0 -' o ^ — < — ' SI O O NO IO — ' O O sl si oo — ' sJ 0 45* O
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 32 LG:1028774.2:2000SEP08 5645467H1 795 1036 32 LG:1028774.2:2000SEP08 3202082H1 956 1198 32 LG:1028774.2.-2000SEP08 6319651 HI 1011 1286 32 LG:1028774.2:2000SEP08 g2901003 1192 1363 32 LG:1028774.2:2000SEP08 6493972H1 45 549 32 LG:1028774.2:2000SEP08 7380087H1 155 655 32 LG:1028774.2:2000SEP08 6834528H1 57 616 32 LG:1028774.2:2000SEP08 268982H1 369 612 32 LG:1028774.2:2000SEP08 6783027H1 197 753 32 LG:1028774.2:2000SEP08 7260955H1 510 695 32 LG:1028774.2:2000SEP08 6131194H1 600 915 32 LG:1028774.2:2000SEP08 667680R6 69 352 32 LG:1028774.2:2000SEP08 667680H1 90 352 32 LG:1028774.2:2000SEP08 5195209H1 95 333 32 LG:1028774.2:2000SEP08 4623051 HI 60 319 32 LG:1028774.2:2000SEP08 7656667J1 21 547 32 LG:1028774.2:2000SEP08 6753046H1 1 540 32 LG:1028774.2:2000SEP08 6245830H1 136 472 32 LG:1028774.2:2000SEP08 4699256H1 179 449 32 LG:1028774.2:2000SEP08 g5754453 1114 1623 32 LG:1028774.2:2000SEP08 g5365284 1132 1623 32 LG:1028774.2:2000SEP08 g3751680 1201 1623 32 LG:1028774.2:2000SEP08 g5887570 1126 1623 32 LG:1028774.2:2000SEP08 g5512642 1139 1623 33 LG:338927.6:2000SEP08 6810281 HI 375 959 33 LG:338927.6:2000SEP08 g1195398 515 717 33 LG:338927.6:2000SEP08 gl 192273 540 755 33 LG:338927.6:2000SEP08 7712077H1 624 1250 33 LG:338927.6:2000SEP08 g4189046 820 1087 33 LG:338927.6:2000SEP08 6804367H1 839 1254 33 LG:338927.6:2000SEP08 3341542H1 970 1210 33 LG:338927.6:2000SEP08 7712077J1 1 617 33 LG:338927.6:2000SEP08 6810281J1 220 846 34 LG:332944.2:2000SEP08 3288409H1 ' 1602 1862 34 LG:332944.2:2000SEP08 3288409F6 1602 1857 34 LG:332944.2:2000SEP08 6582988H1 1677 2277 34 LG:332944.2:2000SEP08 6039693H1 1683 2235 34 LG:332944.2:2000SEP08 1952105H1 1691 1942 34 LG:332944.2:2000SEP08 7649251J2 1699 2318 34 LG:332944.2:2000SEP08 7054847H1 1703 2307 34 LG:332944.2:2000SEP08 6890168J1 1766 2401 34 LG:332944.2:2000SEP08 4216720H1 1804 2071 34 LG:332944.2:2000SEP08 094964H1 1872 2106 34 LG:332944.2:2000SEP08 2414057H1 1894 2140 34 LG:332944.2:2000SEP08 2414057R6 1894 2145 34 LG:332944.2:2000SEP08 1952625H1 1962 2209 34 LG:332944.2:2000SEP08 6912541 HI 1967 2446 34 LG:332944.2:2000SEP08 g3419210 2360 2734 34 LG:332944.2:2000SEP08 4216326H1 1984 2252 CΛ m ©
CO GO CO CO GO Co CO Go Co CO Co CO CO CO CO GO CO GO OO GO CO Co ω co Gi ω CO CO CO CO Co ω Go CO CO GO CO Go ω 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4-* 45* 45* 45* 4^ 45* 45* 45* 45* 45* 45* 45* 45* 45* 45^ o
o
NT — ■ O O O S S -≤ S C NT NT NT — ' CO — ' — ' — ' -n CD O O Tf O O 4S. O O 00 NJ — ' ϋl SI si U, Ji w o oo o -' θo ω o o i ω w σι
CΛ m
0
CO CO GO CO CO Co Co CO Co ω GO GO CO CO CO CO CJO CO CO Go Co CO CO GO CO CO CO GO CO CO GO CO CO CJO CO GO Co Co ω
Ui ϋι ϋι ϋι 0ι 0ι 0ι 0τ θτ ϋl 0τ θτ θι ϋτ θτ θτ ϋι θτ ϋι ϋι 0l θτ ϋι cjτ θτ θτ cjτ cjι θτ ϋι 0ι θτ θτ ϋι 0τ ϋι ϋι 45* 45* 45* 45* 4^ σ
GO CO
45* 45* sl si M 4-* 45* cn cn iO NT
O O O O O O en en rπ rn
T; j O O CD OO
O O O O O O O sJ OO OO CD CD CO sJ sJ CD O O O O Cn Co rO IO NT NT NT NT NT NT NT NT NT NT N NT NT NO NO NT CΛ Oo s] ϋι 45* 45* 45* 45* O Oθ sl sI 45* — ' 00 4s. Oo — ' 4S. NT N0 O 4-. O 00 G0 45* — ' — ' v 5j-h- O- - ±ι-÷» c sol sOJi C sol r r-v , —, ' Oι Uι OT 45* 45* 45* 4-* Ol CO O O ^- 0 0 0 01 01 0 45* — ' ϋi CO CO OO NT O OO CD NT — ' O O 00 — ' NT 00 45* Co sl O IO NO N sl si sl O NT NO — ' OO SI OI O NT O CD — ' O Q O O 4-* NT 45* sl 0i O O — O ϋl
45* NT 4-* NJ NT CO CjOH ϋi P v— i O
45* O sJ O 45*
0
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* GO GO CO CO GO Cθ GO Co ω Co Co ω CO CO Cθ Cθ GO CO GO GO Cθ Cθ GO C Cθ C =-:
— ' — ' o O O O O O O O O O O O O O O O O & O O O O O CD OO OO sl sl sI O O O O O Cn ϋl Oi Oi Ol OT Ol ϋl ϋi ϋi ϋl Oi O o
45* Ol 45* NT NT — ' — ' O sl Ol Ol Oi 45- co co oo co io ro ro — • CO NT
J S1 45* O OO SI O OO CO O 45* — ■ —■ —■ — ' 4S. 4S. NO sl 00 sl NT —■ O O O O O
N Ol — ' O sl Go NO j O O O CO O S1 00 S1 SI CO CO S1 4S. O NT —' NT NT =π ;
NT sl GO Go CO GO NO 00 CO NST Ω
(-. fY. ro ro — ■ — ' — —. — • —n S cD o o o o o co ro O CTo π„ sSj cjTl COnl Oϋil OOil 4J5s*. 4J5s*. 4JSS' OCh CGO0 CG00 44SS.. 445S*. 445S*. OOil O0il Oϋil Uϋli OO ssJl NNTT CC00 44S5*. Ui C T CO CO CO GO CO CO CO -;
&' >? C0 0 4S. 0 45* 03 45* 0 O GO Ol CO CO N ^ NT NT C0 00 45* 4s. O O — ' — ' OO CO CO O CO ϋi O O GO O si — 1 0 ^, 45* — ' N0 C0 O O 45* — ' O — ' — ' — ' O
O NT O sl O O Ol — ' — ' NT ϋl Ol O 45* sl 45* O N0 00 sI sl CD 45* O O Oι 45* NT Uι CT3 NT 00 CD — ' Ol ^ NT Oo Oi CO GO O O Ol O Ol ϋl NO O
GO m
0
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. 45* 45* 4s. 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. 4S. 4S. 45* 4S. 45* 45* 4s. 45* 45* 45* ϋi Oi Ol Ol Ol Oi ϋi ϋi Oi Ol ϋi Ol Oi Oi Ol Ol Ol Ol Ol Ol Oi Ol Ol Ol Ol cn Ol Ol cn Ol 45* 45* co GO CO NT NT NT NO NO NO NT NT ro NT NO NO NT NT σ z p r- ΓΓ i — r— i — |— i — i — I- I— i — I- r— I- i — i — i — i — i — i — r- i — r— — i — i — i — r— — i — r— i — i — I — i — i — r— i — i — i — r— I — i — r— r—
00000000000000000000000000000000 sl SJ SJ NT NT NT NO NO NO NT NT ro NT NO NT ro NO
CO Co GO ca Co CO GO C CO CO C GO CO Co CO Co Co CO CO CO CO Go co CO CO CO CO CO Go CO 45* 45* NT NT NT NT NT NO NO NT NO NT NT NT NT NO NT NT NO
Ol Ol Ol Oi Ol Ol Ol Ol cn Ol Oi Cn Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol Ol ϋi Ol Ol Ol NT NT NT 00 CD 00 03 CD 00 03 CD oo 03 CD 03 CD Oo
CD CD 00 CD 00 00 00 00 CD 00 00 CD 00 00 00 00 CD CD CD 00 00 00 CD 00 00 00 CD 00 Oo CD sl Sl O O O O O O O O O o o o O O o o sj si sl sl J si si sl si sl sl Sl si SI sl Sl s| SI Sl si sl SI sl sl sl Sl Sl sl sl Sl 45* 45* " ' GO GO CO CO CO Go GO co CO GO CO CO Co co
Sl sl CO co CO σ O O O O o O o o O O o o o
O O O o O O o o O o O o O o o o o O O O o O O o O O o O 00 00 φ*
'' — ' — '''''''' — * — ''''' — ' — '''''' — ' NT ro NT NO NT NO NO NT NT NT NT NT NT NO NT NT NO 3
NT NO NT ro NT NT NT ro NT NT ro NT NT NT NO NO NT ro O NT NT NT NT NT NT NT NT NT NT NT ro NT O o O O O O O O O O o o O O O o o a
O O O o O O O o O O o O O O O O O o O O O O O O O O O O O O o O O o O O O O O O O o O o O O O o o Ω
O O o o O O O o o O o O O O O O O o O O O o O O O O O O O O o o O o O O O O O O p o o o O O O o o
O O o o CD O O o o o o O O O O O O o O O O o O O O O O O O O o
CΛ CΛ o CΛ CO CΛ CΛ CO CΛ CO GO CO CO co CO en CΛ CO CO CO s Φr
CO CΛ CΛ CΛ en CΛ CO CO CΛ GO C CΛ CO CΛ CΛ CΛ CΛ CΛ CΛ en CΛ CO CO CΛ CΛ CΛ CΛ CΛ CΛ CΛ m m m m m m m m m m m m m m m m
TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ — m m m m m m rn m m m m m m m m m m m m m m m m m m m m m m m m m TJ TJ TJ TJ TJ n
TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T3 TJ TJ TJ TJ TJ TJ TJ TJ o o o O o o O o O o o o o o o o o o O O o O O O o o O o o O o o o O o o o o o o o o o o O o o o o CO 00 00 00 CD 00 00 CD 00 CD oo CD 00 00 CD 00 CD oo oo oo co 00 CD oo 00 00 oo 00 CD oo CD 00 oo oo 00 00 oo 00 00 00 CD 00 00 00 00 00 CD 00 CD
NT NT NT NT NT NT NT NT NT NT NT NT NO NO NT NT NT NT NO NT NT NT NT NT NT — ' — ' ~, NT Co CO 4- 4- ω ω co ω ro -? sj sl sl sl sι sl ϋl ϋι ϋι 0ι ϋι 0ι 4s. 45* 45* 45* 4s. 45*. 45* 45* 45* 4s. 4s. 4s. 4s. 0 0 ?? 45* 45* — ' O — ' Ol ϋ _, r -' — ' Cn co co o o o o cTo Q sl sl sl sl.sl sl NO NT — ' — ' O O O O O O O O O O O O OO OO O GO CO 1- O NT — ' CO NT O CD 4S. OT O O — ' __
Oτ 4S. sl ϋι sj 4s. 4s. ϋl 4s. 4s. sl Oτ 45* Oι Cn COτ ϋι Oτ ϋι ω cjι Cj-ι 45* Go 4s. Go CJT Cθ Go Oι CJι Uι C^
C> NT 4 Ul O O 0ι 00 O C0 ϋι 00 θ sl O O 45* 4s. 45* O O sI O 00 N0 N0 G0 — ■ fc t O M - ' O 45* O N0 O sI O 45* — ' CJi Ol — ' sl O Ol O O OO Oo 45* sl O sl 45* OO Co O NO O Co — ' 45* Uι O sl sj θ 45* 0 4s. 45* 0 45* O O NT O — ' CD 45* — ' NT O O Ui GO Co 45* NT 45* GO O — ' NO -g
CΛ m
D
4N 45* 45* 4s. 4s. 4s. 4S. 45* 4s. 45* 45* 4s. 4s. 45* 4-* 45* 45* 45* 4S. 45* 45* 45* 4s. 45* 45* 45* 45* 4s. 45* 45* 45* 45* 45* 4 45* 45* 45* == Cθ 00 00 00 00 00 C» CT3 C» C» C» 00 C» 00 CD 00 00 C» 00 00 00 s s| sl sI sl sl sl sl sI sI sl sJ sl si si sj o o4o5* o5* 4o5* 4o5* 4c5n*
Ol Ol Ol O o
CO G CO CO CO CO CO CO CO CO GO CO 45* 4S. CO Oι Oι O sl O Oι O Oι O CD CD 0 O 45* 45* O G0 G0 -i? oo si si co Go co co ro io io ro ro O Ol NT sI NT sl O sl CO SJ 45* 0 45* — NT — ' N O OO NO OO OO O OO OO O O O
> Go Gθ 45* 45* C NT sl O sj cj-ι 00 ϋι NT NT 00 θ 0l G0 0ι 4S. Uι G0 C0 O Go Ol o ro - ' CD CD cn c ro .
O O O O Cri OT O CJl NMT MNT MsI s*J fflOo MO MO
fc 45* 45* 45* 4s. 45* 4N 45* 45* 4N 4s. 4S. 4s. 4s. 4S. 4s. 4s. 45* 45* 4N 45* 45* 45* 45* 45* 45* 45^ OO OO C» CT3 00 00 00 00 00 00 CD OO CT3 00 CD CT3 00 CD CD 00 00 00 00 03 00 00 00 00 00 CD ∞ z
O
NT O > cn ro O rn o NT rn 45* NT NT NT rn cn Q DJ rn 00 CD cn CD cn 00 ) h GO o 4S, CQ f—l cn I sl Ch .15*
CQ cn SJ m rs s| NT sl ϊ co co O CQ CQ CQ CQ CO O CO O CQ CQ CQ CQ CQ O CO CO CO CQ ' Ol o co
CT • CD — SJ O <h SI NT 45* 45* si ϋi 45* cn 5* Sl CO o 3 m no 45* 45* rn CS O o -t=> .fc. ro CO CO 45* 45* 45* CO 45* ca 45* 45* 45* 45* 45* 45* Ca 45* 45* 45* 4
Ch CD o o ro NT NT O C N CO CO
CT NT CO • CO rn Sl sl 45* Sl r rn sj sl O ΓT 00 CD CD CT C ) 00 03 CJ O C T J CJ CT GO CT — ' ca sl vO o CO Sl m rn
N rn CD r.i Co 00 l 45* CJ 03 — • J si sl Sl CD CD CJ NT sl Cjo I SI .fc. o NT Ch ca <h I Ch NT ro CO SI o fe C NJ o rs Go r.i NT NT |S NT NT (h Sl S Ch - — '' Ch o N) 45* ' J en NT O o 03 3
CO sl CD CD Ch o ΓT 00 45* o CD cn — ■ |S rs -n rs rs rs o 45* Co >J J h CD j CO o 45* CT CO < ) CTD 45* NT CD CD cn o NT CO 45* cx> o h CD o o 45* φ
T, T. X X X X T X X X X TO Sl CD
— ' Sl CO X X X X X X Tl C o CD N) Ch CO cn ft SI NT NT ( ) " < ) rn Ch Ch i 5* o o Sl ϋi NT — ' o Cn o SI sl NT sl CO o cn rD CD CO X
' X X X 3
NT σ
S ^ ^ D O O O O ^ S ^ o? ^ — ' ^ ^ 0 0 0 0 01 01 ^ ^ 01 01 01 01 01 01 01 01 01 01 01 ^ ^ 01 01 01 01 01 01 01 01 45* 4* 45* 45* ^ ^ S CD CO N — ' O θ g ^ ^ ^ ^ sl 0 θ cjι 45* 4-* C0 01 45* 45* 45* C GO GO G CO C CO C C CO CJJ G o o co ^ ^ 0 ∞ .js N* c cn 0l „ ^ .0 03 0s 00 _l 0, ch -- — . —. —. —. —> —. — ' -' — ■ — • — ' O — ' NJ NT NT ro o cD CD O co o o :^-
ι _ι — - CJ1 NT NT 45* — > — ' — ' Cn O O CO GO — — ' NT O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ^c sJ CT0 S' Q 45* C0 s| s| s| sl sJ O G0 C0 G0 G0 CD 00 rθ C0 NT NT NT N0 — ' NT — ' NT N IO NO rO NT NT NT NO — ' NO NT IO IO IO ND 45* lθ r rθ O ro si ro o 45* o cn ro Nθ o ro ro o ro ro oo o co Go Go o — ' SJ JS- CO NO O CO O NT CD O NT NO — ' O — ' O o — ' lo io ro — ' Co ro o ιo o 3
m
0
45* 45* 4-* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4S. 4S. 45* 45* 45> 45* 45* 45* 45* 45* 4S. 4S. 4S. 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO CD O3 00 O3 O0 00 0O 00 00 00 CD 0O CT3 00 CD O0 0O CD 0O CD O3 00 CT3 CT3 CJ3 CD O0 0O 0O O3 0O σ z o øøøøøøøøøøøø Osl sOl Osl sOl sOl Osl sOl sOl M M M M M ooooooooooooo ooooooooooo ooppoooo
tO IO IO IO NT NT IO M NT NT NT W NT NT IO NT NO IO w g ∞ CD CD ∞ g sl o cD θ oo oo o o -3 y O vOo rC«o nOoo nOnO rOo sOn sOn sO vOi sOj-i On Oθ sOrι sOri sOn Ono sO NN-ι θ u? rθ NJ G U , — ' CO — — ' O CD s Q
O sl 00 03 Oi 4S. C0 C0 C0 CO G0 4s. C0 — 03 — . GO NT NT 4S- C0 00 S1 O O O O — ' O Go O NT J Oi CO sl sJ 45* O 00 O 45* NT NT .
r CΛ cn oi oi cn c 0nι 45* O 0ι 0ι 0i 0ι 45* 0ι ι Cn 0ι CJT 01 O O O O O O O O N0 O —-, Ooo o cn oo oo O — ' O Oo ϋi Co Oi NO O Cn oo Go sJ Oi IO O O fcαo O C45η* Oro QcD sJ sroJ 4o5* rro ro ro ro — ' — ' — ' C0 0 4S. — rt
Sl CJl O ~n O~ sl — ' O sl C N-) Cn O sI O O O sl CJ
Ol Oi Co O O 03 O O 4S. SI O O — ' O O O NT — ' S1 01 0 CO G0 45* — ' o ro _ θ sl N0 45* o o si sj o SI Ol 45* O W M Oi C M sl O sl K) 5
CO
Θ
4S. 45* 45* 45* 4S. 45* 4S. 4N 4N 4S. 4s. 4N fc 4S. J5* 4N Ji 45* 45* 4N 45* 4S. 45* 4N 45* 45* 4S. 4N 4S. fc O O O O O O O O O O O O O O CT3 00 CD 00 00 00 CD C» C0 CD 00 00 CD 00 CT3 03 CD C» 00 CD CT0 00 00 00 00 00 00 00 00 00 C^ o øøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøøø
CD CD CTb c» C» C» Cτb bό cT3 TO τό cτ c»
O O O O O O O O O O O O O O IO M NT NT NT IO NO NO NO NO NO NT NO NO NO NT NT NT NT NT NT NT NO NT NO rO NO NO rO NO NT NO NJ NT NO 00 00 00 03 00 00 00 00 00 03 00 03 00 00 — ■ —- —■ —. J -π -π — , __, —, _ , _, _ , —. _j _- __, _, _. _, _, —- _-, _, _, __j _- — . _, _, _. _, _- —- _- -_ i sl sl sl sl sl sl si s| sl sl sl sj sJ sJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O _, S| sl sl sj s| sl sj s| sl sj sj sl sl s| sj sl sj sl sj sj sl sl sl sl sl si sl sj sj s| sl sl sl sj sl sl sl sj sl sl sl sl sl sj sl s| sl sl sl ^ __ _- _- —. —. -_ — . _ 4 — . —. —. _. _ > -.J l sJ sJ sJ sJ sJ sJ sJ sl sl sl sJ sl sI sI sJ sJ sI sl sI sl l sJ s| sJ sl sI sI sl _sI J sJ sJ _sI sl ^
NT NT fO iθ iO NT fό fό fO r NT N NT N f Iθ fO NT NT N^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Ω O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O f O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OeO O O O O O O O O O O O O O CΛ OT CΛ Cθ σT CJT CΛ CΛ CO Cy3 σj CΛ C CΛ CO CΛ 03 W CΛ T CΛ 03 CΛ U3 CΛ CO CO CO CO CΛ CΛ CO CΛ CΛ CO CΛ CΛ CO CΛ — ■ m rπ m m m πι m m ιn m π-7 in rπ m m rπ ιn M-ι π-ι πι m m m m m m m rπ m ιn πι ιn πι rπ πι πι πι πι m rπ rπ πτ rπ rπ πι πι πι πι πι θ
-a T Tj Tj T T TJ Tj T T Tj Tj T Tj Tj Tj -D T ^ Tj T Tj Tj -0 Tj -D -D -O ^ Tj ^ Tj -D Tj T Tj T T Tj Tj Tj "O -U O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 00 O3 θ3 00 O3 00 O0 CT3 CT3 CT0 C» CD CT3 CD CD 00 CD CD 00 00 00 00 CD 00 00 00 C» 00 C» 00 CT3 a
—. _-. N3 _. _- —. -_. _■ -_, _, _. _, —. _, _, _, — , -π ^π _π -π -_ -π -_ -n -π _π _π -π -π -π _H -π —π _π _π _π -π -_ (_n
O Ol NT NT NT NT NO — . O O — ' O NT rO NT NT NT NT NT NT NO NO NO NT NT NT NT NO NO NO NO NT NT NT NT NT — ' — — ' NT NO lO lO NT NO IO lO ^r 00 45* O O O O O 00 — ' CO OO SI sj sl sl sl si sI sI sl sI si sl sI sJ sl Ol CO — ' — ' — ' O O O O O O O OO sI OO OO sl OO Oo Oo Q si — ' SI sI O CO CD Oo CD sj Oo sl Oo θθ 0 0 0 0 0 4_ 0 03 NT NT O Oι Uι Go sl sj sl 4_ O O Oι 0 4— Ol Oo -
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
49 LG:898771.1 2000SEP08 2240520F6 120 446
49 LG:898771.1 2000SEP08 2240520H1 120 221
49 LG:898771.1 2000SEP08 6862047H1 178 457
49 LG:898771.1 2000SEP08 g2209650 294 787
49 LG:898771.1 2000SEP08 6216076H1 320 801
49 G:898771.1 2000SEP08 7712493J1 396 857
49 LG:898771.1 2000SEP08 7712493H1 396 857
49 LG;898771.1 2000SEP08 7039186H1 419 987
49 LG:898771.1 2000SEP08 6978931 R8 464 1 158
49 LG:898771.1 2000SEP08 g2209760 571 1048
49 LG:898771.1 2000SEP08 6895681 R8 914 1471
49 LG:898771.1 2000SEP08 6895723R8 935 1265
49 LG:898771.1 2000SEP08 6978931 F8 1 103 1668
49 LG:898771.1 2000SEP08 6978931 HI 1317 1666
49 LG:898771.1 2000SEP08 7204172H1 1325 1860
49 LG-.898771.1 2000SEP08 7714080H1 1623 2053
49 LG :898771.1 2000SEP08 6819816H1 1692 1890
49 LG:898771.1 2000SEP08 6836688H1 1707 2284
49 LG:898771.1 2000SEP08 6349781 H2 1758 1940
49 LG-.898771.1 2000SEP08 7280863H1 1801 2204
49 LG:898771.1 2000SEP08 6210267H1 2195 2297
49 LG:898771.1 2000SEP08 7238641 HI 2189 2303
50 U:457478.1 :2000SEP08 g!618619 1 235
50 LI:457478,1 :2000SEP08 6850916H1 1 563
50 Ll:457478.1 :2000SEP08 2051259H1 81 365
50 LI:457478.1 :2000SEP08 3036424H1 161 446
50 LI:457478.1 :2000SEP08 g2356855 180 538
50 LI;457478.1 :2000SEP08 g4509558 344 571
51 U:125140.1 :2000SEP08 g639805ό 1 248
51 U:125140.1 :2000SEP08 7644338H1 1 412
51 LI:125140.1 :2000SEP08 4434081 F6 329 813
51 LI:125140.1 :2000SEP08 4434081 HI 330 600
51 LI:125140.1 :2000SEP08 3371506H1 479 638
51 LI:125140.1 :2000SEP08 6832669H1 586 1 146
51 LI:125140.1 :2000SEP08 7699223J1 636 1233
51 LI:125140.1 :2000SEP08 1998681 HI 710 964
51 LI:125140.1 :2000SEP08 7644338J1 846 1410
52 LI:021095.2:2000SEP08 3232873H1 1 271
53 LI:888730.1 :2000SEP08 6798649H1 3 303
53 LI:888730.1 ;2000SEP08 6790981 F8 1 502
53 LI:888730.1 :2000SEP08 6795715F8 3 576
53 LI:888730.1 :2000SEP08 6795715H1 3 515
53 LI:888730.1 :2000SEP08 6798649F8 3 493
53 U:888730.1 :2000SEP08 8081234H2 410 934
53 LI:888730.1 :2000SEP08 6798649T8 604 792
53 LI:888730.1 :2000SEP08 8081303H2 337 874
53 LI;888730.1 :2000SEP08 gl 614937 615 921
53 LI:888730.1 :2000SEP08 6793426T8 667 798
53 Ll:888730.1 :2 000SEP08 glόlόl71 724 919 CΛ m
CJi ϋι Oi 0ι ϋi 0ι ϋι CJi ϋι Oι CJι cjι Oι CJi Oι Cn cjι Cn Cn 0i 0ι Oι 0ι Oι Cn 0ι Oι 0i 0ι Oi 0ι Ui 0ι Ui 0ι 0ι Ui Oι 0i O^ __ O z.
O
CO O ro CO O ro ca ro ro ca a ro ro CO ro ca CO Co ca ro ca ro ca ro CO ca ca O CO CO ca ca CO GO CO ro CO Co ca co CO Co CO ca ro O0 CD 00 00 n cn rπ rn cn rn rn rn rn cn cn cn rn rn rn cn rn c n cn cn rn cn rn CΛ cn cn rn cn cn rn n cn cn cn cn cn cn cn cn n Ol cn cn cn O CD 00 00 no oo no no CO no no oo no CD CD CD CTO CD no r» r» (D CD (D CD CD D CD CD CD ca D CD CO no CD CD CD CD CD CD CD CO CD c» 00 CD CJ3 D CD 00 CD O si si Sl si si SJ SJ SJ sl sl SI SJ SI si Sl sl sl sl SI S| s| si sl Sl si sl Sl Sl SJ s SJ sl Sl SJ sl SI Sl si Sl sl Sl sl SJ SI SI sl co
O o O O O O O O o O O o o O o o o O O O O o O o o o o o o o o o o o o o o O o o Φ1
NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT ro NT NT NT NT NT NT NT NT NT NT NT NT NT NO NO NT 3 r T r ) CT CD ΓT C T CD r T C T CT r i o CT ΓJ J CD CD C ) CJ C ) Ό r i ( ) < ) < ) Γ> T c ) < ) C) c > r ) ! ) CJ c J CD CJ C T J CJ J C)
ΓT CD r ) ri ΓT c > r T ( > < ) c ) c ) CJ n
< ) CD CJ C ) C) J J CJ C) CT U CJ -r*- n cn cn cn CΛ Cn r cn ro ro CΛ Cn cn CΛ CΛ ro cn CΛ CΛ cn CΛ CO ro (Λ CΛ CΛ cn CΛ CΛ CΛ CΛ r/j cn CΛ CΛ CΛ CΛ CΛ CΛ co CO CΛ CO C CΛ CΛ C JT CΛ cl) m m m m m m m m m m rπ m m m m m m m rπ m m m m m rπ m m in m m m πi m m m rπ in rn m πi m m rπ m rn III πi in m
TI TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T) TJ TJ TJ T) TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ ΓJ TJ TJ TJ TJ TJ U ΓJ r T c > r i ΓT C) CD < ) r i ΓT r T CD C T o r ) CD n r > c > CD CD c T < ) r T C ) D π D CD CD CD 00 CD CD 00 oo 00 D CD D 00 CD CD 03 CD CO 00 CD 00 oo 00 CD 00 CD D CD 00 00 CD CD CD CO 00 D 00 CD CD 00 D CD CD CD CO CD 00 CD
— > — ' — ' m NT NT rO OO OO OO J- Go iO NO NO NO sI sl sl sl NO NO NO r W-ι r-ι nCJ ^0 0Q 00op θq3 θ^3 s(sl osl __ 1 ϋNTι 0N0ι ϋ_Jι 0ι 0sll ϋOι O4_ 0o ssll si O O O O fg Cn ϋi ϋi — i — - — — ' SJ O ϋi ϋl Ol Oo Co -- — • CD 00 03 r-, r« rO si CO CO C0 1- U1 CO C0 45* — ' CO S1 0 03 0 45* 45* 0 — ' O C CO NT ^ øC a 00 CD O O 00 Oo 45* 45* sJ O Cπ rθ CO O > cf
--
ro O 4s. 0i O O 00 O O Ul O sI CO Ol
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
54 LI-.358719.1 2000SEP08 3585294F6 103 544
54 LI:358719.1 2000SEP08 70247644V1 584 837
54 LI:358719.1 2000SEP08 71645502V1 620 1030
54 LI.-358719.1 2000SEP08 71642987V1 615 1337
54 Ll:358719.1 2000SEP08 70258411VI 664 880
54 Ll:358719.1 2000SEP08 71666717V1 680 977
54 Ll:358719.1 2000SEP08 5674640T6 718 1228
54 U:358719.1 2000SEP08 3315732T6 719 1291
54 Ll:358719.1 2000SEP08 71642544V1 722 1259
54 LI:358719.1 2000SEP08 70249638V1 358 919
54 Ll:358719.1 2000SEP08 70256178V1 254 701
54 Ll:358719.1 2000SEP08 70249314V1 444 531
54 Ll;358719.1 2000SEP08 71684549V1 472 718
54 LI.-358719.1 2000SEP08 71641793V1 253 882
54 Ll:358719,l 2000SEP08 71644835V1 250 813
54 LI.-358719.1 2000SEP08 3461069H1 250 503
54 LI1358719.1 2000SEP08 71686758V1 249 463
54 Ll:358719.1 2000SEP08 70249514V1 1103 1265
54 Ll:358719.1 2000SEP08 71649473V1 1120 1205
54 LI:358719.1 2000SEP08 70256285V1 254 733
54 Ll:358719.1 2000SEP08 70257039V2 254 628
54 Ll:358719.1 2000SEP08 3593421 HI 283 591
54 Ll:358719.1 2000SEP08 3592638H1 283 564
54 LI:358719.1 2000SEP08 3459843H1 283 516
54 Ll:358719.1 2000SEP08 70255330V1 292 787
54 Ll:358719.1 2000SEP08 71501490V1 301 800
54 Ll:358719.1 2000SEP08 71526655V! 425 607
54 Ll:358719.1 2000SEP08 3590122H1 1 317
54 Ll:358719.1 2000SEP08 70506883V1 825 1297
54 Ll:358719.1 2000SEP08 3460259H1 252 497
54 Ll:358719.1 2000SEP08 71640275V1 253 800
54 Ll:358719.1 2000SEP08 3591414H1 253 580
54 Ll:358719.1 2000SEP08 5674646F6 253 566
54 Ll:358719,l 2000SEP08 3462962H1 252 377
54 Ll:358719.1 2000SEP08 3461227H1 253 507
54 Ll:358719.1 2000SEP08 3585880H1 253 358
54 LI-.358719.1 2000SEP08 70256157V1 254 830
54 LI.-358719.1 2000SEP08 70255182V1 254 808
54 LI:358719.1 2000SEP08 3315732F6 254 772
54 Ll:358719.1 2000SEP08 70255417V1 846 1324
54 Ll:358719.1 2000SEP08 70255818V1 860 1315
54 Ll:358719.1 2000SEP08 70254446V1 869 1297
54 LI:358719.1 2000SEP08 70254414V1 870 1274
54 LI.-358719.1 2000SEP08 70251204V1 254 349
54 Ll:358719.1 2000SEP08 7274539H1 283 763
54 Ll:358719.1 2000SEP08 3592746H1 253 567
54 Ll:358719.1 2000SEP08 3589295H1 253 562
54 Ll:358719.1 2000SEP08 70255284V1 254 583
54 Ll:358719.1 2000SEP08 5674847H1 253 522 CO m
Cn θl Oι Oι Cn ϋl CJl CJl CJl Oι Ol Uι Ol Oι Ol Oι Oι Oι Oι Ol Oι Cn ϋι Ul Oι Uι Oι Uι Oι Oι Oι Oι Cn cn θι Oι Uι Oι Oι CΛ Ol Ol ø θ o o o o o o o o oι cjι c_n ϋi θι θi c_n oι oι θι cjι cjι θι oι cjτ oι cjτ cn θι c_n oι cjι θι <_^ Ό o rO rO NO NO rO NT NO NO NT NT CO GT Co Co CO CO CO W GO Go CO CO C CO CO GO CO CJ CO Cjo W
Oι 0ι 0ι ϋι 0ι 0ι ϋl 0l ϋι cjι 0ι 0ι ϋι ϋι 0ι 0ι 0ι 0ι cn cjι ϋι 0l ϋι 0l 0ι 0ι 0ι 0ι 0ι ϋι cπ θι ϋl ϋι 0ι 0ι 0ι 0ι 0ι ϋι 0ι 0τ ϋι 0ι cn θι 0ι ϋl 0ι O O O O O O O O O O O — ' —. —- — . _- — < _- _■ —. — , _. —. _ i _. _ ι _ι _ι _ι _j _ι _ι _ι _4 _ ' C» 00 CT3 O0 O0 00 O3 CD 0O O3 CD O0 00 O0 o o o o o o o o o o o j ω rj ω ω ω w ω ω w cώ to Cύ ω ω ω ω ύ ω ω ω ω w ω M M M si M ' — > — ' — ' — . — . — > — — > — ■ — > ,
O O O O O O O O O O O NT NT NO NO rO NO NT NO NT NO NO NO NT NT NT NT NO NO NO rO NT NT N rO O O O O O O O O O O O O O O ^ ro ro N NT Nτ ro ro ro Nθ Nτ ω oo cjo G ω
NoT NoT NoO NoT NoT NoO NT NO ro NO NO NT NT NT NT NT NT NT NT NT
O O O O O OoO OooOoOoOoOoOoOoOoOoOoO OooO OooOoOoO OooOoOoO OooOoOoOoOoOoOoOoOoOoOoO OooOoO OooOoO OooO OoQ-i O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O →- W CΛ CΛ CΛ W CΛ CΛ ω Cn CΛ M CΛ CΛ W W W W CΛ CΛ Cn C l W CΛ CΛ CΛ W M ω CΛ CΛ W Cn m m m m m rn m m rπ iTi rn m m m πi m m m πi rn m m rπ m πi m rπ πi rri m rn m rπ iii rπ iTi m
TJ TJ TJ TD TT "O TJ T) T) TJ TT TJ -Η TJ -Η TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T) TJ -α TJ TJ T^
Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
O0 CT0 CT3 C» CD 00 00 0θ CD CD 00 CD C» CD O3 00 00 C» 00 00 CT3 θ3 θ3 C» CD C» 00 00 CD 00 0o 00 rø
Go Go Go Co O O sl sl co 45* CO ?^ 00 o _, _, 45* C ϋι 45* CO Ol 45* Cθ NT CO CO NT — ' — ' NT CO NO NO N0 4-* NT NT 5 45* 45* 45* 45* CO Sl NO N0 45* sl 45* Cn ^ _ 45* si ^ CD CD —> 00 CO si 45* si O Ol — ' Oi Oo O NT — ' Ol si sl — 45* O sJ 4-* Cn Cn ϋι Co ϋι Ol Q
N N NO '-' NT sl N ^ NT NO g ^ co ^ O ^ ^l g 45* 45* ϋ O CO NT Ol NT O — ' O CJ1 01 0 4S. 0ι O Uι Co O 45* O 4s. ϋι NT G0 C0 _
o 45* Oo 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 45* ϋl ϋi ϋl Ol sl sJ sI O ^ Fl θo O O Oι Oι Go sι sI Uι C» 45* O Oι Oι C5
45* oϋl O SI o NT 45* Ol -• >j P O CO G0
Co oo ϋi CD NT Cπ Cn ϋi Oi sl sl sj sl sl Oo — ' — ' NO sl NT NO O O H — ' N IOT £4N* NN0 C0»0 CCOD CC O0O0 —— '' OCD — ' O NTT N NTT n O SI o NT O NT O — CO c O O O 45* N0 O O O Oι 0ι 0ι 0ι sl 4_* 4S. SI O NT IO NT CD 00 o NO si sl NT N0 45* 00 O O CJl Cn — ' — NT -« -σ
TABLE 5
SEQ ID NO: : Template ID Component ID Start Stop
56 LI:256099.2:2000SEP08 70191812V1 620 1231
56 LI:256099.2:2000SEP08 71621090V1 627 765
56 U:256099.2;2000SEP08 496221 HI 176 404
56 LI:256099.2:2000SEP08 71 191285V1 1222 1871
56 LI:256099.2:2000SEP08 71616554V1 466 1208
56 LI:256099.2:2000SEP08 71 132568V1 1227 1702
56 LI:256099.2:2000SEP08 71 129593V1 1232 1446
56 LI:256099.2:2000SEP08 2721 125H1 1237 1497
56 LI:256099.2:2000SEP08 71619624V1 1249 1895
56 U:256099.2:2000SEP08 70191992V1 389 925
56 LI:256099.2:2000SEP08 3774992H1 389 721
56 LI:256099.2:2000SEP08 6934512H1 1368 1878
56 LI:256099.2:2000SEP08 71 132775V1 1371 1876
56 LI:256099.2:2000SEP08 220732R1 81 1 1438
56 LI;256099.2:2000SEP08 71 131234V1 817 1443
56 U:256099.2:2000SEP08 71304572V1 1267 1599
56 LI:256099.2:2000SEP08 71 131838V1 48 351
56 U:256099.2:2000SEP08 2174518H1 51 315
56 LI;256099.2:2000SEP08 3572842H1 78 369
56 LI:256099.2:2000SEP08 gl561727 80 1907
56 LI:256099.2:2000SEP08 876929H1 235 348
56 LI:256099.2:2000SEP08 71616214V1 1216 1880
56 LI:256099.2;2000SEP08 71 190402V1 994 1648
56 LI:256099.2:2000SEP08 71 191544V1 996 1630
56 LI;256099.2:2000SEP08 70192173V1 1000 1583
56 LI:256099.2;2000SEP08 71 131089V1 152 499
56 LI:256099.2:2000SEP08 3947429H1 151 449
56 LI:256099.2:2000SEP08 70191815V1 807 1341
56 LI:256099.2:2000SEP08 220732H1 812 1039
56 LI:256099.2:2000SEP08 71617637V1 1204 1895
56 LI:256099.2:2000SEP08 70193242V1 1224 1508
56 LI:256099.2:2000SEP08 71 191065V1 151 692
56 LI:256099.2:2000SEP08 71 188528V1 151 829
56 LI:256099.2:2000SEP08 71620277V1 768 1449
56 LI:256099,2:2000SEP08 71620252V1 768 1448
56 LI:256099.2:2000SEP08 70192498V1 342 797
56 U:256099.2:2000SEP08 3739474H1 501 823
56 LI:256099.2:2000SEP08 3383537H1 576 841
56 LI:256099.2:2000SEP08 71657051V1 589 1003
56 LI:256099.2:2000SEP08 70192196V1 602 1 136
56 LI:256099.2:2000SEP08 71620637V1 342 1049
56 LI:256099.2:2000SEP08 71618785V1 342 980
56 LI:256099.2:2000SEP08 71 129139V1 1 183 1325
56 U:256099.2:2000SEP08 71616871V1 1204 1897
56 LI:256099.2:2000SEP08 g 1384884 1412 1892
56 LI;256099.2:2000SEP08 71597459V1 955 1716
56 LI:256099.2:2000SEP08 71622214V1 982 1359
56 LI:256099.2:2000SEP08 5168528H1 970 1301
56 LI:256099.2:2000SEP08 71 190616V1 151 721 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 56 LI:256099.2:2000SEP08 g1616070 183 335 56 LI:256099.2:2000SEP08 7710212H1 198 761 56 LI:256099.2:2000SEP08 6947589H1 217 785 56 LI:256099.2:2000SEP08 832562R1 106 611 56 U:256099.2:2000SEP08 71133469V1 1024 1470 56 LI:256099.2:2000SEP08 6538702H1 1047 1447 56 LI:256099.2:2000SEP08 2757434H1 1081 1377 56 LI:256099.2:2000SEP08 71619911V1 1012 1710 56 LI:256099.2:2000SEP08 71191068V1 1017 1374 56 LI:256099.2:2000SEP08 70191312V1 342 861 56 U:256099,2:2000SEP08 1461274H1 492 767 56 LI:256099.2:2000SEP08 71129244V1 520 1195 56 U:256099.2:2000SEP08 70191690V1 342 782 56 LI:256099.2:2000SEP08 5529378H1 1313 1558 56 LI:256099.2:2000SEP08 g1388447 1304 1623 56 LI:256099.2:2000SEP08 71132084VI 1280 1850 56 LI:256099.2:2000SEP08 220732F1 1279 1891 56 LI:256099.2:2000SEP08 71130570V1 1172 1308 56 LI:256099.2:2000SEP08 71190752V1 1165 1476 56 LI:256099.2:2000SEP08 71618692V1 342 968 56 LI:256099.2:2000SEP08 70194001VI 1374 1714 56 LI:256099.2:2000SEP08 71129545V1 1383 1891 56 LI:256099.2:2000SEP08 70191322V1 1410 1843 56 LI:256099.2:2000SEP08 71618890V1 1426 1915 56 LI:256099.2:2000SEP08 70192505V1 1426 1891 56 LI:256099.2:2000SEP08 71188839V1 1428 2011 56 LI:256099.2:2000SEP08 71131985V1 602 1013 56 LI:256099.2:2000SEP08 832562H1 106 213 56 LI:256099.2:2000SEP08 1784941 HI 106 349 56 LI:256099.2:2000SEP08 71190661VI 151 751 56 LI:256099.2;2000SEP08 5754860H1 147 651 56 LI:256099.2:2000SEP08 71130477V1 818 1508 56 LI:256099.2:2000SEP08 gl 616064 183 514 56 LI:256099.2-.2000SEP08 2346852F6 48 551 56 U:256099.2:2000SEP08 2346852H1 48 286 56 LI:256099.2:2000SEP08 7688858H1 1 469 56 LI:256099.2:2000SEP08 71175881V1 1176 1602 56 LI:256099.2:2000SEP08 g1403337 271 1705 56 LI:256099.2:2000SEP08 2568292H1 1263 1522 56 LI:256099,2:2000SEP08 71188630V1 1165 1477 56 LI:25ό099,2;2000SEP08 71619837V1 797 1429 56 LI;256099.2:2000SEP08 71621179V1 762 1454 56 LI:256099.2:2000SEP08 832627H1 106 334 56 LI:256099.2:2000SEP08 71191604V1 1267 1866 56 LI:256099.2;2000SEP08 71596620V1 1267 1367 56 LI:256099.2:2000SEP08 71615643V1 737 1448 56 LI:256099.2:2000SEP08 71189450V1 151 729 56 LI:256099.2:2000SEP08 3947429F6 151 638 56 LI:256099.2:2000SEP08 71189442V1 151 , 751 TABLE 5
Q ID NO: : Template ID Component ID . Start Stop
56 LI:256099.2:2000SEP08 71 191531V1 151 654
56 LI:256099.2:2000SEP08 71188842V1 151 718
56 LI:256099.2:2000SEP08 71620734V1 1329 1928
56 U:256099.2:2000SEP08 7688858J1 1326 1818
56 LI:256099.2:2000SEP08 70191412V1 1342 1604
56 LI:256099.2:2000SEP08 7355493H1 423 1 1 12
56 LI -.256099.2-.2000SEP08 70192354V1 1 107 1675
56 LI:256099.2:2000SEP08 1620373 VI 829 1585
56 LI;256099,2:2000SEP08 71 188152V1 827 1 180
56 LI:256099.2:2000SEP08 71617256V1 826 1422
56 LI:256099.2:2000SEP08 71617076V1 848 1437
56 L1:256099.2:2000SEP08 71620139V1 843 1461
56 LI:256099.2:2000SEP08 71617242V1 845 1600
56 LI:256099.2:2000SEP08 71619967V1 846 1493
56 LI-.256099.2:2000SEP08 71617491V1 850 1476
56 LI:256099.2:2000SEP08 71 155601 VI 860 1376
56 U:256099.2:2000SEP08 3880990H1 880 1 195
56 LI:256099.2:2000SEP08 71593959 V 1 888 1417
56 LI:256099,2:2000SEP08 71618259V1 1 154 1650
56 LI;256099.2:2000SEP08 3852851 HI 1134 1386
56 LI:256099.2:2000SEP08 7930642H1 1 149 1696
56 LI:256099.2:2000SEP08 g 1382754 1 152 1555
56 LI:256099.2:2000SEP08 71618055V1 1084 1830
56 LI:256099.2:2000SEP08 71 131975V1 1306 1521
56 LI:256099.2:2000SEP08 3947429T6 1312 1868
56 LI:256099.2:2000SEP08 g 1494168 1330 1496
56 LI;256099.2:2000SEP08 70192164V1 1340 1798
56 LI:256099.2:2000SEP08 4623607T6 1344 I860
56 LI:256099.2:2000SEP08 71621424V1 1362 1996
56 LI;256099.2:2000SEP08 067638H1 1361 1543
56 U:256099,2:2000SEP08 70191215V1 342 783
56 U:256099.2:2000SEP08 70191765V1 342 450
56 LI:256099.2:2000SEP08 70191531V1 342 736
56 LI:256099.2:2000SEP08 6874432H1 353 1018
56 LI:256099.2:2000SEP08 5302829H1 89 368
56 LI:256099.2:2000SEP08 5329667H1 102 241
56 LI:256099.2:2000SEP08 71618790V1 1428 1886
56 LI:256099.2:2000SEP08 71 130063V1 1437 1894
56 LI:256099.2:2000SEP08 71617089V1 1441 1887
56 LI:256099.2:2000SEP08 71616940V1 1440 201 1
56 LI:256099.2:2000SEP08 6434868H1 1453 1897
56 U;256099.2:2000SEP08 4402160H1 1462 1719
56 LI:256099.2:2000SEP08 g2318383 1463 1896
56 LI:256099.2:2000SEP08 71 13i n0Vl 1463 1892
56 LI:256099.2:2000SEP08 gl 615965 1469 1893
56 LI:256099.2:2000SEP08 g4899694' 1471 1898
56 LI:256099.2:2000SEP08 70192423V1 1472 1886
56 LI:256099.2:2000SEP08 g 1382698 1474 1895
56 LI:256099.2:2000SEP08 g4898934 I486 1898 CO m
D
Oi Ol Oi Oi ϋi Oi Oi Oi ϋi Oi ϋi Oi Ol Ol ϋi Oi Oi Oi Uι ϋι 0ι 0ι cn θι 0l 0ι ϋι 0ι ϋι 0ι 0ι ϋι 0l 0ι ϋι 0ι 0ι ϋι 0ι ϋι 0l ϋι ϋι 0l ϋι 0ι 01 0ι ϋi i=; Oo Oo OO CD OO Oo OO OO Oo CO sl sl sl sl sl sJ sl sj sJ sl sl sl O O O O O O O O O O O O O O O O O O O O O O O O O O O O o
si , si sj sl si o > o O Ch sl Sl CQ CQ CQ CQ £CQ CO " cn or r-z Ol r , o _. S oI _ Sl si si si sj sl __
-n CD o s| SI GO <D <Q O — ' o O^ GO CQ Ώ CQ CQ CQ CQ CQ CQ CQ CQ O CD CD CΩ ZJ i — sl CD Ch 00 O O NO NO C Ol C NT o — O NJ O G0 01 G0 C0 N0 NT C0 CO NT 4S, — • — ' 4s. 3 m sl O o O si si NO ^ co
4S.O sl s os O ~ si j O O — ' O NT NT O CD i sj sl GO O si O O
— ' NT 00 ca — . o Cn _ 0 --0 O O sj TJ
NT o SI CO O CO sl s NT CO 01 Ol CO NT CD NT NT o
O cn ca 45* O CO sl 00 o " > CJl £ N O SI NT — ' O NT O S1 CD O S1 SI 0 03 0 03 45* — ■ co ji. o n
3
NT CD NT CJ CO Ol NT 45* 4S. *.*2 4S. sl O O Ol O ϋi CO CO - Ol Ol sj Ol NT Co O TO s Ol Ol 03 — ' ZZπ 00 o Φ
NT O O CD O X — i O o ^! cn o • ■ ■ — ■ <x Ol
< < < < < < X X sl O to o O - 4 -5* O o < < < sj 3
Ό
N0 C0 01 CO U1 4S. O O 4S. - J r-. Co — ' — ■ <5 sj 4^ CN o ω ω oo N si si -- 4-. i "^ ; l ^ι ^ l ^ Λ ^ ^ ^ ~ ^ Λ ^ ^ -2
— ■ o — ' sj o to o co oo tO f-i f^ O NT — ' ro ro ro si — ' si cji oi si - c r ^ ^ o n r o w - α M fo w N M n r NT ω O ^ ^ ^ g ∞ ^ Ϊ ^ S ^ tt c g -i
O CO O O O O CD OO OO OO ,--, rs CD CO O NT Cn Ol NT CO 0ι O O O O O 45* 45* O S f3 Ch O G0 Uι 03 θ O O 45* OO CD CO — ' 45* CJl S? 0 — ' CO GO CO OO O O O O O O O O O O CD OO ^;
O I sl sl sl si sI sl Oo O sl O O O
~ sl O o o O cn o O C oD C 4Os. S Qj ^ 5 ι si o ω αι j-, oo si cnI ^ c COn O O GO tζ js. — ' O OO O O s O O si — . ul ul Co 45* O Oo 45* — ' NT CJl O — ' CD O NT O sJ ϋi — ' O
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
58 LG:980769.1:2000SEP08 g2574686 551 830
58 LG:9807ό9.1:2000SEP08 6777292H1 214 830
59 LG:332474.3:2000SEP08 3334515H1 24 284
59 LG:332474.3:2000SEP08 4833889F8 60 462
59 LG:332474.3:2000SEP08 4833889F9 60 676
59 LG:332474.3:2000SEP08 4833889H1 60 313
59 LG:332474.3:2000SEP08 7584265H1 62 511
59 LG:332474.3:2000SEP08 6858863H1 130 574
59 LG:332474.3;2000SEP08 3334515F6 24 420
59 LG:332474.3:2000SEP08 2290063H1 1 258
59 LG:332474.3:2000SEP08 6858863F8 132 698
59 LG:332474.3:2000SEP08 g1975674 260 544
59 LG:332474.3:2000SEP08 gl 991774 357 673
59 LG:332474.3:2000SEP08 6858863T8 392 1015
59 LG:332474.3:2000SEP08 4833889T8 464 869
59 LG:332474.3:2000SEP08 3334515T6 481 1035
59 LG:332474.3:2000SEP08 5428021 F7 521 1069
59 LG:332474.3:2000SEP08 5428021 HI 521 776
59 LG:332474.3:2000SEP08 g5848557 739 930
59 LG:332474.3:2000SEP08 g5233487 771 1093
59 LG:332474.3:2000SEP08 4773368H1 945 1088
59 LG:332474.3:2000SEP08 4773368F6 945 1088
60 LG:1087707.1 2000SEP08 4333661 HI 428 684
60 LG:1087707.1 2000SEP08 7644435J1 536 618
60 LG:1087707.1 2000SEP08 2205615T6 672 1197
60 LG:1087707.1 2000SEP08 g6397537 782 1231
60 LG:1087707.1 2000SEP08 4150193F6 296 804
60 LG:1087707.1 2000SEP08 4150193H1 296 556
60 LG:1087707.1 2000SEP08 4914013H1 331 585
60 LG:1087707.1 2000SEP08 5044566H1 380 657
60 LG:1087707.1 2000SEP08 5044566F6 380 568
60 LG:1087707.1 2000SEP08 gόόόlόll 1 463
61 LG.-415349.1.-2000SEP08 g5178218 525 946
61 LG:415349.1:2000SEP08 g4190232 540 1005
61 LG:415349.1:2000SEP08 g6710036 534 989
61 LG:415349.1:2000SEP08 g4852090 530 986
61 LG:415349.1:2000SEP08 g4305494 520 985
61 LG:415349.1:2000SEP08 7617296J1 463 959
61 LG:415349.1:2000SEP08 gό993309 575 1024
61 LG:415349.1:2000SEP08 g5914430 565 1016
61 LG:415349.1:2000SEP08 g3679677 542 1005
61 LG:415349.1 :2000SEP08 6991133H1 1 269
61 LG:415349.1:2000SEP08 6991888H1 5 493
61 LG:415349.1:2000SEP08 3101689H1 26 294
61 LG:415349.1:2000SEP08 3101689F6 1 282
61 LG:415349.1:2000SEP08 7355332H1 166 748
62 LG:132420.2:2000SEP08 4435742T9 55 677
62 LG:132420.2:2000SEP08 6560279T8 53 677
62 LG:132420.2:2 000SEP08 4435742F8 1 558 CO m © o o O O O O O O O O O O O O O O O O O O O O
CjloCnoϋι Con θoι 0o1oUl UioOloOιo45*o4No45*o45* 4o5*o45*o45*o45*o4No4-*o45*o45*o45*o45*o45*o45*o45*o4s. 4S. 4S. 4S. 4S. 4S. 4S. 4S. C C CO C GO GO GO CO NT NT NT NT NT
O
0000000000000000000000000000000000000 000000000000 O ro NO ro ro io io NO NJ N ^ ^ ^ "-^ ^ '-^ ^ ^ ^ '-^ ^ ^ ^ ^ ^ Co CO Go Co CO CO CO
45* 45* 45* 4-* 45* 45* 45* 45* 45* 45* 0 0 0 O 0 O 0 O 0 0 0 O 0 O 0 O O O O 0 0 O O 0 0 0 0 O O O O o GO ca ca CO
N0 NT N0 NT N0 NT NT NT N0 NT O O O O O O O O O O O O O O O O O O O 45* 45* 4* 4 r5* 4 -5* 45* 45* 45* NT NT NT NT
-i _- -i — —■ — < —■ —• — ' O O O O O O O O O O O O O O O O O O O O O O O O O O NT NO NT NT NT NO NT 45* 45* 45* 45* O O O O O O O O 00 00 00 CT3 00 00 00 00 00 CT3 CT0 00 C» C» 03 03 CD 00 C» CD C0 CT3 ∞ O O O O O C ) NT NT
-H -H — < — < — ■ — ■ — — i — ' CD OO CJO OO OO CTO ∞ CTO OO OO OO CD OO OO OO OO OO OO CTO OO OO QO OO OO OO OO —' N) NT
O o o O Φ
X X X X X X X X X X ^ ^ ^ ^ ^ ^ i .^ ^ .^ ^ .-^ .^ .^ ^ ^ ^ ^ ^ ^ ^ Ji' ^ ^ ^' ^ X NT NT ro NT 3 NO NO NT NT No ro NO N NO ro : "''' :-7, . ' : ' .- :^ NO NT NO NT NT NT NT NT NT NT NT .
O O O O O O O O O O NT NT NT NT NT NO NO NO NT NT NO NT NT NT NO NT NT NT NT NO NT NT NT NT N-i NT O O O O CD C T C ) r T D CD CD O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O o ( ) C ) C ) o o O -+ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ) o CΛ CΛ CO CO CΛ CΛ Λ CO CO CO O O O O O O O O O O O O O O O O O O O O O O O O O O CΛ o
CΛ CO co ( CΛ) C ) C _ Φ
CΛ CΛ CO C CΛ CO m m m m m m m m m m w M CΛ Λ W w cΛ w Λ Λ M cn cΛ W Λ co Λ w w w w cn cn CΛ CΛ Λ rri m m m m m m rπ m m TD TJ TJ TJ T3 -O T3 TJ TJ Tj rπ m rπ m rn m rπ π-! rπ ιn m rπ rπ m πι m m rπ πι rπ πι πι m m πι m τι TJ TJ TJ T TJ TJ TJ
O O O O O O O O O O TJ TT TJ TT TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ O CD CD O ~ < T C T C o) 03 CD CO OO OO OO OO OO CD CD O O O O O O O O O O O O O O O O O O O O O O O O O O OO oo oo o oo 00 00 00 CD CD co oo
O3 CTO CT0 CT3 CD C» CD O0 0O O0 00 CD 00 0O CD 00 00 0O CD CTO CD 00 0O 00 0O 00
. — r θ0 ^ CN ϋι ϋι Oι ϋι Oι 45* 45* Go co Go rθ - O O — n c, N 'sT^ N luO Oll ϋ Uil CJDH S-l 8 8 8 o g Nθ o oι cn C cjτ N θ ϋι g ω ω o - ^ 45* — — ■ — • — — ' — — • — • — ' M ^ S -O CO M Cύ -O l
NT 45* O O O Cn CO O O co ^ ^ O C CO CO Oi Oi Ui 45* — ' 0 0 45* '-' w O CD O CO GO NT 00
O NT — ' NO NO — ' O Co 45* Co O CD 00 CD C-. O 00 CD sl 00 θ 45* 0l N0 C0 rθ O G0 NT NT — ' ro 45* 45* Oi CJl O O SI sj s! _ r, fT1 _ Ol — ' Ol N — ' CD Ol 00 O SI CO O sl sl zi o o o oι 45* o 45* ro o c» ro co o o o 45* bo bo o p ^-' θι θ 45* o c ϋι ^ n cn ^ i — ' SI NO si — ' O O ^Q sJ O Ui 00 C0 NT NT sl O sl O C0 O C0 θ sl 45* _Z o o o O O CJ1 4S. CO N0 SI O O 45* .fc* O 0O G0 GO — ' Oi ϋ CD -n P
CO m
D o CDoCDoCT3oCT3o00o00o00o00o00o00o00o00o00osloslosloOoOoOoOoOoOoOoOoOoOoOoOoOoUioUloUio0io0ioUio0io0loUio0io0lo0lo0To0lo0lo01oUioUlo01 o
σ
—'
—<
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 68 LG:979390.2:2000SEP08 g991279 1139 1387 68 LG:979390.2:2000SEP08 g922216 1024 1410 68 LG:979390.2:2000SEP08 g989914 1153 1399 69 LG:1400447.1;2000SEP08 1375814H1 261 292 69 LG:1400447.1:2000SEP08 1379688F6 261 374 69 LG:1400447.1:2000SEP08 7373664H1 1 396 69 LG- 1400447.1 :2000SEP08 1375814F1 53 431 69 LG:1400447.1:2000SEP08 1379688T6 96 629 69 LG:1400447.1.-2000SEP08 g697373 422 636 70 LG:1400562.1:2000SEP08 734146R6 50 463 70 LG:1400562.1:2000SEP08 920342T6 50 463 70 LG:1400562.1:2000SEP08 913717T6 50 463 70 LG:1400562.1.-2000SEP08 870924R6 50 422 70 LG:1400562.1:2000SEP08 746315R6 50 415 70 LG:1400562.1.-2000SEP08 840754T6 83 463 70 LG:1400562.1:2000SEP08 871602R6 50 364 70 LG:1400562.1:2000SEP08 734229R6 50 460 70 LG:1400562.1:2000SEP08 861612TO 50 463 70 LG:1400562.1:2000SEP08 751054R6 50 463 70 LG:1400562.1.-2000SEP08 914213R6 50 364 70 LG:1400562.1:2000SEP08 736850T6 77 463 70 LG:1400562.1;2000SEP08 3637162H1 38 329 70 LG:1400562.1-.2000SEP08 919277R6 50 419 70 LG:1400562.1:2000SEP08 905555R6 50 419 70 LG:1400562.1.-2000SEP08 913717R6 50 355 70 LG:1400562.1:2000SEP08 4162946H1 18 262 70 LG:1400562.1:2000SEP08 849924R6 50 463 70 LG:1400562.T.2000SEP08 5490338H1 1 274 70 LG:1400562.1:2000SEP08 6937225H1 35 536 70 LG:1400562.1:2000SEP08 3637162F6 38 490 70 LG:1400562.1:2000SEP08 g1696681 40 420 70 LG:1400562.1.-2000SEP08 3637162T6 1 2 740 70 LG:1400562.1:2000SEP08 768130R6 50 463 70 LG:1400562.1:2000SEP08 833958R6 50 450 70 LG:1400562.1:2000SEP08 871602T6 50 463 70 LG:1400562.1:2000SEP08 746815R6 50 463 70 LG:1400562.1.-2000SEP08 746908R6 50 364 70 LG:1400562.1:2000SEP08 840754R6 50 449 70 LG:1400562.1.-2000SEP08 1708790H1 63 264 70 LG:1400562.1:2000SEP08 920553T6 50 463 70 LG:1400562.1:2000SEP08 4162946F6 18 381 70 LG:1400562.1-.2000SEP08 900986T6 50 463 70 LG:1400562.1.-2000SEP08 900986R6 50 364 70 LG:1400562.1:2000SEP08 919277T6 50 463 70 LG:1400562.1.-2000SEP08 909272T6 50 463 70 LG:1400562.1:2000SEP08 808626R6 50 463 70 LG:1400562.1:2000SEP08 754651 R6 50 364 70 LG:1400562.1:2000SEP08 905555T6 50 463 70 LG:1400562.1:2000SEP08 920553R6 50 447 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 70 LG:1400562.1:2000SEP08 849924T6 50 463 70 LG:1400562.1-.2000SEP08 864660R6 50 463 70 LG:1400562.1:2000SEP08 6206095H1 28 381 70 LG:1400562.1.-2000SEP08 900841To 50 463 70 LG:1400562.1-.2000SEP08 905554T6 50 463 70 LG:1400562.1:2000SEP08 745305R6 50 463 70 LG:1400562.1;2000SEP08 743895R6 50 463 70 LG:1400562.1-.2000SEP08 909272R6 50 449 70 LG:1400562.1.-2000SEP08 864660TO 50 463 70 LG:1400562.1:2000SEP08 734146H1 50 279 70 LG:1400562.1:2000SEP08 920342R6 50 364 70 LG:1400562.1.-2000SEP08 905554R6 50 364 70 LG:1400562.1:2000SEP08 914213T6 50 463 70 LG:1400562.1:2000SEP08 736850R6 50 422 70 LG:1400562.1.-2000SEP08 6200892T8 50 315 70 LG:1400562.1-.2000SEP08 870924T6 50 463 70 LG:1400562.1:2000SEP08 861612R6 50 463 71 LG:1076130.1:2000SEP08 6311083F8 168 544 71 LG:1076130.1-.2000SEP08 gό076180 476 810 71 LG:1076130.1.-2000SEP08 3042046T6 258 768 71 LG:1076130.1:2000SEP08 5543192T6 228 767 71 LG:1076130.1:2000SEP08 6311083H1 168 725 71 LG:1076130.1:2000SEP08 gl261341 547 722 71 LG:1076130.1:2000SEP08 6936847R8 1 630 72 LG:1064459.1 :2000SEP08 3128415F6 1 253 72 LG:1064459.1-.2000SEP08 7155878H1 50 571 72 LG:1064459.1:2000SEP08 7278205H1 105 645 72 LG:1064459.1:2000SEP08 6176589F8 269 918 72 LG:1064459.l:2000SEP08 3201201To 276 798 72 LG:1064459.1.-2000SEP08 6880021Jl 389 1021 72 LG:1064459.1-.2000SEP08 2878289F6 593 1098 72 LG:1064459.1:2000SEP08 g2180031 602 957 72 LG:1064459.1:2000SEP08 2870408H1 593 869 72 LG-.l064459.1:2000SEP08 2870408F7 593 1030 72 LG:1064459.1:2000SEP08 2638526H1 460 579 72 LG:1064459.1:2000SEP08 g1059584 630 901 72 LG:1064459.1:2000SEP08 2870424H1 596 806 72 LG:1064459.1:2000SEP08 2878289H1 593 873 73 LG:1079415.14:2000SEP08 7358860H1 1 370 73 LG:1079415.14.-2000SEP08 g2166139 68 561 74 LG:1329431.3:2000SEP08 g2834309 46 465 74 LG:1329431.3:2000SEP08 3281151H1 328 597 74 LG:1329431.3:2000SEP08 3281151 Fό 327 763 74 LG:1329431 ,3:2000SEP08 7678170H1 1 603 74 LG:1329431.3:2000SEP08 6480874H1 297 763 75 LG:1088431 ,2:2000SEP08 893591HI 25 304 75 LG:1088431.2:2000SEP08 6593962H1 1 161 75 LG:1088431.2:2000SEP08 6593962F8 1 161 75 LG;1088431.2:2000SEP08 894136H1 24 . 191 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
75 LG:1088431 ,2:2000SEP08 3685359H1 45 347
75 LG:1088431.2-.2000SEP08 3685359F6 45 504
75 LG:1088431 ,2:2000SEP08 3685359T6 267 545
76 LG:1329462.2:2000SEP08 1567760H1 1011 1230
76 LG:1329462.2:2000SEP08 7430912H1 986 1295
76 LG:1329462.2:2000SEP08 6732779H1 807 985
76 LG:1329462.2:2000SEP08 6844540T8 1099 1277
76 LG:1329462.2:2000SEP08 6537070H1 18 539
76 LG:1329462.2:2000SEP08 g298792ό 1043 1235
76 LG:1329462.2:2000SEP08 7007661 HI • 18 482
76 LG:1329462.2:2000SEP08 7218223H1 1 510
76 LG:1329462.2:2000SEP08 7437976H1 54 324
76 LG:1329462.2:2000SEP08 7634846Jl 417 973
76 LG:1329462.2:2000SEP08 5200862F6 573 1082
76 LG: 1329462.2:2000SEP08 4630864F8 624 870
76 LG:1329462.2:2000SEP08 6771021Jl 733 1285
76 LG:1329462.2:2000SEP08 5387945T6 1004 1636
76 LG:1329462.2:2000SEP08 1567764F6 1011 1349
76 LG:1329462.2.-2000SEP08 6939863R8 1166 1547
76 LG:1329462.2:2000SEP08 g4438778 1218 1603
76 LG:1329462.2:2000SEP08 4043836F6 1353 1596
76 LG:1329462.2:2000SEP08 4043836H1 1354 1593
76 LG:1329462.2:2000SEP08 4043836F9 1359 1596
77 LI-.393468.1 2000SEP08 7619106J1 444 979
77 LI-.393468.1 2000SEP08 g6040695 217 669
77 Ll:393468.1 2000SEP08 g6836681 222 664
77 Ll:393468.1 2000SEP08 g3918185 266 664
77 Ll:393468.1 2000SEP08 g4525302 265 664
77 Ll:393468.1 2000SEP08 g6991355 260 663
77 Ll:393468.1 2000SEP08 8023621Jl 47 655
77 Ll:393468.1 2000SEP08 g4124201 284 646
77 Ll:393468.1 2000SEP08 g4371399 169 646
77 Ll:393468.1 2000SEP08 g3144307 196 644
77' Ll:393468.1 2000SEP08 1657896F6 1 245
77 Ll:393468.1 2000SEP08 1657896H1 1 234
78 Ll:722577.1 2000SEP08 6270214F8 1 591
78 Ll:722577.1 2000SEP08 6270214H2 1 514
78 Ll:722577.1 2000SEP08 6270214T8 33 484
79 LI:322783.16:2000SEP08 6461340H1 1 538
79 LI:322783.1ό:2000SEP08 6461140H2 2 393
79 U:322783.16:2000SEP08 g1259601 8 227
79 LI:322783.16:2000SEP08 g1988731 38 314
79 LI:322783.16:2000SEP08 6526629H1 170 703
79 LI:322783.16:2000SEP08 3483529H1 266 590
79 U:322783.16:2000SEP08 g705596 332 672
79 LI:322783.16:2000SEP08 5703243H1 382 663
79 LI:322783.16:2000SEP08 7604365H1 311 688
80 LI:901355.2:2000SEP08 4664370F6 1 571
80 LI:901355.2 2000SEP08 8004786H1 34 554 co m
ID
CD 0θ 0θ 0o 0θ 00 CT3 C» CT3 CT3 CD 0θ 0o 00 0θ 00 0θ 0o CP 00 O0 CT3 0θ 0θ 0o CD 0o 0θ 00 0o CD 00 0θ 0o ro Oo CT3 0o θo Oo CD s| sj sJ O CJl Cn θι Uι Cn θl Oι Ui Cn θl Oι Oι Ui Oι Oι 4N 45* 4N 45* 45* Cθ G Cθ NT NT — ■ —■ —■ -- -- — —• — . —. _. — - — . — J _. si sl si SI ) no no no , o ΓD CD o CD o o o CD o o CD o o
O o ( ) CO CO O CO CO CO O CO CO CO CO CO CO GO
Sl SI SJ sl 45* 45* 45* o Sl sl sl SI Sl Sl Sl sl Sl sl I sl Sl Sl Sl I Sl SI Sl 45* 45* (» CD CD CD CD ro D CD CD CD CD O D CD
CO co CO C —O ' co h CN CN rn o o T o D O c ) rn cn n rn rn ΓD c , o h Ch CD no CD no CD ro c» CD CD c» 00 O 00 CD
NJ NT NT CO rn cn cn cn cn rn n rn rn cn cn cn cn rn n rn cn cn rn cn cn rn cn n cn rn cn cn rn cn rn cn rn cn
CO O CO D D CD CD CD CD CD CD CD D CD CD CD CD CD • -o o o • rn cn rn — " o o o o o o o o o o o o o 1_1 _. NT Oi CJl cn cn oi Oi Ol Ol cn cn cn cn Ol Ol cn o o o SI NT NT NT NT NT NT O NT NT NT NT NT NT NT φ
NT NT NT NT NT io NJ NT ro NT NJ NO NJ io NO N NJ NT NT NT NT NT — ' NT NT NT NT NT NT NT NT NT NT NT NT NT NT
NT NT NT NT NT NT NT NJ NT NT NT NT NT NT N) NO NT NT NT NT NT NT NT NT NT NT ΓT o NT NT < ) ( ) r » ( ) r ) ( ) Ό
C T < ) CJ t ) C T ( ) C) CJ CD r T CT r T r T r T T r T c > c > ( > o ( )
( ) ( ) < ) < ) ( ) < CΛ ) < ) < ) ( )
( ) ( ) ( ) < ) ( ) ( > ( ) ( ) < ) ( ) Q c ) ( > < ) < ) c > ( )
( ) ( ) CO CΛ C < ) < ) ( ) < ) c ) r T CΛ CΛ CΛ
CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ (Λ CΛ CO CO CΛ CΛ CΛ CΛ CΛ m CΛ ( CΛ ) CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ
CΛ CΛ CΛ CΛ CΛ CΛ 111 m in rπ ro CΛ III m rπ m πi in rπ m m m III m m m m in m 1/ IJ CJ (/ m m m 111 m m m m πι m rn in m m m m in m m m TJ IJ TJ 111 u TJ u "II π TJ Tl u Tl u l Tl -u 11
0 TJ u π u u TJ TJ u CJ TJ Tl TJ TJ TJ u U u u TJ TJ TJ π TJ TJ TJ ΓT CD u u O n r T ΓT CD o n CD o CD O
C T C J CJO CD CD CD C T < ) c ) ΓT J C T c ) C T o CD no CD CD CO CD (D CD CD CD CD CO CD no CD CD
CO CO CD CD CD CJO 00 CD CD 00 00 oo CD oo oo 00 00 00 CO oo 00 CD CD < 00 > n
00 00 CD CD
O
CJI O NO — ' * 45* CO NT NT — — ' — ' rτι rn Is Is rs ^ 01 0 45* 0 45* 45* 00 01 01 00 00 -5
SJ sl 00 CO — ' 45 . co co -- CJ1 ^ ; ι CD — ' NO NO —n - -H ^ 8J _ —. — ' Ol CO O Oo NT sI — ' SI NO O NO Q O 00 NO -' si oo 4 o C O O O 45* — ' — ' sJ Oo NT O — ' sl cn ϋi Ol .
45* sl CD Go NT 45* —• NT — ' N0 O S1 S1 4S. N0 CO 01 C0 C0 — ' 45* NT NT NT NT NT 4S. SI SI SJ N0 NT — ' Oι Go CD O Oo θo OO OO OO O CD OO CD O O O -5 Ol sl —■ CO O NT —π 45* Oo O sj Oo rO CD O Oi NT Go — ' 00 45* — • CD O si sl s] 0 45* 01 00 Cn NT — ' — — ' Ol NT Oθ 45* O O Ol NT NT 4s. 4N θl O sI O O O O sl o CJl 4-* sl CD OO O si O O O O sl 45* NT — ' O CD N — ' O 0ι 0ι Uι O Ul sl G0 0ι sl θ 00 00 C0 sj CD O Cn 00 O s| sl O -(5
CΛ m
0
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O CO OO CD OO OO OO OO OO OO OO OO CD OO =_: CJl Oι Ol Ol Oι Ol 4s. 4S. 4s. 45* 4N 45* 45* 45* CO G0 G0 C G0 Co CO N0 N0 N0 N0 N0 — ' —■ —■ — ' O O O O O O O O O O O O O O O ∞ OO OO Oo CJ
O
— - — - — - sl si si s| si SJ si
O O o O o O o o o o O O o o o o O O O o o o O o o O O o O — ' — ' —* sl sl Sl sl sl si o O O o O O O o O o o O o o O 00 00 03 00 co o O o o o o O o O o o O o o O O o O Sl ^J ^J \J co 00 00 00 00 00 o O o o o O o l o o O o o O NT NO ro NT NO CO Co Co Oo CO Co CO Co CO CO co Co co
CD 00 oo 00 00 00 Sl sl si SI sl I s NT ro NO NT NT NT NO GO GO Co GO Go o O o o δo CD 00 CD CD CD NO NO NO NT NT NT NT ro NT o O o O o o Ol Ol CJl Ol Ol Ol Ol Ol 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4S. Oo 00 CD Oo 00 00 00 00 CD 00 00 00 CD co O — » — ' —1 o O o o o O o o O O 45* 45* 45* 45* 45* SI sl si — ' — ' — ' — ' — ' — ' CJl Ol CJl Ol CJl Ol Ol cn Ol o o o o φ
NT NT ro NT NT NT ro NT 3
NO NT NT NO NO ro O O o O O O O O NT to NO NO NT NT NO NT NT ro NO NT NT NT NT NT NT NT ro NT ro NT NO NT NT NT NT NT NT ro NT NT NT NT NO Ό
O O O O o o O O o O O O o O O o o O O O O O O o O O O O O O CD O o O o O O O O O O O O o O O O O O Q
O O O O o o O O O o O O o o O O O O O O o O O
CO CΛ o O O O O O O CD O o O o O O O O O O O o o O o O O O
O O O O o o CO CO CO CO CO CΛ O Λ o O O O O O O O
CΛ CO CΛ o O O O O O O CD O o O o O O
CΛ C CO CΛ CΛ CΛ m m m m m m m m C o O o o O o o O o O O O Φ
CΛ CO CO C CΛ co CO CO CO CΛ CΛ CΛ en CΛ CO CΛ CO CO CΛ CO CΛ CΛ CΛ CO CΛ CΛ CΛ C CΛ GO CO ___. m m m m m m TJ TJ TJ TJ TJ TJ TJ TJ m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m
TJ TJ TJ TJ TJ TJ o o o O o O O O TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ Ό
O O O O o O 00 00 00 00 co CD 00 00 o o O o O O o O o o O O O O O O O O O O O o O O O O o O O o O O O O O oo 00 00 00 00 00 00 00 03 00 00 00 CD 00 CD oo 00 CD 00 CD oo oo CD CD CD 0o oo CD oo CD CD oo CD 03 CD oo CD CD oo CD CD
NT O Sl Ol o 45* sl 45* NT 45* NT rs „ , NT o NT O CQ cn NT NT o o o J sl sl 45* * 45* 45* NO o o o 45* o o o SI SI CO o
NT
CD o CO o 5 cπ*
NT 0 O cn Ol Sl r
NT sl Ol si sl o o o o o co 45* o 0 45* Sl NT o o sl O o O o CQ CQ o — ' 45 or o C —
NT NT CD sl CO - 00 00 00 o o o si O 5* m
CO O o
45* O si 45* 00 sl 00
SJ 00 o
CD o 00 NT 45* — ' o o o O O o o o 45* NT 45* 00 CO 0 CO CO o CO 00 4 NT 3
00 Co CD O 00 00 00 ro O
G oO NT ro ro o O o o NT Sl O 00 o CD Ol 00 N 0
O NT ro — ' — ' 45* 45* — » 45* n GO o NO 4*
00 00 oo u ~ ϋi CD o O si O o sl 00 o o Ol o NT
45* O o o * NT 00 O N 0 —0 ' NT OO 00 o O
CO 45* si o Ol o o o 45* o o o Ol SI si 45 NT sl 00 O Ol Ol o 00 CD 00 cn Ol — ' — » O CD 00 00 — '' 4s. NT NT O CD 45* 4
NT 45* co -p o CO o Ol CO
03 SI 45* o o 5 45* 45* NT ro 00 45*
O NT NO CO NT Cn Ol sl X 5* 45* Co
X o o o 4s. 4 NT ro NT co 45* * NT 45* — ' 00 Oi CO O O o 45* CO o « O 3 c__ s] o Ol X Φ
— ' TO 00 TZ- X X TJ — 1 X X ^H — 1 Tl O X X X X X Tl -- X X 8 co 45* X X X X
O o O — 1 X _ X O X X X X 3
O 00 o O O o CD 00 o 00 00 SI NO 00 CD 00 — ' ' →-
fe f
00
oi — ' — ro oi ro Go sj o3 .fc. 45* NT Co O si N 45* 45* G0 45* IO 4s. Cπ si sl sl sl s! sl O s! O NT 0ι 0l -5 45* 45* C0 O 0ι 01 sI NT N0 — ' NT —' 03 sl si sl N0 S1 O O 4S. SI S1 S1 G0 S1 S1 G0 O — ' CD SJ O 4S. CT3 O — ' 45* 0 — ' O sl — ' O 45* CO 45* CO O 45* 0 0 01 — ' O O 00 4S. 00 00 O O — ' Oi sl Nl n
TABLE 5 a ID NO Template ID Component ID Start Stop
95 LI:1178899.1 2000SEP08 5200862H1 594 813
95 LI:1178899.1 2000SEP08 6537070H1 1 560
95 LI:1178899.1 2000SEP08 1567760H1 1030 1249
96 Ll:l 169241.1 2000SEP08 g3785390 229 632
96 Ll:1169241.1 2000SEP08 g316080 250 495
96 LI:1169241.1 2000SEP08 3079252H2 428 721
96 Ll:l 169241.1 2000SEP08 7585969H2 1 361
96 LI:1169241.1 2000SEP08 60113610D2 1 415
96 U:l 169241.1 2000SEP08 60120205D2 1 451
96 LI:1169241.1 2000SEP08 g4684943 542 632
96 Ll:l 169241.1 2000SEP08 60106944D1 1 299
96 LI:1169241.1 2000SEP08 60209330U1 322 610
96 Ll:l 169241.1 2000SEP08 g4328800 542 632
96 LI:1169241.1 2000SEP08 3077177F6 429 610
96 Ll:l 169241.1 2000SEP08 3077177H1 429 610
96 LI:1169241.1 2000SEP08 60209329U1 322 610
96 Ll:1169241.1 2000SEP08 60106937D1 1 323
97 LI:1180090.1 2000SEP08 5326859T9 248 828
97 LI,-1180090.1 2000SEP08 5324689T9 300 825
97 LI:1180090.1 2000SEP08 5326859F8 162 747
97 LI.-1180090.1 2000SEP08 5311657F8 1 650
97 LI:1180090.1 2000SEP08 5326859H1 162 398
98 Ll:2049322.1 2000SEP08 1698730H1 164 358
98 Ll:2049322.1 2000SEP08 4727660H1 1 266
98 Ll:2049322.1 2000SEP08 4822628H1 1 282
98 LI.-2049322.1 2000SEP08 1322674H1 1 255
98 Ll:2049322.1 2000SEP08 1798649F6 5 360
98 Ll:2049322.1 2000SEP08 g3736671 86 515
98 Ll:2049322.1 2000SEP08 1698965F6 164 528
98 Ll:2049322.1 2000SEP08 1698965T6 178 499
98 Ll:2049322.1 2000SEP08 1698965H1 164 395
98 Ll:2049322.1 2000SEP08 1798649H1 5 279
99 LI:809074.1:2000SEP08 455966H1 20 255
99 LI:809074.1:2000SEP08 5998240F8 1 398
99 LI:809074.1:2000SEP08 5998240T8 1 408
99 U:809074.1:2000SEP08 5998240H1 1 484
99 LI:809074.1:2000SEP08 455162R6 20 521
99 LI:809074.1:2000SEP08 3206542H1 25 202
99 LI:809074.1:2000SEP08 7652675J1 47 172
99 LI.-809074.1:2000SEP08 g2401959 129 486
99 LI:809074.1:2000SEP08 460544H1 20 245
99 LI:809074.1:2000SEP08 g510114ό 180 486
99 LI:809074.1:2000SEP08 458061 HI 20 259
99 U:809074.1:2000SEP08 455162H1 20 272
99 LI:809074.1:2000SEP08 460544R6 20 326
99 LI:809074.1:2000SEP08 4663964H1 37 289
99 LI:809074.1:2000SEP08 gl 807165 88 278
99 LI:809074.1:2000SEP08 3614680H1 155 272
100 0:805158.1:2 000SEP08 4028040F6 1 366 CΛ m o
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O o o o o o o o o o o o CO Gύ CO CO CO GO CO CO CO GO Co rO NT NT NO NT NO NO NT — — ' — — • — — ' — ' — ' — > — ' o o o o o o o o o
00 00 00 00 00 03 CD o o o o o o o si si sl SI si si SJ I SJ SI sl SI sl SI SI sl sl SJ sl SJ SI si sl si SI SJ SI sl cn oi oi oi oi oi cn sl sl sl si sl sl sl SJ S 45* 4s. 45* 45* 45* 45* 45* 45* NT NT NT NT NT NT NT ro NT NT NT NT NT NT NT NT NT NT NT NJ NT NO NT
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. — ' ' h h Ch Ch CN h Ch Ch Ch CN h (h Ch CN Ch Ch Ch Ch h h Ch CN Ch Oi Ol ϋi ϋi Oi Ol Ol . O CO O co O CO GO CO GO CO O CJ CT O O O 00 00 00 00 00 00 00 1
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. SJ SI sl SJ sl SJ sl sl si si si sl sl sl sl sl sl SI sl sl sl
NT ro NT ro NT NT NO NT NT ro NT NT NT NO NT NO NT NT NT
NT J NT NT NT NT NT -n
NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT O O O O O O O -i r CD C ) ( ) CT ( ) CD ΓT o n ( ) r ( ) r , r CD o o CD C T ( ) ΓT r CD CD CD CD CD CD CD Ω
C T ( ) ) ( ) < ) C ) CD o o r, CD r r r CD CD o CD CD ΓT r) CD CD CD CD CD CD CD -^-
( ) I ) ΓD CT c , ( ) r ( ) r r CD CD ΓD CD C T D o en en en en en en en φ
CΛ CO en CO CΛ CΛ CΛ Λ CΛ CΛ CΛ en CΛ en CΛ CO CΛ CΛ CΛ cn cn cn CΛ cn CΛ n CΛ ro CΛ CΛ CO CΛ cn CΛ CΛ CΛ CΛ O CΛ cn cn CΛ rn m m rn m m — m m m in m m m m m m m rn m HI rn rn πi m m rπ m in m rπ m m 111 m in m m m m m m m m rπ m m rπ in TJ TJ TJ TJ TJ TJ TJ Γ
TJ Tl TJ τr TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T7 TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ Tl TJ Tl TJ TJ O O O O O O O
CJ CD CJ O CT CT CD o o T 00 CD 00 00 00 00 00 D CO CD 00 CD CD CD 00 CD 00 CD D CO CD 03 CD n C T r c ) c ) CD CD r> CD ( ) r ) C T
C CD CD C CD D OO co CD CD 00 CD 00 00 00 00 00 00 00 CD 00 00 CD oo 00 CD 00
O 45* G0 G0 G0 G0 NT NT _, _, — ' 4S. NT — ' ... N0 45* 45* O 45* 45* CO Co 45* 45* CO NO CO 00 O O Go — ' O Ol CO f_ ,. * O O O NT O HN — ' O O sl NO — - c feo - o C o 5* 45* i 45* 45* O NT _ O CD sj —. o θ G0 O ω SI — ' CD N ^ 4N O O g° o oo °_N n o — ' o CD O CO O O sl o O o Ol NO 4 O 45* s si sl NO NO O CO
NT 0 _ CD Ch C0 4N C0 4N 4 CO Co O Cn C CJl 4N NO O O C^ OO Co CO θo CO Cn O NO NT 0 01 ∞
00 4 -5* SJ 45* G0 ^ G0 G0 NT ω 03 C sJ 00 CJT 00 sl 45* G0 4N* O sl G0 CD 00 O O — ' O CO — ' 45* sI O O 4s. 00 θ sl 4S. 0ι 45* NT 0l NT 0l 0
01 NT O — ' O ϋi O Oi s co — ' CO — — — • O • — ■ ' •O—. c'o.. o3, o .— . r ..o - .r.o -, r ..o , 4 ^s. c rn_ o —. o —. c ,o.. o —o ,o_i .o- o—. o —30 „ _-rj
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
103 U:1177434.2:2000SEP08 097477H1 844 1032
103 LI:1177434.2:2000SEP08 5310369H1 1 243
104 LI:1184255.1:2000SEP08 5261341 HI 1 219
104 LI:1184255.1 :2000SEP08 5261341 F6 1 479
104 LI:1184255.1 :2000SEP08 5953264H1 50 175
104 L1:1184255.1:2000SEP08 5953264F8 51 779
104 LI:1184255.1:2000SEP08 1852536H1 283 570
104 U:1184255.1:2000SEP08 g4070788 336 774
104 LI:1184255.1 :2000SEP08 g2806837 418 773
104 U:1184255.1.-2000SEP08 g3148317 459 771
105 LI:1164555.1 :2000SEP08 2813750F6 1 471
105 U:1164555.1:2000SEP08 2813750T6 1 487
106 LI:238666.4:2000SEP08 3929925H1 630 905
106 LI:238όόό.4:2000SEP08 111294R6 899 1320
106 LI:238666.4:2000SEP08 g2046867 368 720
106 U;238666.4:2000SEP08 653171 HI 392 542
106 LI:238666.4:2000SEP08 4140846F9 453 1035
106 LI:238666.4:2000SEP08 4140846H1 454 735
106 LI:238όόό.4:2000SEP08 4030602F8 556 1087
106 LI:238666.4:2000SEP08 111294R1 899 1320
106 LI:238666.4:2000SEP08 4030602H1 573 653
106 LI;238666.4:2000SEP08 2495993H1 952 1192
106 LI:238666.4:2000SEP08 6549845H1 1 407
106 LI:238666.4:2000SEP08 6549845F8 1 688
106 LI:238666.4:2000SEP08 7273596H1 269 887
107 LI.-n66752.1:2000SEP08 5500807H1 1 180
107 LI:1166752.1 :2000SEP08 5497912H1 1 260
107 LI:1166752.1:2000SEP08 100471OH1 146 384
107 LI:1166752.1 :2000SEP08 2666231 HI 270 523
107 LI:1166752.1:2000SEP08 5500805H1 1 271
107 LI:1166752.1 :2000SEP08 2866837H1 344 654
107 LI:1166752.1:2000SEP08 5500805F8 1 581
107 LI:1166752.1 :2000SEP08 2666231F6 165 523
107 LI:1166752.1 :2000SEP08 5497912F9 4 439 .
108 LI:2049654.1:2000SEP08 7373664H1 1 344
108 LI:2049654.1:2000SEP08 1375814F1 1 379
108 LI:2049654.1:2000SEP08 1375814F6 1 361
108 LI:2049654.1:2000SEP08 7705726J1 38 586
108 LI:2049654.1:2000SEP08 1379688T6 41 577
108 LI:2049654.1:2000SEP08 g697373 370 584
108 LI:2049654.1:2000SEP08 1379688F6 1 322
108 LI:2049654.1:2000SEP08 1375814H1 1 240
108 U:2049654.1:2000SEP08 1379688H1 1 222
109 LI:242665.2:2000SEP08 4155245F6 1 472
109 U:242665.2:2000SEP08 4155245H1 ' 1 249
109 LI:242665.2:2000SEP08 2815027H1 207 439
110 LI-.208637.1:2000SEP08 2045901 HI 3997 4170
110 U:208637.1:2000SEP08 511147H1 4004 4308 no U:208637.1:2000SEP08 511147R6 4004 4294 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 110 U:208637.1:2000SEP08 g6993309 1 450 110 LI:208637.1:2000SEP08 g5914430 9 460 no LI -.208637.1:2000SEP08 g3679677 20 483 no LI:208637.1:2000SEP08 g4190232 20 485 no U:208637.1:2000SEP08 gό710036 36 491 no LI:208637.1:2000SEP08 g4852090 39 495 no LI:208637.1:2000SEP08 g4305494 40 505 no LI:208637.1:2000SEP08 g5178218 79 500 no LI:208637.1:2000SEP08 7355332H1 277 856 no LI:208637.1:2000SEP08 7723481J2 328 846 no LI:208637.1:2000SEP08 1551301R6 3397 3627 no LI:208637.1:2000SEP08 1551301 HI 3397 3542 no LI:208637.1:2000SEP08 g6570747 3561 3805 no LI:208637.1:2000SEP08 3415286H1 4263 4530 no LI:208637.1:2000SEP08 70455975V1 4291 4710 no U:208637.1:2000SEP08 g787701 4288 4618 no LI:208637.1:2000SEP08 446023H1 4004 4246 no LI:208637.1:2000SEP08 442886H1 4004 4246 no LI:208637.1:2000SEP08 2155512H1 4520 4770 no LI:208637.1:2000SEP08 g2583420 4529 4941 no LI:208637.1:2000SEP08 g2155854 4530 5030 . no U:208637.1:2000SEP08 3417104H1 4531 4765 no LI:208637.1:2000SEP08 g787685 4288 4530 no LI -.208637.1:2000SEP08 2445053F6 4291 4804 no LI:208637.1:2000SEP08 2445053H1 4291 4526 no LI. -208637.1.-2000SEP08 1662789H1 4296 4528 no LI:208637.1:2000SEP08 5217068H1 4298 4560 no LI:208637.1:2000SEP08 1663583F6 4296 4628 no U:208637.1:2000SEP08 4531059H1 3128 3402 no LI:208ό37.1:2000SEP08 7081386H1 3147 3663 no U:208637.1:2000SEP08 5394846H1 3949 4250 no LI:208637.1:2000SEP08 4536525H1 3962 4215 no U:208637.1:2000SEP08 7598517H1 3989 4561 no LI:208637.1:2000SEP08 5394325H1 3992 • 4281 no LI:208637.1:2000SEP08 042108H1 3999 4207 no LI:208637.1:2000SEP08 6991133H1 753 1021 no LI:208637.1:2000SEP08 8039312J1 930 1424 no LI:208637.1:2000SEP08 7617296J1 946 1442 no LI:208637.1:2000SEP08 8039151Jl 1454 2144 no LI:208637.1:2000SEP08 8040751 HI 1209 1852 no LI:208637.1:2000SEP08 6623125H1 1930 2419 no LI:208637.1:2000SEP08 6819912F8 1622 2159 no LI:208637.1:2000SEP08 6896576H1 1625 2117 no LI:208637.1:2000SEP08 8039312H1 1707 2308 no LI:208637.1:2000SEP08 7984277H1 1888 2495 no LI:208637.1:2000SEP08 6196362H1 3870 4234 no LI:208637.1:2000SEP08 4880402H1 3874 4140 no LI -.208637.1:2000SEP08 3468193H1 3894 4139 no LI:208637.1:2000SEP08 365246R6 3906 4235 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
110 Ll:208637.1 2000SEP08 g2079127 3910 4299 no U:208637.1 2000SEP08 6480435H1 2338 2879 no ' Ll:208637.1 2000SEP08 6425484H1 2026 2536 no U:208637.1 2000SEP08 7617296H1 2303 2945 no Ll:208637.1 2000SEP08 2555627F6 2024 2590 no Ll:208637.1 2000SEP08 2555627H1 2025 2270 no Ll:208637.1 2000SEP08 6425484F8 2026 2547 no LI-,208637.1 2000SEP08 5924864H1 4755 5056 no Ll:208637.1 2000SEP08 860218H1 4757 5000 no Ll:208637.1 2000SEP08 323803H1 4764 5065 no LI.-208637.1 2000SEP08 4126367H1 4764 5039 no Ll:208637.1 2000SEP08 70453526V1 4771 5224 no Ll:208637.1 2000SEP08 70453430V1 4772 5203 no Ll:208637.1 2000SEP08 2721776H1 4637 4887 no Ll:208637.1 2000SEP08 g2209684 4651 4944 no Ll:208637.1 2000SEP08 1500756T6 4662 5165 no Ll:208637.1 2000SEP08 g2261582 4671 4944 no LI.-208637.1 2000SEP08 3408148H1 4722 4849 no Ll:208637.1 2000SEP08 2873391 HI 4726 5009 no Ll:208637.1 2000SEP08 4763672H1 4728 4949 no Ll:208637.1 2000SEP08 2555275H1 4735 4997 no Ll:208637.1 2000SEP08 g2912948 4746 5181 no Ll:208637.1 2000SEP08 6783219H1 4752 5202 no Ll:208637.1 2000SEP08 g2583727 4774 5202 no Ll:208637.1 2000SEP08 g4283835 4785 5204 no Ll:208637.1 2000SEP08 g4437576 4787 5203 no LI.-208637.1 2000SEP08 g2583781 4788 5213 no Ll:208637.1 2000SEP08 g4874731 4794 5209 no Ll:208637.1 2000SEP08 g3094261 4797 5204 no Ll:208637.1 2000SEP08 7338219H1 4801 5207 no Ll:208637.1 2000SEP08 g2576977 4800 5213 no Ll:208637.1 2000SEP08 7337919H1 4800 5203 no Ll:208637.1 2000SEP08 511147T6 4805 5164 no Ll:208637.1 2000SEP08 g4112440 4822 5206 no Ll:208637.1 2000SEP08 365246T6 4823 5166 no LI.-208637.1 2000SEP08 1920661 4826 5203 no Ll:208637.1 2000SEP08 g2188517 4835 5203 no Ll:208637.1 2000SEP08 gl941487 4875 5211 no Ll:208637.1 2000SEP08 4506729H1 4875 5150 no Ll:208637.1 2000SEP08 g787649 4882 5180 no Ll:208637.1 2000SEP08 g3898699 4907 5203 no LI.-208637.1 2000SEP08 1663583T6 4919 5163 no Ll:208637.1 2000SEP08 4144853H1 4937 5203 no Ll:208637.1 2000SEP08 2607654T6 4966 5158 no Ll:208637.1 2000SEP08 2607654F6 4971 5203 no Ll:208637.1 2000SEP08 2607654H1 4971 5203 no LI.-208637.1 2000SEP08 3470176H1 4976 5203 no Ll:208637.1 2000SEP08 g787635 4983 5203 no LI.-208637.1 2000SEP08 g7278523 5067 5203 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 1 10 LI:208637.1 :2000SEP08 g664248 5113 5204 no LI:208ό37.1 :2000SEP08 1680896H1 5116 5205 no LI:208637.1 :2000SEP08 2419545H1 5153 5203 no LI:208637.1 :2000SEP08 4277393H1 3811 3936 no U;208637.1 :2000SEP08 7617468H1 498 1093 no U:208637.1 :2000SEP08 5664828H1 3835 4123 no LI:208637.1 :2000SEP08 3101689H1 728 996 no LI:208637.1 :2000SEP08 3101689F6 740 1055 no LI;208637.1 :2000SEP08 6991888H1 532 1017 no LI:208637.1 :2000SEP08 g2007169 2014 2311 no LI:208637.1 :2000SEP08 2851540H1 4017 4297 no LI:208637.1 :2000SEP08 2723689H1 3827 4067 no LI:208637.1 ;2000SEP08 3270687H1 4024 4267 no LI:208637.1 :2000SEP08 2287401 HI 4059 4328 no LI:208637,1 :2000SEP08 2968020H1 4104 4400 no LI:208637.1 :2000SEP08 2536769H1 4129 4374 no LI:208637.1 :2000SEP08 6517121H1 4161 4495 no LI.-208637.1.-2000SEP08 2440915H1 4154 4402 no LI:208637.1 :2000SEP08 6480435F9 2353 2948 no LI:208637.1 :2000SEP08 2730768H1 2435 2677 no LI:208637.1 :2000SEP08 4071601 HI 2650 2963 no LI:208ό37.1 :2000SEP08 4071601F8 2649 3200 no LI;208637.1 :2000SEP08 7251820F8 2682 3228 no LI:208637.1 :2000SEP08 g1941486 4250 4448 no LI:208637.1 :2000SEP08 3293726H1 4547 4819 no LI:208637.1 :2000SEP08 6441190H1 4544 4790 no LI .-208637.1 :2000SEP08 5025609H1 4548 4800 no LI:208637.1 :2000SEP08 2876024H1 4580 4874 no LI:208637.1 :2000SEP08 3799471HI 4592 4878 no LI:208637.1 :2000SEP08 2445053T6 4634 5165 no LI. -208637.1.-2000SEP08 g4002423 4633 5088 no LI:208637.1 :2000SEP08 g4071919 4637 5085 no U:208637.1 ;2000SEP08 g2013196 3733 4051 no LI:208637.1 :2000SEP08 g1920720 3752 4136 no LI:208637.1 :2000SEP08 5399720H1 3259 3514 no LI -.208637.1 :2000SEP08 3295282H1 3255 3504 no LI:208637.1 :2000SEP08 2777229H1 4401 4658 no LI:208637.1 :2000SEP08 3720127H1 4469 4781 no LI:208637.1 :2000SEP08 g2270753 4489 4939 no LI .-208637.1.-2000SEP08 1366744R1 4408 4815 no LI:208637.1 :2000SEP08 2607995H1 4407 4669 no LI .-208637.1 :2000SEP08 1366744H1 4408 4707 no LI:208637.1 :2000SEP08 1366966H1 4408 4667 no LI:208637.1 :2000SEP08 6896713H1 1588 1731 no LI:208637.1 :2000SEP08 6896713F8 1616 2181 no LI -.208637.1 -.2000SEP08 6896576F8 1616 2097 no LI:208637.1 :2000SEP08 7251820H1 2689 3224 no U:208637.1 :2000SEP08 5725740H1 3821 4357 no LI:208637.1 :2000SEP08 1267290F1 4348 4626 co m
D ro NT ro NT ro O o o O O O o O O O o o o O O o o o o O O O O D p
SI SJ sl sl si sl SJ si O ) o C ) cn sl sl sl sl sl sl sl sl sl sl >
C ) O O O O O O O O O O o o o o o O Ch cn 4s. 45* NT NT 0 45* NT o CD r T C
NT NO ϋl ϋl NO Ol NO NO NT NT NT l o co rs si sl , 45*
NT cn NT NO cn NT cn Sl Sl SI S si GO GO h N) NT rn cn cn rn rn SI cn 0
NT CD NT NT Ch cn O cn o n cn Ch o h (h CO NO o 3
<) O O n O Ol Oi O Oi O O O O O o O sl IO I CO NT si sl NT o GO
TJ
C) O J CD CD O CD SI — ' OO O O O — ' O o Ch o
— ' o Ch cn sl 45* CO h ( ) CD ΓT 45* — « CO oo
TO o — ' O O O O O O O O 45* CO 00 cn 00 cn \J o Sl CJl . NT Oi sl SI 45* ro GO
NT 00 00 n o 01 sl 45* Ch t SJ ' NO co — ' cn O Sl NT cn — ' O O NN OO CΠ O O CD 9 CJl rn SJ 4s 2 4N 4S. — ' IN CO 45* CO NT Ch ΓD 45* cn ca rn CD (D CD cn CD Ch CD 3
Sl cn GO 45* CO O NT 45* C0 00 Cθ O CD θ O 4S. Cn θ CO 45* 4S. CO ro N C3 ro Ch 45* sl CO c> 45* o sl 45* CD Ch
X X X O sl Go Φ
< < < < < < < < < X < < < < < < < < < < Tl X X X
00 00 — X X X X X X TI X X X X X X X X X X 3
CO — ' — ' 4N 45* 4N 4N 45* 4N NO NO NO CJO GO -^ G0 45* 45* 4N 45* 45* 45* 45* J5* 45* 45* CΛ 00 CD 00 Sl — ' 45* 45* — ' O NT 45* — ' _, rO N0 N0 N0 — ' — ' O OO N^l Oo sl O sl Co Co GO Go CO GO Go CO GO NT ^- CJl 45* 45* O 45* NO si CO 45* 45* C O O O Oo CD Go — ' O O O G0 C0 θ 4s. 45* 45* C0 N0 S
45* — ' 00 4S. IO — ' O 00 O G0 CJ1 SI 01 O O O O 00 03 00 O — ' ^5-
45* 45* 45* 45* 45* 45* 00 45* — ' N J5* NT NT CO NT 45* 4S. NT 45* 4S. O 45* 45* Cn 0l fe fe; 4S. 4S. 4S. 4S. GO G NT 4S. GO NT 45* 45* 4-* 45* 45* 45* 45* 45* 45* 45* 45* CΛ Ol sl sl 45* O NT O 00 01 SJ 01 O 03 NT O 4S. O 4 s5*. 4S. ^ Oι ^ O-ι 4 IsS. r Onl 4 is5* c Onl CJl SO 45* rθ 45* ϋι Co Co oo 45* Oo O — ' 45* Oi Oi O O O O Cn ;-f — ' O O sl SI — ' O Oo OO Oo sl sl O CD GO OO sl Ol sl sj CO 00 4s. 4S. X GO C O — ' sl O O O NO O CO — ' 00 00 45* si — i to ro 45* o o ro o 45* o O O OO Ol sl ϋl — ' 00 45* — ' CO 4S. O O 4S. O CJ1 SI 00 G0 O TJ
CO m
D
Oι Oι Ol Ol Ol Ol Oι Cjl Cn θl Cjι Ol Uι Ol Oι Cn θι Oι U1 0l Ui 45* 4S. 4N 45* 45* 45* 45* 45* 45* 4N 4N 4N 4s. 4T* Cθ Go Co ω Ό z
O
sl sl sl sl sl sl sl sl sI sl si sl sI sl si - J sl sI sj sl sl O O O O O O O O O O O O O O sJ sJ sJ sl sJ si sI sl sl sl -sl sl sl si CJl CJD CJl Ol Ol Ol ϋl Ol ϋl Ol Cji Cjl ϋi ϋi Oi Ol ϋi Oi ϋl Ol ϋl Oi Ol CJl Oi Oi Cji Oi ϋl Oi Ol ϋi Oi CJl Oi sl sj sj sj sl s^ tO NO NO NO NT NO NT NT NO NO NO NT NO NO NO NT NT NT NO NT NT O O O O O O O O O O O O O O GO GO GO CO CO Co CO GO Go Go GO GO GO — ■ . cn oι oι θι ϋι cjι oι θi cn cjι cjι cn ϋι cjι oi cn cn cn θι ϋι θι oι ϋi ϋι ι θι θι oι ϋι oι ϋi oι θι oi ϋι cjo Go co ω p o p p p p p p p p p p o p o o θ o θ pN θ θ o o lO NT NT NT NT NT NT N NT NT NT NT NT NT NT NT NT NT NT NT NT
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O →- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ CΛ CΛ CΛ CΛ W M M W M W W W W CΛ CΛ CΛ W CΛ CΛ W C I W CyJ CΛ W CΛ W M CΛ CΛ M W ω ω rπ m ιn πι rπ πι m m rπ rπ m rπ πι rπ m rπ rπ rπ m πι m πι m rπ rπ πι rπ πι πι πι πτ m rπ πτ rπ MD rπ rπ m rπ m rπ πι πι m rπ rπ πι m rτ
TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TT TJ TJ TJ TJ TJ TT TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T) TJ TJ TJ TJ -Ό
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
03 0θ θ3 CT3 00 CD OO CT3 00 00 00 CT3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CD OO CTO rø
Cn cjι r~^ r^ 4s. ϋι i θ O θ ^ 00 ^- sl Ol o — ' 45* 45* oo θ f-) θo oo co Nτ oι ro ro cn co . n co o go g o NO — ' W t» 4 j j & S _ N0 θ S r CT0 45* 45* Cn 45* NT 45* CΛ CD CD r. — ' 0 &° r» _- _. ^ r Q r-ι ro GO sj o
Go O CO CD O O Ch ^ o O sl 45* O O 00 O Ol O u I 45. - ' NT CO NO Ol O 00 o W O SI ° Cύ NT lό - si js °ι O TJ
CΛ m (0
O O O O O O O Oi ϋi Oi ϋi Oi ϋi Oi ϋi Oi ϋi Oi O Oi cji cji CJi Cji Oi Cn oi Oi Cn Cn cji cn Oi ϋi ϋi ϋi ϋi ϋi ϋi Oi Oi ϋi ϋi Oi Oi O^ o
CTo OO OO OO OO OO OO sl sl sl sl sl sj sl sl sj sl sl sl sl sl sj sl si sl sJ sl sl sl sj sj sj sl sJ sl sl sl -sI sj sj sl si sl sI sj sl sl sl sl
Co Co Go Go Co co Go CJl CJl Oi ϋi Oi CJl Oi ϋl Ol ϋi ϋi O ϋi C-n Ol Oi ϋi Ol Oi ϋi Oi Oi ϋi Oi Ol ϋi Ol ϋi Ol ϋi ϋl ϋl Ol ϋl ϋl Ol O^
— ' — — ' — — ' lO NT NT NT NO NO NO NO NO NO NO NO NT NO NT NT ro NT NT NT NT NT NT NO NO NO NT NO NT NO NO NO NO NO NO NT NT NT NO NT NT N^
O O O O O O O Ol Oi Oi Cn Cn Cn cjl ϋi ϋl Oi ϋl Oi Oi ϋi ϋi Oi Oi CJl Cn Oi Oi Oi Oi CJl Cn Ol ϋi Oi Oi Cjl Oi ϋl Ol Ol ϋi Ol ϋl Cn Cn C^
M M N M M M M p p p p p p p p p p p pp p p p p p p p * ro ro ro NT NT NT NT N N Nτ r Nτ ro NT Nτ rO Nτ
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O -f O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ cjπ- CΛ cn cn Cjrt cn c Ci cn cn ci cn cn cn cn cn cn cn cn cn cn Cj cn cn cn cn cΛ CΛ cn e^ m mm m m m m m m m mm m m mmm m m m m m m m m m mm m m m m m m m m mm m π
TJ T 'O T) -O O T T -O TJ T3 T T; T3 J ^ ^ T Tj T -O ^ T Tj -D Tj -0 -D -aTj -0 -a Tj T T T -D -O -D Tj Tj Tj J Tj TI Ti u
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
00 CT3 0O 00 CT0 CT0 CT3 C» CD C» O0 C33 C» C» CD CT3 00 O0 CD 0O CD CD 0O CT3 CD 00 CT0 CD rø
Sl sl SJ SI Sl Sl I sl Sl SJ Sl SI I sl sl SJ si si sl si sl sl Sl sl si :>
— - — ' — ' " — • CT SI sl Sl Sl j si SI si Sl SI Sl si sl sl SI Sl Sl S| sl SI SI m o o CD CD CD CD CJ CO r T O O N O CD T O NO 4S. O O o CD n o CD ΓD o n S"
C3 CJ o CJ N3 CT 45* GO o o o o r T o o o n * 45* NT NT NT O O
— 45* - 45* 4s. 4s. 45* 45* 45*
— ' s| N) NT 45* 45* NT NT NT NT
Ol NT 45* 45* 45 45* NT NO 45* 45* 45* 4S. 45* .t5* 45* .fc. 45* 3
^> N rn CD o ' '' NT CO — ' CO GO — GO GO CO o a "
CO cn h Ch SI CD CO r
CO — ' — ' CO NT m 45* 45* 45* 45* NO NT O 45* 45* NT ro ro CO c. 45* Co CO
45* 45* 45* 45* O NT NT O NT O ca 4N r T NT ro 4N .f ro — ' TJ Oi
CO CD GO cn 45* SI CD 45* CO o cn 45* — » cn CD — ' CD O — ' — ' sl (O NfT cn 00 n 00 NT
CT GO Sl NJ CO — ' C3 Ch h — - s O NfT O CO o NfT no .15* NO NO ro CO sl si r § s| sl O Ol Ol si 45* 45* O — < rn CO o CD h O o IN Ol NT Co CO T O ■IN IN
O ro 03 45* CO NT cn CO CD ro O 45* CD CO s| 45* sl o < CJl O σ cn cn sl Ol Ol ro o
CO 45* CO rn o cn co IN o 45* o Ol s N
— 1 Co O ro 00 NT CD O ro O
s < < U u u U U U U u U < < < U Ό < < X < < < CJ Ό Ό O < < < < < D CJ σ D Ό Ό CJ D σ rj.
ooooooooooooooooooooooooooooooooooooooooooooooooo Ό
.2 o
o N^ S S ^ 5 § a P; O N ^ ^ C0 _ O OO OO SI O O O ϋi Oi O — • O fn Fl CD 1 O -φ
45* o CD -i rn i v/-ι m rv*. O s C wO NT NT O O O O 00 NO — ' CD _ 45* O — > 0 — ' 0 45* — ' O CO sj — ' OI OO OO O CO O CO ' 0 00 10 ^ 0 45* 0 45* 45* 45* 45* l O CJl CJ1 0 sl θo 45* 45* O O CO N0 45* — ' O
Oi s
CO m sj sl sl si sl sl sl sl sl sl sl sl sl sl sl sl sl sl sl sl si sl sl sj sl sj sj sl sj sl sj sj sl sl sl sl sl sl sl sl sl si sl sl sl sl si o o Ό 9. z o
CD O0 0o 00 00 0o 00 00 CD 00 O0 00 00 0O CD 00 03 00 00 00 CO CD O3 C» 03 CD 03 00 00 θ3 CT3 C» to
Co CO Co Co ω co Go CO J CO CO Co Co Go Go GO GO GO Co cjo Co CO GO co ω Co Go CO Cjo Co CO CO CO CO Co ω NO NT NT NT NT N NT NT NT NT NT NT NT NO JO NO NO NT NT NT NO NO NO NO NT NT NO NO NT NT NT NT NO NO NT NO NT NO NT NT NT NO NO NT r^ ϋι ϋι Oι ι θι ι cJi cJi cpι cjι cjι ι pι ϋi Oi pτ θi pτ pι _cjτ ϋι ϋι ι _rjι _rm
NT NT NT NJ NT NT NT rθ NO NT N NT r NT N NJ NT N N
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O →- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ CΛ CΛ CΛ W CΛ W CΛ Oi ω CΛ C I C CΛ CΛ M CΛ CΛ CΛ W CΛ C I W W CO W CΛ W CΛ W ω Cn ω CΛ W m rπ πι m ιn ππ rπ rπ πι rπ rπ rπ m rπ m rn m rπ πι m m ιn m m rn m m rπ m m rπ m rπ rπ rπ rπ m m m rπ rπ m rπ m m m rπ m m r-1 J TJ T3 T3 TJ TJ TJ TJ TJ TJ TJ T3 TJ TJ TJ TJ TJ TJ T) T3 TJ T) TJ TD TJ TJ TJ TJ T3 TJ TJ TJ TJ TJ TJ T3 T^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
Cθ 03 C» CD C» CD 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CT3 00 00 00 CD 00 00 00 03 00 00 00 00 00 TO
o rj. J- Cn O. f M g o S -' -' O N SO1 —S 'I sU1l sIO sOOl —S ,1 0450* O— ' —o ' feOi scol SD c-jH, — , 4N-J
CO rO NT NT NT NT NO NT NT NT NO NT NT NT NO NO 9 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OO SI sl sl sJ sI sI sl Ό. z o o o CT3 CD 0O O0 00 O3 Cθ 0θ O0 00 00 0O 00 00 CD 00 0O θ0 CD 0θ 00 00 0O 00 00 CD
CD CD O O O O O O O O O o O O O O O O O O O NO O GO CO CO CO GO CO GO Go GO CO CO CO CO CO CO CO CO Co CO CO Co CO CO CO CO CO sl 00 00 CD CD 00 03
CO CO ro CO CO Go CO CO CO Co CO CO Co CO O 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 00 Co Co CO Go co co
< T < ) C T ) O O O ~ ~ o O - o ~ o g ~ C ) C T NT NO NT NO NO NO NO NO NT NO NT NT NT NT NO NO NT NT NO NT NT NT NT NT NT N NO Go C C Go ca co
45* 45* 45* (
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* NT NO NT NO NT NO NO NO NO NO NT NT NT NO NT NT NT NO NT NT NO NO NT NO NT NT O NT NT NT NO NO NO o o o o o o o o o o O - , 0 01 01 Jl CJl 01 Ol Φ o O o o ON CN ON O O O O O O o O NT NO NO NO NT NT NT NT NT NT NT NT NT NT NT NT NT NO NT NT ro ro ro NO NT — '
NT NT NT NT NT NT ro ro ro NT NT NT NT NT O O O O O O O O O O CD C ) O O o o O O o o o o O Q NT NT ro ro NT NT T
CD CJ o CD CD CD o o ( ) D CD CJ O O O O ( T ( T O O σ O O o o O O Q o o CD CD o - o c > C ) O O O O CD C ) O O _ σ _ o _ — +- o o CD CD o o r ) r T ( T CΛ CΛ CO CΛ CO CΛ co co CΛ CO CΛ CΛ CO CΛ CΛ CO CΛ CO CO CO co co co co c _o m CΛ O O Φ
CΛ CO co co en en C CΛ CO CΛ CΛ CΛ CΛ GO m m rn m m m m m m rπ III m rπ m m m m m m m m m — m co co co CO CΛ CΛ _ m m m m m m m m m m m i π m m TJ TJ TJ TJ TJ TJ TJ TJ TJ T) TJ TJ TJ TJ TJ TJ TJ TJ m m m m m m m Q
TJ TJ TJ TJ TJ TJ TJ TJ T TlJ TJ TJ O O O O O - o o o o O O O O o O o O TJ TJ TJ o O o o O O o o C oD CD oo oo oo oo oo oo oo Oo oo oo 00 00 oo or CD 00 oo oo oo oo CD 00 oo CD 00 00 O o o
0O 0O O3 00 CD 00 CD 0O 03 03 CD O3 03 O0 3 CO 00 00 00 oo oo 00
CO CO G Co — . — ■ — i o o o o O CO K-I K-I — - — - r-ι π r-ι n CT C0 C» sl sJ θ Cn 45* C0 C0 C0 .. O O O CO C 0 sI 045* O CO — ■ ro N0 7 ^ g S 2 Ξ θ 0 h sl N0 CO O -' O Cθ -' + O O — ' CD 00 oo sj NT oo sj Cn _ ico^ o —' _, C31 —' O O NT 45* Q 00 00 o o GO 45* ϋl O 45* O CO NT O GO X NO ^ ^ NO N a 0 ,0 00 1^ ^ 0- 01 ^ ^0 -^ 01 01 Ol NO NO Ol GO 3_
m
ro ro ro ro ro ro ro iO NO io io Go ,., _, _,
45* 45* 45* 45* 45* CO CO CO G0 0 45* — K -J 45* 45* 45* CO — — — — NO NO NO — — O O O O O O O Oo Oo sj sl O CD
NT — —' O O O — Sl O Ol O O sj ϋι 45* 45* — — ' — ' O sI Oi O CD O 00 O 00 4S. N0 00 — NT Ol OO NO Oo Cn O — O O 00 00 00 00 45* — 45* 00 00 — 0 45» — sl sl O — J sj sJ 45* CO O Q
N θ o ^
N θθ θ o o S o oco o Ccj o o cτ nι S≥ o0 -c33i cNon o NT cCo c τ Nωon !' 0^120s CT3: - 4-^ - 5* c^ Oo ssJi o4S.4ON c- cNτ3T c4j5*ι o- ωsl O O - O sj cθ NT sl CO O O Cn g θ σ o
NT NO NT
Cn oi ϋι i — i — i — i — i — t— i — i — i — i — i — r— I — i — i — i — i — i — i — r~ X— C— i — i — V— T— i — ι— C— i — r— C— r— i — i — i — i — i — t— r— 1 — r— T— i —
4S. 4s. 4s. 45* 45* 4S. 4s. 45* 45* 45* 45* 45* 45* 4s. 45* 45* 45* 45* 45* 45* 45* 45* 45* .fc. 45* 45* 45* 45* 45* .15* 45* 45* 4S. 45* 45* N 45* JS. 45* 45* 45* 45* 4S. 45* 45* 4s. 4s. 4S. 45*
C ) C T c ) C T C ) C ) ( T C ) ( ) < ) C T C T C ) CT C ) C T C ) CT C T r T CT C ) C ) C T CD C ) r T C T C T o C T r ) ΓT ( T ( ) ( T C ) CD C ) C T C ) C T C T f T r ) C T
C 3 C ) r T C ) C ) C ) CD C ) C ) f T CD C J C J C ) D C ) C ) C ) C T C T r T C T ( ) C ) ( ) C ) C 3 ) C ) C T C ) n n C ) C ) D C T C 3 C 3 C 3 C T C ) C ) C ) CT C ) C T r T CT ϋl CJl cn CJl CJl Ol CJl Ol Ol Ol cn cn CJl Ol Ol cn Ol cn CJl Ol Oi Ol Jl cn cn CJl Ol cn CJl CJl CJl cn cn Jl cn Cn Ol CJl Ol cπ Jl CJl CJl Ol cn cπ cπ cπ cn
Sl si sl sl si SJ sl si I si sj Sl sl si si sl sl SJ si l sl SJ SJ si SI Sl sl SI sl SJ SJ Sl sl Sl Sl SJ sl SJ
Cl
45* 45* 45* 4s. 45* 45* 45* 45* 45* 45* 45* 4s. 4s. 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. 45* 45* 45* 45* 45* 4s. 4s. 4S. 4s. 4s. 45* 45* 45* 45* 45* 45* ro NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT
C J C ) C ) C T C ) r ) ( T C ) C > C ) CD C 3 C 3 C 3 CT ( ) ( ) C ) C T C T f T C T ( ) f ) C J CD C T f T r T ( T ( ) o O C ) C T ΓD CD C ) ( ) C ) ( T C ) C ) f ) o C ) C ) r T C T T 3J
C 3 C 3 C ) C T C ) < ) ( T C ) < ) C ) CD C 3 C ) ( ) CT C ) ( ) C ) C T C T C T ( T ( ) r ) C ) C ) ( ) r T r ) C T C ) n C > f T ΓT C T C > ( ) ( ) C C ) C T C T C T C T r T C C T
CJ C 3 C ) C ) C 3 C ) C J CT ()
C T ( T
CΛ CO ( ) C > < ) C T C ) C ) < ) C ) C ) C 1 C T r T < ) C ) T C ) ΓT
CΛ en CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CO CΛ CO fΛ CΛ CΛ CΛ cn cn CΛ ( ) ( T < ) r T r T C T C ) o CD C
CΛ CΛ CΛ CΛ CΛ O CΛ CΛ < ) ( )
CΛ cn CΛ ( ) C ) CD C ) C T C ) ( T C ) C T C ) C T
CΛ CΛ CΛ CΛ CΛ CΛ CΛ CO Λ CΛ m πi (1) m m m m m πi π i m rn rπ m m in m m m m m m in 111 rπ m m m m in rπ rπ m m m m m rπ HI πι rπ in m m m m m m in u U TJ TJ T TJ TJ TJ TJ TJ TJ TJ TJ TJ π TJ TJ TJ Tl TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ Tl TJ TJ TJ Tl ΓJ
CJ CD C 3 CT C J C J CD C T C T C I CD c 3 C 3 C 3 CT C T < ) < ) C T C ) ( ) C ) C ) C ) < ) C T C 3 C ) r ) < T C ) r T CD C ) c > o C T ( 3 ( T ( ) CD C ) C T C T ( T r T r"> o CD D CD 00 00 D CD CD CJO 00 CD 00 00 CD 00 CD CD CD CJO CD CO CO CD 00 00 CD CD O CD 00 CJO CD CD CD CD O 00 00 CD CD 00 CD CD CO 00 00 00 00 oo 00
ro ro ro ro NT NT NT io ro — — ro ro ro ro ro ro — — — — —
O O O O O O CO CO O CO CO NO Co sI sl CJl 45* — ' — ' O CD CD 00 CD oo CD N j sj O 4-* CO — NO NO CΛ OO OO O OO Cπ ϋi O ϋi rO ϋi O CO NT CO Oi — — 01 0 01 4S. CO S ^ cπ oi oi oi ^ ∞ ^ - ^ ^ ^ ^ ^ - Nq oi cn cn oi sl — _, O sl sj → GO CO Co O Cn cTo ro O O O — GO sl — O 45* 45* O rO G0 — o O CD 00 CD 00 O 00 O O O O sj si — — Q O — 00 45* CO ro si cn O O O o O 45* 4S. 3-
NT CO CO NT NT NT NO NO NT NT NT NT NO NO NO CO NO NT — — — — — t» rπ —' NO NT CΛ O — — — NO 4s. Ol Go — sJ NT O OO CO sI O o 4 £N-! OJ O 2=; O SJ O uιι 445-** ϋ Uιl — — sll O N— I ϋ Uil ϋ Uil N ISTJ — — - O ι_J O NJ O \J O NJ N I TJ GOJ - — NUT O Ui' 44S-. ς>P O U^ C Unl WCO S *jl O (CJJll WNT *sll s SI N rT COJ> s vl ro —' —' —' m r ' N^O O U -—" O U C WO O 7. Λ s ft^l O V s Nl U00J . "
O JS. O — NO O O CD Cπ O O O O — CJI CO O — NO sJ O O OO NO NT CO NT O CO O O sl 45* θ3 — ul C)l ^ 0i M C> u, u, 00 J- O Cl: T3
NT NO NO NO NO NT N NT NO NT NO NO NT NO ro rO NT NT NT NT NJ NT NT NT NT NT NT IO NT NT NO NO NO NT NT NT NO NO NO NT NO NO NT NT NT NO NO NT NO sl sj sl SI sl sl sl sι sl sl sι -s| sj sl sι sl sl o O O O O O O O CJl COι Oι Ui Oι Ui Ui Oι Ol Oι Cn ϋι Oτ θι Oι cjι cjl ϋι Cn cjι Oι Ol Uι Cn o
45* 45* 45* 45* 45* 45* 45* 4s. 45* 45* 45* 45* 45* 45* 45* 45* 45* 4s. 45* 45* 45* 45* 45* 45* r T c ) r> f ) c ) CD r T c ) < ) CT < ) CD o CD r ) r T ΓD ( ) D C T
SI I sl si sl r T o D r ) c > CD ( ) ( ) ro ro NT NT NT NT NT NJ NT N) NT NT NT NT NT NT NT CD CJO 00 00 CD 00 cn cπ rn rn cπ n rn cn rn cn rn cπ cn cn cn rn rn rn cn cn cn cn cπ cn cn cn rn rn cn cn cn cn n cn cn cn rπ cπ cπ cn cn
Ch Ch CN Ch Ch Ch o Ch h h CN h Ch Ch Ch Ch Ch sl sl SI sl SI SI si sl sl SI SI si sl SI sl SI sl SJ o CD CD o o o o o CD o o o CD CD CJ o SJ SJ sl SJ φ fό fό iό fό N'T ro N'T iό fό "N'T fό fό
NT NT NT NT NT NT NT NO NT NT NT NT NT NT NT NT NT NO NT NT NT NT NT NT NT CD ΓT C T o r T ΓT CD CD r T r T CD T3J r i D ( ) C T ( ) ( ) ( ) r T ( ) r T ("> ( ) ( ) ( ) < > CT ( ) ( ) Q
CJ o o CT r ) CT f T o r T n CD CD CD c > C )
< ) < ) ( ) ( » ( ) < ) < ) ( )
CΛ cn < ) ( ) < ) < ) < ) ( ) ( ) CΛ CΛ CΛ CΛ CO CO CΛ cn CO CΛ CΛ C CO CO co CO en CO CΛ CΛ CΛ CΛ CΛ C cn n CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CO CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ Λ rΛ in rn m m in m m in rπ rn m in 111 m I'll m m m m in rn HI rn in m m m rπ rn m m 111 III πi rπ m m m in rn rπ m in m m rn rπ m m 11 u Tl Tl u TJ TJ TJ TJ u TJ 'π "IJ TJ TJ TJ Tl TJ Tl π TJ TJ Tl π
TJ TJ TJ n u J TJ CJ TJ TJ TJ u u u π TJ u TJ 0 u TJ u u Tl TJ c ) ΓT n CD r> r T o CD CD o o o CD r T ΓT o CD CJ
( ) ( > ( ) CD c ) c ) r T CJ3 O CD CD c» O D 00 CD CD CD CD 0 <» (O CD CO CD CD D 00 CD CO CD oo
CD 0 oo CD CO CO 00 CD CD CD CD CD CD 00 00 00 00 00 CD oo D 00 00 00 CD
O1 4S. CO G0 CO CO CO CO NT — 45* 0 45* 45* 00 10 — O n cn cn oi NT N0 rθ NT ro ro NT rθ N0 NT CO sl co O OO O CD O — — ' si si si iNO O si sl sl sl sJ O O sJ O -)- oro — o o — cnoo CO Co CO Go o o s O O O c i c co N NO NO — — — — rθ NT NT NT N0 N0 NT n — ro o — si o o o co . Oo
O O CO O OO OO OO O OI O G OI SJ 45* SJ 45* O NT — ' — — ' Co sl — i — - — - —. —. NO O — ' Q ϋl CO Cπ sl — . 00 0 — 4S. NT 4N 4s. 4N O Oo 4s. 4s. 4s. 4S. O NO N0 4S. N0 3-
_, CJl Ul 0l Ul O 0l 0l O SI NT O N0 CO CO NT N0 N0 NT NT NT N0 NT N0 N0 NT CO NT N0 CO GO N0 N0 NT CO NT CO 45* ^ W N CJi 4-* Nθ N NT O O - O O O O — O N0 Q P G0 W N0 CJ3 O S1 4N O O O O - O O O NO O- Q OO sl O O Ol — 45* O O IO NTi rOjO ON C.fco. Cn N^ - Z o ^K ^ rog grn c^TO Csl O — g rO=ι C^ NNON. NNON: NT o ιO Nθ 45* d6 c_n — o dδ o o o o co oi o co O
O 45* O O O OO OO CD NT 4S. OO SI C0 4S. 0 - Oo Co Go TJ
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
128 LI:427997.4:2000SEP08 g296931 1 1954 2238
128 I;427997.4:2000SEP08 1853191F6 1722 2234
128 LI:427997.4;2000SEP08 1853191 H1 1722 1976
128 LI:427997.4:2000SEP08 3964084H1 1018 1310
128 LI:427997.4:2000SEP08 70059533V1 1338 1601
128 LI:427997.4:2000SEP08 gl98167ό 1338 1670
128 LI:427997.4:2000SEP08 7059995H1 802 1287
128 LI:427997.4:2000SEP08 5376913H1 683 938
128 LI:427997.4:2000SEP08 1683557F6 1687 2243
128 LI:427997.4:2000SEP08 70061871 VI 1816 2167
128 U:427997.4:2000SEP08 g5858285 1917 2240
128 LI:427997.4:2000SEP08 3409241T6 1927 2383
128 U:427997.4:2000SEP08 5669576H1 1947 2137
128 LI:427997.4:2000SEP08 3628024H1 1952 2267
128 LI:427997.4:2000SEP08 g5855887 1917 2239
128 LI:427997.4:2000SEP08 2408847H1 1898 2148
128 LI:427997.4:2000SEP08 70059242V1 1904 2238
128 LI:427997.4:2000SEP08 960516H1 1875 2176
128 LI:427997.4:2000SEP08 960387T1 1875 2201
128 LI:427997.4:2000SEP08 2086141 HI 1880 2143
128 LI:427997.4:2000SEP08 70062540V 1 1876 2238
128 Li:427997.4:2000SEP08 3751266T6 1741 2203
128 LI:427997.4:2000SEP08 3321818T6 1746 2201
128 LI:427997.4:2000SEP08 2298374T6 1751 2198
128 LI:427997,4:2000SEP08 70527491 VI 854 1515
128 I:427997.4:2000SEP08 71265279V1 898 1591
128 LI:427997.4:2000SEP08 4239442T8 1900 2310
128 LI:427997.4:2000SEP08 71265775V1 804 1384
128 LI:427997.4:2000SEP08 4177726H1 81 1 1 1 15
128 LI:427997.4:2000SEP08 g2541 182 2122 2440
128 LI:427997.4:2000SEP08 g3039868 2163 2241
128 LI:427997.4:2000SEP08 223027H1 2079 2238
128 LI:427997.4:2000SEP08 223027F1 2080 2238
128 LI:427997.4:2000SEP08 70524882V1 1575 1741
128 L1:427997.4:2000SEP08 71 120713V1 1322 1901
128 LI:427997.4:2000SEP08 71 1 181 17V1 1327 1887
128 LI:427997.4:2000SEP08 71 1 18747V1 131 1 1976
128 LI:427997.4:2000SEP08 70527304V1 1317 1985
128 LI:427997.4:2000SEP08 2298374H1 1639 191 1
128 LI:427997.4:2000SEP08 1361272F1 1673 2232
128 U:427997.4:2000SEP08 71 1 19759V1 731 1250
128 LI:427997.4:2000SEP08 71266124V1 732 1382
128 LI:427997.4:2000SEP08 71118641 VI 731 1327
128 LI:427997.4:2000SEP08 70530256V1 467 1053
128 LI:427997.4:2000SEP08 1915736R6 731 1 151
128 LI:427997.4:2000SEP08 71266413V1 932 1624
128 U:427997.4:2000SEP08 g 1980460 201 1 2329
128 LI:427997.4:2000SEP08 223027R1 2080 2238
128 LI:427997.4:2000SEP08 3934758F6 904 1427 CΛ m
NO NT NO NT NO NT IO IO NT NO NT NT NT NO NO NO NO NT NO NT NT NO NO NT M NT NT NT NT NT NT NT NT l CD O0 CT3 CTD CT3 0O 0O O0 CD 0O CT3 00 0O CD 0O 0O 0O CD 0O CT3 CTO 0O 0O CT3 00 0O O0 CT3 TO o
C» CD CT3 C» CT0 CT0 00 00 Sj Cn N sJ sJ O sl O si sl CΛ co ro — 45* co co co co 45* cn π cn o o co o o NT CO 0 G ι_ . 4S. CD Oo sl f~i oo oo θ3 sl s) sJ θo Oo θO CD NO NT — — — — — _;
^O G ι_ ^O O 4s sJ O Jl ϋl -. — O — O sJ O sl sI O O O O — — CO CO — O
NT sI θ3 NT sJ 4s, Cn - — OO O O OO OI — O O _, cπ NO GO 0 CO 45* CO CO 45* O — si — ' O O — ' 4S. G0 Ui 45* O O 00 G0 O G0 4s. — ' 4S. 3-
ro ro ro ro ro ro ro ro ro — ro ro o ro ro — — ro — NO o o co ~ ' r-1 - ' o _J ~-, ~' τJ rJ — ' — — — — — ro ro ro — NT NO NT NT NO NT NT NO NO NO — — — — < — —. — . co
ΓT NT NT — ' CO NO NO SJ - N0 45* 45* 4S. NO CO OO O sJ N NO GO NT NT — rθ NT O CD NJ IO O NT 4S. Cπ si Co 4S. sJ Co ; -
GO CO O CO NT O O CD O — 3 CJl sj ^ ϋl G0 θ C0 4s. θ 45* G0 C0 O C0 O - Oo sl O — CO 5 CD CD 0 00 4S. NT C0 45* — ' O O^ N S^I C0 CJι O sl O ϋι O O G0 G0 θ sl 4S. 01 0 1-" si O O sl o CD O OO O Oo sl O Oo rO sl OO O sl — C O CD O O OO sl sl — J θ 4S> O θ TJ
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
128 U:427997.4:2000SEP08 71 1 18652V1 1424 2039
128 U:427997.4:2000SEP08 3934357H1 904 1216
128 LI:427997.4:2000SEP08 133661 1 HI 1714 1957
128 LI:427997.4:2000SEP08 5290693H1 1330 1626
128 LI:427997.4:2000SEP08 70060883V1 1338 1745
128 LI:427997.4:2000SEP08 71 1 17317V1 1 166 1740
128 LI:427997.4:2000SEP08 71 1 19085V1 1213 1841
128 LI:427997.4:2000SEP08 3792653H1 1240 1449
128 LI:427997.4:2000SEP08 6329373H1 1251 1 96
128 LI:427997.4:2000SEP08 70062844V1 1485 1881
128 L1:427997.4:2000SEP08 2273538H1 1510 1779
128 U:427997.4:2000SEP08 3022474H1 1556 1849
128 LI:427997.4:2000SEP08 71 120487V1 1 1 15 1610
128 LI:427997.4:2000SEP08 71265619V1 1043 1705
128 LI:427997.4:2000SEP08 5946829H1 1056 1341
128 LI:427997.4:2000SEP08 71266345V1 1 1 10 1 65
128 LI:427997.4:2000SEP08 71 120506V1 1420 1873
128 LI:427997.4:2000SEP08 71 1 19165V1 893 1508
128 LI:427997.4:2000SEP08 3173959T6 1600 2191
128 LI:427997.4:2000SEP08 1915736H1 731 980
128 LI:427997.4:2000SEP08 71265549V1 731 1218
128 U:427997.4:2000SEP08 g3151378 1976 2242
128 LI:427997.4:2000SEP08 71 1 19545V1 1283 1970
128 LI;427997.4:2000SEP08 3173959H1 1338 1605
128 U:427997.4:2000SEP08 1683573F6 1687 2102
128 U:427997.4:2000SEP08 1683573H1 1687 1923
128 LI:427997.4:2000SEP08 1632340H1 1721 1938
128 LI:427997.4:2000SEP08 614199H1 1722 1949
128 LI:427997.4:2000SEP08 70059326V 1 1735 2238
128 U:427997.4:2000SEP08 2298374R6 1639 2093
128 LI:427997.4:2000SEP08 3408296H1 1578 1828
128 LI:427997.4:2000SEP08 71 1 19740V1 1 148 1440
128 LI:427997.4:2000SEP08 4174878H1 1 152 1436
128 LI:427997.4:2000SEP08 3751266F6 1 358
128 LI:427997.4:2000SEP08 71266530V1 732 1262
128 LI:427997.4:2000SEP08 71264808V1 753 1279
128 U:427997.4:2000SEP08 5296714H1 765 1049
128 LI:427997.4:2000SEP08 71 120676V1 1562 1948
128 LI.-427997.4.-2000SEP08 70529734V1 461 579
128 LI:427997.4:2000SEP08 3751266H1 1 296
128 LI:427997.4:2000SEP08 3321818F6 103 497
128 U:427997.4:2000SEP08 3321818H1 104 350
128 LI:427997.4:2000SEP08 6351045H2 329 660
128 U:427997.4:2000SEP08 3662984H1 1018 1270
128 LI:427997.4:2000SEP08 70529227V1 1021 1617
128 LI:427997.4:2000SEP08 70061612V1 1338 1898
128 LI:427997.4:2000SEP08 70059581 VI 1338 1831
128 LI:427997.4:2000SEP08 70058920V1 1338 1778
128 U:427997.4:2000SEP08 70530468V1 1336 1867 CO m
NO NT NT NT NO NO NT NO NO NT NT NO NT NT NT NT NO NT NT NT NO NO NT NT NT NO NT NO NT NT NO NT NT NT NT NT NT NT NO NO NT NT NO NT NO N^ O O O O O O O O O O O O O O O O O O O O NO O O O O O O O O O O O O O O O O OO CO OO OO OO OO OO OO OO OO OO OO z o
.15* o o o O o o o o o o o o o NT
Sl sl sl Sl si Sl si Sl sl sj sl Sl Sl sl sj si Sl Sl sl Sl Sl Sl
CD CD CD oo CD 00 Oo 00 no 00 00 00 Oo ro 00 CTO o O o o o O o o o NfT o o NfT o o NfT NfT o o o o NfT o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o Sl Sl sj Sl
45* c
NT NT NT NT 3
C 3 r T o r T n o ro f T o f T o TJ
< ) ( ) ( ) ( ) < ) ( ) ( ) ( ) < ) ( )
< ) ( ) < ) < ) < ) ( ) ( ) c ) r ) r T ( ) r ) r T r T en co →- co CΛ ro CΛ cn cn fΛ cn cn cn cn rn m m rn in i n rπ m m in m m I'll m m m m m m m rπ m m in rn m m m (1) m m m m rn m m m m m m m m m m m m m u u u u TJ TJ o Cl
( ) ( ) < ) CD < ) ( ) ) r r> r oo CD CD CD
_■ _ι _• —i _■ — • _ , _. _- _ , __, _, _ , __, __, NT Oo sj sJ θ Oι Cn ϋl Ul 45* Co Co rθ NO - — < o 00 sj ϋl 45* 45* — 03 CO — — 0 C ^ * co C * O 0 45* 0 sl ϋι ϋι cπ θ 0ι 45* sj 0 ϋι — O O Co O — ' sl 45* 45* — — _I C —D 0 COO CNOO 0 —0 C 45O* 40 N _ j * 5SJ J^§ C§ ΓO-I O CO C Co CO 4 45* 45* — O Ol 4S. — GO Q 45* 45* sJ θ Go sJ 45* Co ϋl sl - CO O — SI O O — « O O O O 45* O NT NO SI NO O 45* 00 3-
rO IO NT — NO — — NO — — NT — — — — NO — — CO CD Oo sl rn rn rn Ni NI rπ n is O NT NO NT N NT — — ' — ' — — ' — ' — ' — ' — — ' — ' — ' CO N0 NT - O O 03 O - sJ O Oo 45* G CD — JS. N0 M N S K NT S ^ — ' ro ro ro ro ro cπ — cπ ro io o o oo oo oo o o ^ — — O Ul sl NT 45* Cθ O NT Go sl oo ϋι O O Ol — O O O O Ol — O O NO ϋl O Ol O sl Oo O NT CO GO sI js _, _, oo Nj ω ω m K n M ω Ni -' -' M M M M Oi CiJ Nj -' O M M Nj -' Co ro o O
co m ω ω ω ω ω co ω cj) CΛ ω co ω cj) co ω ω ω co ω ω w ω co o ω ω o rO NO NT NT NT NT NO NT NT NT NT NT NT NT NT NT NO NO — — — O O O O O O O O O O O O O O O O O O O O O O O O O O O O ^
O
0000000000000000000000000000000000000000000 _ *, - ' , * ". l_lι
NT NT NJ NJ NJ NT iό iό fό Nb iό fό iό fό iό fό iό fό ώ cό cό 0 O O 0 0
M M M W M M M M M M M M W M M M ω w c ω c co ω ω c ω ω ω ω co ω sl l SI SJ sl sl
CT3 0o Oo Oo Oo Oo Oo Oo Oo Oo Oo Oo Oo OO Oo Oo Oo Oo 45* 4N 4N 4N 45* 45* 4N 4N 45* 45* 45* 45* 4N 4N 4N 45* 4^ CD CD 03 00 03 03 O O O O O O O O O O O O O O O O O O CO CO GO — — — — — — — — — — — — — — — — — — — — — — O NfT O 0
O O O O O O O O 0 O O O O 0 O O O O 45* 45* 45* O O O O O O O O 0 O O O O 0 O O O O 0 O O O O O O 0 0 NO NT NO NT NT NT NT NO NT NT NT NT NT NT NT NT NT rO Ol ϋl Ol O O O O O O O O O O O O O O O O O O O O O O " Φ
7r' zJ 7' z' rz' 7Z' r' -rz' τ r^ 7' -r^ z' J Z' r' -^ NT NT NO NT NT f 3r
NT NT NT NT NT NT NO NO NO NO NT NO IO NO M NO NO NT NT NO NO rO NO NT NT NT NT M NT NT NT NO NT N^ 0 0 0 o o 2. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q C oooo ~D" o ~ oooo oooo oo ooooooooooooooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo oco CD o co co cn cn cn cΛ cn cn cn c j C cn cn cn cn cjj- cn cn cn c ϊ Cn cn cn cn cn cn cn cn cj cn cn cji cn cn cn en e^ m — rn in rπ rπ rπ rπ rπ rπ m rπ rπ rπ m in rπ rπ πi rπ πi m πi rπ m in rπ rπ rπ nπ m rπ rπ m rπ m rπ rπ rπ rπ πi rπ m rπ rπ TJ TJ TJ TJ TJ TI TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ Tj.Tj TJ TJ TJ -Ό TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ -D TJ TJ TJ TJ TI TJ TJ TJ TJ TJ O 000000000000000000000000000 0000 0 00000 0000 0 000 Oo CD 00 o 00
CT3 CT3 00 CT3 C» OO CTO OO OO OO OO OO CD OO OO O CT3 00 00 00 CT3 00 00 00 CT3 00 00 CT3 CT3 00 00 00 ∞
CD CTO OO CD CD CD CD CD SI SI SI CD O O sl — — — — — NO CO CO NT NO NO NT NT NO - 0 00 45* 0 01 45* O O O Jl O O sl sl O O 4S. O 45* Co CD O Ol O CO O Co — NT O NT j CD CO 03 — ' Go — « co Oi O O O O — CD T ϋl OO Sl sj sj -Nj ϋl NT NT — sj 45* GO NT £ ^ ^ C Sl CD 45* J sl O cπ Oi ϋi ϋi O co Q si si sl 45* sl sj CO O — si Co NT CO cn 00 00 cπ o o o cn co cn 3-
10 — O NO O O NT NT NT 45*
Sl 45* sl 00 O sl en o o to ro ro ro ro to ro ro
4 sJ S! 03 00 SI OO SJ OO 45* O sj sI sl C0 sl C-π C03 CJ0 0l s4 --J --. o O O O ω
co m
CJO O CD CO CD CJO CD O GO CO CO CD CO GO CJJ GO GO CD CO CD GO CO CO GO CO GO C GO CO CD CO CO CO W CJl Ol Cn Cπ ϋι Oι CJτ CJl Ol Cn Cn ϋι Oι Ol Oι Ol Oι Oι Ul Ui 45* 4N 4S. W G0 Cθ CO Cθ CJ0 N0 N0 NT N0 NT N0 N^
O
00^0000000000000000000000000000000000000000000000 x x x x x x x x x x x x x x x x x x x x 'so O 'NO o o o o o o io iό fό fό iό iό iό iό fό iό iό iό iO Nτ iό iό fO Nτ fό fό
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O rO NO NT rO NO NO NT NO NO NT NO NT NT NT NT NO NO NT NO NO
O O O O O O O O O O O O O O O O O O O O O O O CD OO OO CD CD OO OO OO CD CD OO OO CD OO OO OO OO CD OO OO OO OO OO OO CD OO O O O O O O O O O O O O O O O Uι Oι ϋι Cn θι Ol Cπ ϋι Ol O O O O O O O O O O O O O O O O O O O O _,
CJl Ol Cn θl Cπ Cπ cjl Oι CJl Cπ Cn Oι Oι Oι Oι Oι CJl Oι Ol Cπ sj sj sj cD 03 CO CD Oo oo O O O O O O O O O O O O O O O O O O m
00 C» CT3 00 CT3 CT3 CT3 00 00 00 03 00 CT3 ∞ CT3 p0 CT3 CT3 CD 00 N0 N0 N0 O
^ iό ^ iό iό ^ Nό iό iό iό io fό fό fό fό ro NT NT NT NT NT N^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ^ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ cn cn cn cn cn cn cn cn cn CΛ Cn cn cn cn csi en cn cn cn cn cj Cn cn cj cn cn cn cn cn cn cn cn ci cn cn cn cn cn cn cn cn — m rn rπ rπ rπ m in rπ rπ rπ rπ πi rπ rπ rπ πi rπ rπ m rπ πi m rπ m m rπ πi m rπ rπ m rπ m in πi m m rπ πi rπ πi rπ πi nπ rπ πi m rπ rπ rT
"D TJ "D TJ TJ TJ "D TJ TJ -D TJ TJ TJ "D "D -D "D TJ -D TJ'TJ Tj -O Tj -D Tj -D T Tj -D Tj -rj -D Tj -D -IJ -D -O -D -O J T -D TI -D -a O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O CT0 00 CO 0θ 00 CD θ3 0θ 0θ 00 CD 00 00 00 CD CT3 00 θ3 00 C» 00 CT3 C» 0O 00 00 rø
CJ
S O rO? NT NO si O 00 o oo S S Lό O O CO — <3
O sj S ≥ o S S ^ a - ^ W ^ ON ϋ C^ sJ ϋ Cjl sj ^ ^ O ϋl N-1 O _, _, NN, 00 sl O 45* O C0 4S. G0 N0 C0 C0 C0 Nτ r r.N, .
Oo 0 45* sl NO C0 0 45* Oι θ rθ 45* O CO sj sl si O S ∞ --J OO ^ NO O SJ 4N NNJ O O SI O CO NT IO 'SI J NJ 0
O I oo 00 O O O Uι 45* sJ θ sl — Ol - Co co o O 4s. 45* 4s. O β CO θo r O. ∞ sJ CT3 θ NT sI sl N NT Cn 45* O Co - ∞ Oo TJ
m ω ω ω ω ω ω ω co co co co ω co co co ω co ω ω ω ω co ω co co co ω ω co ω co co ω o o o o o o o o o o o cn oι cπ oι oι ϋι θι θι θι oι ϋι oi ϋι cn oι ϋι θι ϋι θι θι θι θι θ θι oi ϋι ϋι θι oι cπ cπ oι cjι cn oι cn ϋι w o
0000000000000000000000000000000000000000000000000
O O O O O O O O O O O o o o o o O o o o o o o o o o o o o o o o o o o o o o o o OO OO OO OO OO OO OO CO OO OO OO O CN fN o Ch fN Ch Ch Ch o h fN o o CN Ch h (N CN o h Ch h h (h Ch Ch h Ch h Ch h h h h o Ch Ch sj sl sj sl sl sj sj sl sj sj sj o o o o O o OO OO OO CTO OO OO OO OO OO OO OO rn rn rn rπ rπ rπ cn cn rπ rn cn rπ rn rπ rn cπ cn cπ cn cn rn cπ cn cn cn cn cn cn cn rn rn cn Ol cn CJl cπ Oi CJl
CD 00 oo 00 00 00 CO CO CD D CD D 00 00 oo CO D oo oo CD 00 00 oo CD CD O 00 CD CD CO CD CD CO CO CD CD CD CD
NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT
NT NT NT NT NT NT NT NT NT NT NT r T ) ΓT r T r ) D ΓT CD ) CJ CD CJ CD CD r T CD CD CD ( ) < ) c i c > ( ) c ) ( ) < ) < J ( ) c ) < ) < ) c > ( ) c ) < > ( ) < )
< ) ( ) < ) ( ) r> < ) r T r ) ΓT ( > CT r T O CD r T C T n CD r T ( ) c ) r ) C ( ) ( ) < ) r T r T r T r ) CΛ CΛ CΛ CΛ ro CΛ CΛ r Λ CΛ CΛ CΛ cn CΛ CΛ CΛ CΛ CO CO CO CΛ CΛ Λ CΛ fΛ CΛ CΛ CΛ CΛ n CΛ CΛ CΛ CΛ CΛ CΛ CΛ
CΛ CΛ CΛ CΛ cn CΛ r CΛ CΛ CΛ CΛ m rπ m m m m III m m m m m m m m m m in m m m m m rn m m πi m m m m m rn m m m m m * J m m m m m m m m m m in TJ Tl TJ TJ TJ TJ TJ TJ TJ Tl Tl TJ TJ TJ TJ TJ τ, TJ Tl TJ TJ Tl TJ J TJ TJ J TJ TJ TJ TJ TJ J T) Tl TJ TJ TJ
TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ c ) ΓT o r ) CD CD c T C ) CT CJ C ) CD r T o CD CD oo 00 CD oo 00 CD CJO 00 00 CD 00 00 oo 00 00 CO D 03 00 CJO 03 CD CD CD CD 00 CJO JO CO 00 CD CD co D CD CD 00 CD
00 00 00 CO 00 oo 00 oo 00 CD CD
o sl sl o — — — — 45* ro j5* cn - — — — — — IO NT NT o fO Oi O O CO Co C GO GO Co SJ SJ SJ N0004S.014S. NT NO NO O OO — NT NO , CO — o JN is cn O 45* 00 CO 45* α co Go 45* Go — — ro si o ro ro g; Ol co o K π — — O Go js. Oi O O ^ o > C 4N ι .fc. Nθ θ 45* cτ3 cπ θι ϋι — ro αo o Q J O - 1 .[:N. NN 0 00 NN _- 0O — oo si si cπ — oι r
Cj0 sJ C31 Cn m ^ r , 4S. N0 N0 NT O 0l sl -N C0 N0 C0 O C0 45* cn 45* 45* OO NO Go Cθ GO CD Cπ 45ι - NT 4S. sl .? l, r O«S O oSI oN0 O O 4S. - 0 0 01 00 45* 0 — sj sl sj Sl 45* 4s. rθ sl sl sj sl sl si cn sl 0i sl si rθ ?
NT s 4S. NT s 4s. 45* — ' 0 0 45* 0 0 03 00 0 Ol O o O o45* fGo0 o45* 4rO5* Oro 4cSπ..fc. oι θι o o cn cπ oi cπ oι cπ o cD 4s. oι — o_2
45* 45* — Co 0 45* Cn θι O Cπ 4- O Cn θ Oo TJ
CO m Coo Coo Coo CoO CoO GOo coo COO COo COO COO CoO GOo COO Coo Coo Goo coo cojo COO Coo Coo Goo Coo GoGoo cojo Coo CoGoo Goo coo co_ Coo COooooOooooooOoOO o
Φ <J? ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø øøøøøøøøøøøøøøøøøøø øøøøø o O0o00oCOo0Oo00oO0o0Oo0Oo0Oo00o0ooCT3oC_oC»o0OoCDo0oo00o0OoCDo00oCT3oCT3o00o0oo0Oo0Oo00o0ooO0oCT3oCT^ooooooooooooooooo sj sl sl sl sj sl sl sl sl sl sl sl sl sl sl sl sj sl sj sl sl sj sj sj si sl sl sl sj sj sj sl sl sj sl sj sj sj sj sl sl sl sl sl sj sj sl sl sl ,
C» O3 CT3 CX O3 O3 00 CT3 O3 0O O0 00 00 00 00 00 00 00 CD 0O 0^
:_:_._ _:_: _, _:_: :_ _::_:_._ _::__;:_:_ _:i^
_, -_ -_ _. X X X X X _. :_ -_ -_ X X X X X X X _-, -_ X X X X X X X X X X X X X X X X X _. X X X X XTJ lo iό iό iό iό iό iό iό iό iό iό iό fό iό i fό fό fό fό fό fό f M
OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO TCD
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O — CΛ CΛ CΛ M CΛ CΛ CΛ M CΛ CΛ CO ω CΛ ω CΛ M W CΛ CΛ CΛ CΛ C CΛ CΛ CΛ ω i ω CΛ W m m m in m in rn rn rπ rπ m m m m m m rπ rπ m m m m m m m m rn m rπ rπ πi m m rπ rπ m rπ m rn rπ rπ rπ m m m m rπ πi rπ
TJ T' TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T TJ TJ TJ TJ TJ TJ
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
CT3 0o θ3 CD O3 0θ 0θ CD 0θ 0θ 0o CD CT0 CD 00 00 0o CD 00 0θ 00 0o 00 CD 00 00 CT0 CO C» 00 00 00 CT0 0o α0 0θ 00 00 00 ∞
ro No io o — cπ oi cπ io — NO Oo Oo Oo oo Oo oo oo OO O O O Oi Oi Oi OO CD OO Oo sj sj o O NT NT — ' — — — — ? ro ro — cn o oo — o oo cn si o cn oo o cπ cn o o co No NT O Ul 45* Go 45* NO - O 00 O O 45* — — 00 O 4S. 4S. N0 Q O O Ol NO O - O U1 C0 4S. - O Oo Co Co sl Ol sI — OO q.
_. _, _. _ι -_ -_ _, _j _j _, _. _J _. _. _, _, _. _, -_ -_ -J _j _. _, _j _, _, _, _, _ι _, _, _, _, --J -ι _J _, _J _. _, _. _J _J _J _ι _. _, _, ®
CJo ω co Go Co Go Go Go GO Go Co CO CO Co co Co cjo co GO Co CJ CjO CO CjO CO Go CO CO Go GO CO Go Go ω sj sJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O o
0000000000000000000000000000000000000000000000000 c NoTcNo0oCT3o0θoθ3o0OoCT3oθ0oCTDoO3o0OoCT0oC»oC»o00o00o0Oo00oCT3oC_o00oO3o∞o00oCT3oC»ooooooooooooooooooooooo sJ S| N sJ sJ sJ sj sJ s| s| sJ sl sJ sJ s| sJ sl sJ sJ si sl sl sJ sl sl sl s| sI sl sJ sI sl sl sJ sI sl sl sJ sl si sl sI sJ sJ sJ sJ sl sl sl _.
CjO Oo coo Oo CD Oo Oo Oo Oo oo Oo Oo oo oo Oo OO Oo oo oo Oo COo Oo OO Oo OO OO Oo CD Oo OO OO Oo Oo CD C^ oo oo — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -^
Oi cn - — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 3
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X ^ X X X X X X X X T o iθ NoO NoT NoTooo rό i ooooo N3 fό ro Nτ i Nτ i NT NO NT M ooooooo ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo→-
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0o0 0 0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0o0Φ = CΛ CΛ CΛ CΛ CΛ CΛ W CΛ CΛ CΛ M CΛ CΛ C I W C W M W CΛ CΛ CΛ CΛ C/T M CΛ CΛ CΛ IjT CΛ CΛ CΛ M m m m m m m m m m m m m rπ m m m rπ m rπ m m rπ m m m m m m m m m m m m m m m m m m m m m rπ m m m m m o TJ TJ TJ TJ TJ TJ TD TJ TJ TJ TJ TJ T3 TJ TJ TJ TJ TJ TJ T3
CT0o0OoTJ TJ 00oTJ 0Oo00o-TJ TJ CDo00oTJ 00oTJ 0OoT3oTJ 0O 00oTJ 0OoTJ 00oTJ 00oT3 TJ -O TJ 00oO0oCDoCDo0OoCJ3o00o00oC03oCTOo00o0Oo00oCDo0Ooooooooooooooooooooo
_ _, — , _, _ _, _, _, _j _. nΪ. Wr.N CO CD CD CO CO CD CO CD CD CD or ro O
cn Ol ro Ol NO — — rs — ' CO CJl s4 0 C31 0 sl O sJ sI O sl O O O Ol sJ CjO JS. sj sl 00 sJ sl 4_ sI OO sJ sl Ol 45* cn ϋi si co 4N CΛ Ol GO O O si O NT NO VT — ' -— CO NT 0045* 45* — si rO GO O Cn CO N0004S- Cn CO - 45* O Cn NO CO Cn NO Go Go NT Ul NT N045* 45* 45* Sl Oo O — NT 00 ^ sj o o oo si si ro oo o si — — i o - G04s. co ro o oo 4s. o o oo — ro co o o oo ro oo si NT oo O si — — N 0T0 — cn N —O Tj
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
137 LG:1327885.1:2000SEP08 6798095F8 1 519
138 LI:449393.1:2000SEP08 6272292H2 1 507
138 LI:449393.1:2000SEP08 6272292F8 1 642
138 LI:449393.1:2000SEP08 5910821 F8 365 793
138 LI:449393.1:2000SEP08 5910821T9 365 662
138 LI:449393.1:2000SEP08 5910821T8 365 636
138 LI:449393.1:2000SEP08 5910821 HI 365 676
139 U:897616.1:2000SEP08 6790830H1 1 523
139 LI:897616.1:2000SEP08 3649045H1 1 143
139 LI:897616.1:2000SEP08 6796674H1 2 519
139 LI;89761ό.l:2000SEP08 6790830T8 1 498
139 LI:897616.1:2000SEP08 6796674T8 2 442
139 LI:897616.1:2000SEP08 6796674F8 2 525
139 LI:897616.1:2000SEP08 6796548T8 19 470
139 LI:897616.1:2000SEP08 6796548H1 21 449
139 U:897616.1:2000SEP08 6796548F8 21 526
139 L1:897616.1:2000SEP08 6790830F8 30 539
140 LI:736860.1:2000SEP08 6274719H2 1 274
140 LI:73ό8όO,l:2000SEP08 6274719T8 1 331
140 LI:736860.1:2000SEP08 6274719F8 15 431
141 LI:027066.6:2000SEP08 2101767R6 1 437
141 LI:027066.6:2000SEP08 6828854H1 307 892
141 LI:027066.6:2000SEP08 7652140H1 645 947
141 U:027066.6:2000SEP08 6080233H1 640 1139
141 LI:027066.6:2000SEP08 7194730H1 641 1114
141 LI:0270όό.ό:2000SEP08 6827816J1 843 962
141 LI:027066.6:2000SEP08 2593628F6 855 1205
141 LI:027066.6:2000SEP08 2593628H1 855 1102
141 LI:027066.6:2000SEP08 4725176H1 870 mi
141 U:027066.6:2000SEP08 3291113H1 910 1170
141 LI:027066.6:2000SEP08 5328747H1 944 1097
141 LI:027066.6:2000SEP08 3599076H1 983 1254
141 LI:027066.6:2000SEP08 1821378H1 1015 1120
141 U:0270όό.ό:2000SEP08 3013226H1 1029 1326
141 LI:0270όό,6:2000SEP08 2776582H1 1176 1429
142 LI:1074263.1 2000SEP08 6792763H1 1 460
142 LI:1074263.1 2000SEP08 6792018T8 1 567
142 LI:1074263.1 2000SEP08 6792018F8 1 649
142 Ll:1074263.1 2000SEP08 6792018H1 1 406
142 LI:1074263.1 2000SEP08 6796374H1 11 503
142 LI:1074263.1 2000SEP08 6791238H1 41 618
142 LI:1074263.1 2000SEP08 6791238F8 41 637
142 LI:1074263.1 2000SEP08 6795926H1 42 461
142 LI:1074263.1 2000SEP08 6791238T8 55 595
142 LI:1074263.1 2000SEP08 6792967H1 46 595
142 LI:1074263.1 2000SEP08 6794141 HI 152 652
142 LI:1074263.1 2000SEP08 6796228H1 145 654
142 Ll:1074263.1 2000SEP08 6794232H1 145 661
142 LI:1074263.1 2000SEP08 6798243H1 147 663 CΛ m
_ -_ ._ _ _ _ _J _ _. _J _ ®
4_ 4N 4_ 4N 4N 4N 4N iS. 4N 4N 4_ 4N 4N 4N 45* 4S. 4N 45* 4N 4N 45* 4N 4N 4N 4N 45* 4N 4N 45* 4N 4N 4^ 45* 45* 45* 45* 45* 45* 4_ 4N 4_ 45* _. 4N 4- 4s. 45* 45* 45* r-l Cjι 0l ϋι ϋι ϋι ϋι 0ι ϋι 0ι ϋι 0ι 0ι ϋι ϋι ϋι ϋι ϋι 0l 0ι ϋι cn ϋι 0ι 0l ϋι 0ι 0ι cn ϋι 45* 4s. 45* 45* 45* 45* 45* 45* 45* 45* 45* CO CO GO CO G NT N NT N
O ro ro "__, ,
C T n o f T ΓT o o co r T
03 no CD cm 00 00 00 00 ro 00 00 CD ro NfT 45* Sl sl sl Sl
00 oo CD 00 00 00 CD 00 00 00 CD oo 00 00 oo 00 oo oo co co 4s. o o o o o o o o o o NT h Ch h Ch Ch Ch Ch o Ch Ch Ch Ch h Ch Ch Ch r> Ch Ch Ch c> Ch Ch — ' cn cn Ol cn cn Ch h Ch
00 CD CO 00 CD CD CJO CD CO CD CJO 00 CD CD CO ' , - . - . - . CO Φ1 fό fύ fό — — — — 3
NT NT NT NT NT NT NT NT NT NT NT NT NT NT TJ
ΓD CD ΓT r T r T c T r T n r ) r T o o ΓT r T n r T c r ) r T r T c ) ( ) r ) r J r T o r T r T ( ) r T ( ) ( ) -+ r ) < ) ( ) c ) c ) 3 ( ) r T r T ΓT < ) ( ) ( > l)
CO cn ro CO cn CΛ CΛ fΛ CΛ cn cn cn in πi i n fΛ m m in m m m m m m m m m m rπ m i π rπ m m m m rn m m m m πi in rn m m m m m m in m I) II π m m rn m 1' 1
TJ TJ TJ TJ π Tl TJ Tl TJ TT Tl Tl TJ π o o n u
CD r T o ΓT o ΓT C T T r T r T CD oo
CD oo 00 CD CD CO 00 CD O 00 00 CD 00
sj N si Nj ^ ^ oi ^ ^ ^ w ^ w ro -j ^ - ^ - w r j M ^ ^ ^ ^ ^ ^ cjι θι cπ .r_ S cji 4- cτ3 sj ω sj _ _ ∞ g ∞ g cjι cJi o 45* 45* si NO si CO GO CO Go NO NNJ N - 4S. G0 — — £
C03 C0 4- 4N θ g θ IO Cn C0 ^ § 0N Cn Co cn ^ 4^ 4N C0 C0 _ _ o ^ _ _ CD CO T1 D 4S. 45* cn 45* O O O 45* 45* NO si 00 CD O GO w ϋ, O CD N4 θ M
jN. 4S. O 4s. 4s. 4S. sl CD 03 00 CD C0 4S. C0 Cπ sI 0l O O O CO NC.o CO. I W C ON sO O O —O , C JON O rs SIi C r .Oi CJl O O O O - ' . n sSIi O N NO rCO .i s C SIi t mO OO mJl rC OO Un C I m CJOl O > l NO oTI O ■ O c ro. N O —T ■ T —C) J-
Co m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
4S. 4— 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 45* 45* 45* 45* 45* 4s. 45* 45* 4_ 4_ _i 4N J_ 4N 4N 4_ 4- 4N _ 4N 4N 4N _ 45* 4_ 4N π Cn ϋi Oi Oi ϋi ϋi ϋi ϋi ϋi ϋi ϋi ϋi ϋi Ol ϋi Oi Oi Cn Oi ϋi Ol ϋi Oi Oi Ol Ol Ol Ol Oi Ol Cn ϋi ϋi Oi ϋi Oi Oi ϋi Oi Oi ϋi Ol ϋl ϋi Ol Oi Oi Ol
~ o
OO OO OO OO OO OO OO C» OO OO OO OO OO CD OO OO CT3 CD OO OO OO OO OO OO OO OO OO OO C_ 00 00 00 00 00
O3 CD 0O O0 0O CD CD O0 O0 O0 0O 00 O0 00 O0 0O CT3 0O O0 CT3 CT3 CT3 O3 CT3 CT3 CT3 O3 C»
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ^Λ1 CT3 O0 CT3 CT3 C» 00 00 CT3 00 O0 0θ O0 0O O0 0O 00 CT3 00 0O C^ ιo fό fό fό fό iό fό fό iό fό iό fό fό fό fό iό fό iό fO Nτ iό iO N^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O rri-
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ cn cn cn cn cn cn cn cn cn cn cn cn cn cn cn cn ci ci cn cn cn cn cn cn cn cn cn cn cj cn cn cn cn cn cn cn cn cn en cj, — πι m m m ιn m ιn πι rπ m ιn rπ πι m ιn m πι rπ πι rπ rπ rπ m ιn rπ m m ιn m ιn m m ιn m ιn m ιn m rn rπ rπ rπ rπ rπ rπ πι rπ rπ rπ r-ι τD TJ T3 τj τj τj τj τj ττ τ: τj τj τ3 TJ T) τj -n τJ τj τj τj τj τ: τJ TJ TJ T τj τ τ; τ τ τ τj τ τj τ τ^
0000000000000000000000000000000000000000000000000
OO OO C» 0O O0 OO OO C» O0 O0 OO OO OO OO 0O O0 CT3 CD 0O C00 0O O0 0O O0 0θ O0 CD 0O C_ 0O O0 CT3 CD CD rø
NO NT NT — — — — — — NT NO — NT NT - — — —
SI 4S. O 00 O 00 SJ O SJ 4S. NT NO NO NT O — — — Λ (v, ro ro ro ro ro ιo -' ro ro ro — — NO - — NO s
45* NO 45* 3 S 4N sl sl sl Oi 4- Oo O IO sj sl o O — O sl ^ τ-f
— O O — O OO — N0 4- O1 sl 00 O sj 4- O G0 O sl 45* 45* O — 4S. O O cn o 00 o 45* O 4S. Jl O — O C S — 4N 0l Cn O O CD O sI NT O O 4- SJ NJ _ CN 0
— NT O NT O 4- U1 00 SJ 4- CO — O SI 4s. fc. O NO C T3 4S. O SI O — O O Oo O O Uι 4- SI Oι ^ Cn CD O GO J- OO O sl CO O - O O Ol CD o W 3
CΛ m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
4_ 4S. 4_ 4N 45* 45* 45* 4— 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* _ _. _! _! _ _! _! _ _! _! _ _! _ _. 4_ 4_ 4_ 4_ 4- 4- 4_ 4- 4- _! _. 4- 4N _! _! r-| Cπ ϋi Ol Oi ϋi Ol Ol ϋi Ol ϋi Cn Ol ϋi Cπ cn ϋi ϋi ϋi ϋi ϋl Oi Oi Oi ϋi ϋi ϋi Oi ϋi Oi Oi ϋi ϋi ϋi ϋi ϋi Oi ϋi Ol Oi ϋl Ol Ol Oi Oi Oi Ol Oi ϋi ϋi ^
O
O3 00 CT3 00 00 00 CD 00 O0 CT3 00 O0 00 00 C» 0O 00 CjO Cj3 C_ 00 00 CO C_ C_ 00 00 00 00 CT0 00 00 00 00 00 CT0 CT3 00 00 C03 C_ 00 00 00 o CT3oCT3oCT3o0OoO0oCDoO0oO0o0OoOOoOOo0OoC00o0OoCJ0o0Oo00oθ0oC»oθ0oθOo0O
XXXXXXXXXXXXXXXXXXXXXX
NT Nτ i iO Nb io iO N fό iό i iό iό iό iύ i Nτ fό fό iό iό fό
O O O O O O O O O O O O O O O O O O O O O O 00 O 000 O O 00 O O 0 O 000 O 0000 O O O O O O O O O O O O O O O O O O O O O O ιn πι πι rπ rπ m m rπ m rπ πι m ιn m m ιn rπ m rπ m m m
0 OO 0O0 0Oθ O0O OO0 COD 0OO O0O CO» COT3 0OO OCD COD O00 COT0 OO0 OO3 COD Oθ3 COD3 OCT0 COχ>
— — — — — r _ M M M _ r ro r ro ro ro -' — — to to — — NO NO NT _ _
Oo COO ^ Oo OO OO Oo cjl NN rS ro l- Nn O Go NT NT NT N Oo Oi Ol — Ol O 00 45* NT O 4s. 45* 45* CO N0 _ _ _ _ _ _ _ OO OO sJ sj N^j rxj co oo o o _ £2 N3 „ N ∞ y ~ « N J > _ _ _ js, r - GO — Sj h o sI 03 0 s| J 4- O CD 45* O Go NT * s _ _ o O CO NO NT NT Ui n coo oo
^ „ K jl Jl (jl
O ~ O - ϋ-i O- O- O-i sl jfc. O Oi NNJ Ol GO Ol 00 45 oo sι sι si αι si j-, ro ji co O 45* _ Q
Ol 3,
O — NT NT - — NO — NT - — — — NT CΛ si o — — o oo co o — cji _ ω ro si Ϊ+ θl NO sl J_ - O S1 C0 NT O 00 —. O -2 Oi OO CO GO sI Cn O - OO O SI O Go sl TJ
CO m
4_ 4N fc* 4— 4N 4— 4N 4— 4— 4S. 4S. 4— 4— 4S. 4— ^. 45* 45* 4s. 45* 45* 4s. 4s. 4s. 45* 45* 45* 4s. 45* 45* 45* 4_ 4S. 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* _. ϋi Oi Oi Ol Oi ϋi ϋi ϋi Oi Oi ϋi ϋi ϋi ϋi ϋi Oi ϋi ϋi Oi ϋi Ol ϋi ϋi Oi ϋi ϋi ϋi ϋi ϋi Oi ϋi Cn ϋi Oi Oi ϋi ϋi Ol ϋl Ol oi ϋi cn cn cπ ϋi oi ϋi o
Z O
00 00 CD 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CD co 00 00 00 CD ro 00 Oo co 00 Oo 00 Oo 00 00 00 CTO 00 00 00 00 00 03 00 CTO
CD CO CD CD CD CD 00 CD CD CD CD 00 00 CD 00 00 CD CD 00 CD CD 00 CD CD 00 00 00 00 CJO 00 00 CD 00 00 00 CD 00 co CD 00 00 00 CO 00 CD 00 CD 00 CJO
0 h 0 O Ch Ch 0 O 0 0 O 0 0 0 0 0 0 0 O 0 0 0 0 Ch 0 0 0 0 0 0
CD CD CO CD 00 CD 00 CD 00 CD CD CO CD 00 00 co CD CD CD CD 00 00 CO 00 00 00 00 CD 00 00 00 00 00 00 CD CD 00 00 00 CD Oo 00 CD 00 CD 00 00 00 00 Φ1
NO NT NT NT NT NT NT NT NT NT NT N NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT T3J
C3 C 3 C T C 3 C ) < ) C ) CD C 3 C T C ) C 3 C T < ) C T < ) C T C ) C 3 C ) ( ) r ) C ) ( ) C T r ) C ) CD C T CD r T C T CD r T C. T CT C T C ) C T C ) CT r T C T C T C T C T C ) C ) C T
C3 C 3 ( ) C 3 C ) C ) C ) CJ C 3 C ) ( ) C 3 C J ( ) C T C ) C T C ) C 3 ( ) ( T C T ( ) C T < T c ) ( ) CD C T C)
( T C T ( T (T C T C T C T < T C ) C ) < ) CT r T C ) C T c 1 C T C ) C T C T —+-
CJ C 3 C ) C 3 C ) < ) C T C 3 C T C ) C 3 C ) < ) C T ( ) C ) < ) C 3 C ) C ) C T ( ) C T C T C T < ) CD I T CD ( T CD C T C T C T C T C T C T
CO ( ) C T r T
CO CO CO CΛ CI3 m rπ m CΛ CΛ CO CO CO CΛ CΛ CΛ CΛ CΛ CΛ CO CΛ CΛ CΛ CΛ fΛ f CΛ cn ΓΛ CΛ CΛ CΛ c ) < ) C ) C T C ) CT r T C ) m Λ II m m ro CO CΛ CΛ CO CΛ CΛ CΛ CΛ CΛ ΓΛ O rπ m rn m III m m m ΓΛ CΛ
ITI m m rn m ITI m 1 111 in m m m rπ m ro CΛ CΛ CΛ CΛ n __. in rn 1 π rπ m rn m in rπ 111 rn m m m m rπ m ITI m m
CJ TJ TJ TJ TJ TJ TJ U TJ TJ TJ TJ TJ Tl TJ J TJ TJ TJ T) TJ TJ Tl TJ TJ Tl π TJ TJ TJ Tl TJ Tl TT Tl TJ Tl TJ TJ Tl Tl T) TJ Tl TJ TJ TJ TJ CJ
CD 0 CJ CJ C 3 C 3 0 CJ C 3 CJ C J C T f T C 3 C J C 3 J C 3 C T 0 C T C T CD C T C 3
CD CO CD O CD 00 CD CT CD ) CD CD CT CT C T C T C T C T C ) CT 0 c ) ΓT C T C ) C T CT
CD CD CD CD CD CD CO 00 CO CD CD CD CD CD CD 03 CD CO 00 O CD 00 O 00 00 CJO 00 CD CD 00 00 CD CD CD CD JO CO CD 00 CO CD CD 00
_ _ _ _ _ NO — — — — — NT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ N0 J ro ro io ro ro ro ro — — OO OO OO CD OO NT - — O O O — 4s. 4s. 4s. 4S. Cπ ϋι O Cθ C0 G0 G0 C0 C0 G0 G0 G0 C0 C0 00 θ O O O 4S. 00 no NN
UO ^1 445-** 445—* 4 J-5** GU0J 44S—. 44S—. CO -NJ O
4_ 00 O Ul Cn θ O 4- O O 0ι 4N 4- fc* 4- 4N 00 s G0 CT3 si O Ol ϋl NO NO NO NT O 45* O Ol Ol 45* O r NT NT O CD O O O O — 00 O 00 CO J_ sl C0 — 4- O Cπ ϋl Cn Ol CO NO NT 4S. NT CD — CO 45* NT NT si ϋl O sl o _ o O O 0^0 O NT Oo Q Ol — CO O G0 CD S1 00 K 3-
rO NT NO NT NT NO - — — — — NT — — — — NT — NT — NT — — — — — — — — — NO NO NT NT NO NO NT — — — NT NT NT NT NO NO NO NT NT CΛ — ' — ' Uι CO O Sl o 4N 4_ GO Oι sl Nθ O O sl O sJ CO O sJ CD O s sJ CJl sJ 0l s J_ Nj _ NT O Nv! O C0 O si O .fc>. s 4_ sj sl — - NT ;→- 4s. OO NT O O O1 OO sj 4s. cn O1 — ' 00 O NO — ' CO 4S. NO NO O — ' 00 — sj sj oo O O O Oo O O — ' O O — ' — ' O 00 01 SJ 01 01 4— 0 4— O __ 00 O NO 4S. CD 4— Cπ O O Oi O CD CD IO O Go Oi NO sl sj sj sj SI — ' OO CD O sJ O CD CD OO Ui O O CD O O CD 45* O sj _ oo s! 0 — ' 00 TJ
co m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* JS. 4- J_ 45* 45* 45* 45* J5* 45* 45* _! j_ _. _ _ _! _ _ _. 4_ 4_ 4_ _! S K _ _! 4N 4N _! 4_ 4N 45* 4N 4s. 4N 4. 4N N m si sl sl si sj sj sj si si sj o o o oi oi oi cπ oi cn cn ϋi ϋi ϋi ϋi ϋi Oi ϋi ϋi Oi ϋi ϋi Oi ϋi Cn oi ϋi ϋi Oi Oi ϋi Cπ Oi Oi cn oi ϋi Cn ϋi Oi ^
O
sj sl o O Oi Ol Oi ϋi Oi jN. — rO NT NT NT - NT NT NO NT NT NT NT NT NT NT NT - — — — — NO NT NT — — — — NT NT - — — — — — CΛ S1 O O NT CJ1 4S. C0 NT O 4S. O SI — — O — — — rO NO NO NT - — — — O O O Cn ϋl 4S. GO NT NT NT ^N. 4N O O O OO OO ? — — O O CJ1 NT O S1 NT 4S. G0 SJ O — — CO O O O CO NT NT NO O CO CD O NT NT O O — NO — O Co O sJ -Nj 4N^ cn 45* O O Cθ 45* sl NT NO sj O OO O O - sl NT — N0 00 G0 00 O — O — — CJ1 0 3-
-" ^ NNj N^i N si i c o rn io i io i io io io io ro io ro ro NioO iNoT NroO rNoO iNO rNoT NO NO N NT NT NT - — — — rO NO NT NT NT NT NT NT CΛ
Co O g ^ ^ ^ ^ ^ g P^ g o g sj sj co Cn O ϋi O sJ O O O CCnπ CCo-. n. O^ 'CO-N -NON _ _ _, NN NNj NNj NNj (- , Nθ Ol Sl θ N0 4— — ' O CO ~ ' J-f σ O ^ M ∞ S S ro N3 « g θ g Cn cjl NT sj Nθ O Gθ Cπ - 0 0 4- 0 03 — o *j oN_. ,roNj O 4- O Ul O 0l O _. 4N — NT 4N O G0 4S. O CD P CJ C COύ Ns^j1 0_ co5 θθ lNOT NNTJ CCOo ssjl ccππ 44__ Ccoθ — NNTT SS1l 0O CCoo _. Ol CO N CO NO C NJ 0 " 0 " 4'- 4_ 4_ 03 03 θo TJ
4_ 4s. 4_ 4_ 4- 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4S. 4- 4S. 4S. 4S. 45* 45* 45* _. 45* 4S. 4S. 4S. 45* 45* 4S. 4S. 45* 45* 4S. 45* 45* 4S. 4s. 4- 45* 45* 45* 45* 45* 45* 45* 45* 45* _ sj si si sj si si si sj sj sl sj sj sj sl sj sj sj sj sl sj sl sj sl sl sj sl sj sl sj sj sl sj sj sl sj sl sl sl Sl Sl sl sl sl sj sl sj sj sj sj o
CoT3 CoT3 0oO CoD 0oO 0oO 0o0 0oO Oo0 0oO 0o0 0oO 0oθ Oo0 OoO Oo3 CoT3 0oO Oo0 0oO CoT3 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -^ po ps po po po po ps po po pa po po po rø iό ro i ro iό iό ro iό iό iό iό iό i iό iό iό iό iό iό iό NT
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmn cmji cmi cmi cmn cmi
TJ -D TJ -D TJ TJ TJ TJ -D TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T) TJ
O O O O O O O O O O O O O O O O O O O O O
CD 00 0θ 0O 0O 00 0θ 00 CO CD 0o 00 00 00 0θ 0θ 00 0O 00 0O 00
4_ 4_ 4- 4_ CJO GJ CJ CO CO C Co CjO C_ rθ Go Cθ NT NT NT NT rθ NO NT NT NO NO NT NT NT NT NT NO - — — ' Sl C» C» CT3 CT3 00 00 0o OO CD OO CD SI ^2 NO NO O O O O sJ O Ol CO NT NO O O O O O O O sJ sJ Cn Ol 4- 4_ 4N C CO — — — — 0 O 4- O Cl O C3 00 NN4 4N i 4N i 4s jN i T — O C rj O — S1 S1 N0 4S. O O O — O O ϋi OO — — O O Oo OO Cn sJ O CO CO — O O SJ 4S. GO CO SJ CJ1 - 4- tO O O Go αo Oι Ol Oι O Oι Cθ 4_ ^
O O si O O O OO Cn ϋl s sj sj oo sj sj -Nj 4N k oo -^I sl Ol 4s. Cn ui sl C0 0 4S. sI 4s. 4s. 4s. NT Go
- o -- 8; co — O N0 0 4S. Cn N0 4- Cn CO O O O O O O O
J_ CO NNJ 4*. o 00 03 NT OO O CD O NT OO SI NO OO — SI 00 4S. O Oo GO O NO - NT _ _ _ _ sj _O oO O — O — —O GCOo —O O — COo CJT O O O Oi NO O CD O O Oi OO O O Go TJ
CΛ m o fc. 4 —_ 4- 4s. 4s. 4- 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4s. 4.S. 45* 4 _s. 4_s. 4_- 4_N_ 4_ 4- 45* 4- 4N 4N _. 4N 4N 4- 4_ 4S. 4_ 4S. 4S. 4 _S-. 4_- 4_S. __-. 4- 4S. 4N 4N 4S. 4N sl sl sl sl sl sl sl sl sl sl sj sj sj sl sj sj sl sl sl sl sj NN sl sl sI sJ sl sl sI sl sl sl sj sl sl sl sl sl sj sl sl sl sj sl Nj Nj Nj Sl l o
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 po oo ps po pa c- po po po pα po m
NT fό fό iO f NT i i NT NT NT i i NT NT NT NT NT NT NT
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O →- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ cn cn cn cn cn cn cHO cn cΛ cn cn cn cn cn cn cn cn cn cn cn cn c i cn cn cn cn cn cn cn cn cn cn cn cn cn cn cϊ cn cn cΛ m T)mT)mT3 mT3mT3mTJmTDmTJmTJ TJ mTJmTJmTJmTJmTJmTJmTJmTJ mTJmTJmTJ mT mT)m-0 mTJmTJmT3 mT:mT3mTJ mTJmTJ TmJ mTJ mT) mTJmT!m^mmm mmmm m m m mπ
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
Ol NT NT NT NT NO rO - — — — — — — — — — — — — — — — — — . , 4s. 4_ ._. 4_ 4S. 45* —
O sl CTO NO NT NO - O Ol CO CO CO CO GO CO GO CO CO CO GO GO GO GO CO GO ^1 si sl sj O O CO GO GO NT NO NO O sl sj o O 00 sj NT sj g g o o 4- α. Q
— NT NT J— NT NT NT O NO CΠ NT NO - O — Cn — CO — ^ ^ O Jfc. 4S. NT N0 N0 4N NT N0 NT NT — — NO CO , , O _ sl O sj s sl Go sj sJ Cπ Ol CΛ
O O Co rθ NT JS. O NT 4s. O sl o rθ - NT GJ - — O rT^ N^ r^ j O CJl NO sJ O Oi ϋi NT sl Ol Co sj Js. - . 4N — sI NO CO
O NT Cn — ' si — ' O O SJ NO — ' Co oo cn _ O S! O N IN-' ^ UJ -p' — 4— O O — O O NT G0 J_ 00 O O sJ ^ N0 00 θ s-l _ r 0o0 _ _ r Ool r NoT 4 s.l s J_i ^ 0'
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 147 Ll:l 180418.1.2000SEP08 3489279H1 51 315 147 LI:1180418.1:2000SEP08 5340183T8 58 630 147 Ll:l 180418.1 :2000SEP08 484811OH1 65 154 147 U:l 180418.1.2000SEP08 853227H1 68 315 147 LI:1180418.1 :2000SEP08 71599552V1 779 1019 147 Ll.l 180418.1 :2000SEP08 71602555V1 788 1019 147 U:l 180418.1 :2000SEP08 71602383V1 830 1019 148 LG:232648.1:2000SEP08 3437327H1 126 312 148 LG:232648.1:2000SEP08 2471325H1 124 353 148 LG:232648.1:2000SEP08 1665878H1 119 357 148 LG:232648.1:2000SEP08 7679565H1 1 550 148 LG:232648.1:200aSEP08 60387681-11 105 319 148 LG:232648.1:2000SEP08 2625304H1 474 733 148 LG:232648.1:2000SEP08 6804355 8 379 543 148 LG:232648.1:2000SEP08 6804355F8 379 542 148 LG:232648.1:2000SEP08 6804355H1 379 543 148 LG:232648.1:2000SEP08 6804355J1 362 543 148 LG:232648.1:2000SEP08 3742927H1 126 415 148 LG:232648.1:2000SEP08 6038768F8 129 724 148 LG:232648.1:2000SEP08 2464965H1 130 363 148 LG:232648.1:2000SEP08 3856515F8 321 894 148 LG:232648.1:2000SEP08 g2556959 1815 2142 148 LG:232648.1:2000SEP08 484034R1 1768 2142 148 LG:232648.1:2000SEP08 484034T7 1770 2063 148 LG:232648.1:2000SEP08 g5447846 1781 2141 148 LG:232648.1:2000SEP08 7700719H1 1782 2134 148 LG:232648.1:2000SEP08 1619047H1 1765 1982 148 LG:232648.1:2000SEP08 6213262H1 1767 2049 148 LG:232ό48.1:2000SEP08 g33300όό 1769 2145 148 LG:232648.1:2000SEP08 7239656H1 1666 2125 148 LG:232648.1:2000SEP08 7056066H1 1674 2142 148 LG:232648.1:2000SEP08 484034F1 1766 2082 148 LG:232648.1:2000SEP08 6127977H1 1697 2152 148 LG:232648.1:2000SEP08 g4902141 1721 2130 148 LG:232648.1:2000SEP08 7450786T1 1733 2010 148 LG:232648.1:2000SEP08 4505914H1 1733 1992 148 LG:232648.1:2000SEP08 5903286H1 1733 2005 148 LG:232648.1:2000SEP08 4664636H1 1748 2000 148 LG:232648.1:2000SEP08 4664658H1 1748 2001 148 LG:232648.1:2000SEP08 g885174 1754 2162 148 LG:232648.1:2000SEP08 g272640ό 1753 2095 148 LG:232648.1:2000SEP08 3739527H1 1850 2141 148 LG:232648.1:2000SEP08 g4989863 1868 2142 148 LG:232648.1:2000SEP08 g2009754 1846 2089 148 LG;232648.1:2000SEP08 3739527F7 1849 2142 148 LG:232648.1:2000SEP08 6017442H1 1475 2066 148 LG:232648.1:2000SEP08 2951610H1 1825 2111 148 LG:232648.1:2000SEP08 632399R6 2015 2152 148 LG:232648.1:2000SEP08 632399T6 2015 2093 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 148 LG:232648.1 :2000SEP08 602162T6 1926 2102 148 LG:232648.1 :2000SEP08 602162R6 1926 2145 148 LG:232648.1 :2000SEP08 484034H1 1960 2142 148 LG:232648.1 :2000SEP08 7941209H1 1973 2145 148 LG:232648.1 :2000SEP08 g2224223 1885 2152 148 LG:232648.1 :2000SEP08 g4150209 2042 2148 148 LG:232648.1 :2000SEP08 632399H1 2015 2150 148 LG:232648.1 :2000SEP08 g2398138 2020 2145 148 LG:232648.1 :2000SEP08 g5672245 2029 2142 148 LG:232648.1 :2000SEP08 g3425208 2029 2153 148 LG:232648.1 :2000SEP08 6850278H1 1470 2022 148 LG:232648.1 :2000SEP08 3856515T8 1467 2042 148 LG:232648.1 :2000SEP08 6544359H1 1464 2015 148 LG:232648.1 :2000SEP08 586344H1 1536 1803 148 LG:232648.1 :2000SEP08 g896529 1540 1811 148 LG:232648.1 :2000SEP08 2740488H1 1505 1741 148 LG:232648.1 :2000SEP08 1292945H1 1482 1742 148 LG:232648.1 :2000SEP08 6038768T8 1488 2041 148 LG:232648.1 :2000SEP08 1292945F6 1482 1858 148 LG:232648.1 :2000SEP08 3397313H1 1452 1699 148 LG:232648.1 :2000SEP08 1665878T6 1656 2104 148 LG:232648.1 :2000SEP08 2781935H1 1656 1900 148 LG:232648.1 :2000SEP08 5094116H1 1578 1839 148 LG:232648.1 :2000SEP08 1292945T6 1642 2104 148 LG:232648.1 :2000SEP08 5307388H1 1542 1772 148 LG:232648.1 :2000SEP08 5306755H1 1542 1742 148 LG:232648.1 :2000SEP08 4857661HI 1547 1685 148 LG:232648.1 :2000SEP08 5161538H1 1551 1781 148 LG:232648.1 :2000SEP08 5306855H1 1542 1658 148 LG:232648.1 :2000SEP08 1932006F6 563 990 148 LG:232648.1 :2000SEP08 g774443 670 890 148 LG:232648.1 :2000SEP08 3083767H1 747 998 148 LG:232648.1 :2000SEP08 3616968H1 839 1082 148 LG:232648.1 :2000SEP08 522198H1 1013 1244 148 LG:232648.1 :2000SEP08 7679565J1 489 1100 148 LG:232648.1 :2000SEP08 963051 R2 521 825 148 LG:232648.1 :2000SEP08 963051 HI 521 857 148 LG:232648.1 :2000SEP08 7234232H1 550 955 148 LG:232648.1 ;2000SEP08 1932006H1 563 823 148 LG:232648.1 :2000SEP08 5902804H1 1419 1643 148 LG:232648.1 :2000SEP08 g1962798 1223 1382 148 LG:232648.1 :2000SEP08 6161987H1 1444 1946 148 LG:232648.1 :2000SEP08 820544H1 1222 1371 148 LG:232648.1 :2000SEP08 820544R1 1222 1771 148 LG:232648.1 :2000SEP08 6429960H1 1269 1595 148 LG;232648.1 ;2000SEP08 g1548524 1291 1786 148 LG:232648.1 :2000SEP08 6409782H1 1372 1651 148 LG:232648.1 :2000SEP08 1535447H1 1199 1405 148 LG:232648.1 :2000SEP08 6572961HI 1201 1706 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
148 LG:232648.1 2000SEP08 6077657H1 1211 1283
148 LG:232648.1 2000SEP08 7700719J1 1064 1673
148 LG:232648.1 2000SEP08 3948553H1 1072 1347
148 LG:232648.1 2000SEP08 6291826H1 1102 1240
148 LG:232648.1 2000SEP08 6149863H1 1119 1540
148 LG:232648.1 2000SEP08 g888070 1184 1444
149 LG:1078420.1:2000SEP08 3199584F6 5 508
149 LG:1078420.1:2000SEP08 2617918F6 120 367
149 LG:1078420.1:2000SEP08 3004192F6 5 578
149 LG:1078420.1:2000SEP08 2966808F6 5 585
149 LG:1078420.1:2000SEP08 3051156F6 5 593
149 LG:1078420.1:2000SEP08 4331527F6 1 586
149 LG:1078420.1:2000SEP08 4331527H1 1 140
149 LG:1078420.1:2000SEP08 2929058F6 5 594
149 LG:1078420.1:2000SEP08 2990863F6 5 528
149 LG:1078420.1;2000SEP08 2929058T6 522 972
149 LG:1078420.1:2000SEP08 3076784F6 5 555
149 LG:1078420.1 ;2000SEP08 g3428455 17 505
149 LG:1078420.1:2000SEP08 g3674918 17 479
149 LG:1078420.1:2000SEP08 3001577F6 5 585
150 LG:1397599.1:2000SEP08 g678960 481 669
150 LG:1397599.1:2000SEP08 7082955H1 559 696
150 LG:1397599.1:2000SEP08 4625011T8 42 628
150 LG:1397599.1:2000SEP08 2807456F6 1 506
150 LG:1397599.1:2000SEP08 4625011F9 1 506
150 LG:1397599.1:2000SEP08 7630459J1 1 474
150 LG:1397599.1:2000SEP08 6750082H1 202 309
150 LG:1397599.1:2000SEP08 2807456H1 1 247
150 LG:1397599.1:2000SEP08 2807456T6 120 669
150 LG:1397599.1:2000SEP08 7660359H1 71 660
151 LG:1397655.2:2000SEP08 587588R6 414 749
151 LG:1397655.2:2000SEP08 7382264H1 338 738
151 LG:1397655,2:2000SEP08 7237485H1 1 486
151 LG:1397655.2:2000SEP08 g2806538 579 969
151 LG:1397655.2:2000SEP08 g3237896 643 968
151 LG:1397655.2:2000SEP08 g4149271 782 968
151 LG:1397655.2:2000SEP08 7658736J1 338 819
151 LG:1397655.2:2000SEP08 2839106T6 356 937
151 LG:1397655.2:2000SEP08 587588T6 414 925
151 LG:1397655.2:2000SEP08 2406278H1 660 890
151 LG:1397655.2:2000SEP08 g3179231 631 977
151 LG:1397655.2:2000SEP08 g3110195 508 971
152 LG:241055.1 2000SEP08 2600825F6 12 305
152 LG:241055.1 2000SEP08 2662733H1 1 224
152 LG:241055.1 2000SEP08 5732381 F6 117 580
152 LG:241055.1 2000SEP08 2600825H1 9 89
152 LG:241055.1 2000SEP08 g2809760 227 688
152 LG:241055.1 2000SEP08 6137035H1 253 558
152 LG:241055.1 2000SEP08 4302282H1 255 537 CΛ m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
CJi Cn ϋι Oι Oι Oι ϋι ϋι Oι ϋι 0ι 0i 0ι Ui Oι ϋι Ui 0ι 0ι Oι Oι Oι Oι ϋι Oι Ui Cn 0i ϋι Oi ϋι ϋι Cπ 0ι Oi 0ι Oι 0ι Oι 0i 0i 0i ϋι 0l Cn 0ι ϋ^ CJO GO CO CO GO CO CO CO CJO GO CO GO CJO GO CJO CO CO CO CO CO GO CO CO CO CJJ GO CO CO CO GO NT NT NT NT W o
CO Gύ GO CO CO CO CO CO NO - — NO NT NT NT Ol 4N 4S. 4_ 4S. cn 4N cn ϋι θι ϋι ϋι 4N. 4s. s. ro oι 4N 4. * N co ro ^ F^ cn NT No - — — ro ro ro O 0l G0 G0 NT G0 O 00 JS. sJ O G0 — 10 10 0 0 00 4- 00 - sJ NT NT NO GO O Ol - N O C_ O C31 4_ - Ol O r-, 1i< OO NO O sl 4s. 45* NT O Ol Q O rO sl O ul C0 G0 03 NT O O sl 03 Cn _. NT 00 rO _. O - CO CO 4N C0 UI 4S. C0 N0 O - CD — sl _ 00 N0 _ ^ _ 0l O - NO — Ol — sj ^
O CN Nj _ _ ()i O> α >1 - o O l Nj Nj ) ffl ( ( ϋι C> Ji α) Nj \l C> N4 C>
— — NT CO OO CTo NT NT jfc. O 4S. G0 C0 01 NT O - j^ ϋl O O O O O O OO CJl OO Cπ T O NT ϋl NT O - rk nC ft vNl O Jl -" O Ol O> 0
J- J O O O NJ Nj O NJ M NJ O co -' ti O -' NJ O -' O -' θ rθ Oo rθ sJ Nθ rθ θ N^ O O ∞ J KD NNj O O g jι _ j . 4NN. _ _ Nθ -g
co m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _'_
Cjι 0ι 0ι 0ι 0ι cn ϋι ϋι ϋι 0ι 0ι 0ι 0ι 0ι ϋι 0ι cn θι 0ι 0ι 0ι 01 ϋι 0ι 0ι 0ι 0ι 0l ϋι 0l ϋι 0l 0ι 0l ϋι ϋι 0ι 0ι ϋι 0ι ϋι ϋι ϋι 0ι 0ι 0ι ϋ^ CJl Oι Ui Ui Cn θι Ui Ui Ui Oι Oι Ol Oι Oι Oι Oι Ul 45* 4N 4N 4N Cjθ Cθ Cθ Co Cjθ CO Cθ Cθ Go Cθ Gθ CO Cθ Gθ Go (jθ Co ω Go Co C^
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
CO CO CO O CO Co CO CO ro ca ro CO O ro CO ro CO
45* 4S. 45* OOOOOOOOOOOOOOOOΘOOOOOOOOOOOOOO
45* 45* 45* 45* 45* 45* 4N 45* 45* 4N 45* -fc. •fc. 45*
CD CD CD CD CO CD CD CD CD CD CD CD CO CD CD 00 CD o o o o O o o O cπ oi oi oi o o o o o o o o o o o o o o o o o o o o o o o g o o o o lO IO IO IO O O O O O O O O O O O O O O O O O O O O O O O O O O O O fϊ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ^
NT T _ -, X X Ui Ui pl Ui Oι Ui Ul Ol ι Ui Oι Oι Ui pι Ui pι Ui pι Ui Oι Ui Ui ϋi Ui Ol Uι Ui Ui 3
NT NT NT NT NT NT NT NT NT NT NT N NT NT NT NT
CT r T O C T O o CD
< ) r T r ) r T o CD < ) iό fό rό iό :τ rr' :τ :r rτ":τ :7' :τ :τ :r' : :τ' :τ :τ rτ'
O
CΛ ro CΛ CΛ ΓΛ CΛ ro CO CΛ CO CΛ cn CΛ CΛ ro CΛ CΛ rn m m ITI m m m m m m m rπ m m m O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ m m
TJ TJ TJ J TJ Tl TJ TJ TJ TJ TT TJ Tl TJ TJ TJ TJ GT CO CΛ CΛ O O O O O O O O O O O O O O O O O O O O O O O O O O O O — m m m m M ω w w cΛ n w w cΛ w ω w M W W w w ω w n i cΛ w tΛ w cn cΛ w π c > C T o TT Ti TO Ti m m m m m m m m m m m m m m m m m m m m m m rn m m m m m
CD 00 00 CD CD 00 CD oo 00 CD CD CD CD 00 CD oo 00 O O O O T) T3 Tl TJ TJ -O TJ T3 TJ -O TJ TJ T3 TI TJ TJ T) TJ TJ Tl TJ -TJ TI TJ TJ TI TI TI 03 03 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0O CT0 00 00 CD 00 0O CT3 CT3 CD CjO 00 CT3 CT3 00 00 0O 00 00 CT0 C» CT3 CT3 CT3 C» ∞
si sj o cπ 45* 45* 45* 45* CO _ _ Co 4N cπ cn cn oi cn O 00 45* O sl o Co 45* 45* __, _J CO 45* Co 45* G0 G0 G0 G0 — — cπ ϋi o o o oi --?
O — ' — s O — O O O O — ' sl 4S. NT — — — SI 4S. O O O Q
CO CO OO — ' SJ 0 0 4N C0 45* _ 45* 45* ■p' O 00 G0 4s. sI CD 45* O O sI O O NT NT Ol O O -fc. — 45* O O NT sI 45* Ol CO q_
O O ∞ 00 O O O O O Cπ <-0 C0 NT C0 45* NT — CJl ui NT 4N Ui Ui O O 0l Ul Ui O O 00 00 O O CD O Ui 45* N0 — GO Cπ O O OO sl sj vj CΛ CO G0 4S. N0 4S. 4S. 45* — O CJl sJ θo O Oo Cθ O Cπ θo sJ Oo θ sl O O NT 4N sl P 4s, co 0 4N - O O O O — 0 0 0 0 45* 0 0 0 45* 10 0 0 Oo oo O Nθ N. 0 0 — ' O Oi Oi Ol CJi Co sl o J- Cπ O O O Oi O Oi Go O sJ Oi rO — ' 45* Oι CO Ui 4s. — ' O N0 45* ol Oo O O sl C0 45* O O NO -π
GO m
_
Uι 0l Ui 0l Uι 0ι CJ1 0ι CJl Ui 0l CJl CJι Ui O 0l CJl 0l Uι CJl 0ι 0ι 0l CJl Ui 0l Ui 0l Ul Cn ϋι 0l 0ι Uι 0ι CJl Uι 0l 0ι 0l 0ι O O O O O O O O 00 00 00 00 CD 00 00 00 00 00 00 CD 00 CT3 CT3 00 00 00 00 00 00 CD 00 Cto 00 00 00 00 00 00 CT3 00 sl sj sJ O O O O O O z O
_ _ _ _ _ _ ' — — < OO OO OO OO O OO OO OO OO OO OO OO OO C03 CTO OO OO OO OO OO CT OO OO CTO OO OO OO OO OO OO C» OO NT NO N0 ^ 4^
O O O O O O O O Cn oi ϋi ϋi cji ϋi Oi oi Oi ϋi ϋi ϋi ϋi Oi ϋi ϋi Oi cn ϋi ϋi cn cn Oi cn oi Oi cn ϋi Oi ϋi ϋi ϋi — ■ — i — ' cn cπ cn cπ cπ oo sJ s sj sJ s sl sJ sJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O CO GO CO O O O O O O CO CO Cθ CO G0 CO (_ CO CT3 CD O0 C» CT3 00 00 00 CD 00 00 00 CD 00 00 CT3 ∞ 00 00 CT3 00 00 CT0 θ^ NT NT NT NT NT NT NT NT O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Ch O — ' — ' •— O O O O O — • rrC J 1 J J J l N| ^J χ x x χ χ x ^ ? > ^ ^? !>? ^9 ^? ^ NT Nτ iό iO Nτ iό ^ N iO N Nτ iO N^
NT rO NT NO NT NT NT NO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O — . O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O }-
O O O o o o o o w w M W M W w ω Λ w Λ ω Λ w ω w w w w ω w ω n Λ ω cΛ Λ Λ W Λ w ω w ω fji w ω ω ω w rπ m m m m iTi m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m τJ -tj -D T 'D -D -D -θ 'θ -D -o -o τ τ' τι "D 'D -o \ι τ 'D -o τ3 'D -D TJ -D 'D -α 'D -π τ) τ' 'D -D 'Η -D T -o τ τ π
TJ TJ TJ TJ TJ TJ TJ TJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
O O O O O O O O ∞ OO OO OO OO OO CD OO OO OO OO OO OO OO OO OO CTO OO OO OO OO OO CD OO OO OO OO OO CTO OO CD ∞ OO OO OO
CΛ m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
NT NT NT NT NT NT oOooOoOoOoOoOoOoϋoi Ool ϋoi Ooi ϋoi ϋoi ϋoi m NT NT o
GO 45* cπ _ _ cn o CO c COa <Q CQ CQ cn CΩ 45* NO NO ro CO > ro iiP 4s. Q O DJ
CO rn 00 CO GO Go o 4- c o CQ NT ca O co ca no O o o 45* 00 45* 45* CD CO 45* •o o o CD O Ol n CO Ch Ch O GO CO CO GO
45* 45* 45* 45* CO — CD io IN cm IO N GO c ) C ) CD o CJ CD C T rn NO 3 m
C T g CO
CN 45* TJ cπ o Cn s o O C T CD O GO s SJl ' O si o rn O 00 GO Ol O N) CO ro CO O NO 45* SI J Ol cn
45* 45* o IO g CΛ: 45* NT 45* cπ rn O O CJ ( T IO NT rn IO O NO c
O NO Cn 00 O CD C ) rn —n , 00 CTO o O O
CO 3 o r o cπ CTO Ch 45* cn rn 00 4 4N- sl Ch o CO o r T T X) X X n NT O 00 Cn O 45* 45*
X X NT c I υ! Ch o 00 CO NT NT en NJ rn IN Φ cn α X X X — ' CO o — 1 X X NO NO O NT P — ' — ' T X X X X TI si X O T, X TO TJ X X TO _ .
O O GO o' Ύ.
— ' — ' CD — ' — ' 00 o — ' o — — ' o 00 — — 3
N0 NT C0 C0 NT NT O Cn 00 00 sJ sl sl O O O O 45* 4N C0 G0 C0 NT — 00 00 SI NT NT NT 45* — CΛ NO NO — — O O O sl p p Go co — OO GO NT P 45* O O sJ sl 45* oo 0 Ol CO NO — 00 45* Oi Oi Cn 03 NT — 4S. NT O 45* 45* C O O CO NO sI o cn Oo — ' — ' Ol NO O O — ' O O N0 O 00 4S. 45* — ' CD 01 O 00 NT O 03 Ui 0 45* ϋl — 4N si si sl O NT O si -^ sl _"
3-
Ol Oi O O O Oi O sJ — O IO 0ι O O Oo Oi CJl — U1 NT NT G0 O O 4S. O G0 4S. CΛ
45* ^ Ol θN Q 4N P Ol g g g g g O O 00 S! O O CD 0ι 45* 45* C0 |O CD NT 45* Ol - O NT NT 00 O 4- SJ P SJ O O 0 O 45* 4S. O Go O Cθ sl θ sj sl oo ϋι 45* sJ co CD Gθ -— oo rr
SI 01 4S. O O SJ CO O NT Ol Ol Ol Ol 45* C _Jl
Ol 4— CD — sl ϋι O O sl NT NT Ul sl _ θl 01 Ol J NT O SI CD — 45* 00 — O SI OO O O SJ OO OJ -Q
CO m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o π ro io ro ro ro ro io NJ io io io fo io to No ro NT ro io ro to ro to io io io to ro ro No io ro N^
O
— I
Q Φ-t- —
NO — O sl 45* ϋl 45* Ol 0ι O 00 CJι O sl Co O O sl 0l 0l 0l 45* O1 4N 4N 00 4N <Jl ϋι NT N0 rS rCT O _ _ 00 O si NO n n C» 0 <) l o r; <T O j; θι j; ϋi -J O NT — 4- θ ω 45* O O O O sl ϋι 0 01 0 45* O Ol 45* NO sl θ NO Nθ Co Cθ O 00 O O O 0ι C0 4S. 4S. O C0 ϋι G0 NT Cn C0 sl N0 O - NT O NT CO O O Ch CO OO NT NT fe g o 00 O ^ Q ON -o oo -o oio oji S^ oco jii Sg scπi SS oj--oi
oooooooooo ooooooooooooooooooooo 4N ooooooooo 4oN 4o5* 4oNo45* 4oN 4o-o4N 4o- oooo sj sl sj sl sj sj sj sl sl sl sj sl sj sl sj sl sl sj sj sl sj sj sl sl sl sj sj → lo fό fό fό fό iό iό fό fό fύ fό iό fό fό iό iό fό fό i iό NT NT
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O -+ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ CJO CO CΛ CΛ CO CΛ CO CO CO CΛ CjO CΛ CΛ CΛ CΛ CΛ CO CΛ CJO CΛ CO CΛ CΛ CjO CΛ CjO CΛ rn in m in m in rπ rπ rπ rπ m rπ rπ πi m rπ rπ rπ rn rπ πi m rn rπ rπ m rπ rπ m rπ rπ m rπ m rπ m rπ m in rπ m rπ m m m rn m in m m
TT TJ TJ TJ TJ TJ TJ TJ -O TJ TJ TJ T' TJ TI TJ -O TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ -O TJ TJ TJ TJ T^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O P O O O
O3 00 00 00 0O CT0 00 00 00 CT3 00 C» 00 0O 00 00 CD 00 CO 00 CP 00 C» 00 00 00 00 00 CT3 CT3 0O CD 0O CT0
— O O O Nro0 tNOT NIOT NIOT j.J-T. — — _— _— cOrlι —_ ωNO Cjnη moo 4rss^ moo mCl __ __ _ ro NJ Cπ cn ui Oi cn oi — NT NO Ol — CΛ
O^ S CJI C^ & OI NT O '
O O O O O 0N O 01 S4 _ JS, -NNJ O 0I 45* _ CO 00 03 ^ — NT NO O SJ NO O - OI O 4- 4N NT CO NO P O sJ NT Ol Oi O - Ol o o 00 - 00 Ol NO NO o cn Q
NT & - o ro oι o 3 rπ oo Eo co ro o o oo cJ 3-
CΛ m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o r"J lO IO NO NO NO IO tO NT IO tO IO IO IO NO IO NO IO NT IO NO IO IO tO IO tO IO lO IO IO IO IO IO IO tO IO IO IO tO N^ o ι— I- X r— ι— I— I- X X r— !— r— X I— i— X J— i—
O O O O O O O O O o
1° o 4- 4s. s O O o o o O o O o o o o o o O O o o o o O o O O O CD O O O O O
4- 4- 4N o 4N 4N o 4- 4- 4N o O S 4_ 4N sj SJ SJ SJ I SJ Si SJ SJ Sl SI SI j sj SI sj sj si sj SI Sj j SJ sj sj SI SJ sj sj SJ I sj sj
4- 4s. 4N 4s. 4s. 45* 4N Φ
NO NO NT NT NT NT NO NO NT NT NT NT NT NO NT NO T NO NT NT NT NT NT ro ro o NT NT NT NT NO NT NO NT NT NT NO NO NO NT NO T —J . O O ( ) ( ) O O O O O O O O O O O ro ro N O O O O ( ) ro ro ro ro ro r o o O O O O < ) O O O O
( ) o o Q
O O O O O O ~ o σ o o α o o o o o o o o O O O O O O O O O O O < ) O O O O O o o o < ) o O O O CT _ O _ O O _ O _ φ
CΛ CΛ CΛ CΛ CΛ en en en en en en en en o CO o CΛ CΛ GO CO CO CO CΛ CΛ CΛ CΛ CΛ CO CO co co CO o CΛ co co en en en CO CO co co CΛ CΛ co en en en cn en m — ITI 111 m m m — m m m m m m m m III m m m m m m m m m — m — m m m m III m m — — m m —
TJ T TJJ IJ TJ TJ TJ DO TJ TJ Tl TJ TJ TJ TJ TJ T TJJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T TJJ TJ τι TJ D o o o CD CD o o CD O o o o o o CD o C oD O o O
00 Oo 00 00 00 00 oo 00 00 00 00 CD 00 o o o O o o o O o o
CD oo 00 00 00 00 CD 00 oo oo oo oo oo oo Co oo oo Oo oo oo oo o oo oo oo oo oo oo oo oo oo oo αo oo
sj si o O 4s. Co Co Co cn O Oi CO — 4s. Go CJl 4_ CO 45* CO ro ca O co o 45* Go sl sl o O > CD
SI o CJl NO Co CO O NO O CO O O Ol Ol Ol o o NO 4N CD o NT cn cn 00 Cn NT o O sl _ Ol CD h
Oi o 4S. Ol — o o - o 00 CO GO si 4N O 45* GO O 00 45* GO CD cn cn 45* O rn sl co N CO SJ o CO Ol CO Ol NT NO O O CO O Ol 00 h NT NT o CO T 00 NO O NO T O o o CJl O N O CO NO CO si 00 Nf) CO NO NT co Ch
SJ o Cn NT Ol O sl SI cπ CD GO 00 00 00 00 45* CN Co O N) SI
CO si N ._T _ NO ° GO GO sj Ol o NO 00 45* CO Sl 3 GO cn CN 00 O O
X < _ < _ X X X X X _ _ _ 00 X X _ < X ± Tl X CD NT 00 X X X X X X X GO — Φ
_ _ _ O _ = < < 3
. -. . . . . ,n- . r. , . ,-_ —" 01 01 01 0 0 CΛ . rθ sl sl 0ι G0 Cπ N0 N0 45* Oι O 45* NT Q Co 00 - ' — ' — ' 00 45* 01 0 — ' 00 O O _
CΛ m
_ _ _ _ _ _ _ _ o NOoNOoNOoNOoNOoNToNToNToNToNOoNOoNToNOoNOo O O O O O O O O O
NToNToNOoNOoNOoNTo NToNToNOoNToNOoNO NoOoNToNO NoToNT NoO NoT NO NT NT NT NT NT NT NO NT t oO No0 oN0 oN0 No0 foO No0 m O
m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ΓT NT NT NT NT NT N0 NT NT NT NT N N0 NT NT NT NO N0 NT NT NT NT NT NT NT NT N0 N0 N0 NT NT N0 NT N0 N0 NO NO NT NT NO NO N0 N0 NT N0 NO
O
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O O O O O O O O O O O O O O O O O O'O o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 4N 4N 4* 4N N 45* 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4_ 4N 4N 4N 4N 4N 4N 4N N 4N 4N SJ SJ SJ SJ J SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ J SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ SJ N^ ^ N^ N^ N^ N^ N^ N^ N^ N^ N^ N^ N^ — 1
_N 4_ 4N 4N 4- 4N 4N 4N 4N N 4N 4N - 4N 4 4N 4N 4N 4N 4N 4N 4N 4N 4N N iό iό iό ro iό ro i fό fό fό fό fό iό fό io fό iό fό iό fό io
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O — O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ ιn m ιn rπ rπ m ιn m ιn rπ m ιn rπ rπ πτ rπ m ιn m rπ πι m ιn m m ιn πτ rπ rπ πτ rπ πτ πι rπ m ιn m rπ rπ m ιn ιn rπ rπ rπ m ιn rπ πτ rτ
TJ TJ TJ TJ -O TJ TJ "O TJ T) T: TJ TJ TJ T: -O TJ TJ TJ T3 TJ TJ T3 TJ TJ T3 TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ T^
OOOOOOOOOOOOOOOOOOOOOOO O OOOOOOOOOOOO OO OOO O OOO OOO O
CT3 00 00 00 00 C» 00 00 00 00 00 00 00 00 00 00 00 CO CT3 CT0 CT3 CTO 00 00 00 CD CD 00 CT3 CT0 00 CXι α
CO CO ϋl ro — ro ro ro co ro ro io io io ro io ro io ro ro cn ro — — — NT NO Go Go CO 45* CO GO CO GO NO NO NT Ol — — Co GO GO O C »^1 CO CO NT O NT NT NT N NT NT NO NO — G0 4N O G0 — — — NT NT Ol Ol Ol — 45* 45* 00 45* O 0.0 Ol NT sl o o o o O Ol cn O 45* si CO O O CX> I0D CJ0 CT3 CT3 00 CD CD S| SJ SJ 4NN. 0I N0 — 4N NO N0 4N sj O O — 45* — — O Ol 4N O 0 —0 - O O Oι Oo j* o ^
SI CD O O 45* O CsJi co θo θo Q sl oo si 3-
SI 45* ∞ θ o o CN COO Orn Orn cl l CD
§ CJl « O Ol G0 _
o COoGOoCOoCJJoGOoGOoGOoGOoCOoCOoCJJoGOoCOoCOoGOoNToNOoNToNOoNOoNOoNToNOoNOoNOoNToNToNToNToNToNOoNToNToNToNOoNToNOoNO
00 00 03 00 00 03 00 00 00 00 03 00 00 03 00 — i — i — i _ . — i _ — . — f — ■ — i — ' — ' — ■ — ι _ι — . — ι _ι — ■ — < — i — i — i —i — . — i _ — . — i _ i — . _ _ i
— — — — — — — — — — —' — — — — oooooooooooooooooooooooooooooooooo C Cθ C Gθ OO C Gθ C C C Go ω θJ Go C J_ 4- 4N 4N 4N 4N 4N 4N 45* 4N 4N Js. 4s. 4N 4N 4N 4N J^
O O O O O O O O O O O O O O O — ' — ' — ' — ' — ' — — ' — ' — ' — — ' — ' — — ' — > — ' — ' — ' — — ' — ' — ' — ' — ' — — ' — ' — — ' — ' — ' — ' — ' — ' ,
M M M M M M M M M M M M M M M NJ NJ NJ NJ NJ NJ NJ NJ NJ ^ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ ^ ϊό fό fό fό iό i fό Nτ ϊό Nb Nτ ϊό iό fό Nτ X X :τ :τ :τ : :^
O O O O O O O O O O O O O O O NO NT NO NT NT NT NO NT NT NT NT NO NT NO NO NT NO NT NT NT NT NO NT NO NO NO NO NO NT NO NT NO NO NT — . O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O -t- CΛ CO CO CΛ CO CO CO CO CO CΛ CO CΛ CO CΛ CO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ m m m m m m m m m m m m cΛ w w w ω cΛ w ω cn w M W CΛ W c/J W W w w w ω co w M CΛ M M 'D TJ 'D TJ O 'D Ti -D 'D Ti -O 'D 'D m m m m m m m m m m m m m m m m m m m m m m m m m n
O O O O O O O O O O O O O O O TJ T) TJ TJ TJ TJ TJ T3 T) T3 TJ T3 T3 TJ TJ T3 TJ TJ TJ TJ TJ T: TJ TJ TJ TJ TJ -α TJ TJ T) T3 TJ TJ OO OO OO OO OO OO OO OO OO CD CD CD CD OO CD O O O O O O O O O O O O O O O O O O O O O O O P P P P O O O O O O O
CT3 00 0O CD O0 O0 0O 0O O0 00 0O 0O CT0 0O 0O CD O3 00 0O .O0 0O CD O0 0O CD 00 O3 00 0O 0O 0O CD 00 CD
ro ro ro ro — ro ro ro ro ro ro ro ro io ro — C .O Js. 4S. js. 4S. 4s. 4N 4s, cn,, ro io io ro ro . NO. .NO . .NO . T -H-
O sl O O O ϋi jN. 4N O CO O O O CO IO — — rO O NT NT NO NO NO - rO NO NT NO O Ul 4S. rO 4S. CO N0 4S. CO CO O 00 O C31 NT O O 45* rjι 45* O C 45* - — O O sJ ϋi sl sl CD Oo O ^l O Oo GO OO Oo O sl ϋl Cn ϋi CO cn Co ro Nr'T1 r" °_ Q O 3-
NO CO NT NT C CJO CO CO CO NO NT COO NT N0 0 4N CJ1 CT3 4N 4N CJ1 NO CJO CJ1 4N U1 01 01 01 C0 3-; ^ N rn fτ m ^ N)i Ui Cii - Cθ ϋι J-. Ui NT CO NO P OO O P — — — O sJ θ N^ Q N^ qo NT N0 4* — O ι_ C3N p N* o ω -* - _ js ^ _ Nn Cn CO G0 4S. 00 Ol 45* 4N CO
Cj0 C0 N0 03 0ι P 45* O C0 O O 4N sJ NT Cπ θ NT CT3 45* sj p 03 θ 45* sl o O sl Oi — p ζj o S ^ ^ ^1 0 ^ O 4 "5N* - — P *- r O«l §
_________________________________________________ oo o o oooo o ooo oo oo oo oo ooo oooo ooo oo o oo ooo ooo o ooo oo o or-j
4N CO G0 CO CJ3 G0 CO G0 CO CO Co CO G0 CO CO CO C CJ CO CO CO G0 G0 CO CO C0 C G0 CO Cj0 CO CO CO Cj0 CO CjJ G0 ω
O
> m cπ
^ - N{^T NO N NO _ - -' - - -' - - -' ro CO JN. j5> CO IO O CO to _ — rO CO CO NO NT NT O O — NT — — O — 00 O O 00 SI — 45* — O rO tO sl O sJ Oi o _ S^§Sα M NO _ NO — —I- o o O Q
3-
Ol 4-* rO NO NT NO NT NO NO N0 4S. Ol NO N0 4s. 4N NT NT 4N NO NT — Ol 4N O O O Oi 4-* O Oi NT 4N NT 4-* NT 4N sl Ol NO CO - NT NT NT NT NT CΛ 4— O 03 SI — ' 0l O O O 0l NT C0 4S. ϋι O SI N0 4N O CD sl 00 O 45* rs — NNj Ch CO — G0 NT — — ' G0 0 4S. C0 4N CO NT O O C0 4N 4N O IO OO — ■ 00 O O ■- 0 00 — ' OθN USlj C-O sCOI GO0ι 4N5*T O sJ ' —.fc. ' OO OO sCπJ NθOO NCOD NSTJ -gO
o CDoOooooooooOooOooOooOOoOOoOooOoosloNNjosIoslosloslosJoOoOoOoOoOoCnouioOio45*o45*o45*o45*o45*o4No4No45*o4No4No45* oJ5*o45*o4No4s.o45*o45*o45*o45*o45*o4N 4oNo45*
Z o O O O O O O O O O O .fc. 4N 4N , , , , sl Sl NT NT NO co co ro ro O co ro ro co CO Cύ ro co CO NT NT co co co O ro C si Sl sj sl si CN Ch Ch o o o N) o o o o o o Ch Ch Ch h Ch Ch o Ch Ch CO cn cn cπ rπ cπ cn cn cn cn cn cπ en cn cn cπ Ol cn cn cn CJl cn CJl cπ Ul cn CJl cn cn
NO NO NT NT NT NT NT ro CD (a 4- 45* J—
O O o o o o o o o o co co GO — < sl NT ro ro ro
CJl Ol cn cπ cn Ol cn cn Ol Ol cn φ
— ' — ' — ' NT — ''' — ' — ' NT ro ro 3
( ) r ) ( ) < ) ( ) < ) ( ) ( ) TJ o o o o o o
ΓT ( ) < ) ( ) ( ) o o <)
( ) n ΓT n cn CΛ ( ) ( ) ( ) ( ) CΛ co co co co co co en co Cl) cn cn cn cn ro ro cn cn ro in m m m cn m m m m III m rπ m m i π m m rn in m rn m m rπ rπ m m m m rπ m m m m m u Tl u Tl TJ m m rn m m u π u u Tl u u CJ π u u u π π
TJ TJ Tl TJ r T u u u u o CD o o
O O O CD ΓT ΓT o CD CO 00 CD
00 00 CD 00 co CD CD
— — — O O O 45* 4N _, - co ro -J — 45* T co — ω ro IO rn — ' — '
GO CO _ SI 4S. N0 N0 S1 _ _ 00 !_ 00 O ^ N oo — r —o s —i o oo — cn oo o _ O o cn sl ϋl
C0 CO CD — O 4S. N0 O O O O Ol 01 45* " O O O O O Ol — o- cn o
CD CD Ol Gθ 45* NO O sJ 4N s! (s sl — CO ϋi O O cn. 00 NT 45* 45* Ol 4N 4N N0 4N 4N CJl Oι Oι Oι Oι Oι Ol Oι Ol Ui Ui Oι Ol Uι Ol Ui Oι Ui Ol Ui --|. O O O GO O sl p cn O O _ 45* 45* O sj _ 45* OO SJ O OO — Gu G0 G0 CO 4N 4N 45* 4s. 45* 45* 4N 45* 4-* 45* 45* 45* 4N 4N 45* CD 00 C0 CD O 1O U1 SJ 01 4— V tjj CO NO O Oi O O sl i sj si 45* 45* NT OO Oo O sl oo O sl sl sl sJ O O O O O O O - — NT NT NT Co ϋi ϋi -g
co rπ sl sl sj sl sl sl sl sl sl sJ sJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O r-J
O O O O O O O O O O O O O O O O CT3 C30 03 CD 03 03 CD CT3 03 03 CT3 CT3 CT3 CT3 CT3 CT3 CD CD 03 00 rø o cϊΦ
NT CO O rO Oi O OI O f NO Φ to v ϋ -l o X TJ O NO p o Λ O φ -" O θ m _
OO TI O OO
to ,— . αo oo sj cπ cn cπ s. -0. 0— — ,_ _ _ _ _ Co IO 00 si sj O O O O
NO V^ sl O NO CO NO NT Cπ o 0 0 45* O O 4N 4N
— O Oo j i O NT sl sl IN sJ — O sJ Ol O O O O o o o
_ CD O O CO GO GO G0 4N 4S. U1 4S. -— — ' — — • f ' i Ol CO OO OO sI cn 0 00 00 U0 O CO GO NO CD oo si js. cn oi oi o Oo sl oi 45* O NT 00 0 0 45* Cπ OO OO sl Cθ Q
NO O O 00 ~sl NO O Co rO O O .fc. 4s. sl 45* — 0 0 01 00 3-
--i- O TJ
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 171 L1:332655.2:2000SEP08 70029991 Dl 243 727 171 LI:332655.2:2000SEP08 70094732V1 265 796 171 U:332655.2:2000SEP08 3096881 HI 265 401 171 LI:332655.2:2000SEP08 70032023D1 276 777 171 LI:332655.2:2000SEP08 3098886H1 275 541 171 U:332655.2:2000SEP08 70030900D1 283 725 171 U:332655.2:2000SEP08 70099139V1 312 844 171 LI:332655.2:2000SEP08 7600091 HI 314 782 171 LI:332ό55.2:20uuSEP08 70033443D1 323 796 171 LI:332655.2:2000SEP08 70032717D1 330 860 171 LI:332655.2:2000SEP08 70094884V1 349 750 171 LI:332655.2:2000SEP08 70097593V1 374 841 171 LI:332655.2:2000SEP08 70100180V1 390 802 171 LI:332655.2:2000SEP08 70031426D1 399 950 171 U:332655.2:2000SEP08 70031765D1 400 958 171 LI:332655.2:2000SEP08 70032527D1 401 881 171 U:332655.2:2000SEP08 70094406V1 433 854 171 LI:332655.2:2000SEP08 g1959326 479 914 171 LI:332655.2:2000SEP08 70030000D1 484 976 171 U:332655.2:2000SEP08 5766510T8 484 616 171 LI:332655.2:2000SEP08 5070935H1 498 757 171 L1:332655.2:2000SEP08 70095137V1 506 919 171 LI:332655.2:2000SEP08 70094797V1 512 951 171 LI:332655.2:2000SEP08 70034268D1 540 1093 171 LI:332655.2:2000SEP08 70033906D1 568 1184 171 L1:332655.2:2000SEP08 70094604V1 589 1099 ~ 171 LI:332655.2:2000SEP08 70099025V1 588 1066 171 LI:332655.2:2000SEP08 570522H1 606 876 171 LI:332655,2:2000SEP08 70096090V1 633 1180 171 LI:332655.2:2000SEP08 70031980D1 636 1144 171 U:332655.2:2000SEP08 3872248T6 663 1142 171 U:332655.2:2000SEP08 70030235D1 675 1092 171 LI:332655.2:2000SEP08 70034276D1 700 1144 171 U:332655.2:2000SEP08 2850393T6 700 1142 171 L1:332655.2:2000SEP08 70033742D1 709 1144 171 LI:332655.2:2000SEP08 g2835010 716 1069 171 LI:332655.2:2000SEP08 70100118V1 734 1144 171 LI:332655.2:2000SEP08 70099040V1 737 1157 171 LI:332655.2:2000SEP08 70094459V1 739 1144 171 U:332655.2:2000SEP08 70095263V1 753 1144 171 LI:332655.2:2000SEP08 70097249V1 763 1102 171 LI:332655.2:2000SEP08 70097115V1 764 1181 171 LI:332655.2:2000SEP08 70096990V1 782 1144 171 U:332655.2:2000SEP08 70098090V1 832 1144 171 U:332655.2:2000SEP08 g4763947 847 1144 171 LI:332655.2:2000SEP08 70095213V1 901 1370 171 Li:332655.2:2000SEP08 70097499V1 1 333 171 U:332655.2:2000SEP08 70096744V1 1 351 171 U:332655.2:2000SEP08 70095985V1 1 353 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 171 U:332655.2:2000SEP08 70099142V1 1 440 171 LI:332655.2:2000SEP08 2850393F6 1 499 171 l_l:332655.2:2000SEP08 70099114V1 1 442 171 U:332655.2:2000SEP08 2850393H1 2 289 171 LI:332655.2:2000SEP08 861289H1 3 241 171 LI:332655.2:2000SEP08 70097343V1 3 476 171 U:332655.2:2000SEP08 70098803V1 3 563 171 U:332655.2:2000SEP08 2287640 6 5 485 171 LI:332655.2:2000SEP08 2287640H1 5 255 171 L1:332655.2:2000SEP08 3872248H1 11 301 171 LI:332655.2:2000SEP08 70032638D1 11 547 171 LI:332655.2:2000SEP08 3872248F6 11 517 171 LI:332655.2:2000SEP08 70095217V1 19 467 171 LI:332655.2:2000SEP08 70096220V1 44 487 171 LI:332655.2:2000SEP08 70032006D1 67 512 171 LI:332655.2:2000SEP08 70096911VI 82 538 171 LI:332655.2:2000SEP08 70032045D1 107 611 171 U:332655.2:2000SEP08 70031433D1 110 658 171 LI:332655.2:2000SEP08 3403385H1 114 366 171 LI:332655.2:2000SEP08 1821848H1 133 370 171 U:332655.2:2000SEP08 3531915H1 153 445 171 LI:332655.2:2000SEP08 2059050H1 160 405 171 LI:332655.2:2000SEP08 5742858H1 164 474 171 LI:332655.2:2000SEP08 2733553H1 185 415 171 U:332655.2:2000SEP08 5025820H1 188 470 171 LI:332655.2:2000SEP08 70099520V1 192 521 171 U:332655.2:2000SEP08 g1947735 206 551 171 U:332655.2:2000SEP08 70095482V1 907 1144 171 LI:332655.2:2000SEP08 70095836V1 915 1144 171 LI:332655.2:2000SEP08 70099681VI 928 1144 171 U:332655.2:2000SEP08 g4986367 978 1144 171 LI:332655.2:2000SEP08 g5661482 980 1142 171 LI:332655.2:2000SEP08 516193H1 1027 1136 172 LI:1 184621.4:2000SEP08 8014017H1 295 895 172 Ll:l l 84621.4:2000SEP08 7746161 HI 484 1101 172 LI:1 184621.4:2000SEP08 3355568H1 1 280 172 LI:1 184621.4:2000SEP08 5767076H1 78 608 172 LI:1 184621 .4:2000SEP08 3355568F6 1 342 173 LI:2051386.1 :2000SEP08 gl988131 86 329 173 U:2051386.1 :2000SEP08 8038751 HI 1 616 173 LI:2051386.1 :2000SEP08 4992337H1 204 321 173 LI:2051386.1 :2000SEP08 3433251 HI 203 459 173 L1:2051386.1 :2000SEP08 6893208J1 349 658 173 LI:2051386.1 :2000SEP08 5960485H1 386 793 173 LI:2051386.1 :2000SEP08 g1383226 514 950 173 U:2051386.1 :2000SEP08 761331 Ul 624 1207 173 LI:2051386.1 :2000SEP08 g1383397 827 1165 174 LG:362757.1 :2000SEP08 4084603H1 1 242 174 LG:362757.1 :2000SEP08 6273740F8 1 673 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
174 LG;362757.1:2000SEP08 6274096H1 1 473
174 LG:362757.1:2000SEP08 6274096T8 28 674
175 LG:406770.1:2000SEP08 g4983713 1990 2080
175 LG:406770.1:2000SEP08 7010884H1 1171 1471
175 LG:406770.1:2000SEP08 938514R6 1 258
175 LG:406770.1:2000SEP08 938514H1 1 320
175 LG:406770.1:2000SEP08 938514R1 1 498
175 LG:406770.1:2000SEP08 4176087H1 110 414
175 LG:406770.1:2000SEP08 g1058461 299 571
175 LG:406770.1:2000SEP08 g1058474 300 538
175 LG:406770.1:2000SEP08 g2278147 1750 2097
175 LG:406770.1:2000SEP08 g!687124 1945 2078
175 LG:406770.1;2000SEP08 g1687230 1969 2078
175 LG:406770.1:2000SEP08 gl 194732 1806 2085
175 LG:406770.1:2000SEP08 g3721294 1871 2082
175 LG:406770.1:2000SEP08 630365H1 1898 2078
175 LG:406770.1:2000SEP08 g1058378 1091 1471
175 LG:406770.1:2000SEP08 g1229629 1109 1482
175 LG:406770.1:2000SEP08 4981119H1 1003 1266
175 LG:406770.1;2000SEP08 4981119F6 1003 1568
175 LG:406770.1:2000SEP08 g6025310 1012 1474
175 LG:406770.1:2000SEP08 g1058364 1090 1472
175 LG:406770.1:2000SEP08 235785R6 712 917
175 LG:406770.1:2000SEP08 235785H1 712 915
175 LG:406770.1;2000SEP08 584125H1 799 1085
175 LG:406770.1:2000SEP08 7216683H1 835 1366
175 LG:406770.1:2000SEP08 284730H1 843 1046
175 LG:406770.1:2000SEP08 g3148103 1110 1473
175 LG:406770.1:2000SEP08 . 1886580H1 1193 1473
175 LG:406770.1:2000SEP08 g1225171 1373 1473
175 LG:406770.1:2000SEP08 6534885H1 1412 1949
175 LG:406770.1:2000SEP08 4050307T6 1469 2042
175 LG:406770.1:2000SEP08 g1687281 1482 1975
175 LG:406770.1:2000SEP08 gl687155 1482 1975
175 LG:406770.1:2000SEP08 g4896587 1600 2061
175 LG:406770.1:2000SEP08 235785T6 1627 2044
175 LG:406770.1:2000SEP08 g2277685 1723 2083
175 LG:406770.1:2000SEP08 g1272076 613 984
175 LG:406770.1:2000SEP08 7610621 HI 397 941
175 LG:406770.1:2000SEP08 4050307F6 416 892
175 LG:406770.1:2000SEP08 4050307H1 416 711
175 LG:406770.1:2000SEP08 541 36H1 451 678
176 LG:1094640.1:2000SEP08 4632658F6 1 436
176 LG:1094640.1:2000SEP08 4632658T6 254 704
176 LG:1094640.1:2000SEP08 gl 784417 467 723
176 LG;1094640.1 :2000SEP08 3011594H1 547 804
176 LG:1094640.1:2000SEP08 3011594F6 547 814
177 LG:001929.1:2000SEP08 7602642J1 504 1104
177 LG:001929.1 :2000SEP08 g6570839 604 1029 sJ sJ sJ si sl sj sI sl sl sj sI sl sl sJ sj sJ sJ sl sJ sJ sl sJ sJ sj OO OO Oo Oo OO OO OO OO OO OO OO OO OO OO Oo OO OO OO OO OO OO sl sl sJ si
SSi ililSili i i i i i i ilili i ili il ooooooooooooooooooooooooΘOO
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O r ^ N , 7, , ^ r, r, 7, r, N rN , N ^; 7 r, j o o o o o o o o o o o o o o o o o o o o o o o o o o o
NT N0 N0 NT N0 NT NT NT N0 N0 NT NT NT NT N0 N0 N0 N0 N0 NT N0 θ O O O O O O O O O O O O O O O O O O O O O O O O O O _,
NT rθ NT r NO NO NT NT NT NO NT NT NO NT NO NT NT NT KO NT Nθ NO N rθ NO NT NO N NT NT NT NO NO N N NO NT NO N^
' . ' . ' . ' . ' . ' . ' . - . . ' . - . ' . ' . ' . . ' . . . ' . . ' . Nn Nn n ^ ^ n N Nn Nn n n Nn Nn Nn n Nn
NgO NoO NT NO ro rO N NO NO NO Nό NO NT NT NT NT NT NO iθ oogggggoopggogoggggoo oooooogoooooσoooooooooooooooooooooσoooooooooooooooooooooooooooooooooooooooooooooooooooooooQ→-
LTj ITi m m m nT m m m m m im m m m nT m m
23.22222 322223322232222 m πι ιπ m m m r^ g g g o o g o o o o o g o g g o o o o o o -D τj τj i 'D TJ 'rj 'D -0 'θ 'D 'θ 'D -D 'rj 'o -D Tj τj 'ti 'D 'D τ O -o 'D 'D O
C» OO CT3 CX> OO CD C>3 00 00 00 00 CT3 00 CT3 CT3 00 CD OO C03 00 CT3 0 0 0 P P P P P P O O O O O O O O O O O O O O O O O O O
0O 0O CP CT3 00 0O 00 0O 0O O0 0O 0O 00 0O 00 00 0O 0O CT3 CT3 CT3 0O CD CD 0O 00 CO 00
CO CO Co Go rO NO NT NT — — — — — — rs _ _ _ ,-, O si si -. sl Js. N0 — G CO N — — f. CO NO — O O O O NT NT N0 N0 rN g ^ -^ fr, j. J O — S OJl J — W 4S. N OOl O NOT —O rr sOl NSOl NNj ^ ^CJ ^O ^O -.
00 sl N 4S o — N NT NO NT -c O Oo sj c> Oo ϋι O Cπ Co NT 4N θ θ cn N go 4N O Q
00 Ul sj CD O 4S. NT si si -i
O sJ Oi Oo Cn Co 4S. Cn 4N Co j - J-* JN Go N0 4N — ■ Go CO — < 4s. cn N0 Cπ rθ Nτ θ N0 03 si sι sj cD Cπ sj cD — NT GO ϋl O — ' O OO O O OO SI O O O SI CO U1 4S. O CO NO CD co o o cn 4s. s. — ' Co o oo oo oo co o si — ' o cn O O sl Oi O sl NO Oi — 4S. C0 00 4N 0ι Cπ θ O sJ θ O 0l NT O SJ _ G0 JS. - JS. O NT N0 NT NT NT SJ G0 U1 S1
TABLE 5
SEQ ID NO: : Template ID Component ID Start Stop
178 Ll:401322.1 :2000SEP08 70499609V1 394 532
178 U:401322.1:2000SEP08 70495192V1 467 817
178 U;401322.1:2000SEP08 g3202357 462 808
178 U:401322.1:2000SEP08 748032H1 581 800
179 U:208748.1:2000SEP08 g5849939 2047 2505
179 LI:208748.1:2000SEP08 g5886544 2052 2505
179 LI:208748.1:2000SEP08 g7316701 2056 2509
179 U:208748.1:2000SEP08 5547767T8 2062 2494
179 LI:208748.1:2000SEP08 g3897094 2067 2506
179 LI:208748.1:2000SEP08 g5589572 2075 2507
179 U:208748.1:2000SEP08 7363924H1 1 421
179 LI:208748.1:2000SEP08 7216253H1 16 372
179 LI;208748.1:2000SEP08 2997925H1 152 412
179 LI:208748.1:2000SEP08 7216779H1 182 673
179 LI:208748.1:2000SEP08 4029524H1 317 583
179 LI:208748.1:2000SEP08 7677411 HI 394 749
179 LI:208748.1:2000SEP08 70880160V1 433 910
179 U:208748.1:2000SEP08 6976066H1 615 1197
179 U:208748.1;2000SEP08 70881073V1 616 1048
179 LI:208748.1:2000SEP08 5589484F6 679 1082
179 LI:208748.1:2000SEP08 5589484H1 679 878
179 LI:208748.1:2000SEP08 5547767F8 699 1119
179 LI:208748.1:2000SEP08 5547767H1 699 903
179 U:208748.1:2000SEP08 7690958J1 711 1197
179 LI:208748.1;2000SEP08 70879145V1 870 1332
179 U:208748.1:2000SEP08 70882772V1 1028 1600
179 LI:208748.1:2000SEP08 70879812V1 1037 1688
179 LI:208748.1:2000SEP08 2289869H1 1045 1287
179 LI:208748.1:2000SEP08 3572255F6 1066 1491
179 U:208748.1:2000SEP08 3572255H1 1066 1372
179 LI:208748.1:2000SEP08 g572859 1099 1487
179 LI:208748,1:2000SEP08 2866482H1 1162 1459
179 LI:208748.1;2000SEP08 7409011 HI 1201 1820
179 LI:208748.1:2000SEP08 70882160V1 1231 1870
179 LI:208748.1:2000SEP08 7207904H1 1303 1812
179 LI:208748.1:2000SEP08 4779792H1 1332 1597
179 U:208748.1:2000SEP08 2314765R6 1351 1756
179 U;208748.1:2000SEP08 2314765H1 1351 1588
179 LI:208748.1:2000SEP08 70882288V1 1361 1839
179 LI:208748.1:2000SEP08 1573702H1 1390 1615
179 U:208748.1:2000SEP08 1217087H1 1398 1631
179 LI:208748.1:2000SEP08 2741444F6 1408 1746
179 LI:208748.1:2000SEP08 2741444H1 1408 1662
179 U:208748.1:2000SEP08 70879122V1 1458 2010
179 LI;208748.1;2000SEP08 6123696H1 1464 1941
179 U:208748.1:2000SEP08 6127196H1 1464 1926
179 LI:208748.1:2000SEP08 3448750H1 1465 1730
179 LI:208748.1:2000SEP08 4209522H1 1473 1759
179 U:208748.1:2000SEP08 767485H1 1505 1746 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
179 LI:208748.1 :2000SEP08 6288827H1 1530 1812
179 LI:208748.1 ;2000SEP08 7240771 HI 1537 2100
179 LI:208748.1 :2000SEP08 4406924H1 1541 1808
179 LI:208748.1 :2000SEP08 3019766H1 1584 1862
179 U:208748.1 :2000SEP08 4942607H1 1618 1904
179 U:208748.1 :2000SEP08 2642519F6 1651 2088
179 U:208748.1 :2000SEP08 2642519H1 1651 1884
179 LI:208748.1 :2000SEP08 252812H1 1671 1973
179 LI:208748.1 :2000SEP08 70881747V1 1719 1901
179 U:208748.1 :2000SEP08 70882562V1 1741 2165
179 LI:208748.1 :2000SEP08 5189967H1 1755 2027
179 LI:208748.1 :2000SEP08 7933991 HI 1782 2352
179 U:208748.1 :2000SEP08 70882068V1 1815 2267
179 LI:208748.1 :2000SEP08 2314765T6 1846 2467
179 U:208748.1 :2000SEP08 999167H1 1852 2104
179 U;208748.1 :2000SEP08 4029524T8 1863 2407
179 LI:208748.1 :2000SEP08 g6697993 1922 2505 .
179 U:208748.1 :2000SEP08 2794808F6 1931 2431
179 U:208748.1 :2000SEP08 2794808H1 1931 2193
179 LI:208748.1 :2000SEP08 2741444T6 1939 2462
179 U:208748.1 :2000SEP08 g5913894 1947 2333
179 LI:208748.1 :2000SEP08 70881 171 VI 1968 2507
179 U:208748.1 :2000SEP08 70882247V1 1987 2503
179 U;208748.1 :2000SEP08 g5396272 2002 2469
179 LI:208748.1 :2000SEP08 g3674187 2005 2466
179 LI:208748.1 :2000SEP08 6855636H1 2019 2522
179 LI:208748.1 :2000SEP08 g5636152 2028 2507
179 LI:208748.1 :2000SEP08 2794808T6 2032 2468
179 LI:208748.1 :2000SEP08 g5636140 2035 2507
179 LI:208748.1 :2000SEP08 3315324H2 2039 2284
179 U:208748.1 :2000SEP08 g2563964 2044 2507
179 LI:208748.1 :2000SEP08 g5888147 2080 2505
179 LI:208748.1 :2000SEP08 g38997όl 2091 2507
179 LI:208748.1 :2000SEP08 6912743J1 2094 2492
179 LI:208748.1 :2000SEP08 g2458063 2103 2510
179 LI:208748.1 :2000SEP08 gl 140592 2103 2501
179 LI:208748.1 :2000SEP08 g3932768 2171 2506
179 LI. '208748.1 :2000SEP08 g4739395 2179 2501
179 LI:208748.1 :2000SEP08 544703H1 2189 2418
179 LI:208748.1 :2000SEP08 g2805935 2196 2333
179 LI:208748.1 :2000SEP08 g565556 2208 2506
179 LI:208748.1 :2000SEP08 3572255T6 2275 2467
179 U:208748.1 :2000SEP08 g2432032 2342 2507
180 LI:407242.1 :2000SEP08 1271760H1 1 237
180 LI:407242.1 :2000SEP08 7292121 HI 38 497
180 LI:407242.1 :2000SEP08 3237762H1 104 309
180 U:407242.1 :2000SEP08 2862509H1 151 429
180 LI:407242.1 :2000SEP08 2862509F6 151 580
180 LI:407242.1 :2000SEP08 5457948H1 217 480 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
80 LI:407242.1 :2000SEP08 065132H1 523 692
80 U:407242.1 :2000SEP08 6254643H1 532 776
80 LI:407242.1 :2000SEP08 6987868H1 549 1091
80 LI:407242.1 :2000SEP08 6984055H1 549 1016
80 LI:407242.1 :2000SEP08 5089291 HI 680 943
80 LI:407242.1 ;2000SEP08 3232664H2 765 1004
80 LI:407242.1 :2000SEP08 g2001171 787 1056
80 LI:407242.1 :2000SEP08 4285194H1 873 1154
80 LI:407242.1 :2000SEP08 7180376H1 951 1496
80 LI:407242.1 :2000SEP08 482811 HI 979 1187
80 LI:407242.1 :2000SEP08 483426H1 979 1208
80 LI:407242.1 :2000SEP08 482811 R6 979 1466
80 LI:407242.1 ;2000SEP08 g760448 1499 1738
80 LI:407242.1 :2000SEP08 7158891 HI 1631 2045
80 LI:407242.1 :2000SEP08 5260775H1 1659 1901
80 U:407242.1 :2000SEP08 g4902429 1757 1897
80 LI:407242.1 :2000SEP08 7206772H1 1880 2043
80 LI:407242.1 :2000SEP08 894787H1 1914 2027
80 LI:407242.1 :2000SEP08 2862509T6 1164 1533
80 LI;407242.1 :2000SEP08 6771708H1 1198 1755
80 LI:407242.1 :2000SEP08 g4525016 1223 1533
80 LI:407242.1 :2000SEP08 g4395234 1229 1533
80 LI:407242.1 :2000SEP08 4041841 HI 1287 1561
80 LI:407242.1 :2000SEP08 743874H1 1330 1577
80 LI:407242.1 :2000SEP08 3355268H1 1342 1537
80 LI:407242.1 :2000SEP08 2725709H1 1387 1635
80 L1:407242.1 :2000SEP08 2725709F6 1387 1745
80 U:407242.1 :2000SEP08 1952718R6 1932 2027
80 LI:407242.1 :2000SEP08 1952718H1 1932 2027
81 LI:403409.1 :2000SEP08 71599134V1 772 1403
8 U:403409.1 ;2000SEP08 70522973V1 793 1395
8 LI. -403409.1.-2000SEP08 71602119V1 891 1390
8 U:403409.1 :2000SEP08 6532643H1 803 1384
8 L1:403409.1 :2000SEP08 70522446V1 733 1374
8 U:403409.1 :2000SEP08 70523188V1 696 1360
8 U:403409.1 ;2000SEP08 70524769V1 653 1352
8 L1:403409.1 :2000SEP08 71556912V1 924 1343
8 LI:403409.1 :2000SEP08 70522614V1 765 1339
8 LI .-403409.1.-2000SEP08 70522594V1 717 1338
8 LI:403409.1 :2000SEP08 6329688H1 705 1325
8 U:403409.1 :2000SEP08 71603353V1 768 1418
8 U:403409.1 ;2000SEP08 71603455V1 1252 1415
8 LI:403409.1 :2000SEP08 71601114V1 724 1408
8 LI:403409.1 :2000SEP08 71603450V1 1284 1460
8 LI:403409.1 :2000SEP08 71598741VI 892 1454
8 LI .-403409.1.-2000SEP08 70522814V1 774 1452 U:403409.1 :2000SEP08 70646182V1 896 1439
8 U:403409.1 ;2000SEP08 71599649V1 913 1435 8 U:403409.1 :2000SEP08 71598104V1 822 1430 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 18" LI:403409.1 ;2000SEP08 70522541VI 755 1409 8 LI:403409.1 ;2000SEP08 70522436V1 771 1422 8 U:403409.1 :2000SEP08 71598878V1 781 1510 8 LI:403409.1 :2000SEP08 71600060V1 923 1503 8 LI:403409.1 :2000SEP08 70526615V1 947 1493 8 U:403409.1 :2000SEP08 71599305V1 1044 1491 8 LI:403409.1 :2000SEP08 71603274V1 949 1492 8 LI:403409.1 :2000SEP08 71601895V1 910 1488 8 LI:403409.1 :2000SEP08 7068876H1 1069 1483 8 LI:403409.1 :2000SEP08 70646150V1 1000 1479 8 U:403409.1 :2000SEP08 71602212V1 862 1475 8 U:403409.1 :2000SEP08 70522443V1 933 1622 8 LI:403409.1 :2000SEP08 71603173V1 979 1606 8 LI:403409.1 :2000SEP08 71598064V1 895 1581 8 LI:403409.1 :2000SEP08 4068325T6 1018 1641 8 LI:403409.1 :2000SEP08 71600461VI 918 1430 8 LI:403409.1 :2000SEP08 71601649V1 849 1425 8 U:403409.1 :2000SEP08 71601959V1 1803 2288 8 LI:403409.1 :2000SEP08 g1998847 1973 2275 8 LI:403409.1 :2000SEP08 6869946H1 1717 2275 8 LI:403409.1 :2000SEP08 g4510725 1831 2272 8 LI:403409.1 :2000SEP08 3541104H1 1991 2272 8 LI:403409.1 ;2000SEP08 70525434V1 1668 2257 8 U:403409.1 :2000SEP08 71600071VI 1438 2211 8 LI:403409.1 :2000SEP08 6298042H1 1871 2146 8 LI:403409.1 :2000SEP08 70531485V1 1972 2140 8 LI:403409.1 :2000SEP08 71570166V1 1946 2130 8 LI:403409.1 :2000SEP08 71572989V1 1947 2130 8 LI:403409,1 :2000SEP08 71599459V1 1419 2126 8 LI:403409.1 :2000SEP08 7468129H1 1672 2123 8 LI:403409.1 :2000SEP08 71599020V1 1377 2056 8 LI:403409.1 :2000SEP08 71599030V1 1330 2056 8 LI .-403409.1.-2000SEP08 71600742V1 1513 2002 8 LI:403409.1 :2000SEP08 70526344V1 1387 1988 8 LI. -403409.1.-2000SEP08 70522547V1 1396 1983 8 LI:403409.1 :2000SEP08 71600018V1 1280 1965 8 LI:403409.1 :2000SEP08 71598137V1 1757 1940 LI .-403409.1 :2000SEP08 6412487H1 1494 1931
8 LI:403409.1 :2000SEP08 71559794V1 1463 1905 8 LI:403409.1 :2000SEP08 71602804V1 1135 1832 8 LI:403409,1 :2000SEP08 71600670V1 1402 1823 8 LI:403409.1 :2000SEP08 5534143H1 1578 1820 8 LI:403409.1 :2000SEP08 71602415V1 1308 1755 8 LI:403409.1 :2000SEP08 71όθni5Vl 1097 1726 8 U:403409.1 :2000SEP08 71600852V1 1086 1708 LI:403409.1 :2000SEP08 70522029V1 1187 1709 LI:403409.1 :2000SEP08 g1383466 1395 1696 U:403409.1 :2000SEP08 g3254782 1346 1682 LI:403409.1 :2000SEP08 g1400213 1337 1682 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 181 U;403409.1 :2000SEP08 71597884V1 1113 1675 8 LI. -403409.1.-2000SEP08 70525805V1 1088 1643 8 LI:403409.1 :2000SEP08 70522266V1 920 1644 8 LI:403409.1 :2000SEP08 71602346V1 1075 1639 8 LI:403409.1 :2000SEP08 70525328V1 1039 1641 8 LI:403409.1 :2000SEP08 6141236T8 989 1579 8 LI:403409.1 :2000SEP08 70523039V1 862 1577 8 LI:403409.1 :2000SEP08 71599563V1 934 1577 8 U:403409.1 :2000SEP08 3256362H1 1308 1554 8 LI:403409.1 :2000SEP08 71582870V1 1082 1322 8 U;403409.1 :2000SEP08 70523160V1 587 1316 8 LI:403409.1 :2000SEP08 g6570650 881 1310 8 U:403409.1 :2000SEP08 71602205V1 830 1306 8 LI:403409.1 :2000SEP08 71599405V1 861 1305 8 U:403409.1 :2000SEP08 71599960V1 738 1299 8 LI:403409.1 :2000SEP08 71600217V1 738 1298 8 LI:403409.1 :2000SEP08 71598742V1 553 1294 8 LI:403409.1 :2000SEP08 70525194V1 724 1283 8 LI:403409.1 :2000SEP08 71601307V1 716 1270 8 LI:403409.1 :2000SEP08 70524986V1 585 1261 8 LI:403409.1 :2000SEP08 70524908V1 682 1254 8 LI:403409.1 :2000SEP08 70524071VI 682 1253 8 LI:403409.1 :2000SEP08 71598545V1 680 1248 8 LI:403409.1 :2000SEP08 6333709H1 705 1240 8 LI:403409.1 :2000SEP08 7377730H1 697 1236 8 LI:403409.1 :2000SEP08 71600305V1 716 1196 8 LI. -403409.1.-2000SEP08 71601960V1 627 1190 8 LI:403409.1 :2000SEP08 71602858V1 513 1189 8 LI:403409.1 :2000SEP08 71601208V1 616 1179 8 LI:403409.1 :2000SEP08 7080559H1 760 1170 8 LI:403409.1 :2000SEP08 71598389V1 650 1156 8 U:403409.1 :2000SEP08 71571578V1 747 1126 8 U:403409.1 :2000SEP08 71601452V1 535 im 8 LI:403409.1 ;2000SEP08 71598584V1 402 1113 8 U:403409.1 :2000SEP08 4068325F6 515 1108 8 LI:403409.1 :2000SEP08 71599561VI 412 1098 8 LI:403409.1 :2000SEP08 71603370V1 588 1074 8 LI:403409.1 :2000SEP08 71598865V1 713 1079 8 LI:403409.1 :2000SEP08 71602389V1 520 1073 8 LI:403409.1 :2000SEP08 71603466V1 374 1066 8 LI:403409.1 :2000SEP08 71598103V1 595 1051 8 LI:403409.1 :2000SEP08 71600023V1 485 1043 U:403409.1 :2000SEP08 71600063V1 486 1043
8 LI:403409.1 :2000SEP08 71601379V1 339 1041 8 LI:403409.1 ;2000SEP08 71600010V1 353 1027 8 LI:403409,1 :2000SEP08 71599332V1 386 1008 8 LI:403409.1 :2000SEP08 71602290V1 485 982 8 LI:403409.1 ;2000SEP08 71599504V1 282 941 8 LI:403409.1 :2000SEP08 70525699V1 339 890 TABLE 5
QIDNO Template ID Component ID Start Stop
181 Ll:403409.1 2000SEP08 373638H1 663 880
181 Ll:403409.1 2000SEP08 3751233H1 577 867
181 Ll:403409.1 2000SEP08 5812190H1 644 845
181 Ll:403409.1 2000SEP08 5812189H1 644 840
181 Ll:403409.1 2000SEP08 3629455H1 560 835
181 Ll:403409.1 2000SEP08 4068325H1 517 804
181 Ll:403409.1 2000SEP08 311752H1 657 744
181 Ll:403409.1 2000SEP08 70535025V1 354 705
181 Ll:403409.1 2000SEP08 6141236F8 28 633
181 Ll:403409.1 2000SEP08 70525881VI 102 628
• 181 Ll:403409.1 2000SEP08 3409854H1 357 618
181 Ll:403409.1 2000SEP08 6328822H1 47 592
181 Ll:403409.1 2000SEP08 70522980V1 1 589
181 Ll:403409.1 2000SEP08 71599955V1 1 538
181 Ll:403409.1 2000SEP08 70524819V1 1 487
181 Ll:403409.1 2000SEP08 3323733H1 192 461
181 Ll:403409.1 2000SEP08 71601789V1 1 452
181 Ll:403409.1 2000SEP08 71598908V1 1 435
181 Ll:403409.1 2000SEP08 6134014H1 no 401
181 Ll:403409.1 2000SEP08 6141236H1 28 370
181 Ll:403409.1 2000SEP08 3388023H1 89 361
181 Ll:403409.1 2000SEP08 2723940F6 1 333
181 Ll:403409.1 2000SEP08 3126704H1 22 298
181 Ll:403409.1 2000SEP08 g1998848 22 296
181 Ll:403409.1 2000SEP08 5643224H1 22 275
181 Ll:403409.1 2000SEP08 3256088H1 24 259
181 Ll:403409.1 2000SEP08 5499626H1 59 250
181 Ll:403409,l 2000SEP08 1728945H1 26 246
181 Ll;403409.1 2000SEP08 2723940H1 1 243
181 Ll:403409.1 2000SEP08 5500309H1 25 226
181 LI.-403409.1 2000SEP08 5499909H1 1 174
181 Ll:403409.1 2000SEP08 5500026H1 3 160
182 Ll:450798.1 2000SEP08 70998432V1 348 871
182 Ll:450798.1 2000SEP08 71300061V1 423 1033
182 Ll:450798.1 2000SEP08 70996277V1 454 1126
182 Ll:450798.1 2000SEP08 5906243F8 1 150
182 Ll:450798.1 2000SEP08 71298861VI 1 381
182 Ll:450798.1 2000SEP08 70997953V1 1252 1785
182 Ll:450798.1 2000SEP08 70998492V1 1291 1779
182 Ll:450798.1 2000SEP08 71299576V1 1276 1779
182 LI.-450798.1 2000SEP08 70998290V1 1301 1754
182 Ll:450798.1 2000SEP08 71001007V1 1487 1666
182 Ll:450798.1 2000SEP08 71002412V1 1487 1665
182 Ll:450798.1 2000SEP08 71284994V1 1622 1784
182 Ll:450798.1 2000SEP08 70996506V1 1158 1779
182 Ll:450798.1 2000SEP08 71299572V1 1134 1758
182 Ll:450798.1 2000SEP08 70998139V1 1160 1808
182 Ll:450798.1 2000SEP08 70996451VI 1143 1779
182 Ll:450798.1 2000SEP08 71299204V1 694 1129 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
182 Ll:450798.1 2000SEP08 70995811VI 707 1320
182 Ll:450798.1 2000SEP08 70996755V1 711 1216
182 LI.-450798.1 2000SEP08 71298736V1 719 1271
182 Ll:450798.1 2000SEP08 71298890V1 735 1345
182 U:450798.1 2000SEP08 70995249V1 749 1340
182 LI.-450798.1 2000SEP08 70995662V1 1 528
182 Ll:450798.1 2000SEP08 70998510V1 1 541
182 Ll:450798.1 2000SEP08 70997263V1 269 814
182 LI.-450798.1 2000SEP08 70995503V1 328 878
182 Ll:450798.1 2000SEP08 70997710V1 344 979
182 Ll;450798.1 2000SEP08 70996031VI 344 978
182 Ll:450798.1 2000SEP08 70998404V1 1 491
182 Ll:450798.1 2000SEP08 70995922V1 1 442
182 Ll:450798.1 2000SEP08 5906243H1 1 293
182 Ll:450798.1 2000SEP08 5906243F6 2 567
182 Ll:450798.1 2000SEP08 70996045V1 194 666
182 Ll:450798.1 2000SEP08 70997325V1 1144 1702
182 Ll:450798.1 2000SEP08 70997726V1 1145 1791
182 Ll:450798.1 2000SEP08 71004095V1 1143 1376
182 Ll:450798.1 2000SEP08 71003787V1 1143 1374
182 Ll:450798.1 2000SEP08 70998190V1 1173 1779
182 Ll:450798.1 2000SEP08 70995151VI 755 960
182 Ll:450798.1 2000SEP08 70997688V1 765 1497
182 Ll:450798.1 2000SEP08 70997481VI 775 1374
182 Ll:450798.1 2000SEP08 70997157V1 794 1372
182 Ll:450798.1 2000SEP08 70954505V1 799 1033
182 Ll:450798.1 2000SEP08 70954308V1 805 1204
182 Ll:450798.1 2000SEP08 70995706V1 664 1316
182 Ll:450798.1 2000SEP08 70996760V1 541 1112
182 Ll:450798.1 2000SEP08 70996660V1 539 1114
182 Ll:450798.1 2000SEP08 70997885V1 565 1098
182 Ll:450798.1 2000SEP08 71003084V1 569 730
182 Ll:450798.1 2000SEP08 6594478H1 647 1209
182 Ll:450798.1 2000SEP08 70995050V1 651 1339
182 Ll:450798.1 2000SEP08 71032609V1 495 1112
182 Ll:450798.1 2000SEP08 70996627V1 807 1350
182 Ll:450798.1 2000SEP08 70997576V1 831 1403
182 Ll:450798.1 2000SEP08 70997754V1 846 1538
182 Ll:450798.1 2000SEP08 71299090V1 850 1467
182 Ll:450798.1 2000SEP08 70998737V1 850 1470
182 Ll:450798.1 2000SEP08 70996977V1 935 1569
182 Ll:450798.1 2000SEP08 71300385V1 938 1609
182 Ll:450798.1 2000SEP08 71298713V1 972 1305
182 Ll:450798.1 2000SEP08 71300270V1 985 1677
182 Ll:450798.1 2000SEP08 5906243T9 1026 1666
182 Ll:450798.1 2000SEP08 70995801VI 1077 1679
182 Ll:450798.1 2000SEP08 70998862V1 1116 1779
182 Ll:450798.1 2000SEP08 70998830V1 1156 1779
183 Ll:410317.1 2000SEP08 4999885F6 264 643 en m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
CT3 00 00 00 00 00 00 00 00 00 CT3 00 00 00 00 00 00 CD OO OO OO CT3 00 00 00 C» OO OO CT3 00 00 CD OO OO O OO C» o
C C C C C C C C Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ C
GO GO CO Cj3 CjJ GO CJ0 CJ- ω GO cJ GO ω
JS. JN JN JS. 4N 4N JN JS. JN JN 4S. 4N JN JS. — ' — ■ — ' — ■ — ' — ■ — ■ — ' — ■ — ■ — ' — ■ — ■ — ' — ■ — ' — - — • — . _ _ _ _ — , _ _ — , — _ _ _ _ _ _ _ _
O O O O O O O O O O O O O O O O O O O O O O O O P O O O O O O O O O O O O O O O O O O O O O O O O NT NO NO NO NT NO NT NO NO NO NO NO NO CO COO CO CO CO CO GO GO CO CO CO CO GO GO GO CJO CO CO CO CO CO CO GO CO Co CJ CO W
O O O O O O O O O O O Ch O O — ' — — ' — — ' — ■ — ■ — ■ — ' — ■ — ' — ' — ' — ' — ■ — ■ — ' — ■ — ' — ' — — ■ — — ' — ■ — ■ — ■ — ' — ' — ' — ■ — ■ — ' — ' _, » po c» po p3 pB po c» po po po o o » r 'rό M 'rό b i 'rό r NT r r r r r
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O i O O O O O O O O O O O O O O O O O O O O O O O O P O O O O O O O O O O O O O O O O O O O O O O Q CD CD CD CD O O CD CD CD CD O CD Cl CD O O O CrD O C CD CD O Ci a en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en en cn en en en en en en en en CHn en Φ rπ rπ πT rπ rπ m rπ rπ rπ rπ m rπ rπ rπ m rπ rπ m rπ rπ rπ m rπ πT rπ m rπ rπ rπ m m m rπ rπ rπ m m rπ m rπ m rπ πT rπ m rπ rπ
OO O OOO O OOO OO OO OO OO OOO OOOOOO OOOOOOOO OOO O O O OOO O O O
4S. GO GO NT — — — r-N CD sl sJ NO NO rO NT NO NO NO NO CO NT NO NT — Ol — — l— ' — ' O Cn C0 4N CO C0 G0 G0 G0 C0 — O O 4N JS. JS. M Nl CO r, K -1 O O CX ϋl GO - CD l Nj o u o o i f O Sl — Oo NT O O NJ CO UI OO OO NO OO OO OO OO NT NO GO is. CO Cn CO NO ^ N^ ^ CO NT OO sJ —
CΛ m _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C» 0O 00 CT0 O3 0O CX> 00 00 0O CT3 CT0 O0 00 O0 00 CD CD 0O 00 CT0 CO 00 0O 00 00 0θ O0 00 0O O0 03 0o α COT ϋι Ol Oι Ui Ol Oι Oι Oι CJi Oι Ui Oι Oι Ui Ol Ui Oι Oι Ol Oι Ui Ol Ui Oι Oι Ui Ol Ui Ol Ui Uι Ui Oι CJl Oι Ui Ol Ui Oι Ui 4N 4-* 4N 4N 4N
NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT GO CO GO GO CO GO GO CO
C 1 C T c ) CT r T C , CT CT C T C T ΓT C ) C ) C ) ( ) ( ) C ) ( ) C ) C T C ) C ) CD C T C T CT C T C T ( T C T C T C T C T C T o C T C ) o C T C ) o JN 4N JN 4N JS. 4N 4S. 4S.
CJl Ol Ol CJT cπ cπ Oi cn Ol ϋi cn Jl ϋl cn cn cπ CJl CJT cπ Ol CJl CJl cπ cn cn cπ cn Ol cπ cπ Ol cn cπ Ol cn cπ CJT CJT CJT cn cπ O O O O O O O O NT NO NT NO NT NO NT NT o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O O O O O O O O .
Sl sj sl SI si I si sl si Sl Sl Sl SJ sl SJ SJ Sl SJ Sl sl SJ SI SI SI SJ SJ J SJ Sl J SJ Sl Sl SJ Sl Sl SJ SJ Sl oo a po po po cB po po j lO NT NT NT NT NT NT NT -π
NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT O O O O O O O O -i
C T O C T CT n CD CD o CT C T o C ) C 3 C ) C ) C 3 C ) C ) C ) C T C ) C ) C ) C ) C T Γ C ) C T C T C T C T T C T C T CT C T C T o C T C T o O O O O O O O O Q
( T C T ( ) < ) C ) C T < T C T C T C ) C T C ) C 3 ( ) C ) C ) < ) ( ) C ) C ) ( ) C ) ( ) ( ) < T C ) ( ) < ) C T ( T < ) C T ( J C ) C T C T < ) C T C ) C T r T o o o
C ) C T C ) ( T C T C T T C T C T ( T C T C ) C 3 < ) C ) C ) ( ) ( ) C ) ( ) < ) C ) C T < ) < T C ) C ) C ) C ) C T C ) ( T C ) ( ) C T C T C ) C T C T O CO oCΛ oCΛ oCO →CD-
CΛ CΛ CΛ CΛ Λ CΛ CΛ CΛ CΛ CΛ CΛ CΛ co CΛ CΛ CΛ CΛ CΛ CO CΛ CΛ CΛ CΛ CΛ CΛ CΛ < ) C T CΛ oCΛ oCΛ C
CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ m m m m m m rπ m — rπ rn I'll in m rn in m m III m m I'll m m III m m rn I'll m m m 1 II i n m πi 111 in m I'l 1 in m ITI m m ITI m m rπ m TJ TJ TJ TJ Ti Tj -Tj TJ D
TT TJ TJ Tl TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TT TJ TJ TJ TJ TJ TJ TJ TT TT TJ TJ TJ TJ TT TJ TJ TJ Tl TJ Tl TT TT TJ TJ O O O O O O O O
C J C T C ) T C T C T CD T C T C T o C J CJ C ) C 3 C 3 C ) C ) C ) C ) C ) C ) T C ) C T o C ) C T CT ΓT C T CT o C T o o C T n o C T o 00 00 00 00 03 00 00 00
CD CO CD CD CD CO 00 00 CO CD 00 CD CO CD CD 00 00 CD CJO 00 CO 00 00 00 00 CJO CD CO 00 00 00 CD 00 CJO 00 00 00 00 00 00 00
NO NT Ol — Ol JN CJJ ω jN- Js. θι Cπ θo ϋι Oι Oι CJl 4N 4N sl O O CD Oo ϋι Oι 4N 4N — — ■ O O OO CD Oo O Cn CO NO JO sl sl sJ O O O Oi ϋi -^
— i — • NQ ON O SJ C0 00 O O O C31 C31 O O O sJ CT0 sJ Cn CD 0l C0 O 4-* IO O O O 4N - CTθ 00 CJ0 G0 CD θ1 C_π C01 sJ G0 Cn tO — ' CD O O O O O
O CO 00 00 si Ol — ' NO O NT O — ' sl ω sl C - ' 4* CJl r sl - ' O O 4S. G04N O O SI 4S.010 NO O O O SJ O — ' — ' Co o sl o — ' O Go
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
186 LG:998844.1:2000SEP08 gό569256 1 394
186 LG:998844.1:2000SEP08 6764364Jl 1 294
186 LG:998844.1:2000SEP08 g273443 2 422
187 LG:1043787. 1 :2000SEP08 6794461 F8 1 591
187 LG:1043787. 1 ;2000SEP08 6794461 HI 2 449
187 LG:1043787. 1 :2000SEP08 6798630H1 24 445
187 LG:1043787. 1 :2000SEP08 6798630F8 24 481
187 LG:1043787. 1 ;2000SEP08 6798660H1 24 340
187 LG:1043787. 1 :2000SEP08 6798660F8 24 591
187 LG:1043787. 1 :2000SEP08 6794461T8 79 732
187 LG:1043787. 1;2000SEP08 6798660T8 241 696
188 LG: 1098931.16:2000SEP08 7664285J1 1 582
188 LG:1098931.16:2000SEP08 1671805H1 291 515
189 LG:199423.2:2000SEP08 2757508H1 1 257
190 Ll:1075297.1 :2000SEP08 5334749F6 1 510
190 LI:1075297.1:2000SEP08 5334749F8 1 594
190 LI:1075297.1:2000SEP08 5334749H1 1 128
190 U:1075297.1:2000SEP08 5334749T8 245 810
191 LI:1043321.1:2000SEP08 6796976H1 1 254
191 U:1043321.1:2000SEP08 6796976F8 45 586
191 LI:1043321.1:2000SEP08 6796976T8 218 780
192 Ll:297070,l:2 000SEP08 7042156H1 457 984
192 Ll:297070.1:2 .000SEP08 7766217J1 464 1125
192 Ll:297070.1:2 .000SEP08 1564083H1 682 869
192 11:297070.1:2 O00SEP08 g3676974 1 430
192 Ll:297070.1:2 .000SEP08 2640925H1 228 476
192 LI.-297070.1:2 000SEP08 7766217H1 263 857
192 Ll:297070.1:2 000SEP08 7042156F8 457 983
192 Ll:297070.1:2 000SEP08 5539166F6 756 1096
192 Ll:297070.1:2 000SEP08 5539166H1 756 965
193 Ll:1085041.1 2000SEP08 6796417H1 1 340
193 LI:1085041.1 2000SEP08 6796417F8 5 601
193 LI:1085041.1 2000SEP08 6796417T8 139 612
193 Ll:1085041.1 2000SEP08 6793856T8 331 640
193 LI:1085041.1 2000SEP08 6793856H1 332 481
194 U:1071544.1 2000SEP08 6792174F8 1 578
194 Ll:1071544.1 2000SEP08 6792174H1 1 570
194 Ll:1071544.1 2000SEP08 6792174T8 2 616
195 U:2052480.1 2000SEP08 70844959V1 1025 1696
195 LI.-2052480.1 2000SEP08 2243248H1 1114 1368
195 Ll:2052480.1 2000SEP08 71045682V1 411 572
195 Ll:2052480.1 2000SEP08 71223029V1 493 1123
195 Ll:2052480.1 2000SEP08 g657732 539 917
195 Ll:2052480.1 2000SEP08 70844094V1 376 1062
195 Ll:2052480.1 2000SEP08 71224053V1 258 787
195 Ll:2052480.1 2000SEP08 6787834H1 293 735
195 Ll:2052480.1 2000SEP08 5072694H1 28 118
195 Ll:2052480.1 2000SEP08 g3756288 1287 1719
195 Ll:2052480.1 2000SEP08 4198326F6 1289 1816 -
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
195 LI:2052480.1:2000SEP08 2924556H1 895 1117
195 LI. -2052480.1.-2000SEP08 70846478V1 763 1171
195 LI:2052480.1:2000SEP08 2925554H1 1192 1483
195 LI:2052480.1:2000SEP08 g7045146 1249 1684
195 LI:2052480.1:2000SEP08 4198326H1 1289 1591
195 LI:2052480.1:2000SEP08 5072595H1 29 120
195 LI:2052480.1:2000SEP08 70846219V1 38 642
195 LI:2052480.1:2000SEP08 5072595F8 72 225
195 LI:2052480.1:2000SEP08 71223294V1 82 618
195 LI:2052480.1:2000SEP08 g4852852 792 950
195 LI:2052480.1:2000SEP08 70845403V1 817 1400
195 LI:2052480.1:2000SEP08 3696915F6 824 1268
195 LI:2052480.1:2000SEP08 3696915H1 824 1120
195 U:2052480.1:2000SEP08 70845925V1 891 1417
195 LI:2052480.1:2000SEP08 70844782V1 740 1140
195 LI .-2052480.1:2000SEP08 6934749H1 760 1346
195 LI:2052480.1:2000SEP08 71223146V1 553 1086
195 LI:2052480.1:2000SEP08 70844786V 1 732 1359
195 LI:2052480.1:2000SEP08 g1721215 1326 1488
195 LI:2052480.1:2000SEP08 7424361 Tl 1323 1539
195 U:2052480.1:2000SEP08 70845065V1 927 1522
195 LI:2052480.1:2000SEP08 71224205V1 936 1539
195 LI. -2052480.1.-2000SEP08 70842872 V 1 1019 1466
195 LI:2052480.1:2000SEP08 70845463V1 1291 1784
195 U:2052480.1:2000SEP08 70845678V1 1147 1763
195 LI;2052480.1:2000SEP08 g657793 1 55
195 LI:2052480.1:2000SEP08 1501410F6 1 288
195 LI:2052480.1:2000SEP08 1501410H1 1 208
195 U:2052480.1:2000SEP08 70846047 V 1 10 560
195 LI:2052480.1:2000SEP08 71223828V1 904 1354
196 LG:450105.1:2000SEP08 5912415F8 1 376
196 LG:450105.1:2000SEP08 5912415H1 1 299
196 LG:450105.1:2000SEP08 5912415F6 12 565
196 LG:450.105.1:2000SEP08 5912415T9 66 535
197 LG:450581.1:2000SEP08 5906909T6 1 410
197 LG:450581.1:2000SEP08 5906909F6 1 501
197 LG:450581,1:2000SEP08 5906909T9 1 432
197 LG:450581.1:2000SEP08 5906909T8 1 359
197 LG:450581.1:2000SEP08 5906909F8 10 484
197 LG:450581.1:2000SEP08 5906909H1 10 302
198 LG:450887.1:2000SEP08 5911592T6 1 523
198 LG:450887.1:2000SEP08 5911592H1 1 290
198 LG:450887.1;2000SEP08 5911592T8 1 473
198 LG:450887.1:2000SEP08 5911592F8 1 569
198 LG:450887.1:2000SEP08 5911592T9 1 473
198 LG:450887.1:2000SEP08 5911592F6 1 565
199 LG:460809.1:2000SEP08 4119207F6 1 336
199 LG:460809.1:2000SEP08 4119207T6 1 336
199 LG:460809.1:2000SEP08 4119207H1 1 175 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
200 LG:452089.1:: 2000SEP08 5905252H1 35 313
200 LG:452089.1:: 2000SEP08 5905252T9 236 685
200 LG:452089.1:. 2000SEP08 5905252T6 376 823
200 LG:452089.1:. 2000SEP08 5905252F8 1 497
200 LG:452089.1:. 2000SEP08 5905252F6 35 576
201 LG:1099416.1 :2000SEP08 6729842H1 1 412
201 LG:1099416.1 :2000SEP08 5401350H1 1 105
201 LG:1099416.1 :2000SEP08 6057617H1 56 643
201 LG:1099416.1 :2000SEP08 5401350T9 82 666
201 LG:1099416.1 :2000SEP08 g3214092 406 782
201 LG:1099416.1 .-2000SEP08 3524102H1 479 779
202 LG:255713.1:: 2000SEP08 3439989F6 1 433
202 LG:255713.1:I 2000SEP08 3439989H1 1 242
202 LG:255713.1:. 2000SEP08 g1689833 19 402
202 LG:255713.1:: 2000SEP08 2272896H1 21 157
202 LG:255713.1:. 2000SEP08 4884069T6 39 468
202 LG:255713.1:1 2000SEP08 4541467H1 50 115
202 LG:255713.1:: 2000SEP08 3439989T6 204 480
202 LG:255713.1:I 2000SEP08 4644925H1 239 467
203 LG;998903.1:< 2000SEP08 6271004T8 1 378
203 LG:998903.1:. 2000SEP08 6271004H2 1 277
203 LG:998903.1:_ 2000SEP08 6271004F8 1 604
204 LG:1119656.1 :2000SEP08 7741826H1 1 617
204 LG:1119656.1 :2000SEP08 4969073H1 26 111
204 LG:1119656.1 :2000SEP08 go199339 69 339
205 LG;1096907.1 :2000SEP08 6793205H1 1 353
205 LG:1096907.1 :2000SEP08 6938916H1 2 455
205 LG:1096907.1 :2000SEP08 6793205F8 17 330
205 LG:1096907.1 :2000SEP08 6791360F8 18 350
205 LG;1096907.1 :2000SEP08 6798621 F8 18 350
205 LG:1096907.1 :2000SEP08 6791360T8 18 245
205 LG:1096907.1 :2000SEP08 6798621 HI 18 350
205 LG:1096907.1 :2000SEP08 6791360H1 19 350
205 LG:1096907.1 :2000SEP08 6793393F8 21 346
205 LG;1096907.1 :2000SEP08 g1667827 30 408
205 LG:1096907.1 :2000SEP08 g1667806 398 453
205 LG:1096907.1 :2000SEP08 g1798962 25 283
206 LG:1323741.1 :2000SEP08 6795323T8 101 459
206 LG:1323741.1 :2000SEP08 6790451 F8 98 528
206 LG:1323741.1 :2000SEP08 6790451T8 98 420
206 LG:1323741.1 :2000SEP08 6794426H1 101 532
206 LG:1323741.1 :2000SEP08 6796814T8 101 427
206 LG:1323741.1 :2000SEP08 6795323F8 101 533
206 LG:1323741.1 :2000SEP08 6796814F8 101 577
206 LG:1323741.1 :2000SEP08 6795425T8 101 431
206 LG:1323741.1 :2000SEP08 6795425F8 101 534
206 LG:1323741.1 :2000SEP08 6795323H1 102 505
206 LG:1323741.1 .-2000SEP08 6795425H1 104 432
206 LG:1323741.1 :2000SEP08 6796814H1 104 505 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
206 LG:1323741.1:2000SEP08 6790451 HI 108 528
206 LG:1323741.1:2000SEP08 4194768H1 110 429
206 LG:1323741.1:2000SEP08 5220546H1 no 285
206 LG:1323741.1:2000SEP08 g1614987 116 513
206 LG:1323741.1:2000SEP08 5163260H1 122 370
206 LG:1323741.1:2000SEP08 2906282H1 148 391
206 LG:1323741.1:2000SEP08 732838H1 162 243
206 LG:1323741.1:2000SEP08 g1614885 204 545
206 LG:1323741.1:2000SEP08 5794252H1 1 268
206 LG:1323741.1:2000SEP08 5794252F6 1 268
206 LG:1323741.1:2000SEP08 5792265H1 20 268
206 LG:1323741.1:2000SEP08 579021OH1 1 268
206 LG:1323741.1:2000SEP08 579481OH1 1 268
207 LG:1098372.1:2000SEP08 3033193F6 1 272
207 LG:1098372.1:2000SEP08 3033193H1 1 216
207 LG:1098372.1:2000SEP08 5664835T9 31 414
207 LG:1098372.1:2000SEP08 3033193T6 129 443
208 LG:1006783.1:2000SEP08 6792143T8 1 426
208 LG:1006783.1:2000SEP08 6792143F8 1 552
208 LG:1006783.1:2000SEP08 6792143H1 1 532
209 LG;1097562.1.-2000SEP08 6796616H1 1 454
209 LG:1097562.1:2000SEP08 6796616T8 1 408
209 LG:1097562.1:2000SEP08 6796616F8 1 510
209 LG:1097562.1:2000SEP08 6798275T8 315 406
209 LG:1097562.1:2000SEP08 6798275H1 1 492
210 LG:998868.1;2000SEP08 6269958F8 1 345
210 LG:998868.1:2000SEP08 6269958H1 1 469
210 LG:998868.1:2000SEP08 6269958T8 293 868
211 LG:1063383.1:2000SEP08 198987H1 1 162
211 LG:1063383.1:2000SEP08 198987R6 1 503
211 LG:1063383.1:2000SEP08 g2885034 702 894
211 LG:1063383.1:2000SEP08 176170H1 12 334
211 LG:1063383.1:2000SEP08 7756883J1 73 671
211 LG:1063383.1:2000SEP08 2531743H1 109 335
211 LG:1063383.1:2000SEP08 146685R1 113 609
211 LG;1063383.1:2000SEP08 146685H1 121 291
211 LG:1063383.1:2000SEP08 4935334H1 213 487
211 LG:1063383.1:2000SEP08 6327776H1 328 623
211 LG:1063383.1:2000SEP08 146685F1 360 1010
211 LG:1063383.1:2000SEP08 198987T6 371 971
211 LG: 1063383.1:2000SEP08 1544811 HI 458 635
211 LG:1063383.1:2000SEP08 3567613H1 537 852
211 LG:1063383.1:2000SEP08 g3254441 595 1010
211 LG:1063383.1:2000SEP08 2232291 HI 617 857
211 LG:1063383.1:2000SEP08 5508171 HI 814 1018
212 LG:1400567.1:2000SEP08 7124338H1 138 589
212 LG;1400567.1:2000SEP08 6096304H1 249 408
212 LG:1400567.1:2000SEP08 g5672428 1 366
212 LG:1400567.1:2000SEP08 g6473179 38 363 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop. 213 LI:449404.1:2000SEP08 5908301HI 1 311 213 LI;449404.1;2000SEP08 5908301 F8 1 519 213 L1;449404.1:2000SEP08 6271267F8 24 643 213 LI:449404.1:2000SEP08 6271267H2 24 492 213 LI;449404.1:2000SEP08 5908301T9 248 582 214 U:449941.2:2000SEP08 5911361 F8 1 206 214 LI;449941.2:2000SEP08 5911361H1 1 301 214 LI;449941.2:2000SEP08 5911361T8 238 780 214 U:449941.2:2000SEP08 5911361T9 249 710 215 U:450229.1:2000SEP08 6268506H1 1 474 215 U:450229.1:2000SEP08 5913065F7 1 509 215 LI;450229.1:2000SEP08 6268506F8 1 589 215 LI.-450229.1.-2000SEP08 5913065F8 1 567 215 U:450229.1:2000SEP08 6268506T8 1 600 215 U:450229.1:2000SEP08 5913065H1 7 303 215 L1:450229.1:2000SEP08 5913065T7 118 715 216 LI:450399.3:2000SEP08 71464335V1 1 627 . 216 LI:450399.3:2000SEP08 71452422V1 1 248 216 LI.-450399.3.-2000SEP08 71457485V1 1 615 216 U:450399.3:2000SEP08 71465203V1 1 612 216 LI:450399.3:2000SEP08 71465621V1 1 603 216 U:450399.3:2000SEP08 ' 71467288V1 1 526 216 LI:450399.3:2000SEP08 5910662F8 1 510 216 U:450399.3:2000SEP08 71449251VI 1 232 216 LI:450399.3:2000SEP08 71461359V1 1 488 216 LI:450399.3:2000SEP08 71457006V1 1 318 216 LI:450399.3:2000SEP08 71465838V1 1 511 216 LI:450399.3:2000SEP08 71440536V1 1 405 216 LI:450399.3:2000SEP08 71455851VI 1 248 216 LI:450399.3:2000SEP08 5910662F6 1 552 216 U:450399.3:2000SEP08 71455734V1 1 143 216 U:450399.3:2000SEP08 5910662H1 1 295 216 LI;450399,3:2000SEP08 5910662T6 9 601 216 LI:450399.3:2000SEP08 71464857V1 17 469 216 U:450399.3:2000SEP08 71458424V1 53 155 216 LI:450399.3:2000SEP08 5910662T9 67 572 216 U:450399.3:2000SEP08 5910662T8 146 589 216 U:450399.3:2000SEP08 71439746V1 173 327 216 LI:450399.3:2000SEP08 71437264V1 205 476 216 LI:450399.3:2000SEP08 71439082V1 353 504 216 U:450399.3:2000SEP08 71463937V1 367 571 216 U:450399.3:2000SEP08 71459908V1 430 693 217 U:455771.1:2000SEP08 5911540F8 1 460 217 LI:455771.1:2000SEP08 5911540H1 1 250 217 LI:455771.1:2000SEP08 5911540T9 27 568 217 LI:455771.1:2000SEP08 5911540T8 78 569 218 U:720459.1:2000SEP08 8081371 H2 1 347 218 U:720459.1:2000SEP08 6571208H1 1 472 218 LI:720459.1:2000SEP08 6571208F8 1 489 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
218 LI:720459.1:2000SEP08 6571208T8 11 564
219 U:723156.1:2000SEP08 6270651T8 1 429
219 U:723156.1:2000SEP08 6270651 H2 26 533
219 LI:723156.1:2000SEP08 6270651F8 26 508
220 L1:728055.1:2000SEP08 3565507T6 98 504
220 LI:728055.1:2000SEP08 6793060H1 1 476
221 LI:1020789.1 :2000SEP08 6794842H1 1 442
221 LI: 1020789.1 :2000SEP08 6794842T8 1 405
221 LI:1020789.1 :2000SEP08 6794842F8 1 316
222 LI:1071728.1 :2000SEP08 6790925T8 1 375
222 Ll:1071728.1 ;2000SEP08 6792395T8 1 380
222 LI:1071728.1 :2000SEP08 6792395F8 1 498
222 LI:1071728.1 :2000SEP08 6792395H1 1 487
222 Ll:1071728.1 :2000SEP08 6790925F8 1 482
222 LI:1071728.1 :2000SEP08 6791755T8 1 256
222 LI: 1071728.1 :2000SEP08 6791755H1 1 485
222 Ll:1071728.1 :2000SEP08 6790925H1 1 163
222 LI:1071728.1 ;2000SEP08 6791755F8 1 482
223 LI:1084329.1 :2000SEP08 6791107H1 1 256
223 LI:1084329.1 :2000SEP08 6797814F8 1 498
223 LI:1084329.1 :2000SEP08 6797814H1 1 498
223 LI:1084329.1 :2000SEP08 6791987F8 1 496
223 LI:1084329.1 :2000SEP08 6791987T8 1 394
223 LI:1084329.1 :2000SEP08 6791987H1 1 496
223 LI:1084329.1 :2000SEP08 6791107F8 12 522
223 LI:1084329.1 :2000SEP08 6798578F8 82 500
224 U:246422.1:2000SEP08 71544580V1 1 429
224 U;246422.1:2000SEP08 71546409V1 1 413
224 LI:246422.1:2000SEP08 3033193F6 1 272
224 LI:246422.1:2000SEP08 3033193H1 1 216
224 LI:246422.1:2000SEP08 5664835T9 31 413
224 U:246422.1:2000SEP08 3033193T6 129 442
224 LI:246422.1:2000SEP08 71544508V1 133 413
224 U:246422.1:2000SEP08 71543126V1 152 302
224 U:246422.1:2000SEP08 71545653V1 175 413
224 LI:246422.1:2000SEP08 71543870V1 318 439
224 LI:246422.1:2000SEP08 71545652V1 357 413
225 LI:1086066.1:2000SEP08 6793877T8 194 601
225 LI:1086066.1:2000SEP08 6793877H1 1 604
226 U:223142.1:2000SEP08 797887R6 222 612
226 LI:223142.1:2000SEP08 5377283T9 59 591
226 L1:223142.1:2000SEP08 g4078031 284 558
226 U:223142.1:2000SEP08 797887H1 222 430
226 U:223142.1:2000SEP08 5377283H1 1 231
226 U:223142.1:2000SEP08 5443396T9 177 700
226 LI:223142.1:2000SEP08 g5675714 214 679
226 LI:223142.1:2000SEP08 g!313047 216 648
226 LI:223142.1:2000SEP08 5443396H1 1068 1207
226 Ll:223142.1;2 Ϊ000SEP08 5443396F9 694 1207 m
NO NO NT M NO NO NO NT NT NO NO NT NO NT NO NO NT NO NO NO NO NT NT NT NT NT NO NO NT NO NT NO NO NO NO NO NO NO NT NT NT NT NT NO NO NT M D , ,
CJJ CO CO CO GO CO CO CO CO CO GO NO NO NT NO NO NT NT NT NT NT NT NT NO NO NO NO NO NO NO NO NO NO NT NO NO NO NT NT NO NO NT NT NT N^
' NO " NO NO NO — - — ' P P P O O O O O O O O O O O O O OO OO OO OO OO OO OO sj sJ sJ sJ sl sl sl sJ sJ O O O O O O O O O D
Z
O
o — 4N NT CO N NO NT CO O 4ι sl sl N CT
O NT — 4N -^l sl Ul 4-* 4-* 4S. 4N 4N 4N 4N - sj O — — f°. Ol O _ 4.N sj si NO P NO Q Co sj Oi Ol O CJl O θ 4N sι co sι co cn θ q.
— — NT NO Cθ Cn θl Co Cπ 4N ϋι 4N Gθ 4N 4N sj sj — — S1 G0 O P 4N S1 NJ CD CD P P Co js^ Go JN Go JN JN Go JN O CO NO Co rO NO COl ϋi O CO Co — ' 01 0 4S. 4N — ' Oo oo OO O O O O ^ → G0 00 C0 CD O C0 G0 N0 G0 CD CJT O S1 G0 S1 4-* — P P GO si αo — O NO Ol OO OO NT NO Co O ^ O
— — Co Go Oo sl sl — CO O CD — CJI OO OO P P 4-* θ rθ sl Uι CO — 4N NT SI - rθ N0 00 θ 4N C0 Co O P G0 NT Uι Uι CO O N0 O - 00 ^ -Q
NO NO NO NT NO NO NT NT NO NO NO NT NT NO NT NT NO NT NO NT NO NT NO NO NO NT NT NT NO NT NO NO NO NT NO NO NO NO NT NO NO NT NT NO NO NT NT NT NT — © CO GO CO CO CO CJO CO CO CO CO CO GO CO CO CO CO CO GO CO CO CO CO CO CO CO GO CO CO GO c.j0 C _j0 C _jo G _0 Cθ G0 G0 G0 Cθ Co Co G0 G0 GJ G0 G0 G0 G0 G0 G0 [— j CJ3 CD 00 00 00 sl sj sJ sJ sJ sJ sJ sJ sJ CN O O O Cn Cπ Ol CJ1 4N 4N 4N <j0 Cj0 G0 [O IO [O tO IO IO IO NJ lO fO W I^ o
NT NT NT NT NT oo oo 00 00 oo oo 00 CTO oo 4s> 4N 45* 45* 4N 00 ro oo 00 00 oo 00 , , , 03 oo oo OO 00 oo oo 00 00 no oo 00 00 00 00 oo 00 00 00 03 cn cπ cn cn cn o CN Ch (h CN cπ cπ cn cπ cn cn cπ cπ cn cn cn n cπ cn o C T Sl Sl SI SJ Sl SI Sl 4N 4N 4N 4N 45* 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 45* 4N 4N 4N 45* JN sl Sl SI SI CO CD D CD CO CO CD CD CD 00 00 O CD CD CD 00 00 00 CO CO ca NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT N) NT NT
SI sj sj SJ SJ Sl SJ o 45* 4N 4N 4N ca ca ca 00 ro CO O (Λ 00 00 CD ( (» CO CO CD CD CO CO CD (O CO CO D CD (D co GO G co GO 4N 4N 4N 45* 45* 45* 4N 4N o o o O CJl Ol Ol Oi CD CD Ol cn Ol cπ Ol cn Ol cπ ϋi CJl CJl cπ ϋi CJl cπ cπ CJl Ol CJl cπ CJl cn cn cn cn c
NT NO NT NT NT NT NT NO NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT ro NT NT
C 3 C T < ) ( ) NT NT NT ( ) ( ) CT ( > < ) < ) < ) CT T 3
< ) ( ) C T < ) ( I < ) ( ) ( ) ( ) ( ) ( ) CT < ) ( )
( ) < ) < ) < ) ( ) C ) ( ) ( ) ( ) < ) ( ) ( ) < ) < ) ( ) o — i-
CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ cn CΛ CΛ CΛ CΛ CΛ cn CΛ CΛ CΛ CΛ cn CΛ CΛ fΛ CΛ CΛ CΛ CΛ CΛ CΛ (Λ CΛ CΛ CΛ CΛ CΛ CΛ O CΛ CΛ CΛ CΛ CΛ φ
III m rπ m rπ III III m m ITT m m m m m m rπ m rπ m m m m m m fΛ CΛ fΛ m rπ m m m m ITI in m m rn ITI III i n m m rn III m m
TJ u TJ TT Tl TJ J u TJ TJ TJ π TJ TJ TT TJ TJ TJ TJ TJ TJ Tl TJ TJ Tl TJ m m m TJ Tl TJ TJ TJ TJ TJ T TJ TJ TJ TJ Tl TT TJ TJ TJ TJ TJ TJ π
CT o o CT o o CD C T o TJ TJ T) o CD C T CT CT
00 00 CO oo CD CD 00 00 00 CD CD CD CD CO 00 CJO 00 CD CO CD 00 CJO 00 00 oo CD CD 00 CD 00 CJO 00 CJO CD JO 00 CO 00 00 CO CO O CD CD CJO 00 00 00
45* CO CO j 4* 45* 4N _ sj sj sl NT NT 45* NO NO J 00 00 SJ _ _ O cn rn 5 00 o NNN] > o — — - Q -
N0 45* Cπ θ Ol O sl _ Nθ ϋι 4N 45* N0 4N 4N - — O Oi NT Oi CO NT O - 45* Cn G0 G0 NT Go 4N 4N 45* ϋl 45* O sj sj sJ Ol Ol O rO NO — — ?
00 4S. G0 G0 NT NT NT NT O O O O O 4N O O 4N O 00 O 0l O O NT O Ul NT O O SI C0 Uι O — O CO O SI O CO OO — ' 00 N0 O G0 G0 Cn θ O O sl N0 C0 CD 00 jS. 4N NT O N0 sl O O 4N O 00 sj G —O O —o O CO CNnT O4N N —T O NoO CoDo Oo NNOo Gcno-roj
CO m rO NO NO NT NO NO NO NO NT NT NO NO NT NT NO NO NT NO NO NO NO NO NT NT NO NO NO NO NO NO NO N NO NO NT NT NO NO NO NO NT NO NT NO NO N^ 0 GO GO CO CO GO CO CO CO GO CO CO GO GO CO NT NT NO NO NO NT — • — _ _ _ — O O O O O O O O O O O Oo Oo CTO OO OO OO OO OO OO OO OO o
CO CO CO CO CO CO GO CO CO GO CO CO CO GO — — — — — — rO NO NT NT NT NT NT - — — O O O O O O O O NO NO rO NO NO NT NT NT NO NO NT
O O O O O O O O O O O O O — — — GO CO Go CO GO CO GO Go Cn ϋi ϋl Ol Ol Ol Cn i Ol Ol ϋi o O O O O O 4N 4N 4N 4N JN JN JN^ Cj0 C0 C0 Cn ϋι U1 0ι Cn 0ι Ui Cn Cn 0ι Uι 0ι 0l 0i Uι Ui 0l 0ι Uι
JN 4oN o4N 4oN o4N o4N o4N 4oN o4N o4S. 4oNo4N o4No4N NO NT NO NT NO NT O O O O O O O OO OO Oo O O O O O O O O sl sJ sj sl sl sj sl sI sJ sJ sJ rO NT NO NT NO NO NO NT NO NO NT NT NO NT 4N 4N 4N 4S. 4S. 4N O O O O O O O — — — sl sl sl sl sl sl sl sl — — — — — — — — — — — o o o o o o sj sj si sj sj sj sj _ _ _ Go co co co cD CD Go cD Co CD Co co cD CD Cjθ CD Go Go co ;rrl p P P P p P 4N 4N 4N 4N 4N 4N 4N P P p _ _ -_, _ -_ LJ LJ _ ^1 _ _ _ _ _ _ _ LJ _ _ ^
NT NT NT NT NT NT NT NT NT NT NT NT NT NT ΓT' Γ^ ΓT' ΓT' — r^ rf r^ r^ π' - Γ ' ΓT - J r^ NJ N NJ iθ NJ NT Nτ io io NT NT NT NT N N NT NT N NT -π
C T C ) C ) ( ) C ) C ) C ) < ) C ) ( T ( ) < T C T C ) NO NO NT NO NO NO rO N NT NT NO NO NT NT NT NO O O O O O O O O O O O O O O O O O O O S-.
C ) C T C 3 C T C ) C ) C ) C T C ) C ) C T C T C T C ) O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Q
C ) C T C ) < ) ( ) C ) C T < ) C ) C T ( ) < T C T C ) O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O -i-
CΛ CΛ CO CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ CΛ O O O O O O O O O O O O O O O O CΛ CΛ CO CJO CΛ CO CO CJO CΛ CO CO CO CO CΛ CΛ CΛ CO CΛ CO Φ m m I'll 111 m m ITI II 1 m m ITI in m m w ω w w w iT CΛ W iT W w w ω w w ω m m m m m m m m rri m rπ m m m m m πi
T; TJ T; TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ πτ πτ πτ πτ πτ πτ πτ πr πτ m ιn rπ rπ rπ rπ πτ -D T0 TT T0 T0 TD TJ TJ Tθ TJ Tj τo τJ TJ Tθ T0 TT TJ Tj rτ
C ) O C ) C ) C ) C ) C T C T C T CT C T C T n ΓT TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ O O O P P O O O O O O O O O O O O O O
CO 00 CD CO 00 CD 00 CO 00 00 00 00 00 00 O O O O O O O O O O O O O O O O CO OO OO OO OO OO OO OO OO OO OO OO OO CD OO OO CD OO OO OO OO OO OO OO OO OO CD OO OO OO OO OO OO OO Oo
CO o
Ui Oι Ol Ol Ol Uι » _ Ol 4N 45* Co GO 00 cn m Ch Ol 4N NO NT — ' cn r.i ,
NT sl o O CD O CO O O C 0T30 O-sl 00 ■ _ • C Chn 0 eo3 ϋ N N NT NT O 00 o 00 o o o O NT 4N JN CO r 0
l rN CDej, N.O, J . O
C rJn — l G r.Oi CJ O — ' O Q
M 0 4- CO CO O Ol
NO — NT — NT NO —
— O — O — NO O JN NO rτ Fτ o ^ 45* O Uι NO Oι NO rn Uι Ol Cn ϋι sJ O Cθ - — ' 4N SI S1 O O O O O O 00 4N 4N O — ' — 4N n 4
O sj JS. O sj _ SI si NO >=C >=, O <-T G GOO SsIi CcJπI OoOo Oo Oo SΞ — oOo ccnπ —— NNOT OO J4SN.. — _ O — O O CD jN Ol OO — — — NNJ NO O — sl
O 4N Ol 4N 00 CO Ol •^ O rcnn rϋni CJ o sι o O 45* O sl cπ C Ol O NT O
No o oo cπ o si o to oo o co co o oo o si o oo oo si oo O Go cπ — ' sj Ol ^Ό
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
243 Ll:399421.1 2000SEP08 71672327V1 1651 2327
243 Ll:399421.1 2000SEP08 71671 131V1 1685 2101
243 Ll:399421 .1 2000SEP08 71670034V1 1810 2552
243 Ll:399421.1 2000SEP08 71669677V1 1860 2565
243 Ll:399421.1 2000SEP08 71669977V1 1868 2471
243 Ll:399421.1 2000SEP08 71669922V1 1868 2262
243 Ll:399421.1 2000SEP08 71670815V1 1868 2174
243 Ll:399421.1 2000SEP08 71674026V1 2090 2527
243 Ll:399421.1 2000SEP08 71672887V1 2102 2577
243 Ll:399421.1 2000SEP08 71669725V1 1437 1936
243 Ll:399421.1 2000SEP08 71671316V1 1451 2129
243 Ll:399421.1 2000SEP08 71673713V1 1469 2054
243 Ll:399421 .1 2000SEP08 71672796V1 1 665
243 Ll:399421.1 2000SEP08 71672127V1 52 714
243 Ll:399421.1 2000SEP08 g7455758 83 458
243 Ll:399421.1 2000SEP08 g2457980 96 469
243 Ll:399421.1 2000SEP08 71672645V1 167 936
243 Ll:399421 .1 2000SEP08 71671802V1 194 796
243 Ll:399421 .1 2000SEP08 71674174V1 245 1000
243 Ll:399421.1 2000SEP08 71671 1 16V1 244 1036
243 Ll:399421.1 2000SEP08 71672884V1 286 931
243 Ll:399421.1 2000SEP08 71672574V1 323 1079
243 Ll:399421.1 2000SEP08 71671331V1 332 984
243 Ll:399421 .1 2000SEP08 71672409V1 338 774
243 Ll;399421.1 2000SEP08 g2063420 349 598
243 Ll:399421.1 2000SEP08 71673338V1 356 71 1
243 Ll:399421.1 2000SEP08 71542839V1 428 896
243 Ll:399421.1 2000SEP08 71674325V1 1 175 1690
243 Ll:399421.1 2000SEP08 71670040V1 1 194 1691
243 Ll:399421.1 2000SEP08 71670586V1 1205 1690
243 Ll:399421.1 2000SEP08 716721 1 1V1 1244 1929
243 Ll:399421.1 2000SEP08 71670467V1 1282 1691
243 LI :399421.1 2000SEP08 71670695V1 1284 1690
243 Ll:399421.1 2000SEP08 71670674V1 1283 1648
243 Ll:399421.1 2000SEP08 71675221 VI 1314 1959
243 Ll;399421.1 2000SEP08 g3734807 1316 1763
243 LI. '399421.1 2000SEP08 71670506 V 1 1327 2017
243 Ll:399421.1 2000SEP08 71673468V1 1331 1674
243 Ll:399421 .1 2000SEP08 71674268V1 1331 1674
243 Ll:399421 .1 2000SEP08 71670681 VI 1343 1690
243 Ll:399421.1 2000SEP08 71674973V1 1363 1690
243 Ll:399421.1 2000SEP08 71671 128V1 1369 21 18
243 Ll:399421.1 2000SEP08 71554925V1 1375 1661
243 Ll:399421.1 2000SEP08 71674341 VI 1420 21 12
243 Ll:399421.1 2000SEP08 71673770V1 1433 2043
243 LI :399421.1 2000SEP08 71675202V1 819 1516
243 Ll:399421.1 2000SEP08 71540274V1 1 169 1661
243 Ll:399421.1 2000SEP08 71670585V1 1 159 1805
243 Ll:399421.1 2000SEP08 71674048V1 1 174 1679 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
243 LI:399421.1:2000SEP08 71673538V1 841 1397
243 LI. -399421.1.-2000SEP08 71673548V1 886 1614
243 LI:399421.1:2000SEP08 71675105V1 900 1584
243 LI:399421.1:2000SEP08 71670075V1 906 1546
243 LI:399421.1:2000SEP08 71671772V1 919 1677
243 LI:399421.1:2000SEP08 71671422V1 936 1669
243 LI:399421.1;2000SEP08 71674671 VI 762 1357
243 U:399421.1:2000SEP08 71672673V1 982 1317
243 U:399421.1:2000SEP08 71672580V1 1086 1816
243 LI:399421.1:2000SEP08 71671338V1 1091 1690
243 LI:399421.1:2000SEP08 71670081V1 580 1262
243 LI:399421.1:2000SEP08 3532689F6 1 381
243 LI:399421.1:2000SEP08 71671278V1 1 650
243 LI:399421.1:2000SEP08 71673438V1 356 890
243 U:399421.1:2000SEP08 71675265V1 782 1353
243 L1:399421.1:2000SEP08 71674149V1 1141 1657
243 LI:399421.1:2000SEP08 71541507V1 1169 1669
243 LI:399421.1:2000SEP08 71670447V1 1179 1936
243 U:399421.1:2000SEP08 71671453V1 982 1317
243 LI:399421.1:2000SEP08 71671269V1 1082 1690
243 LI:399421.1:2000SEP08 71669745V1 381 997
243 LI:399421.1:2000SEP08 71672102V1 1 613
243 LI:399421.1:2000SEP08 71673585V 1 362 1095
243 LI:399421.1:2000SEP08 71673905V1 789 1516
243 LI:399421.1:2000SEP08 71670267V1 445 1133
243 LI:399421.1:2000SEP08 71671650V1 1028 1753
243 LI:399421.1:2000SEP08 71670603V1 1027 1496
243 U:399421.1:2000SEP08 71674515V1 1147 1560
243 LI. -399421.1.-2000SEP08 71541489V1 1150 1691
243 LI:399421.1:2000SEP08 71674543V1 1051 1694
243 U:399421.1:2000SEP08 71674611V1 1077 1456
243 LI:399421.1:2000SEP08 71673288V1 399 1008
243 U:399421.1:2000SEP08 71672026V1 573 1294
243 LI:399421.1:2000SEP08 71538613V1 583 757
243 U:399421.1:2000SEP08 71673555V1 610 1096
243 LI:399421.1;2000SEP08 71671221V1 649 1367
243 LI:399421.1:2000SEP08 71671332V1 675 1202
243 LI:399421.1:2000SEP08 71548524V 1 703 1054
243 LI:399421,1:2000SEP08 71670624V1 721 1063
243 LI:399421.1:2000SEP08 71673372V1 724 1359
243 U:399421.1:2000SEP08 71673387V1 723 1127
243 L1:399421.1:2000SEP08 71671687V1 751 1342
243 LI:399421.1:2000SEP08 71669849V1 1109 1690
243 LI:399421.1:2000SEP08 g2458468 25 469
243 L1;399421.1:2000SEP08 g6029855 14 483
243 LI:399421.1:2000SEP08 g3785048 14 329
243 LI:399421.1:2000SEP08 71547880V1 23 310
244 LI:816655.2:2000SEP08 2307022H1 999 1182
244 LI:816655.2:2000SEP08 g6074114 1000 1285 CO m rO NO NO NT NO NT NO NO NT NT NO NO NO NO NT NO NT NT NO NT NO NO NO NT NT NO NT NT NO NO NO NO NO NT NT NT NT NO NT NO NT NT NT NO N^ ©
JN JN JN JN JN JN JN JN JN JN JN JN JN JN JN 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N JN JS. JN 4N 4N 4N 4N 4N 4N 4N 4N 4N 4S. 4N 4N 4N 4N 4S. 4N 4N 4N JN
JN JN 4N 4N 4N 4N 4N 4N JN JN JN JN JN 4N 4N JN 4N JN JN JN JN JN JN JN JN JN JN JN JN JN JN JN JN 4N 4N JN JN JN 4N 4N 4N 4N 4N 4N 4N 4N 4N JN JN
O
CD Oo CT3 CTo θo θo θo θo θo oo θo θo θo oo sl sj s| sι o O O O O O O O O O O O O O O O CD θo CD Oo CD CD Oo θθ Oo sι sι oo sl sl sj Ol Uι 45* 4N G0 Cθ 4N CO G0 G0 G0 G0 G0 CO Cn θι 0l Cn N0 NT N0 IO N0 N0 IO N0 — ' — ' O O O O O O O O Go OO OO SI Oo O O Cn Oi GO SI sJ sJ Q O O CJl — ' O1 O1 4N 4N N0 4N N0 — ' — ' — ' O O sJ θo θ Cn ϋl O O O O O Ol O Cθ Oo sJ θ 4N 4N NO — ' C0T 4N 01 O — ' O SJ GO 4N GO SJ ON OI
— NO - — — NT — NO NO NO NT NT NO NT - NO O NO NT NO — NO — NT NO NO NT - NO NT NO NT O — — NT — — O — NO — NO O O NT NO - NT 0l 00 Cn ϋι 0l 00 4N 00 00 00 O 00 00 00 sl 00 sl 00 θ O 4N O O NT G0 00 4N θ C0 sl oo JN O Ol — — sJ GO O OO O OO GO ϋl OO CD JN Oo J O 0l 00 00 O Uι 00 0l G0 O NT Cn 0ι 0l O 0i NT Ui O 0ι 4N 4N ϋι 0ι O 4N ϋι O NT sl O O O CO O O 00 NT 4s. 45* ϋι Gθ Oi NT Co sI CJi NO O τJ
NO NO NO NO NO NO NT NO NT NT NO NO NT NT NT NT NO NT NT NO NO NO NO NO NT NO NT NT NO NT 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N JN* JN JN JN JN 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N
CD 00 00 00 SJ SI SI SJ SJ SJ SJ --J sJ Oo Oo OO CD Oo Oo OO Oo Oo Oo sl sl sj sj sj oo oo Oo CD Oo oo Oo Oo Oo oo oo OO OO Oo Oo Oo O O O O O O O O O sI NO NO θ rθ NT NO NT NO NO NO NT NT NT sJ NO NO NO NO ϋl U1 0 01 U1 0ι Ol Cn Uι 4N 4N 4N O O Ol 01 Q O O O Oo OO O ϋl NO NO NO O O O O O Uι 0 4N Go ϋι NO NT tO sl Ol Oι Ol 4N CD Oo sl oo sl cn Cθ NO - 00 00 O 4N 4N O O _
— NT NT O NO — NO NO NO NT O NT _ _ CO O NO NO NT NT NT NO NT - — — — O O O O — — — — O NO — NO N0 - NT NT - NO O — Tt Ol sj 00 Co Go — 00 00 SI 00 JN CD s COj JNNNJ c N JSπ N^ . CT3 00 00 SJ 00 00 G0 CT3 N0 CO G0 - Cπ sJ OO Cθ CO NO Cθ 4N O C» O CT3 CT0 4N C» Oθ 4N OO CD Cθ O O O Ol NO sj o cπ 4N o Oi oo cπ ^ Ol Oi O O sl O sl Oi rO - 0ι O N0 O 00 NT C0 sl 00 θ O Cn 4N ϋl 0l N0 Ui 0l O Uι - 4 O
ro ro NT NO NO NO NT NT NO NO NO NO NT NO NO NO NO NO NO NT NT NT NO NO NT
4N 4N 4N 4N 4N 4N r4o NT NT NT NT NT NT NT NT NT NO NT NT NT NT NO NO NO NO NT
4N 4N 4N 4N 4N 4N ■IN 4N JN JN JN 4N 4N 4N 4N 4N 4N ro
4N 45* 4N N 4N JN fc. 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N
4N JN 4N 4N 4N 4N 4N JN JN JN JN 4N JN JN JN JN JN 4N 4N 4N 4N 45* 4N 4N 4N 4N 4N 4N 4N JN 4N 4N 4N 4N 4N JN JN 4N 4N 4N 4N 4N 4N 4N 4N 4N r- t— (— ι— r— r~ i— i— !— r- t— r— i— ι— r- i— (— r- ι— [— r- r- i— i— (— t— r— r- i— i— r- r~ — !— (— [— r— i— (— I— ι— I- (— i— r~ r~ r- i— r-
00 OO 00 00 00 00 00 Oo 00 oo 00 Oo 00 00 00 00 00 00 00 00 CD Oo CD 00 00 00 00 CD 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 Oo 00 00 CD 00 00
O o o o o o o o o O O o o o O O o O o o O O o o o o O O o o o o O O o
O o o o O O o o o O O o o o o O O o o o o o o O o o o o O o o o o o o
Ol Ol cn Ol cn Oi cπ Ol cn Ol cn CJI cn cπ cπ cn cn Oi cπ Ol cn Ol o o o o oo o
Ol cn Oi cπ cn cπ ϋi cn Oi o Cn o o cn Ol CJl en o o
Cn ocπ Ol Ol cπ o o o o o o o o o
Cn cπ Ol cn o O cn ocπ o Oi cn
Ol Ol cn Ol cn Ol cn cn cπ cπ cn n cπ cπ cn Ol cn Ol cn Oi cn cn Ol cn Ol cn cn cπ cπ cn Ol Cn cn Ol oi Ol cn cn Ol Ol ϋi Cπ cn Ol Ol Ol cn Ol cn —1 io io NO NT N io NT io io io io io io T io NT φ io io NO fo NT N NO NT io io N NT io io NT to io io io NO io io NT NO io io io io NO io io io io NT ioό ioό iό NT NO iό NT ioό iό iό Kb iό iό iό iό fό ro iό iό iό iό iό iό iό NO iό iό iό iό ό iό iό fό iό NO iό iό iό io iό NT NT iό iό o O o o o o o o O o o o o o o o o o O o o O o o b iό i o o o o o o o o "to T NO T3 o o o O o o o o o
CΛ CΛ CO CO o o o o o o o o o o o o o o o o O o o o o o o oo o o o o o o o o oo o o o o o o O o o o O o o O o o o o o o Q o o o O o o o o o o o o o o o o O
CO CO CΛ CO co CO o o
CO CO o CΛ CΛ co CO co CO CΛ o CΛ o o o O o o o o
CO CO o o o
CΛ co CO CO CΛ CΛ CO C CO CO o o o o
CO o o o o o o O o o o o o
CO C co CO CO CΛ CΛ CO CO co GO co CO CO CO CO Φ m m m m m m m rn rn m m m m m rn m m m m m m m m m m m rπ m m m m m m rn m m m rπ m m m m m m rπ rπ m m
TJ TJ TJ TJ TJ ToJ TJ TJ TJ TJ TO TJ TJ TJ TJ J TJ TJ TJ TJ TJ TJ TO TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TO TJ Tl TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ
O O O o O o O O O O
00 oo 00 00 00 00 o o
CD o O
03 oo 00 co O O o o o o CO 00 CD oo oo 0o O 3 CD o O o o O O
C o O O o O o o o o o O O O o o o O O O o o co 00 00 CD CD O CD oo oo 00 CD co 00 CD- oo 00 00 00 CD 00 00 00 CD CD 00 CD 00 00 00 00 oo
CO CD OO OO OO OO OO OO OO OO OO CD SI SJ OO OO OO 0 0 0 0 0 4N sl sI sl sJ sJ sl sl sI sl sJ sI O O OO OO OO OO OO Oo CD OO OO OO OO OO --2
— — — — — — — — — — — cπ o ro ro ro co o O sl O O O 4N 00 sJ NT — — — NO NO O O OO NT O O O O O O O O O O O O O Q O 00 O 01 O O 00 — sJ fc. G0 4N O N0 O 00 CO 00 00 sl sl 0N N0 - — O O O O 00 q_
— — — — — — — — — — — — — — — — CO
— ro to — — ro ro ro ro ro — ro o o O NT NO NO sj O NO NT 00 O NT NO O — — — — — — — — — ro NT NT NT - ro io ro ro ro io ro NT — o — o ro — NO O O — — o ^
O O OO O OO OO SI OO OO OO OO OO CJI SJ O CT3 00 0 0 M OO O OO CJT C» OO N0 0 4N NNJ C NO SI NT SJ OO UI O O OO O sl Ol Ol O CO Ol OO Cπ sJ Oi O — ' Ol NT sj oo O sl O O O O O O 4N CO OO O Cn Ol 01 N0 sl 4N 4N O CD Uι Ul N0 4N O NT Uι N0 - O sl ti O M ) O H
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 244 816655.2:2000SEP08 g2779404 821 1286 244 816655.2.-2000SEP08 g4095635 821 1286 244 816655.2:2000SEP08 4304173H1 821 1129 244 816655.2:2000SEP08 g4088858 822 1290 244 816655.2:2000SEP08 g5398034 834 1285 244 816655.2:2000SEP08 2939062H1 725 1030 244 816655.2:2000SEP08 1602631 HI 1 191 244 816655.2:2000SEP08 g6464173 970 1285 244 816655.2:2000SEP08 4606258H1 983 1251 244 816655.2:2000SEP08 1212531 HI 975 1206 244 816655.2:2000SEP08 g1376690 975 1157 244 816655.2:2000SEP08 g6464528 976 1285 244 816655.2:2000SEP08 833443H1 990 1054 244 816655.2:2000SEP08 1292202H1 981 1183 244 816655.2:2000SEP08 g1448481 990 1205 244 816655.2:2000SEP08 7128006H1 992 1299 244 816655.2:2000SEP08 70998240V1 508 1123 244 816655.2:2000SEP08 70995534V1 513 1.115 244 816655.2:2000SEP08 70996937V1 509 1181 244 816655.2:2000SEP08 70248846V1 513 1025 244 816655.2:2000SEP08 g3415495 792 1285 244 816655.2:2000SEP08 292307H1 791 1139 244 816655.2:2000SEP08 2994979H1 776 1138 244 816655.2:2000SEP08 2227862H1 777 1076 244 816655.2:2000SEP08 4513074H1 777 1071 244 816655.2:2000SEP08 g2806970 778 1286 244 816655.2:2000SEP08 70996274V1 632 1283 244 816655.2:2000SEP08 g2558287 634 1087 244 816655.2:2000SEP08 3318870H1 749 1054 244 816655.2:2000SEP08 6747339H1 562 1193 244 816655.2:2000SEP08 5771213H1 562 1059 244 816655.2:2000SEP08 1215646H1 562 839 244 816655.2:2000SEP08 4610433H1 562 838 244 816655.2:2000SEP08 7084557H1 569 1171 244 816655.2:2000SEP08 5444485F8 575 1158 244 816655.2:2000SEP08 5444485F9 576 1266 244 816655.2:2000SEP08 6721894H1 590 1135 244 81όό55.2:2000SEP08 5716571H1 629 1140 244 816655.2:2000SEP08 1528653H1 933 1160 244 816655.2:2000SEP08 3239517H1 937 1226 244 816655.2:2000SEP08 1666325H1 941 1180 244 816655.2:2000SEP08 574249H1 948 1217 244 816655.2:2000SEP08 1357273H1 947 1099 244 816655.2:2000SEP08 1344137H1 954 1234 244 816655.2:2000SEP08 4698462H1 954 1224 244 816655.2:2000SEP08 g4109989 965 1285 244 816655.2:2000SEP08 g4522919 963 1285 244 816655.2:2000SEP08 71298721VI 644 1221 244 816655.2:2000SEP08 032779H1 741 1055 CΛ m
NO NT NT NT NT NT NT NT NT NT NT NT NT NT NT NO NO NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT ©
4N 4N 4N 4N 4N .IN 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 45* 4N 4N 45* 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N ^ 4N
4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N Ό o
00 00 oo 00 00 oo 00 CD 00 00 00 00 00 oo 00 00 00 00 03 00 00 00 00 00 00 00 00 oo CD 00 00 CD 00 00 00 00 00 oo 00 03 00 00 00 00 00 00 00 00 00 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O o o o o o o o o o o o o o o o o o o o cn Ol Ol cn cn Ol cn cπ or CJT cn Oi cn cn Ol cn Ol Ol cn Ol cπ cn CJl cπ Ol CJI cn Ol cn CJl Ol cn cπ cn cn cn CJl cπ Ol CJT cπ cn CJl Ol Oi Ol cn cn cn cn Oi Ol cn cn Ol cn cn Ol Ol Oi Oi Ol cn cπ cn Ol Ol cπ Oi cπ Ol CJT cn Ol cn cπ Ol Ol Oi cπ Oi cn cn Ol cπ Ol Ol cn Ol cπ cn Oi Ol Ol Ol Oi Ol cn io NO NO io NO NO io to io NO NT NT io NT io NT io NT NO io io io fo NO io io io NO io io io NT ro io NO "to NT NO io NT fo "to io NO NT fo io NO NT φ
NT NT iό iό iό iό "NT iό iό fό iό iό NT NT iό NT NT iό iό ro iό iό fό fό iό iό fό IO NT rό fό iό ro iό iό rό NT io fό iό NT iό iό iό fό fό fό NT ro o O o o o o o o o o o o o o o O o o o O o o o o o o o o O o o o o o o o O O o o o o o o O o o O o T3J o O o o o o o o o o o o o o o O o o o o o o o o o o o o O o o o o o o o O o o o o o o o O o o O o Ω o O o o o o o o o o o o o o o O o o o o o o o o o o o o O o o o o o o o O o o
CO CO CO CO CO CO CO C CΛ CO CΛ CΛ CΛ CO CO O CO co O CΛ CO o o o o o o o o O o -+ co CΛ CO CO GO co CO CO CΛ CΛ O CΛ CO CΛ CO O CO co CO CO CO CO co CO CΛ CO CO co φ m m m m m m m m m rn m m m m m m m m rn m m m m m m m m m rπ m m m m rπ m m rπ m m m m m m m m m m m m
T TJ TJ TJ TJ TJ TJ T TJ TJ T TJ TJ TJ T TJ TJ TJ TJ TJ T TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ TJ o O o o O o O o o o o o o o O O o o O O O O o o O o O o O O o O O o o O o O o o o O o o O o O O O D
00 00 00 00 00 00 CD 00 00 00 00 CO 00 00 00 oo 00 00 CD 00 00 CD 03 00 00 00 co 00 00 00 CD 00 00 oo 00 00 CD CD 00 00 CD 00 CD CD 00 00 co 00 CD
o CD SJ SI SI SI sl OO OO OO CD OO OO Sj CN sJ Cπ Cπ sj sJ sl sl sJ O sl sl O sl sl sl sI O ϋi Ol O sl sl sl sI - ? co — o SI o cπ co si sj — , _ _ _ _ . __ , 00
CJl 00000003 CD 04N — O sj sl sj sJ Oi O O O Oi Cn 4N .fc. CO Q o o o o — jN Go ro ro — — o o zz cn Co o O Sl cn ro co — O N0 - O — 00 O 00 O O O O 4N G0 U1 U1 00 C0 CD — U1 — O 4N C0 G0 C0 O N0 NT — O O O q.
NO NO NO NO NO NO NO NT NO NO NO NO NO NT NT NO NO NO NO NT NO NO NO NO NT NT NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NT NO NT N^ * NTN 0 _
4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N 4N n
CD o o CD oooooooooooooooo 00 CO 00 CD 00 00 00 00 00 CD 00 00 00 CD CTO 00 CO 00 00 00 00 00 00 00 00 00 00 CD
C ) ( ) c ) < ) IS IN ■IN ■IN ■IN [N
N) NT N) N) sl SI s| s| s| s| si _» CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN o
O —O ' C —D ' CD C —D < IN IN IN fN N N* IN fN IN IN IN IS rs IN IN IN IN CN CN CN CN CN CN CN CN CN C CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN o
''' o n o CD CD CD CD o o SI SI SI Sl Sl Sl cn cn Ol cn cn cπ cn cn Ol cn cn cπ cπ oi cn cπ cn n en cn cπ cπ n Ol cn cπ cn cn
NT NT NT NT NT NT NT NT NT NT NT Go CO ca ca ca CO ϋl cn Ol cn cπ CJl cn cn Ol Jl cπ cn CJl cn cπ cn Ol cπ cn cn CJl cπ oi cπ cn Ol cn
_ — ' N NT NO NT NT NO NT NT NT cπ cn NO ro ro ro to NT Φ P NO NO NT NO NT NO NO to NO NO NO NO NT NO NO NT NT N to NT NT to NT NO P ro NO NT N
NT NT NT NT — ''''''' — * — ' NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT 3
C ) C ) CJ NT NT NT NT NT TJ
NT NT NT NT NT NT NT NT NT NT C 3 CJ O CT C 3 C ) C ) CJ O C 3 C ) C 3 C) CD C ) C 3 C ) CJ CT C ) C ) C ) CJ C ) C T C )
CJ C ) C 3 C 3 CD ΓT o CD CD CD NT NT O CD CD n C 3 C ) C 3 C ) C ) C ) C T C ) C 3 C ) C ) C 3 C ) C 3 C ) C ) C ) C 3 ( ) ( ) ( ) C ) C ) C ) < ) ( ) C ) CJ
CD C J C 3 C J CD o CD CD CD CD CD CD CD I 3 C ) C 3 CT C ) C ) C ) C ) C 3 C ) C ) C ) C ) C 3 C ) C ) C ) C )
CΛ CΛ CO CΛ CΛ CΛ CΛ < CΛ ) ( CΛ ) < CΛ ) C ) C ) C ) C )
CΛ co CΛ CΛ ( < n n n CD CD ΓT CD CD CΛ ) CΛ ) J () cn CO CΛ CD CD CO CO co CΛ co CΛ CΛ CΛ CΛ Cl)
111 m m rn cn n ro ro cn cn ro cn n cn cn ro cn cn ro m m rπ III III III m m m rπ III II 1 m rn III II 1 m m m rπ rn 111 III m m m u CJ u U m m m m m m m m ro CO m m m m rπ m u u CJ J u u u u u u U u u CJ TJ TJ u u TJ TJ TJ TJ u u TJ Tl TJ U
CD C ) CJ O C ) CO o CD C 3 C ) C ) C) CJ C ) C ) C ) C ) C 3 C ) C ) C ) C ) C ) C ) c > C ) C ) C T C ) C T C ) O π
TJ TJ Tl TJ TJ TJ Tl TJ TJ m m Tl TT TJ TJ TJ TJ
CD CD CD CD o O O o o O O O O TJ TJ O O O o o O CD CD CD CD CD CO 00 CD CD CJO CD CD CO CD CD CD CD 00 CD CJO CD CD CD CO CD oo CD CD
OO 00 00 00 00 oo 00 00 00 P P do 00 00 00 00 00 oo oo
en m
NT to to to No to ro ro rO NO NO NT NT NT NO NO NT NO NO NO NO NO NO NO NO NO NO NO NT NO NT NT NT NT NO NT NT NO NT NT NO NO NT NT NO NO NT NO cπ Oi ϋi Oi Oi Cπ Cπ oi N.. _.
NO NT NO NT - — — O OO OO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OO OO CD OO OO CD OO OO Co g
O
NO Go CO CO
J 4NN O fc 4N. r Os G0 JN JN 4N 4N C0 4N Uι O sι c O Uι 4N ϋι O 4N O 0l O NT O O O O sι sj sl ϋι O O 00 sl θ 00 G0 sj 4N GO NO NT CO
0 Co _ Sl o cn io si si — ' Oi iN CD si o si oo ro oo oo cn si 00 JN JN CB | Nj ji o rJ, θ Nj o o θ | 'O M C» cn NT O O
01 00 O o c "n " o —o o o o O O O O OO — ' 4N O O s| Nθ O O O OO Cn oNT O Cn CD Cn -Q
CO m
NO NO NO NT NT NO NO NO NO NO NO NO NO NO NO NT NO NO NT NT NT NO NO NO NO NO NO NO NO NT NT NT NT NT NO NO NT NO NO NO NO NO NO NT NT NO N^ , NT-, D __
Cjι ϋι 0ι ϋι 0ι cji (-π ϋι 0ι 0ι c3i ϋι ϋι 0ι 0ι ϋι 0ι ϋι 0ι 0ι 0ι cπ ϋι 0ι 0 ϋι cjι 0ι 0ι ϋι 0ι ϋι 0ι ϋι 0l ϋι coι 0ι ϋι 0ι 0^ oi π
NO NO NT NO NO NO NO NO NO NO NO NO NO NO NT NT NO NO NO NT NO NT NT NT NO NO NO NO NT NT NT NO NO NO NO NO NO NO NT NO NO NO NT M ^ Z n
sl sj sl sl sj sj sj sj sj sj sj sl sj sj sj sj sj sl sj sl sl sj sj sj sj sj sj sj sj sl sj sj sj sl sj sj sj sj sj sj sl s sl sj sj sj sj sj sj sl sj sj sj sj sj sj sj sl sj sj sj sl sl sj sj sj sj sj sl sj sj sj sj sj sj sj sl sl sl sj sl sj sj sl sj sl sj sj sj sj sj sl sj sj sl sj sj sj sj sj sj sj si sj sj sl sl sj sj sl sj sj ^r1 NT NT NT NT NT NT NT N^
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -+ O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 — cn cn cn cn Cj cn cn cn cn en cn cn cn cn cn cn e^ ιn rπ m m ιn rπ m rπ m m ιn rπ πτ m rn πτ rπ rπ m rπ rπ m rπ rπ m rπ πτ πτ TJ TJ -O TI TT -n TJ 'TJ TJ TJ TJ T^ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CD 00 00 00 CD 00 ∞ CD 00 00 00 00 CD 03 00 00 00 CD 00 00 rø
ro ro ro ro ro ro ro ro ro ro ro ro ro NO — — Co CO
O O O Uι Ul 4N 4N 4N Co G0 N0 NT O O O sj 4N 4N 4N 4N GO sj O Oi — CO O OO SJ O CD CO CO — — — — CO NO O OO - O 4N N0 O O O O C0 O O — CO — 4N O O sl N0 4N O O O O 4N C0 N0 O NT Uι 4N ϋι C0 4N O O
CO Co rO NO NO NO NO GO NT GO NO NO NO NO NO NO NO — — — — — NO — CO CO — — — — — — O Go — CO
NO — ' SJ 00 SJ si sl O 0l O JN sJ sJ 0ι Oι Uι rθ sI O 00 00 4N N0 sJ O O O 4N N0 C0 O NT J O SJ JS, ON O Ol 4N GO 4N ,.* NT NO C Cn O CJl O sj GO NO CO CO O sl Cn CO CD sj NO IO O ϋl O sj NO — ' 00 00 00 CD 4N N0 O 4N — ■ — sj o — sl cn 4N 4N si JN _ O P si NT 00 O O fO O O Ui Oi sl oo JN JN JN NO — Go CO O JN 00 01 O O O sl O NT 4N 00 "NJ — 4N O O O O00 T oJ
M NO M NO NO NO NO NO NO NO NT NT NT NJ NT NT NT NJ NT NJ NO NT NT NT NO NT NT NO NT NT NO NO NT NT NO NT NT NT NO NO NT NO N^ ©
Cτ (-n 0ι ϋι cjl 0ι ϋι 0ι 0l 0ι 0ι cjτ 0ι 0ι ϋι 0i 0ι cπ 0ι ϋι 0ι 0ι 0ι ϋι 0ι 0ι 0ι ϋι 0ι ϋι ϋι 0l ϋι ϋι 0l ϋι 0ι ϋι 0l ϋι ϋl ϋι 0ι ϋι 0l 0ι ϋ^
Ol Ol Cn Ol Ol Ol 4N 4N 4N 4N 4N CO GO CjO rO NT NO NO NO NT NO NO NO NO NT NO NO NO NO NO NO NO NT NO NT NO NO NT NT NO NO NO N^
O
CO GO CO CJO GO CO CO GO GO GO CO CO CO CJO GO GO CO CJO CO GO CO CO GO CO CO IO NO NO NO NT NT NO NT NO NO GO o fc. -sl js, _ _ _ si o o o cπ cn cn cn oι oι ϋι 4N N 4N jN jN fc. jN jN jN jN NT NT — ' O o o o o o o o o sj sj →- oo —■ JN JN CO CO CO CD IO O — OO O O O Cn jN O O O Oo O O O NO Go O NO CO GO O NO SI sj Ol NO NO NO NO NT Oo - Q — — — — O O O O O O O Ol NT NT - — ϋl O OO NO O CD OO O O NO NO Cn O O O O O O O ^Jr
NO C rn nN rN ro rN rs si rs r Co ω cO Co co co CO Go co co CO Go co CjO Go co Go co ω Go CO CO Go co ω
O NO S CO CTO — ?-- ≥ — sj o CD 00 00 O 00 O O O CJT C» CT0 00 sl sj N0 NN cj CD sj 00 θ G0 IO N0 4N 4N NT - CO CO Oi NO r o — JN C NN. [ , r r-) J y — sj sj cjl CD - O O COl O Oo sl si o O sl sl JN OO sl sl N O Oo O CO jN NO O ϋi sl o CO — O J2 O O 4N CO SJ CO NO O — O QN CΠ - SJ SJ CTO O O CJ1 GO O CJ1 SJ 0 4N O CJ3 ∞ O O O N0 4N - SJ 00 SI 4N — — 0 4N 01 01 0 TJ
TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 256 LG:476342.3:2000SEP08 5911370T8 1 438 256 LG:476342.3:2000SEP08 5911370F8 1 537 256 LG:476342.3:2000SEP08 5911370H1 1 286 256 LG:476342.3:2000SEP08 5911370F6 5 182 256 LG:476342.3:2000SEP08 6273147H1 9 534 256 LG:476342.3:2000SEP08 6271829F8 10 550 256 LG:476342.3:2000SEP08 6271829H1 10 534 256 LG:476342.3:2000SEP08 6271829T8 10 462 256 LG:476342,3:2000SEP08 6269367H1 14 496 256 LG:476342.3:2000SEP08 6268536H1 39 534 256 LG:476342.3:2000SEP08 6268536T8 39 431 256 LG:476342.3:2000SEP08 6268536F8 39 553 256 LG:476342.3:2000SEP08 5911370T6 98 495 257 LI:336801.1:2000SEP08 8032401Jl 78 639 257 U:336801.1:2000SEP08 g3048347 322 669 257 U:336801.1:2000SEP08 g3049456 283 669 257 LI:336801.1:2000SEP08 g3240957 350 669 257 LI:336801.1:2000SEP08 g3921826 319 664 257 LI:336801.1:2000SEP08 g3888653 324 664 257 U:336801.1:2000SEP08 g5768260 222 656 257 U:336801.1:2000SEP08 g2115199 424 656 257 U:336801.1:2000SEP08 g2458387 257 655 257 LI:336801.1:2000SEP08 g2458406 245 655 257 LI:336801.1:2000SEP08 g2882233 229 654 257 LI:336801.1:2000SEP08 g2907339 248 654 257 LI:336801.1:2000SEP08 g5850219 198 653 257 LI:336801.1:2000SEP08 g3149320 231 651 257 LI.-336801.1.-2000SEP08 g5765617 277 652 257 U:336801.1:2000SEP08 g4648089 195 651 257 LI:336801.1:2000SEP08 g3753479 240 651 257 U:336801.1:2000SEP08 g5393508 293 756 257 U:336801.1:2000SEP08 g3037471 296 593 257 LI:336801.1:2000SEP08 g2115406 1 365 258 LI:449685.1:2000SEP08 5912933T7 1 616 258 U:449685.1:2000SEP08 5912933F7 13 617 258 U:449685.1:2000SEP08 5912933F8 13 464 258 U:449685.1:2000SEP08 5912933H1 13 318 258 U:449685.1:2000SEP08 5912933T9 44 587 259 U:476342.1:2000SEP08 5321234F9 6 484 259 U:476342.1:2000SEP08 5914061F8 6 444 259 LI:476342.1:2000SEP08 5913683H1 1 281 259 LI:476342.1:2000SEP08 5320619F9 1 372 259 LI:476342.1:2000SEP08 5913683F6 4 358 259 LI:476342.1:2000SEP08 5914061 HI 6 264 259 U:476342.1:2000SEP08 5913683T6 28 431 259 U:476342.1:2000SEP08 5913683F8 27 200 259 LI:476342.1:2000SEP08 6269343F8 36 444 259 LI:476342.1:2000SEP08 6269343H1 36 444 259 LI:476342.1:2000SEP08 71637646V1 51 300 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
259 LI:476342.1:2000SEP08 6269670F8 59 444
259 LI.-476342,1.-2000SEP08 6269670T8 59 384
259 U:476342.1:2000SEP08 6269670H1 59 444
259 LI:476342.1:2000SEP08 71605057V1 228 376
260 LI:1072804.1 :2000SEP08 5910555F6 1 640
260 LI:1072804.1 :2000SEP08 5910555F8 1 587
260 LI:1072804.1 :2000SEP08 5910555T6 1 575
260 LI:1072804.1 :2000SEP08 5910555T8 85 489
260 LI:1072804.1 :2000SEP08 5910555T9 36 456
260 LI:1072804.1 :2000SEP08 5910555H1 1 194
261 Ll:455450.1: 2000SEP08 5911845T6 1 432
261 Ll:455450.1: 2000SEP08 5911845F8 1 588
261 Ll:455450.1: 2000SEP08 5911845F6 1 484
261 Ll:455450.1: 2000SEP08 5911845H1 1 254
261 Ll:455450.1: 2000SEP08 5911845T8 22 443
262 LI:1073699.1:2000SEP08 6794742H1 22 365
262 LI:1073699.1:2000SEP08 6796418H1 22 365
262 LI:1073699.1.-2000SEP08 6794009H1 22 297
262 LI:1073699.1:2000SEP08 6797985H1 59 383
262 LI:1073699.1:2000SEP08 6797985T8 59 257
262 LI:1073699.1:2000SEP08 6791482F8 59 365
262 LI:1073699.1:2000SEP08 6794247H1 1 362
262 LI:1073699.1:2000SEP08 6796383H1 22 365
262 LI:1073699.1:2000SEP08 6796341 HI 59 366
262 LI:1073699.1:2000SEP08 6791482H1 59 365
262 LI:1073699.1:2000SEP08 6791482T8 59 255
262 LI:1073699.1:2000SEP08 6797985F8 59 390
262 LI:1073699.1 :2000SEP08 6796341T8 59 249
262 LI:1073699.1:2000SEP08 6796341 F8 59 365
263 U:1013729.1:2000SEP08 8081248H2 1 644
263 U: 1013729.l:2000SEP08 6795338H1 49 567
263 U:1013729.1 :2000SEP08 6795338F8 49 643
263 U:1013729.1:2000SEP08 6795338T8 376 468
264 U:2050322.2:2000SEP08 1232566F1 464 1068
264 LI:2050322.2:2000SEP08 6430389H1 677 1254
264 LI:2050322.2:2000SEP08 1808715T6 691 1216
264 U:2050322.2:2000SEP08 041912H1 694 986
264 LI:2050322.2:2000SEP08 5802291 HI 646 971
264 U:2050322.2:2000SEP08 6845563H1 795 1216
264 LI:2050322.2:2000SEP08 g353927ό 814 1263
264 U:2050322.2:2000SEP08 g5857161 818 1255
264 LI:2050322.2:2000SEP08 g7457026 823 1258
264 U:2050322.2:2000SEP08 g4736196 823 1256
264 U:2050322.2:2000SEP08 6019406H1 826 1259
264 LI:2050322.2:2000SEP08 g2883372 834 1257
264 U:2050322.2:2000SEP08 955429H1 834 1147
264 U:2050322.2:2000SEP08 5196594H1 835 1063
264 U:2050322.2:2000SEP08 g6640997 837 1256
264 Ll:2050322.2 :2000SEP08 g5548433 837 1257 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 264 LI:2050322.2:2000SEP08 g2896497 854 1256 264 LI:2050322.2:2000SEP08 1475376R6 473 961 264 LI:2050322.2:2000SEP08 4598585H1 473 767 264 U:2050322.2:2000SEP08 1475376H1 473 674 264 L1:2050322.2:2000SEP08 2651228H1 502 759 264 LI:2050322.2:2000SEP08 4658940H1 537 768 264 U:2050322.2:2000SEP08 4939484H1 543 780 264 U:2050322.2:2000SEP08 2715435H1 558 813 264 LI:2050322.2:2000SEP08 1559421 HI 599 821 264 LI:2050322.2:2000SEP08 6019901 HI 603 1188 264 LI:2050322.2:2000SEP08 g652230 786 1241 264 U:2050322.2:2000SEP08 2505229T6 672 1219 264 LI:2050322.2:2000SEP08 6430122H1 677 1272 264 LI:2050322.2:2000SEP08 3143329H1 779 1109 264 LI:2050322.2:2000SEP08 5754935T8 659 1139 264 U:2050322.2:2000SEP08 4934014H1 661 961 264 LI:2050322.2:2000SEP08 1655116T6 666 1216 264 LI:2050322.2;2000SEP08 5525720H2 όόό 988 264 U:2050322.2:2000SEP08 2734436H1 1091 1256 264 LI:2050322.2:2000SEP08 6586205H1 1 473 264 LI:2050322.2:2000SEP08 5291518F6 4 400 264 LI:2050322.2:2000SEP08 203885H1 289 541 264 U:2050322.2:2000SEP08 5291518H1 4 254 264 U:2050322.2:2000SEP08 71304357V1 291 961 264 LI:2050322.2:2000SEP08 70789164V1 303 535 264 LI:2050322.2:2000SEP08 g6577740 907 1256 264 U:2050322.2:2000SEP08 g1406653 910 1259 264 U:2050322.2:2000SEP08 5332444H1 919 1119 264 U:2050322.2:2000SEP08 g3871471 921 1255 264 U:2050322.2:2000SEP08 4737914H1 903 1097 264 LI:2050322.2:2000SEP08 g7319136 904 1254 264 LI:2050322.2:2000SEP08 g6036700 1059 1257 264 LI:2050322.2:2000SEP08 2358421 HI 1061 1248 264 U:2050322.2:2000SEP08 g4900806 1064 1248 264 LI:2050322.2:2000SEP08 g1885931 1076 1248 264 LI:2050322.2:2000SEP08 2358234H1 1078 1248 264 LI:2050322.2:2000SEP08 g3001843 1083 1259 264 LI:2050322.2:2000SEP08 2734436T6 1084 1218 264 LI:2050322.2:2000SEP08 4822254H1 1085 1352 264 LI:2050322.2:2000SEP08 2734436F6 1091 1256 264 LI:2050322.2:2000SEP08 3674083H1 469 689 264 LI:2050322.2:2000SEP08 g5865341 865 1256 264 LI:2050322.2:2000SEP08 g1425520 868 1259 264 U:2050322.2:2000SEP08 g4004421 871 1255 264 LI:2050322.2:2000SEP08 g4266711 875 1264 264 LI:2050322.2:2000SEP08 g4003813 875 1255 264 LI:2050322.2:2000SEP08 g1953836 879 1255 264 LI:2050322.2:2000SEP08 g3539275 856 1256 264 U:2050322.2:2000SEP08 6723335H1 858 1256 TABLE 5
SEQ ID NO Template ID Component ID Start Stop
264 LI:2050322.2:2000SEP08 7020250H1 425 923
264 U:2050322.2:2000SEP08 7934249H1 428 1097
264 LI:2050322.2:2000SEP08 7965961 HI 447 1152
264 LI:2050322.2:2000SEP08 4185920H1 454 556
264 U:2050322.2:2000SEP08 g6475407 1096 1256
264 U:2050322.2:2000SEP08 3676421 HI 469 761
264 U:2050322.2:2000SEP08 70683609V1 465 1027
264 LI:2050322.2:2000SEP08 70683377V1 390 477
264 U:2050322.2:2000SEP08 5393080H1 348 633
264 U:2050322.2:2000SEP08 g2824231 967 1254
264 U:2050322.2:2000SEP08 4185282H1 979 1214
264 U:2050322.2:2000SEP08 2756901 HI 985 1262
264 U:2050322.2:2000SEP08 g2789017 1005 1256
264 U:2050322.2:2000SEP08 3238912H1 1032 1256
264 U:2050322.2:2000SEP08 g2752802 1046 1256'
264 U:2050322.2:2000SEP08 g4606954 941 1265
264 U:2050322.2:2000SEP08 g1952200 947 1254
264 U:2050322.2:2000SEP08 5392871 HI 346 633
264 U:2050322.2:2000SEP08 g5862585 729 1180
264 LI:2050322.2:2000SEP08 g5755440 739 1246
264 LI:2050322.2:2000SEP08 3859660H1 718 1016
264 U:2050322.2:2000SEP08 2505229F6 630 1215
264 U:2050322.2:2000SEP08 2505229H1 630 875
264 U:2050322.2:2000SEP08 1811086T6 630 1227
264 U:2050322.2:2000SEP08 g1527387 649 1101
264 LI:2050322.2:2000SEP08 2216534H1 629 887
264 LI:2050322.2:2000SEP08 4367794H1 624 899
264 U:2050322.2:2000SEP08 2208322H1 629 896
264 LI:2050322.2:2000SEP08 1232566H1 464 707
264 LI:2050322.2:2000SEP08 1811086H1 617 878
264 LI:2050322.2:2000SEP08 2358421T6 702 1213
264 LI:2050322.2:2000SEP08 8056748J1 707 1257
264 LI:2050322.2:2000SEP08 70685857V1 606 797
264 U:2050322.2:2000SEP08 1505277H1 615 892
264 U:2050322.2:2000SEP08 4459805H1 616 897
264 LI:2050322.2:2000SEP08 1811086F6 617 1119
265 LI.-891327.1: 2000SEP08 4906137H2 1 289
265 Ll:891327.1: 2000SEP08 5427025T8 60 579
265 Ll:891327.1: 2000SEP08 5427025F8 61 677
265 Ll:891327.1: 2000SEP08 4906137F6 1 401
265 Ll:891327.1:. 2000SEP08 5427025H1 61 343
265 Ll:891327.1: 2000SEP08 4516212H1 361 605
266 Ll:2053076.1 :2000SEP08 5617302H1 197 489
266 Ll:2053076.1 :2000SEP08 70536520V1 297 777
266 Ll:2053076.1 :2000SEP08 5617302R8 336 806
266 LI.-2053076.1 :2000SEP08 4317660T8 404 969
266 Ll:2053076.1 :2000SEP08 813140H1 451 710
266 Ll:2053076.1 :2000SEP08 4731735H1 649 916
266 Ll:2053076.1 :2000SEP08 5086779F8 1 563 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop 266 LI:2053076.1:2000SEP08 7218403H1 551 992 266 LI:2053076.1:2000SEP08 4317660H1 252 508 266 U:2053076.1:2000SEP08 g1880685 253 633 266 LI:2053076.1:2000SEP08 4179193H1 260 526 266 LI:2053076.1:2000SEP08 4935339H1 102 368 266 LI:2053076.1:2000SEP08 70535152V1 103 640 266 LI:2053076.1:2000SEP08 803781OH1 208 817 266 U:2053076.1:2000SEP08 8037810J1 220 852 266 LI:2053076.1:2000SEP08 g1967076 225 580 266 LI:2053076.1:2000SEP08 4317660F8 255 695 266 U:2053076.1:2000SEP08 7402448H1 1 297 266 LI:2053076.1:2000SEP08 6029531 HI 2 317 266 L1:2053076.1:2000SEP08 5086779H1 1 246 266 LI:2053076.1:2000SEP08 6824284H1 49 510 266 LI:2053076.1:2000SEP08 4935339F6 103 396 267 LG:220085.1:2000SEP08 6541929H1 432 928 267 LG:220085.1:2000SEP08 g314066 261 662 267 LG:220085.1:2000SEP08 7083405H1 1 529 267 LG:220085.1:2000SEP08 g712155 128 380 267 LG:220085.1;2000SEP08 g712130 26 364 268 LG:406709.1:2000SEP08 g3738020 1024 1268 268 LG:406709.1:2000SEP08 g5596083 853 1268 268 LG:406709.1:2000SEP08 2007067H1 849 1063 268 LG:406709.1:2000SEP08 2010757H1 1038 1123 268 LG:406709.1:2000SEP08 6026715T8 609 1171 268 LG:406709.1:2000SEP08 g2035382 205 428 268 LG:406709.1:2000SEP08 6026715H1 1 224 268 LG:406709.1:2000SEP08 6026715F6 1 640 268 LG:406709.1:2000SEP08 g2115758 846 996 268 LG:406709.1:2000SEP08 7684247H1 129 721 268 LG:406709.1:2000SEP08 6026715F8 1 523 268 LG:406709.1:2000SEP08 6026715T6 719 1038 268 LG:406709.1:2000SEP08 g3052844 801 1268 268 LG:406709.1:2000SEP08 g5363052 814 1269 268 LG:406709.1:2000SEP08 2013065H1 1038 1288 268 LG:406709.1:2000SEP08 2013065T6 1038 1272 268 LG:406709.1:2000SEP08 g3232595 827 1272 268 LG:406709.1:2000SEP08 g4223124 1017 1270 268 LG:406709.1:2000SEP08 g2162155 894 1268 268 LG:406709.1:2000SEP08 2013065R6 1038 1323 268 LG:406709.1:2000SEP08 g5395578 797 1263 268 LG;406709.1:2000SEP08 g4487277 941 1265 269 LG:347863.9:2000SEP08 7966127H1 1 601 269 LG:347863.9:2000SEP08 5307074H1 227 442 269 LG:347863.9:2000SEP08 4947827H1 248 316 269 LG:347863.9:2000SEP08 4947827F8 266 745 269 LG:347863.9:2000SEP08 2936425H1 487 750 270 U:1073027.1:2000SEP08 6792866H1 1 136 270 LI:1073027.1:2000SEP08 6792866F8 1 541
— — — —
O O — NT NO NT — NT 4N NT O SJ CO - NT 4N CO SJ SI O ro ro ro ro ro — O O j j o l j NT - 4N G0 4N U1 — O O NT - sJ O 4N 4N - 00 O Ul sl - O O Co sl sl sJ sl sl G0 C0 4N O 4N U1 T OO GO SI 4N C0 0 4N 0 4N cn si o — ' Oo o o o oo oo o oo o o cπ si oo o 4N — N si — 4N 00 O O O 4N O 4N — NT NT sl o 4N CO sl
TABLE 5
SEQ ID NO Template ID Component ID Start Stop
273 Ll:406709.1: 2000SEP08 7684247H1 129 724
273 Ll:406709.1: 2000SEP08 6026715F8 1 525
273 Ll:406709.1: 2000SEP08 g2035382 205 430
273 Ll:406709.1: 2000SEP08 6026715H1 1 224
273 Ll:406709.1: 2000SEP08 2013065H1 1043 1295
273 Ll:406709.1: 2000SEP08 2013065R6 1043 1330
273 Ll:406709.1: 2000SEP08 g3232595 830 1279
273 Ll:406709.1: 2000SEP08 g2162155 898 1275
273 LI.-400709.1: 2000SEP08 g5596083 857 1275
273 Ll:406709.1: 2000SEP08 g3738020 1029 1275
273 Ll:406709.1: 2000SEP08 g4487277 945 1272
273 Ll:406709.1: 2000SEP08 g5395578 800 1270
274 Ll:2052938.1 :2000SEP08 71746272V1 150 745
274 Ll:2052938.1 :2000SEP08 6303292H1 125 424
274 Ll:2052938.1 :2000SEP08 71746949V1 150 747
274 Ll:2052938.1 :2000SEP08 71747212V1 150 742
274 Ll:2052938.1 :2000SEP08 71745617V1 149 731
274 Ll:2052938.1 :2000SEP08 g2955524 286 736
274 Ll:2052938.1 :2000SEP08 g4740512 304 734
274 Ll:2052938.1 :2000SEP08 71744444VI 150 574
274 LI.-2052938.1 .-2000SEP08 71741423V1 150 553
274 Ll:2052938.1 :2000SEP08 2746019H1 261 490
274 Ll:2052938.1 :2000SEP08 g5231739 265 731
274 Ll:2052938.1 :2000SEP08 g6569347 274 734
274 Ll:2052938.1 :2000SEP08 5428676H1 150 401
274 11:2052938.1 :2000SEP08 71741551VI 171 225
274 Ll:2052938.1 :2000SEP08 71746335V1 152 592
274 Ll:2052938.1 :2000SEP08 55005970J1 247 829
274 Ll:2052938.1 :2000SEP08 2311091 6 409 732
274 Ll:2052938.1 :2000SEP08 2311091 HI 409 653
274 Ll:2052938.1 :2000SEP08 71740956V1 419 719
274 Ll:2052938.1 :2000SEP08 633969H1 ' 446 695
274 Ll:2052938.1 :2000SEP08 gό569255 472 734
274 LI.-2052938.1 .-2000SEP08 5428676F6 150 641
274 Ll:2052938.1 :2000SEP08 6148913H1 1 463
274 Ll:2052938.1 :2000SEP08 7285289H1 1 437
275 Ll:213208.1: 2000SEP08 g5665419 1 440
275 Ll:213208.1: 2000SEP08 g5432007 2 419
275 Ll:213208.1: 2000SEP08 616667H1 13 132
275 Ll:213208.1: 2000SEP08 7677770H2 18 498
275 Ll:213208.1:: 2000SEP08 g1364420 25 208
275 LI:213208.1:: 2000SEP08 g4734886 35 383
275 Ll:213208.1:: 2000SEP08 3407953m 74 340
275 Ll:213208.1:: 2000SEP08 g2354840 83 383
275 Ll:213208.1:: 2000SEP08 g4685246 129 383
275 Ll:213208.1:: 2000SEP08 g3149184 242 383 TABLE 6
SEQ ID NO: Template ID Tissue Distribution
1 LG:405741 ,3:2000SEP08 Digestive System - 31 %, Urinary Tract - 25%
2 LG:337194.1 :2000SEP08 Germ Cells - 48%, Unclassified/Mixed - 12%, Skin - 10%
3 LG:017108.4:2000SEP08 Sense Organs - 81%, Respiratory System - 13%
4 LG:372569.5:2000SEP08 Liver - 53%, Cardiovascular System - 16%
5 LG:968765.1 :2000SEP08 Liver - 75%, Female Genitalia - 17%
- 6 LG:255999.16:2000SEP08 Nervous System - 100%
7 LG:977820.9:2000SEP08 Embryonic Structures - 22%, Pancreas - 22%, Musculoskeletal System - 15%
8 LI:1071608.1 :2000SEP08 Liver - 77%, Pancreas - 13%
9 LI :1074023.1 :2000SEP08 Liver - 100%
10 LI:453570.1 :2000SEP08 Nervous System - 100%
1 1 LI:072072.1 :2000SEP08 * Hemic and Immune System - 17%, Germ Cells - 14%, Liver - 12%
12 LI:148565.4:2000SEP08 Nervous System - 57%, Male Genitalia - 43%
13 LI:368626.4:2000SEP08 Skin - 94%
14 U:346123.1 :2000SEP08 Exocrine Glands - 63%, Nervous System - 38%
15 LI:335795.1 1 :2000SEP08 Female Genitalia - 20%, Nervous System - 15%
16 U:246023.2:2000SEP08 Connective Tissue - 19%, Endocrine System - 18%
17 LG: 1 100661.1 :2000SEP08 Liver - 100%
18 LG:475856.1:2000SEP08 Nervous System - 67%, Hemic and Immune System - 33%
19 LG:1015343.1:2000SEP08 Liver - 100%
20 LG:1400575.1 :2000SEP08 Respiratory System - 29%, Male Genitalia - 24%, Endocrine System - 24%
21 LG:1080545.1:2000SEP08 Germ Cells - 35%, Urinary Tract - 15%, Liver - 12%, Pancreas - 12%
22 LG:213947.1:2000SEP08 Respiratory System - 50%, Hemic and Immune System - 50%
23 U:720641.1:2000SEP08 Nervous System - 100%
24 LI: 1023894.1 :2000SEP08 Liver - 100%
25 LI:734904.1 :2000SEP08 Unclassified/Mixed - 23%, Sense Organs - 20%
26 U:l 178118.1 :2000SEP08 Sense Organs - 38%, Unclassified/Mixed - 17%, Endocrine System - 12%
27 Ll:213947.1 :2000SEP08 Respiratory System - 50%, Hemic and Immune System - 50%
28 LG:407304.1:2000SEP08 Endocrine System - 20%, Pancreas - 14%, Embryonic Structures - 14%
29 LG:337358.1:2000SEP08 Nervous System - 34%, Male Genitalia - 26%
30 LG:986090.1 :2000SEP08 Nervous System - 100%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
31 LG: 123250.1 :2000SEP08 Musculoskeletal System - 32%, Hemic and Immune System - 32%, Respiratory System - 26%
32 LG:1028774.2:2000SEP08 Unclassified/Mixed - 23%, Female Genitalia - 13%, Male Genitalia - 12%
33 LG:338927.6:2000SEP08 Skin - 64%, Hemic and Immune System - 14%
34 LG:332944.2:2000SEP08 Endocrine System - 45%, Musculoskeletal System - 22%
35 LI:347174.5:2000SEP08 Skin - 39%, Male Genitalia - 14%
36 LI:477070.1 :2000SEP08 Nervous System - 100%
37 LI:723144.1 :2000SEP08 Nervous System - 100%
38 LI: 1007188.1.-2000SEP08 Liver - 100%
39 LI:1024412.1 :2000SEP08 Liver - 100%
40 LI:284797.3:2000SEP08 Digestive System - 42%, Nervous System - 34%, Hemic and Immune System - 13%
41 LI :1092901.1 :2000SEP08 Male Genitalia - 75%, Nervous System - 25%
42 L1:228930.1 :2000SEP08 Nervous System - 55%, Nervous System - 18%, Respiratory System - 14%, Hemic and Immune System - 14%
43 LI:722913.1 :2000SEP08 Nervous System - 100% 44 LG:457478.1 :2000SEP08 Nervous System - 100%
45 LG:358719.1 :2000SEP08 Urinary Tract - 96%
46 LG:105160.5:2000SEP08 Urinary Tract - 54%, Hemic and Immune System - 23%, Male Genitalia - 15%
47 LG:400705.1 :2000SEP08 Endocrine System - 34%, Female Genitalia - 13%, Cardiovascular System - 1 1%, Urinary Tract 11 %, Hemic and Immune System - 11 %
48 LG:221977.1 :2000SEP08 Hemic and Immune System - 42%, Germ Cells - 16%
49 LG:898771.1 :2000SEP08 Liver - 19%
50 LI:457478.1 :2000SEP08 Liver - 34%, Pancreas - 31%, Nervous System - 17%
51 LI:125140.1 :2000SEP08 Exocrine Glands - 81%
52 U:021095.2:2000SEP08 Digestive System - 100%
53 U:888730.1 :2000SEP08 Liver - 81%, Endocrine System - 15%
54 U:358719.1 :2000SEP08 Urinary Tract - 96%
55 LI:351342.3:2000SEP08 Exocrine Glands -48%, Female Genitalia - 15%
56 U:256099.2:2000SEP08 Cardiovascular System - 27%, Germ Cells - 23%, Exocrine Glands - 15%
57 LI:2051991.1 :2000SEP08 Hemic and Immune System - 47%, Urinary Tract - 21%, Digestive System - 16%, Respiratory System - 16%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
58 LG:980769.1 :2000SEP08 Unclassified/Mixed - 68%, Male Genitalia - 1 1 %, Hemic and Immune System - 1 1%
59 LG:332474.3:2000SEP08 Nervous System - 50%, Urinary Tract - 44%
60 LG:1087707.1:2000SEP08 Stomatognathic System - 74%
61 LG:415349.1:2000SEP08 Urinary Tract - 33%, Embryonic Structures - 21 %, Unclassified/Mixed - 19%
62 LG:132420.2:2000SEP08 Respiratory System - 30%, Male Genitalia - 20%, Female Genitalia - 20%, Digestive System - 20%
63 LG:394201.1:2000SEP08 Embryonic Structures - 86%, Respiratory System - 14%
64 LG:1060884.1 :2000SEP08 Germ Cells - 30%, Connective Tissue - 24%, Respiratory System - 15%
65 LG:242191.1:2000SEP08 Sense Organs - 44%, Endocrine System - 15%
66 LG:1063762.3:2000SEP08 Endocrine System - 23%, Embryonic Structures - 23%, Musculoskeletal System - 15%
67 LG:1100856.1 :2000SEP08 Liver - 100%
68 LG:979390.2:2000SEP08 Liver - 26%, Pancreas - 26%, Connective Tissue - 20%
69 LG:1400447.1 :2000SEP08 Respiratory System - 71%, Nervous System - 14%, Hemic and Immune System - 14%
70 LG:1400562.1:2000SEP08 Exocrine Glands - 19%, Respiratory System - 15%, Nervous System - 14%
CO o 71 LG:1076130.1:2000SEP08 Female Genitalia - 31%, Cardiovascular System - 25%, Exocrine Glands - 25% o
72 LG:1064459.1 :2000SEP08 Sense Organs - 62%, Endocrine System - 15%
73 LG:1079415.14:2000SEP08 Embryonic Structures - 90%, Nervous System - 10%
74 LG:1329431.3:2000SEP08 Respiratory System - 38%, Male Genitalia - 25%, Digestive System - 25%
75 LG:1088431.2:2000SEP08 Exocrine Glands - 50%, Cardiovascular System - 25%, Urinary Tract - 25%
76 LG:1329462.2:2000SEP08 Female Genitalia - 25%, Digestive System - 21%, Liver - 19%
77 LI:393468.1:2000SEP08 Unclassified/Mixed - 53%, Unclassified/Mixed - 31%, Urinary Tract - 12%
78 U:722577.1:2000SEP08 Nervous System - 100%
79 LI:322783.16:2000SEP08 Connective Tissue - 88%
80 LI:901355.2:2000SEP08 Male Genitalia - 97%
81 LI:038859.2:2000SEP08 Unclassified/Mixed - 61%, Female Genitalia - 19%
82 LI:1046117.1:2000SEP08 Sense Organs - 85%, Germ Cells - 15%
83 LI:801015.1:2000SEP08 Male Genitalia - 100%
84 LI:1175590.1 :2000SEP08 Musculoskeletal System - 57%, Endocrine System - 43%
85 L1:1170585.2:2000SEP08 Endocrine System - 38%, Endocrine System - 28%, Musculoskeletal System - 13%
86 LI:719531.2:2000SEP08 Hemic and Immune System - 100%
87 LI:794623.1:2000SEP08 Urinary Tract - 75%, Female Genitalia - 25%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
88 LI:1173119.1:2000SEP08 Digestive System - 32%, Liver - 26%, Respiratory System - 19%
89 LI:1093285.1:2000SEP08 Respiratory System - 36%, Female Genitalia - 32%, Digestive System - 28%
90 U:1091881.1:2000SEP08 Hemic and Immune System - 45%, Female Genitalia - 27%, Respiratory System - 27%
91 LI:1091617.1 :2000SEP08 Nervous System - 67%, Male Genitalia - 33%
92 LI:1082344.1:2000SEP08 Musculoskeletal System - 73%, Digestive System - 27%
93 LI:1166249.1.-2000SEP08 Liver - 34%, Exocrine Glands - 29%, Endocrine System - 17%
94 LI:799675.1:2000SEP08 Female Genitalia - 70%, Endocrine System - 12%, Exocrine Glands - 10%
95 LI:1178899.1 :2000SEP08 Female Genitalia - 69%, Female Genitalia - 16%
96 U:1169241.l:200OSEP08 Unclassified/Mixed - 55%, Musculoskeletal System - 26%, Nervous System - 13%
97 U:l 180090.1 :2000SEP08 Connective Tissue - 64%, Urinary Tract - 29%
98 LI:2049322.1:2000SEP08 Urinary Tract - 54%, Digestive System - 21 %, Respiratory System - 13%, Male Genitalia - 13%
99 LI:809074.1:2000SEP08 Skin - 76%
100 LI:805158.1:2000SEP08 Exocrine Glands - 38%, Nervous System - 23%, Male Genitalia - 23%
& 101 U:1172697.1:2000SEP08 Nervous System - 27%, Connective Tissue - 16%, Digestive System - 13%
"" 102 L1:1174107.2:2000SEP08 Sense Organs - 69%, Unclassified/Mixed - 26%
103 LI:1177434.2:2000SEP08 Unclassified/Mixed - 42%, Female Genitalia - 27%, Embryonic Structures - 19%
104 LI:1184255.1 :2000SEP08 Skin - 36%, Unclassified/Mixed - 34%, Connective Tissue - 18%
105 LI:1164555.1 :2000SEP08 Female Genitalia - 100%
106 LI:238666.4:2000SEP08 Endocrine System - 29%, Embryonic Structures - 29%, Exocrine Glands - 12%
107 LI:1166752.1:2000SEP08 Endocrine System - 32%, Exocrine Glands - 26%, Nervous System - 21 %, Urinary Tract - 21 %
108 LI:2049654.1:2000SEP08 Respiratory System - 50%, Female Genitalia - 25%, Hemic and Immune System - 17%
109 LI:242665.2:2000SEP08 Endocrine System - 67%, Female Genitalia - 33%
110 LI:208637.1:2000SEP08 Cardiovascular System - 17%, Stomatognathic System - 16%, Liver - 1 1 %
111 LI:2051808.1:2000SEP08 Liver- 100%
112 LI:1175136.1 :2000SEP08 Cardiovascular System - 100%
113 LI:1177337.1 :2000SEP08 Unclassified/Mixed - 40%, Nervous System - 30%, Respiratory System - 15%, Hemic and Immune System - 15%
114 LI:1165056.1 :2000SEP08 Female Genitalia - 58%, Nervous System - 12%, Nervous System - 10%
115 LI:1175250.1.-2000SEP08 Germ Cells - 83%, Digestive System - 10%
116 LI:1183192.1:2000SEP08 Cardiovascular System - 62%, Urinary Tract - 21 %
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
1 17 LI: 1 183325.1 :2000SEP08 Female Genitalia - 38%, Connective Tissue - 20%, Male Genitalia - 20%
1 19 U:813422.1 :2000SEP08 Sense Organs - 44%, Connective Tissue - 33%
120 LI:1093049.6:2000SEP08 Germ Cells - 20%, Endocrine System - 15%, Unclassified/Mixed - 14%
121 LI:202192.4:2000SEP08 Female Genitalia - 92%
122 LG: 1041854.1 :2000SEP08 Liver - 82%, Unclassified/Mixed - 18%
123 LG: 1 100502.1 :2000SEP08 Liver - 97%
124 LI:726414.1 :2000SEP08 Nervous System - 100%
125 U:400517.4:2000SEP08 Stomatognathic System - 58%, Embryonic Structures - 18%
126 LI:1078917.1 :2000SEP08 Liver - 100%
127 U:1012560.1 :2000SEP08 Unclassified/Mixed - 40%, Nervous System - 30%, Male Genitalia - 15%, Digestive System - 15%
128 LI:427997.4:2000SEP08 Liver - 15%, Male Genitalia - 14%, Embryonic Structures - 1 1 %
129 LI: 197899.1.-2000SEP08 Germ Cells - 24%, Male Genitalia - 19%, Unclassified/Mixed - 18%
130 LG:334199.1 :2000SEP08 Unclassified/Mixed - 43%, Endocrine System - 17%, Liver - 12%
131 LG:334345.1 :2000SEP08 Nervous System - 100%
132 LG:228092.1 :2000SEP08 Liver - 33%, Unclassified/Mixed - 20%, Germ Cells - 16%
133 LG:098580.1 :2000SEP08 Unclassified/Mixed - 59%, Cardiovascular System - 28%, Endocrine System - 14%
134 LG:969572.1 :2000SEP08 Hemic and Immune System - 100%
135 LG:196958.1 :2000SEP08 Hemic and Immune System - 53%, Germ Cells - 26%, Musculoskeletal System - 13%
136 LG:1087811.1:2000SEP08 Hemic and Immune System - 21%, Connective Tissue - 11%
137 LG:1327885.1:2000SEP08 Liver - 100%
138 U:449393.1:2000SEP08 Nervous System - 100%
139 LI:897616.1:2000SEP08 Liver - 98%
140 L1:736860.1:2000SEP08 Nervous System - 100%
141 LI:027066.6:2000SEP08 Respiratory System - 34%, Digestive System - 26%, Musculoskeletal System - 14%
142 LI:1074263.1:2000SEP08 Liver - 100%
143 LI:334345.1:2000SEP08 Nervous System - 100%
144 LI:1093914.1:2000SEP08 Nervous System - 52%, Unclassified/Mixed - 17%, Respiratory System - 13%
145 LI:1188168.1 :2000SEP08 Exocrine Glands - 13%
146 LI:1065168.1:2000SEP08 Liver - 100%
147 U:1180418.1:2000SEP08 Hemic and Immune System - 15%, Respiratory System - 13%, Musculoskeletal System - 13%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
148 LG:232648.1 :2000SEP08 Exocrine Glands - 15%, Female Genitalia - 14%, Hemic and Immune System - 11%
149 LG: 1078420.1 :2000SEP08 Urinary Tract - 23%, Musculoskeletal System - 20%, Female Genitalia - 17%
150 LG: 1397599.1 :2000SEP08 Liver - 35%, Connective Tissue - 27%, Urinary Tract - 15%
151 LG:1397655.2:2000SEP08 Embryonic Structures - 53%, Female Genitalia - 19%, Urinary Tract - 11 %
152 LG:241055.1 :2000SEP08 Skin - 20%, Exocrine Glands - 17%, Hemic and Immune System - 13%, Endocrine System - 13%
153 LG: 1 101065.1 :2000SEP08 Sense Organs - 42%, Unclassified/Mixed - 10%
154 LG:475629.1 :2000SEP08 Nervous System - 100%
155 LI:348991.1 :2000SEP08 Nervous System -42%, Endocrine System - 25%, Male Genitalia - 21%
156 LI:475629.1 :2000SEP08 Female Genitalia - 75%, Nervous System - 25%
157 LI:261331.1 :2000SEP08 Hemic and Immune System - 100%
158 11:815686.1 :2000SEP08 Urinary Tract - 33%, Pancreas - 14%, Connective Tissue - 12%
159 U:1 167327.2:2000SEP08 Connective Tissue - 26%, Musculoskeletal System - 23%, Female Genitalia - 23%
160 LI:758009.3:2000SEP08 Respiratory System - 94% 161 LG:331593.1 :2000SEP08 Hemic and Immune System - 39%, Unclassified/Mixed - 24%, Nervous System - 24%
162 LI:1094174.1 :2000SEP08 Stomatognat ic System - 30%, Musculoskeletal System - 11 %
163 U:814362.1 :2000SEP08 Female Genitalia - 21%, Musculoskeletal System - 14%, Hemic and Immune System - 13%
164 LI:219542.1 :2000SEP08 Unclassified/Mixed - 47%, Germ Cells - 31%, Male Genitalia - 20%
165 U:726197.1 :2000SEP08 Nervous System - 100%
166 LI:1075314.1 :2000SEP08 Liver - 100%
167 LI:437883.1 :2000SEP08 Liver - 98%
168 LG:336265.1 :2000SEP08 Endocrine System - 19%, Musculoskeletal System - 18%, Embryonic Structures - 13%
169 LG:407788.2:2000SEP08 Embryonic Structures - 50%, Endocrine System - 22%, Male Genitalia - 1 1%, Digestive System - 11%
170 LG:1326925.1 :2000SEP08 Liver - 47%, Digestive System - 42%, Male Genitalia - 11%
171 U:332655.2:2000SEP08 Pancreas - 26%, Digestive System - 17%, Female Genitalia - 13%
172 LI:1 184621.4:2000SEP08 Cardiovascular System - 71%, Endocrine System - 23%
173 LI:2051386.1 :2000SEP08 Skin - 32%, Nervous System - 21 %, Liver - 21 %
174 LG:362757.1 :2000SEP08 Connective Tissue - 78%, Nervous System - 22%
175 LG:406770.1 :2000SEP08 Unclassified/Mixed - 29%, Urinary Tract - 21 %, Female Genitalia - 16%
176 LG: 1094640.1 :2000SEP08 Musculoskeletal System - 87%, Digestive System - 13%
TABLE 6
\ ID NO: Template ID Tissue Distribution
177 LG:001929.1:2000SEP08 Stomatognathic System -56%, Skin - 20%, Digestive System - 13%
178 Ll:401322.1: :2000SEP08 Sense Organs - 45%, Liver - 20%, Skin - 15%
179 Ll:208748.1: :2000SEP08 Unclassified/Mixed - 14%, Germ Cells - 13%, Connective Tissue - 10%
180 Ll:407242.1: :2000SEP08 Connective Tissue - 27%, Nervous System - 12%
181 Ll:403409.1: :2000SEP08 Stomatognathic System - 51%, Respiratory System - 10%
182 Ll:450798.1: :2000SEP08 Female Genitalia - 97%
183 Ll:410317.1: :2000SEP08 Skin - 64%, Hemic and Immune System - 13%
184 Ll:340268.1: :2000SEP08 Urinary Tract - 33%, Nervous System - 25%, Digestive System - 25%
185 U:2051671.1:2000SEP08 Pancreas - 18%, Respiratory System - 15%, Musculoskeletal System - 13%, Digestive System - 13%
186 LG:998844.1:2000SEP08 Germ Cells - 93%
187 LG:1043787.1:2000SEP08 Liver - 100%
188 LG:1098931.16:2000SEP08 Urinary Tract - 67%, Female Genitalia - 33%
189 LG:199423.2:2000SEP08 Hemic and Immune System - 100%
190 LI:1075297.1:2000SEP08 Hemic and Immune System - 100%
191 LI:1043321.1:2000SEP08 Liver - 100%
192 U:297070.1:2000SEP08 Urinary Tract - 65%, Embryonic Structures - 15%
1 3 LI:1085041.1:2000SEP08 Liver - 100%
194 LI:1071544.1:2000SEP08 Liver - 100%
195 U:2052480.1:2000SEP08 Digestive System - 39%, Digestive System - 21 %, Liver - 17%
196 LG:450105.1:2000SEP08 Nervous System - 100%
197 LG:450581.1:2000SEP08 Nervous System - 100%
198 LG:450887.1:2000SEP08 Nervous System - 100%
199 LG:460809.1:2000SEP08 Exocrine Glands - 100%
200 LG:452089.1:2000SEP08 Nervous System - 100%
201 LG:1099416.1:2000SEP08 Embryonic Structures - 60%, Digestive System - 27%, Nervous System - 13%
202 LG:255713.1:2000SEP08 Male Genitalia - 44%, Respiratory System - 31%, Endocrine System - 25%
203 LG:998903.1:2000SEP08 Nervous System - 100%
204 LG:1119656.1 :2000SEP08 Urinary Tract - 57%, Female Genitalia - 29%, Hemic and Immune System - 14%
205 LG:1096907.1:2000SEP08 Liver - 90%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
206 LG:1323741.1 :2000SEP08 Liver - 52%, Connective Tissue - 32%
207 LG:1098372.1 :2000SEP08 Nervous System - 50%, Hemic and Immune System - 50%
208 LG:1006783.1 :2000SEP08 Liver - 100%
209 LG:1097562.1 :2000SEP08 Liver - 100%
210 LG:998868.1:2000SEP08 Nervous System - 100%
211 LG:1063383.1 :2000SEP08 Male Genitalia - 25%, Hemic and Immune System - 21%, Exocrine Glands - 14%, Urinary Tract - 14%
212 LG:1400567.1:2000SEP08 Digestive System -57%, Female Genitalia - 29%, Hemic and Immune System - 14%
213 Ll:449404.1 :2000SEP08 Nervous System - 100%
214 Ll:449941 ,2:2000SEP08 Nervous System - 100%
215 LI:450229.1:2000SEP08 Nervous System - 100%
216 LI:450399.3:2000SEP08 Nervous System - 100%
217 LI:455771.1:2000SEP08 Nervous System - 100%
GO O 218 LI:720459.1:2000SEP08 Endocrine System - 92% cn
219 U:723156.1:2000SEP08 Nervous System - 100%
220 LI:728055.1:2000SEP08 Liver - 96%
221 LI:1020789.1:2000SEP08 Liver - 100%
222 LI:1071728.1:2000SEP08 Liver - 100%
223 LI: 1084329.1 :2000SEP08 Liver - 100%
224 LI:246422.1 :2000SEP08 Hemic and Immune System - 67%, Nervous System - 33%
225 LI: 1086066.1 :2000SEP08 Liver - 100%
226 LI :223142.1 :2000SEP08 Germ Cells - 51 %, Female Genitalia - 24%
227 LI:885368.1 :2000SEP08 Nervous System - 67%, Female Genitalia - 33%
228 LI:481782.1 :2000SEP08 Cardiovascular System - 63%, Nervous System - 38%
229 LI:1093813.1 :2000SEP08 Liver - 100%
230 LI:449413.2:2000SEP08 Nervous System - 100%
231 LI:450105.1 :2000SEP08 Nervous System - 100%
232 LI:814285.1 :2000SEP08 Liver - 71%, Male Genitalia - 29%
233 LI : 1142855.1 :2000SEP08 Digestive System - 83%, Nervous System - 17%
234 LI :817330.1 :2000SEP08 Nervous System - 100%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
235 U:817845.1 :2000SEP08 Nervous System - 100%
236 LI:460809.1 :2000SEP08 Exocrine Glands - 83%, Nervous System - 17%
237 LI:815874.1 :2000SEP08 Musculoskeletal System - 90%
238 LI:255713.1 :2000SEP08 Male Genitalia - 28%, Endocrine System - 21%, Respiratory System - 21%
239 U:035973.1 :2000SEP08 Digestive System - 60%, Embryonic Structures - 24%, Digestive System - 10%
240 U:l 1381 10.1 :2000SEP08 Hemic and Immune System - 86%
241 LI:2049074.1 :2000SEP08 Liver - 98%
242 LI :1092460.1.-2000SEP08 Liver - 100%
243 LI:399421.1 :2000SEP08 Unclassified/Mixed - 69%, Male Genitalia - 23%
244 LI:816655.2:2000SEP08 Female Genitalia - 1 1 %
245 LG:414732.1 :2000SEP08 Endocrine System - 82%, Nervous System - 18%
246 LG:1 140250.1 :2000SEP08 Respiratory System - 100%
247 LG:174022.1 :2000SEP08 Sense Organs - 73%, Liver - 25%
CO o 248 LI:00281 1.1 :2000SEP08 Nervous System - 33%, Endocrine System - 25%, Cardiovascular System - 21 %, Female Genitalia o 21%
249 LI:414732.2:2000SEP08 Endocrine System - 80%, Nervous System - 20%
250 U: 1019920.1 :2000SEP08 Liver - 100%
251 LI :1038336.1 :2000SEP08 Nervous System - 100%
252 U:1 177772.1 1 :2000SEP08 Male Genitalia - 37%, Female Genitalia - 22%
253 LI:205642.2:2000SEP08 Nervous System - 100%
254 LG:449685.1 :2000SEP08 Nervous System - 100%
255 LG:453922.1 :2000SEP08 Nervous System - 100%
256 LG:476342.3:2000SEP08 Nervous System - 100%
257 L1:336801 .1 :2000SEP08 Germ Cells - 70%, Unclassified/Mixed - 20%
258 LI:449685.1 :2000SEP08 Nervous System - 100%
259 U:476342.1 :2000SEP08 Connective Tissue - 75%, Nervous System - 25%
260 LI :1072804.1 :2000SEP08 Nervous System - 100%
261 U:455450.1 :2000SEP08 Nervous System - 100%
262 LI: 1073699.1 :2000SEP08 Liver - 100%
263 U: 1013729.1 :2000SEP08 Liver - 73%, Endocrine System - 27%
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
264 U:2Q50322.2:2000SEP08 Pancreas - 35%
265 L1:891327.1 :2000SEP08 Digestive System - 50%, Hemic and Immune System - 50%
266 L1:2053076.1 :2000SEP08 Male Genitalia - 55%, Digestive System - 14%, Digestive System - 12%
267 LG:220085.1 :2000SEP08 Hemic and Immune System - 57%, Digestive System - 29%, Nervous System - 14%
268 LG:406709.1 :2000SEP08 Unclassified/Mixed - 66%, Male Genitalia - 29%
269 LG:347863.9:2000SEP08 Hemic and Immune System - 67%, Digestive System - 33%
270 LI :1073027.1 :2000SEP08 Liver- 100%
271 LI:347635.1 :2000SEP08 Female Genitalia - 49%, Musculoskeletal System - 22%
272 LI:013685.1 :2000SEP08 Male Genitalia - 34%, Embryonic Structures - 28%, Endocrine System - 14%
273 LI:406709.1 :2000SEP08 Unclassified/Mixed - 63%, Male Genitalia - 31%
274 LI:2052938.1 :2000SEP08 Germ Cells - 63%, Endocrine System - 20%
275 LI:213208.1 :2000SEP08 Germ Cells - 68%
CO o Sl
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
276 3 133 3 401 g8996018 4.00E-21 hexokinase 1 isoform td
276 3 133 3 401 g8996017 4.00E-21 hexokinase 1 isoform ta/tb
276 3 133 3 401 g34670 4.00E-21 hexokinase type 1
277 1 160 250 729 g870752 2.00E-22 N-acetylglucosaminyltransferase V
277 1 160 250 729 g4545222 2.00E-22 alpha-! ,3(6)-mannosylglycoprotein beta-1 ,6-N-acetyl- glucosaminyltransferase
277 1 160 250 729 g349091 5.00E-21 N-acetylglucosaminyltransferase V
278 2 125 239 613 g7687936 1.00E-18 possible adenylate kinase
278 2 125 239 613 g10177920 3.OOE-15 contains similarity to adenylate kinase~gene_id:MCA23.18
278 2 125 239 613 g!0176815 1.OOE-14 adenylate kinase-like
279 1 373 199 1317 g3273307 1.OOE-148 Lysophospholipase
279 1 373 199 1317 g7290456 3.00E-82 CG6428 gene product
279 1 373 199 1317 g3874557 1.00E-81 (Z81041) predicted using Genefinder-Similarity to E.coli L- asparaginase (SW:P18840), contains similarity to Pfam domain: o CO PF00023 (Ank repeat), Score=65.5, E-value=3.7e-16, N=2; PF00710
00 (Asparaginase), Score=174.7, E-value=5.1 e-49, N=l ~cDNA EST yk9f7.3 comes from this gene-cDNA EST yk25c6.5 comes from this gene-cDNA EST ykl 28d6.3 comes from this gene-cDNA EST yk!52f8.5 comes from this gene-cDNA EST ykl52f8.3 comes from this gene-cDNA EST yk348d9.3 comes from this gene-cDNA EST yk348d9.5 comes from this gene-cDNA EST yk225c!2.3 comes from this gene-cDNA ES yk225cl2.5 comes from this gene-cDNA EST yk430c7.5 comes from this gene
280 2 227 29 709 g488838 1.OOE-105 CaBPl
280 2 227 29 709 g13905146 1.OOE-105 Similar to protein disulfide isomerase-related protein
280 2 227 29 709 g12838858 1.OOE-104 putative
282 2 281 17 859 g6996429 1.OOE-111 dJ568Cl 1.3 (novel AMP-binding enzyme similar to acetyl- coenzyme A synthethase (acetate-coA ligase))
282 2 281 17 859 g12697774 1.00E-106 acetyl-CoA synthetase 2
282 2 281 17 859 g12697772 1.OOE-104 acetyl-CoA synthetase 2
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
283 3 1 15 198 542 g399660 3.00E-51 aldehyde reductase
283 3 1 15 198 542 g7677318 8.00E-51 aldehyde reductase
283 3 1 15 198 542 g 12848322 8.00E-51 putative
• 284 3 214 30 671 g56336 1 .OOE-104 glutathione S-transferase (aa 1 -209)
284 3 214 30 671 g459939 1. OOE-104 glutathione S-transferase
284 3 214 30 671 g695303 2.00E-96 GST pi enzyme
285 3 212 3 638 g2909424 8.00E-86 Glyoxalase 1
285 3 212 3 638 g 12744892 5.00E-84 glyoxalase 1
285 3 212 3 638 g21 13825 3.00E-83 Glyoxalase 1
286 2 183 44 592 g 12856270 7.00E-06 putative -
286 2 183 44 592 g 10434969 7.00E-06 unnamed protein product
287 164 1 492 g414607 1.00E-09 glyceraldehyde-3-phosphate dehydrogenase
287 164 1 492 g409575 1.00E-09 glyceraldehyde-3-phosphate dehydrogenase
Co o 287 164 1 492 g312179 3.00E-09 glyceraldehyde 3-phosphate dehydrogenase (phosphorylating) o
288 154 124 585 g 13378170 7.00E-29 arachidonate lipoxygenase 3 -
288 154 124 585 g 10799676 7.00E-29 lipoxygenase-3
288 154 124 585 g 10441004 7.00E-29 epidermal lipoxygenase
289 2 148 2 445 g217974 1.OOE-06 triosephosphate isomerase
289 2 148 2 445 g 168647 1 .OOE-06 triosephosphate isomerase 1
290 2 291 485 1357 g 12847081 1. OOE-1 1 1 putative
290 2 291 485 1357 gl 2653491 1. OOE-1 10 Similar to threonyl-tRNA synthetase
290 2 291 485 1357 g 1464742 1. OOE-l 09 threonyl-tRNA synthetase
291 1 354 1 1062 g 10434528 0 unnamed protein product
291 1 354 1 1062 g 13278319 1. OOE-137 Similar to hypothetical protein FLJ 12816
291 1 354 1 1062 g4929585 1 .OOE-1 1 1 CG1-58 protein
292 3 132 3 398 g57468 4.00E-63 oxytocin
292 3 132 3 398 g205900 4.00E-63 oxytocin/neurophysin
292 3 132 3 398 g205894 4.00E-63 oxytocin/neurophysin precursor
293 1 100 1 300 g6467206 2.00E-25 gonadotropin inducible transcription repressor-4
293 1 100 1 300 g6330394 8.00E-25 ' KIAA1 198 protein
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
293 1 100 1 300 g12804721 3.OOE-24 Unknown (protein for MGC:2663)
294 1 146 1 438 g57279 1.00E-51 pre-prosomatostatin
294 1 146 1 438 g297530 1.00E-51 preprosomatostatin
294 1 146 1 438 g207031 1.00E-51 somatostatin precursor
295 2 285 no 964 g6467206 l.OOE-101 gonadotropin inducible transcription repressor-4
295 2 285 no 964 g13623354 4.00E-95 Similar to zinc finger protein 136 (clone pHZ-20)
295 2 285 no 964 g6330394 2.00E-93 KIAAl l 98 protein
296 2 213 2 640 g12052983 4.00E-71 hypothetical protein
296 2 213 2 640 g5262560 2.00E-68 hypothetical protein
296 2 213 2 640 g10434856 6.00E-68 unnamed protein product
297 1 95 94 378 g7262613 7.00E-13 candidate taste receptor T2R7
297 1 95 94 378 g7262619 2.00E-12 candidate taste receptor T2R10
297 1 95 94 378 g7262615 3.00E-12 candidate taste receptor T2R8 sj 298 2 328 50 1033 g32093 8.00E-69 HGMP07J o 298 2 328 50 1033 gl419016 7.00E-68 odorant receptor
298 2 328 50 1033 gl 1692549 2.00E-66 odorant receptor K30
299 3 101 660 962 g7158201 1.00E-25 cytokine receptor-like protein CYRL
301 3 427 348 1628 g14495650 2.00E-75 (BC009433) zinc finger protein 331; zinc finger protein 463
301 3 427 348 1628 g8575775 2.00E-75 KRAB zinc finger protein
301 3 427 348 1628 g13939858 2.00E-75 RITA
302 3 95 45 329 g7262613 7.00E-13 candidate taste receptor T2R7
302 3 95 45 329 g7262619 2.00E-12 candidate taste receptor T2R10
302 3 95 45 329 g7262615 3.00E-12 candidate taste receptor T2R8
304 2 407 221 1441 g9857402 1.OOE-155 tumor endothelial marker 2
304 2 407 221 1441 g4092830 1.00E-155 dJ569D19.1 (similar to mouse Ras, Dexamethasone-induced 1 (Ras-related protein, RASD1, DEXRAS1))
304 2 407 221 1441 g5059122 1.OOE-145 Rhes protein
305 1 277 1 831 g1519251 1.OOE-131 GF14-C protein
305 1 277 1 831 g2921512 1.OOE-123 GF14 protein
305 1 277 1 831 g7271253 1.OOE-123 14-3-3-like protein
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
306 2 208 2 625 g6358507 7.00E-17 guanine exchange factor MCG7 isoform 2
306 2 208 2 625 g6358505 7.00E-17 guanine exchange factor MCG7 isoform 1
306 2 208 2 625 g4225848 7.00E-17 calcium- and diacylglycerol-regulated guanine nucleotide exchange factor I
308 2 205 635 1249 g3599940 5.00E-78 faciogenital dysplasia protein 2
308 2 205 635 1249 g3342246 3.00E-50 actin-filament binding protein Frabin
308 2 205 635 1249 g3599944 - 1.00E-48 faciogenital dysplasia protein
309 1 545 1 1635 g5823454 5.00E-75 GTPase-activating protein 6 isoform 4
309 1 545 1 1635 g7243304 2.00E-74 rho-type GTPase-activating protein isoform 3
309 1 545 1 1635 g5881233 2.00E-74 rho GTPase-activating protein 6 isoform 5
310 2 212 2 637 g437985 1.OOE-105 Rab 12 protein
310 2 212 2 637 g206531 9.00E-96 RAB12
310 2 212 2 637 gl2851149 4.00E-75 putative
311 2 189 2 568 gl69931 3.00E-80 Glycine max calcium dependent protein kinase mRNA
311 2 189 2 568 g2501764 1.00E-75 calmodulin-like domain protein kinase isoenzyme beta
311 2 189 2 568 g7321076 5.00E-75 calmodulin-domain protein kinase CDPK isoform 4 (CPK4)
312 2 202 2 607 g22935όό 1.OOE-101 ADP-ribosylation factor 1
312 2 202 2 607 g2275195 1.OOE-100 ADP-ribosylation factor 1
312 2 202 2 607 g166586 1.OOE-100 ADP-ribosylation factor
313 1 178 1 534 g5689475 1.00E-67 KIAA1069 protein
313 1 178 1 534 g6705987 3.00E-48 phospholipase C-L2
313 1 178 1 534 g5689521 4.00E-48 KIAA1092 protein
314 3 77 75 305 g385168 3.00E-33 G-protein gamma subunit
314 3 77 75 305 g3450746 4.00E-31 GBG7_HUMAN
314 3 77 75 305 g3149954 4.00E-31 G-protein gamma 7
315 3 213 195 833 gl3183338 1.OOE-107 calneuron 1
315 3 213 195 833 g7670344 1.OOE-107 unnamed protein product
315 3 213 195 833 gl 3183340 1.OOE-107 calneuron 1
317 2 235 47 751 g9368450 1.OOE-109 phospholipase C-beta-1 b
317 2 235 47 751 g9368448 1.OOE-109 phospholipase C-beta-1 a
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
317 2 235 47 751 g206218 1.OOE-108 phospholipase C-l
318 1 208 118 741 g3450893 1.00E-87 ras-like small monomeric GTP-binding protein
318 1 208 118 741 g7268592 4.00E-86 SARI /GTP-binding secretory factor
318 1 208 118 741 g2104550 4.00E-86 AGAA.4
320 3 302 258 1163 g4584382 1.OOE-152 El a protein from 13s mrna (32k,regulation and transformation)
320 3 302 258 1163 g209814 1.OOE-149 32 kD protein
320 3 302 258 1163 g4584383 1.OOE-115 El a protein from 12s mrna (26k, regulation and tranformation)
321 162 1 486 g531901 2.00E-36 nuclear respiratory factor-2 subunit gamma 2
321 162 1 486 g531897 2.00E-36 nuclear respiratory factor-2 subunit beta 2
321 162 1 486 g286025 2.00E-36 E4TF1-53
322 195 34 618 g998899 5.00E-36 scleraxis=basic helix-loop-helix transcription factor (mice, embryos, Peptide, 207 aa)
322 195 34 618 g2155242 8.00E-22 paraxis
322 195 34 618 gl 813563 2.00E-21 paraxis
324 262 1513 2298 g508528 1.OOE-136 myocyte nuclear factor
324 262 1513 2298 g2289235 1.00E-119 myocyte nuclear factor-beta
324 262 1513 2298 g33854 2.00E-96 transcription factor ILF
326 3 408 165 1388 gό979924 1.00E-62 RP58
326 3 408 165 1388 g4959903 1.00E-62 transcriptional repressor RP58
326 3 408 165 1388 g4128145 1.00E-62 RP58 protein
327 1 59 1 177 g55624 2.00E-19 alpha initiation factor
327 1 59 1 177 g37058 2.00E-19 IIB protein
327 1 59 1 177 g339490 2.00E-19 transcription factor
329 3 104 144 455 gό66914 8.00E-22 ferritin L-subunit
329 3 104 144 455 g309234 8.00E-22 ferritin light chain
329 3 104 144 455 g204133 8.00E-22 ferritin light chain
330 2 168 452 955 g4584382 2.00E-05 El a protein from 13s mrna (32k, regulation and transformation)
335 3 122 399 764 g4309888 2.00E-54 similar to zinc finger proteins; similar to protein S47071
(PID:g631503), match to EST AA339462 (NID:gl 991774)
335 3 122 399 764 g9502403 2.00E-06 Hypothetical zinc finger-like protein
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
336 1 137 370 780 g 186774 8.00E-27 zinc finger protein
336 1 137 370 780 g2384653 1.00E-25 Krueppel family zinc finger protein
336 1 137 370 780 g 14348591 3.00E-25 KRAB zinc finger protein
338 2 166 2 499 g 1 1345048 6.00E-26 SCAN domain-containing protein 2
338 2 166 2 499 gl 1320940 6.00E-26 SCAND2
338 2 166 2 499 g 12859721 2.00E-25 putative
340 3 198 372 965 g7630121 1.00E-19 zinc finger protein 92
340 3 198 372 965 g 1401082 1. OOE-19 kruppel-type zinc finger protein
340 3 198 372 965 g2924250 3.00E-17 dJ29Kl .2
341 2 439 341 1657 g3638956 0 zinc finger-like; similar to P52742 (PID:gl 73141 1)
341 2 439 341 1657 g7670496 1.00E-170 unnamed protein product
341 2 439 341 1657 gl020145 7.00E-75 DNA binding protein
342 1 404 61 1272 g506502 1. OOE-141 NK10
342 1 404 61 1272 g3135968 8.00E-71 b34l8.1 (Kruppel related Zinc Finger protein 184)
342 1 404 61 1272 g3970712 2.00E-69 zinc finger protein 10
343 2 139 2 418 g 13097465 6.00E-62 RIKEN cDNA 31 10024A21 gene
343 2 139 2 418 g 12837667 4.00E-61 putative
343 2 139 2 418 g2190184 3.00E-54 zinc finger protein
345 1 128 229 612 g9502403 2.00E-07 Hypothetical zinc finger-like protein
346 " 3 146 99 536 g2739353 9.00E-56 ZNF91 L
346 3 146 99 536 g7959207 7.00E-50 KIAA1473 protein
346 3 146 99 536 g3342002 9.00E-50 hematopoietic cell derived zinc finger protein
347 3 250 3 752 g2843171 3.00E-71 zinc finger protein
347 3 250 3 752 g55471 ό.OOE-71 Zfp-29
347 3 250 3 752 g 12855698 6.00E-71 putative
348 2 365 2 1096 g3135968 4.00E-84 b34l8.1 (Kruppel related Zinc Finger protein 184)
348 2 365 2 1096 g5640017 7.00E-84 zinc finger protein ZFP1 13
348 2 365 2 1096 g 1769491 2.00E-80 kruppel-related zinc finger protein
350 2 185 209 763 g4164083 l .OOE-52 zinc finger protein EZNF
350 2 185 209 763 g2970038 1 .00E-52 HKL1
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
350 2 185 209 763 g6007769 6.00E-42 KID1
351 1 89 127 393 g7023216 3.00E-18 unnamed protein product
351 1 89 127 393 gl 2804415 8.00E-18 Similar to hypothetical protein FLJ10891
351 1 89 127 393 g13752754 2.00E-16 zinc finger 111 1
353 2 322 2 967 g2618752 1.OOE-105 zinc finger protein
353 2 322 2 967 gόl 77785 4.00E-38 HKR1
353 2 322 2 967 g13325427 4.00E-38 Unknown (protein for IMAGE:3928207)
354 1 115 1 345 g5107180 1.00E-31 small zinc finger-like protein
354 1 115 1 345 g5107088 5.00E-30 small zinc finger-like protein
354 1 115 1 345 g5107174 3.00E-27 small zinc finger-like protein
355 1 108 379 702 g7159800 3.00E-58 dJ351 K20.1.2 (novel C3HC4 type Zinc finger (RING finger) protein (isoform 2))
355 1 108 379 702 g7159799 3.00E-58 dJ351 K20.1.1 (novel C3HC4 type Zinc finger (RING finger) protein (isoform 1)) 355 1 108 379 702 g14198342 3.00E-58 hypothetical protein DKFZp43401427
356 2 166 74 571 g8163824 1.00E-37 krueppel-like zinc finger protein HZF2
356 2 166 74 571 g498723 4.00E-32 zinc finger protein
356 2 166 74 571 g4235144 1.00E-28 BC39498J
358 1 127 381 g13623633 1.OOE-18 Unknown (protein for MGC: 13105)
358 1 127 381 g4567180 1.OOE-13 BC37295_2 (partial)
358 1 127 381 g12804721 6.00E-13 Unknown (protein for MGC:2663)
359 1 78 234 g487785 4.00E-16 zinc finger protein ZNF136
359 1 78 234 g13623607 4.00E-16 zinc finger protein 136 (clone pHZ-20)
359 1 78 234 g13623354 1.OOE-15 Similar to zinc finger protein 136 (clone pHZ-20)
360 3 158 3 476 g14424716 2.00E-59 hypothetical protein FLJ 11637
360 3 158 3 476 g10432938 2.00E-59 unnamed protein product
360 3 158 3 476 g9187356 2.00E-30 hypothetical protein, similar to (AB021644)GONADOTROPIN INDUCIBLE TRANSCRIPTION REPRESSOR-4
361 2 115 443 787 g7576274 2.00E-48 bA393J16.3 (novel KRAB box containing zinc finger gene)
361 2 115 443 787 g12841623 5.00E-34 putative
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
361 2 1 15 443 787 g488551 2.00E-25 zinc finger protein ZNF132
363 2 64 53 244 g702321ό 1.00E-1 unnamed protein product
363 2 64 53 244 gl 2804415 1 . OOE-19 Similar to hypothetical protein FLJ 10891
363 2 64 53 244 g 13752754 7.00E-17 zinc finger 1 1 1 1
364 3 255 3 767 g488551 2.00E-80 ' zinc finger protein ZNF132
364 3 255 3 767 g9968290 2.00E-78 zinc finger protein 304
364 3 255 3 767 g 14249844 2.00E-78 Similar to hypothetical protein FLJ23233
365 1 97 253 543 g7023417 2.00E-36 unnamed protein product
365 1 97 253 543 g 14042715 2.00E-36 unnamed protein product
365 1 97 253 543 gl ! 917507 2.00E-36 HPF1 protein
366 1 158 1 474 g9802037 7.00E-61 zinc finger protein SBZF3
366 1 158 1 474 g4235144 1.00E-36 BC39498J
366 1 158 1 474 g 186774 3.00E-35 zinc finger protein
367 2 122 185 550 g5730196 2.00E-29 Kruppel-type zinc finger
367 2 122 185 550 g 12849906 4.00E-28 putative
367 2 122 185 550 g55483 2.00E-24 Zfp-1 protein (AA 1-424)
368 1 242 1 726 g220637 1.00E-67 zinc finger protein
368 1 242 1 726 g4559318 2.00E-62 BC273239J
368 1 242 1 726 g6467206 3.00E-62 gonadotropin inducible transcription repressor-4
369 3 92 48 323 g3342002 4.00E-37 hematopoietic cell derived zinc finger protein
369 3 92 48 323 g7959207 1 .00E-35 KIAA1473 protein
369 3 92 48 323 g 186774 2.00E-35 zinc finger protein
370 1 85 214 468 g7023216 4.00E-13 unnamed protein product
' 370 1 85 214 468 g 12804415 4.00E-13 Similar to hypothetical protein FLJ 10891
370 1 85 214 468 g 13752754 ό.OOE-12 zinc finger 1 1 1 1
372 3 206 3 620 g4567180 1 . OOE-102 BC37295_2 (partial)
372 3 206 3 620 g9502202 2.00E-99 endothelial zinc finger protein induced by tumor necrosis factor
372 3 206 3 620 gl 3879240 1. OOE-l 6 Similar to zinc finger protein 46
373 1 206 121 738 g7981299 3.00E-41 dJ31316.6 (zinc finger protein 165)
373 1 206 121 738 g683471 3.00E-41 ZNF165
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation 373 1 206 121 738 g4154166 3.00E-41 zinc finger protein 374 1 160 49 528 g13752754 1 .00E-28 zinc finger 1 1 1 1 374 1 160 49 528 g14348588 5.00E-28 KRAB zinc finger protein 374 1 160 49 528 g12654015 3.00E-27 Similar to hypothetical protein FLJ 10891 375 2 115 140 484 gl0434195 5.00E-57 unnamed protein product 375 2 115 140 484 g13529188 2.00E-36 Unknown (protein for MGC: 12466) 375 2 115 140 484 g13623354 5.00E-33 Similar to zinc finger protein 136 (clone pHZ-20) 376 2 120 392 751 gl4042186 3.00E-35 unnamed protein product 376 2 120 392 751 g55475 2.00E-32 Zink-finger protein 37 376 2 120 392 751 g53457 2.00E-32 zinc finger protein (AA 1-41 1) 377 1 273 73 891 g10435738 1.OOE-142 unnamed protein product 377 1 273 73 891 g3342002 4.00E-87 hematopoietic cell derived zinc finger protein 377 1 273 73 891 g7959207 1.00E-85 KIAA1473 protein co i 378 3 132 48 443 g13623587 2.00E-45 Similar to zinc finger protein 254 o 378 3 132 48 443 g10435738 3.00E-35 unnamed protein product 378 3 132 48 443 g3342002 2.00E-34 hematopoietic cell derived zinc finger protein 379 2 233 563 1261 g2689444 5.00E-76 ZNF134 379 2 233 563 1261 g488553 7.00E-69 zinc finger protein ZNF134 379 2 233 563 1261 gl0440218 3.00E-68 unnamed protein product 380 2 140 2 421 g7023417 2.00E-48 unnamed protein product 380 2 140 2 421 g14042715 2.00E-48 unnamed protein product 380 2 140 2 421 g11917507 2.00E-48 HPF1 protein 381 3 153 27 485 g10434781 9.00E-48 unnamed protein product 381 3 153 27 485 g13938351 5.00E-47 Similar to zinc finger protein 268 381 3 153 27 485 g7023216 3.00E-45 unnamed protein product 382 1 420 61 1320 g506502 1. OOE-138 NK10 382 1 420 61 1320 g3135968 1.00E-66 b3418.1 (Kruppel related Zinc Finger protein 184) 382 1 420 61 1320 gl 769491 8.00E-65 kruppel-related zinc finger protein 384 3 186 3 560 g9502403 3.00E-07 Hypothetical zinc finger-like protein
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
386 1 476 43 1470 g2085786 0 similar to zinc finger 5 protein from Gallus gallus, U51640 (PID:gl399185)
386 476 43 1470 g4454855 0 zinc finger transcription factor Kaiso
386 476 43 1470 gl 399187 2.00E-20 zinc finger 5 protein
387 292 876 g487284 1 . OOE-109 CRP2 (cysteine-rich protein 2)
387 292 876 g 13938064 1.00E-108 RIKEN cDNA 0610010123 gene
387 292 876 g 12832503 1. OOE-108 putative
389 171 513 g 186774 1 .00E-71 zinc finger protein
389 171 513 g2723316 8.00E-71 Zinc-finger protein
389 171 513 g 1017722 2.00E-69 repressor transcriptional factor
390 2 336 476 1483 g 13752754 6.00E-72 zinc finger 1 1 1 1
390 2 336 476 1483 g 14348588 4.00E-69 KRAB zinc finger protein
390 2 336 476 1483 g 10440398 4.00E-69 FLJ00032 protein
Co sj 391 1 375 121 1245 g 10434195 0 unnamed protein product sl
391 1 375 121 1245 g 13529188 1. OOE-128 Unknown (protein for MGC: 12466)
391 1 375 121 1245 g6330394 1 . OOE-120 KIAA1 198 protein
392 3 444 84 1415 g3135968 1. OOE-102 b34l8.1 (Kruppel related Zinc Finger protein 184)
392 3 444 84 1415 g 14042550 1. OOE-100 unnamed protein product
392 3 444 84 1415 g 13937909 1. OOE-100 Similar to KIAA0961 protein
393 1 620 235 2094 g 14042330 0 unnamed protein product
393 1 620 235 2094 g3882317 0 KIAA0798 protein
393 1 620 235 2094 g 1280401 1 0 KIAA0798 gene product
394 3 134 3 404 g487787 1. OOE-13 zinc finger protein ZNF140
394 3 134 3 404 g 13752754 5.00E-13 zinc finger 1 1 1 1
394 3 134 3 404 g7023216 7.00E-13 unnamed protein product
396 2 1 14 413 754 g 12853416 3.00E-24 putative
396 2 1 14 413 754 g 13529497 6.00E-23 Unknown (protein for MGQ6652)
396 2 1 14 413 754 g4514561 1.00E-21 KRAB-containing zinc-finger protein KRAZ2
398 3 208 3 626 g457929 1 .00E-65 delta subunit of Fl F0 ATPase
398 3 208 3 626 g 14198434 3.00E-63 RIKEN cDNA 0610008F14 gene
TABLE 7
SEQ ID NO: Frαmc- 3 Length Start Stop Gl Number Probability Sc
398 3 208 3 626 g12857538 3.00E-63 putative
399 3 104 51 362 g12859516 9.00E-49 putative
399 3 104 51 362 g12859507 9.00E-49 putative
399 3 104 51 362 g12859498 9.00E-49 putative
400 1 284 1 852 g3319340 3.00E-68 contains similarity to E. coli cation transport protein ChaC (GB:D90756)
400 1 284 1 852 g7270031 9.00E-67 predicted protein
400 1 284 1 852 g2827524 9.00E-67 predicted protein
401 2 297 2 892 g7290145 7.00E-46 EG:8D8.3 gene product
401 2 297 2 892 g2950398 7.00E-46 /prediction=(method:""genscan"", version:""l .0"", score:""294.38"")~/match=(desc:""THIAZIDE-SENSITIVE SODIUM- CHLORIDE COTRANSPORTER (NA-CL SYMPORTER)"", species:""HOMO SAPIENS (HUMAN)"", ranges:(query:33174..33518, target:SWISS-PROT::P55017:246..132, score:""258.00""), (query:33015..33149, target:SWISS-PROT::P55017:304..260, score:""75.00"M), (query:32642.,32761, target:SWISS- PROT::P55017:378..339, score:""121.00""), (query :32339..32503, target:SWISS-PROT::P55017:498..444, score :""76.00""), (query:32122..32268, target:SWISS-PROT::P55017:548..500, score:""97.00""), (query :31494..31625, targe SWISS- PROT::P55017:617..574 score:""55.00"")), method :""blastx"", version:""1.4.9,l")~/match=(desc:""BUMETANIDE-SENSITIVE SODIUM- (POTASSIUM)-CHLORIDE COTRANSPORTER 1 (BASOLATERAL NA-K- CL SYMPORTER)"", species:""HOMO SAPIENS (HUMAN)"", ranges:(query:33174„33512, target:SWISS-PROT::P55011 :395..283, score:""247.00""), (query:33012..33149, target:SWISS- PROT::P5501 1 :454.,409, score:""97.00""), (query :32642..32827, target:SWISS-PROT::P5501 1 :525..464, score:""l 54.00""), (query:32339..32515, target:SWISS-PROT::P55011 :644..586, score:""93.00""), (query:32122..32268, target:SWISS-
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
401 2 297 2 892 g 1086832 3.00E-37 coded for by C. elegans cDNA cml 3gl; Similar to bumetanide- sensitive Na-K-Cl cotransporter
402 2 148 2 445 g5101 6 7.00E-84 TF
402 2 148 2 445 g 1854476 7.00E-84 transferrin
402 2 148 2 445 g 14250269 8.00E-72 Unknown (protein for IMAGE:3592890)
403 3 165 651 1 145 g9957542 6.00E-86 connexin 59
403 3 165 651 1 145 gl 4189950 6.00E-86 connexin 58
403 3 165 651 1 145 g 10946367 7.00E-36 connexin 55.5
404 3 285 3 857 g6996442 8.00E-49 CTL1 protein
404 3 285 3 857 g6996589 2.00E-47 CTL1 protein
404 3 285 3 857 g6996587 7.00E-39 CTL1 protein
405 2 414 749 1990 g 14042129 0 unnamed protein product
405 2 414 749 1990 g21 16552 1. OOE-158 cationic amino acid transporter 3
405 2 414 749 1990 g 1575776 1. OOE-154 cationic amino acid transporter
407 2 - 188 2 565 g5921501 9.00E-21 distal intestinal serine protease
407 2 188 2 565 g4753837 8.00E-18 tryptase
407 2 188 2 565 g4753835 4.00E-17 tryptase
409 3 109 60 386 g999454 5.00E-30 TX protease precursor
409 3 109 60 386 g903934 5.00E-30 cysteine protease
409 3 109 60 386 g886050 5.00E-30 lch-2
410 3 235 3 707 g205308 1. OOE-140 alpha- 1 major acute phase protein prepeptide
410 3 235 3 707 g207341 1 . OOE-133 T-kininogen
410 3 235 3 707 g205085 1. OOE-131 LMW T-kininogen 1 precursor
41 1 2 210 1 1 640 g 13516326 9.00E-20 marapsin
41 1 2 210 1 1 640 g 12841953 1. OOE-l 7 putative
41 1 2 210 1 1 640 g 12836503 7.00E-16 putative
413 3 184 3 554 g4586674 2.00E-99 signal peptidase 21 kDa subunit
413 3 184 3 554 g 1284131 1 2.00E-99 putative
413 3 184 3 554 g 164084 2.00E-98 signal peptidase 21 kDa subunit
414 3 263 3 791 g6957716 1. OOE-128 putative chaperonin
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
414 3 263 3 791 g14423532 1.OOE-128 putative chaperonin
414 3 263 3 791 g9755653 1.OOE-125 TCP-1 chaperonin-like protein
415 3 163 3 491 g433783 7.00E-76 binding protein
415 3 163 3 491 g337370 3.00E-74 rapamycin- and FK506-binding protein
415 3 163 3 491 g13097252 3.00E-74 Similar to FK506 binding protein 2 (13 kDa)
418 3 142 3 428 g736290 1.00E-66 precursor cystatin C C-terminal fragment (128 AA) (1 is 2nd base in codon)
418 3 142 3 428 g497415 2.00E-59 cystatin C
418 3 142 3 428 g12852172 2.00E-59 putative
419 1 191 88 660 g5921501 9.00E-21 distal intestinal serine protease
419 1 191 88 660 g4753837 8.00E-18 tryptase
419 1 191 88 660 g4753835 4.00E-17 tryptase
421 2 152 749 1204 g14043131 4.00E-20 Unknown (protein for IMAGE:2967328)
CO 421 2 152 749 1204 g14017907 4.00E-20 KIAA1845 protein o 00
421 2 152 749 1204 g13279050 4.00E-20 calpain 10
422 1 91 205 477 g56998 3.00E-33 proteasome subunit RC5
422 1 91 205 477 g3790135 6.00E-33 dJ191N21.3 (proteasome subunit HC5)
422 1 91 205 477 g220026 6.00E-33 proteasome subunit C5
424 1 362 85 1170 g5748546 1.00E-159 C321 D2.1 (Ribosomal Large Subunit Pseudouridine Synthase (EC 4.2.1.70, Pseudouridylate Synthase, Uracil Hydrolase) LIKE protein)
424 1 362 85 1170 g14336724 1.OOE-159 ribosomal large subunit pseudouridine synthase C like
424 1 362 85 1170 g12845023 1.00E-63 putative
425 2 169 119 625 g1263081 2.00E-77 mariner transposase
425 2 169 119 625 g3005702 1.00E-76 unknown
425 2 169 119 625 g2231380 9.00E-76 orf; encodes putative chimeric protein with SET domain in N- terminus with similarity to several other human, Drosophila, nematode and yeast proteins
428 1 131 184 576 g3005702 1.00E-44 unknown
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
428 1 131 184 576 g2231380 1.00E-44 orf; encodes putative chimeric protein with SET domain in N- terminus with similarity to several other human, Drosophila, nematode and yeast proteins
428 1 131 184 576 g1263081 1.00E-44 mariner transposase
430 2 101 74 376 g4185140 3.00E-40 putative small nuclear ribonucleoprotein E
430 2 101 74 376 g7269933 3.00E-38 small nuclear ribonucleoprotein homolog
430 2 101 74 376 g35105 3.00E-28 snRNP E protein (AA 1-92)
431 1 274 1 822 g3399667 1.OOE-116 FBRLJHUMAN; 34 KD NUCLEOLAR SCLERODERMA ANTIGEN
431 1 274 1 822 g31395 * 1.OOE-116 fibrillarin
431 1 274 1 822 g182592 1.OOE-116 fibrillarin
432 3 110 24 353 g7269933 7.00E-26 small nuclear ribonucleoprotein homolog
432 3 no 24 353 g4185140 7.00E-26 putative small nuclear ribonucleoprotein E
432 3 no 24 353 g35105 8.00E-20 snRNP E protein (AA 1-92)
433 1 104 139 450 g2231380 4.00E-42 orf; encodes putative chimeric protein with SET domain in N- terminus with similarity to several other human, Drosophila, nematode and yeast proteins
433 1 104 139 450 g3005702 2.00E-41 unknown
433 1 104 139 450 g14286268 2.00E-41 SET domain and mariner transposase fusion gene
435 3 74 453 674 g2104910 4.00E-31 ORF derived from Dl leader region and integrase coding region
435 3 74 453 674 g4959374 2.00E-21 pol protein
435 3 74 ' 453 674 g2104914 2.00E-21 ORF derived from protease and integrase coding regions
436 2 187 293 853 g9650711 1.00E-72 HEF like Protein
436 2 187 293 853 g12964245 1.00E-72 dJ1167H4.4 (HEF like protein (HEFL))
436 2 187 293 853 g14042680 3.00E-67 unnamed protein product
437 3 187 147 707 g4106980 1.OOE-06 immunoglobulin-like transcript 10 protein
437 3 187 147 707 g3776468 2.00E-06 immunoglobulin-like transcript 10 protein
437 3 187 147 707 g2645890 3.00E-06 IGSF1
439 3 175 3 527 g598166 5.00E-66 immunoglobulin kappa chain variable region
439 3 175 3 527 g5360673 2.00E-64 anti-Entamoeba histolytica immunoglobulin kappa light chain
TABLE 7
EQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
439 3 175 3 527 g261240 5.00E-64 immunoglobulin M light chain V region=anti-lipid A antibody (human, hybridoma cell line HR78, Peptide Partial, 141 aa)
440 3 87 159 419 g30151 2.00E-26 cytochrome c oxidase subunit Vllb
440 3 87 159 419 g 12834072 2.00E-25 putative
440 3 87 159 419 g 12832690 2.00E-25 putative
441 3 1 19 3 359 g21 14207 6.00E-49 glutaredoxin
441 3 1 19 3 359 g485953 4.00E-48 glutaredoxin
441 3 1 19 3 359 gl0178147 2.00E-39 glutaredoxin-like protein
442 1 226 1 678 gl518874 2.00E-78 integral membrane protein CII-3
442 1 226 1 678 g 13543226 3.00E-77 Similar to RIKEN cDNA 0610010E03 gene
442 1 226 1 678 g 12849813 2.00E-76 putative
443 2 152 2 457 g205628 1.00E-27 24-kDa mitochondrial NADH dehydrogenase precursor (EC
443 2 152 2 457 g 12850902 1.00E-27 putative
CO OO 443 2 152 2 457 g3123721 2.00E-27 24-kDa subunit of complex 1 NT
445 3 169 903 1409 g5102636 3.00E-06 dJ682J15.1 (novel Collagen triple helix repeat containing protein)
445 3 169 903 1409 g 12052774 3.00E-06 hypothetical protein
447 1 194 304 885 g 12856559 1 .00E-85 putative
447 1 194 304 885 g 12856631 3.00E-85 putative
447 1 194 304 885 g 12849896 3.00E-85 putative
450 1 160 1 480 g!419370 4.00E-74 actin depolymerizing factor
450 1 160 1 480 g 10441256 5.00E-46 actin-depolymerizing factor 1
450 1 160 1 480 g9757910 2.00E-44 actin depolymerizing factor 4
452 3 267 3 803 g8249467 3.00E-31 titin
452 3 267 3 803 g 1017427 7.00E-28 elastic titin
452 3 267 3 803 g9826 1.00E-07 1 1-1 polypeptide
453 1 525 37 161 1 g52785 1.00E-109 57 kd keratin (aa 1-524)
453 1 525 37 161 1 g386850 1. OOE-109 keratin K5
453 1 525 37 161 1 g34073 1. OOE-109 cytokeratin 4 (408 AA)
455 1 154 1 462 g8777465 5.00E-58 cytoplasmic dynein heavy chain
455 1 154 1 462 g 12852400 2.00E-33 putative
TABLE 7 ID NO; Frame Length Start Stop Gl Number Probability Score Annotation
455 1 154 1 462 g3876099 4.00E-16 (Z75536) similar to dynein heavy chain-cDNA EST ykl3dl 1.3 comes from this gene-cDNA EST ykl 3dl 1.5 comes from this gene
456 2 596 56 1843 g3769362 1.00E-70 ectoderm-neural cortex- 1 protein
456 2 596 56 1843 g3309573 1.00E-70 nuclear matrix protein NRP/B
456 2 596 56 1843 g2282582 1.OOE-70 actin-binding protein
457 3 762 3 2288 g8896164 0 kinesin-like protein GAKIN
457 3 762 3 2288 g 10697238 0 KIF13A
457 3 762 3 2288 g 12054032 0 KINESIN-13A2
458 3 255 3 767 g4415996 2.00E-36 beta-tubulin 4
458 3 255 3 767 g4098331 2.00E-36 beta-tubulin 5
458 3 255 3 767 g4098319 2.00E-36 beta-tubulin 1
459 1 156 21 1 678 g386847 2.00E-39 keratin
459 1 156 21 1 678 g34069 2.00E-39 keratin
459 1 156 21 1 678 g914833 3.00E-39 keratin type 11
460 3 216 465 1 1 12 g9864780 3.00E-14 beta-actin
460 3 216 465 1 1 12 g4204812 4.00E-14 actin
460 3 216 465 1 1 12 g8895873 5.00E-14 actin
461 3 160 3 482 g63805 7.00E-31 tensin
461 3 160 3 482 g619577 7.00E-31 cardiac muscle tensin
461 3 160 3 482 g212755 7.00E-31 tensin
462 3 97 3 293 g7259234 3.00E-22 contains transmembrane (TM) region
462 3 97 3 293 gl 2861877 3.00E-22 putative
462 3 97 3 293 g 12837694 3.00E-22 putative
463 124 21 1 582 g7023973 1.00E-72 phospholipid hydroperoxide glutathione peroxidase
463 124 21 1 582 g4061 1 1 1.00E-72 phospholipid hydroperoxide glutathione peroxidase
463 124 21 1 582 g 1063636 1.00E-72 phospholipid hydroperoxide glutathione peroxidase
464 68 25 228 g452316 6.00E-14 acetyl-CoA carboxylase
464 68 25 228 g2138330 6.00E-14 acetyl-CoA carboxylase
464 68 25 228 g 1399290 6.00E-14 acetyl-CoA carboxylase beta
465 85 1 255 g5670328 2.00E-22 copine III
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
465 1 85 1 255 g3327086 2.00E-22 KIAA0636 protein
465 1 85 1 255 g6453711 8.00E-21 copine VII protein
466 2 172 2 517 g204491 8.00E-84 glutathione S-transferase
466 2 172 2 517 g5762309 3.00E-82 microsomal glutathione S-transferase
466 2 172 2 517 g12836829 8.00E-82 putative
468 1 no 604 933 g9650954 2.00E-26 beta-1 ,6-N-acetylglucosaminyltransferase B
468 1 110 604 933 g12860327 1.OOE-12 putative
468 1 1 10 604 933 g9650956 3.00E-08 beta-1 ,6-N-acetylglucosaminyltransf erase A
469 3 212 3 638 g206117 1.00E-97 prostaglandin H2 D-isomerase
469 3 212 3 638 g206115 1.00E-97 prostaglandin D synthetase
469 3 212 3 638 g895868 4.00E-86 prostaglandin D synthetase
470 2 178 2 535 g53354 1.00E-56 nucleoside diphosphate kinase B
470 2 178 2 535 g4467843 1.00E-56 NM23-H2 protein
470 2 178 2 535 g349476 1.00E-56 c-myc transcription factor
471 2 152 113 568 gl63152 1.OOE-13 hexokinase 1
471 2 152 113 568 g8996018 3.00E-13 hexokinase 1 isoform td
471 2 152 113 568 g8996017 3.00E-13 hexokinase 1 isoform ta/tb
472 2 164 2 493 g643074 5.00E-76 putative 40S ribosomal protein sl 2
472 2 164 2 493 g6716785 2.00E-75 40s ribosomal protein S23
472 2 164 2 493 g14532718 8.00E-75 (AY039983) unknown protein
474 3 122 3 368 g7629994 2.00E-34 60S RIBOSOMAL PROTEIN L36 homolog
474 3 122 3 368 g7413634 2.00E-33 60S ribosomal protein-like
474 3 122 3 368 g3236242 2.00E-33 60S ribosomal protein L36
475 3 101 3 305 g14586963 1.00E-22 (AF362574) M75
475 3 101 3 305 g57115 1.00E-22 ribosomal protein L31 (AA 1-125)
475 3 101 3 305 g36130 1.00E-22 ribosomal protein L31 (AA 1-125)
476 2 207 14 634 g7340874 1.00E-77 ESTs D15590(C0900),D48950(S15542),D22684(C0900) correspond to a, region of the predicted gene. -Similar to Arabidopsis thaliana 60S ribosomal protein LI 1 A (L16A). (P42795)
476 2 207 14 634 g14517470 6.00E-77 (AY039570) AT4g 18730/F28A21 J 40
TABLE 7 i lD NO: Frame Length Start Stop Gl Number Probability Score Annotation
476 2 207 14 634 g9758681 6.00E-77 ribosomal protein LI 1-like
477 3 83 303 551 g 12842823 2.00E-18 putative
477 3 83 303 551 g57121 4.00E-18 ribosomal protein L37
477 3 83 303 551 g461232 4.00E-18 ribosomal protein L37
478 1 75 256 480 g 14586963 2.00E-09 (AF362574) M75
478 1 75 256 480 g571 15 2.00E-09 ribosomal protein L31 (AA 1-125)
478 1 75 256 480 g36130 2.00E-09 ribosomal protein L31 (AA 1-125)
479 1 162 1 486 gδl 06775 1.00E-64 ribosomal protein S12
479 1 162 1 486 g4263712 3.00E-46 40S ribosomal protein S12
479 1 162 1 486 g6587799 1.00E-45 Strong similarity to gb | AF067732 ribosomal protein Sl 2 from Hordeum vulgare. ESTs gb 1 T41772, gb I T42570, gb I AI999345, gb I T20784, gb I F20068 come from this gene.
481 1 108 1 324 g57123 7.00E-46 ribosomal protein L37a (AA 1 - 92)
CO 00 481 1 108 1 324 g36134 7.00E-46 ribosomal protein L37a ϋl
481 1 108 1 324 g312414 7.00E-46 ribosomal protein L37a
482 2 142 68 493 g57702 3.00E-37 ribosomal protein L35 (AA 1-123)
482 2 142 68 493 g 12849009 3.00E-37 putative
482 2 142 68 493 g 12846227 3.00E-37 putative
483 82 70 315 g409074 4.00E-31 HBpl5/L22
483 82 70 315 g409072 4.00E-31 HBp 15/122
483 82 70 315 g409070 4.00E-31 HBpl5/L22
484 163 7 495 g 12858199 3.00E-91 putative
484 163 7 495 g 12842650 3.00E-91 putative
484 163 7 495 g 12833292 3.00E-91 putative
485 2 1 18 1 10 463 g488415 9.00E-61 ribosomal protein L30
485 2 1 18 no 463 g31 15336 9.00E-61 ribosomal protein L30
485 2 118 no 463 g206728 9.00E-61 ribosomal protein L30
486 2 260 2 781 g483431 1. OOE-123 cyc07
486 2 260 2 781 g4079800 1. OOE-122 S-phase-specific ribosomal protein
486 2 260 2 781 g6714564 1. OOE-1 15 cyc07
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
487 1 1 15 640 984 g571 17 1. OOE-14 ribosomal protein L32
487 1 1 15 640 984 g36132 1. OOE-14 rpL32 (aa 1-135)
487 1 1 15 640 984 g200781 1 . OOE-14 ribosomal protein L32-3A
488 2 57 125 295 g 14586963 5.00E-19 (AF362574) M75
488 2 57 125 295 g571 15 5.00E-19 ribosomal protein L31 (AA 1-125)
488 2 57 125 295 g36130 5.00E-19 ribosomal protein L31 (AA 1-125)
489 167 46 546 g4886269 9.00E-61 putative ribosomal protein S14
489 167 46 546 g 12322890 3.00E-60 putative 40S ribosomal protein s!4; 67401-66292
489 167 46 546 g4678226 1.00E-59 40S ribosomal protein S14
490 212 636 g 1498053 3.00E-82 ribosomal protein S8
490 212 636 g968902 3.00E-73 ribosomal protein S8
490 212 636 g3264759 6.00E-73 40S ribosomal protein S8
491 217 651 g4588906 6.00E-97 ribosomal protein S7
491 217 651 g4128206 9.00E-83 40S ribosome protein S7
491 217 651 g3851636 9.00E-83 unknown
492 3 148 3 446 gl 3489168 5.00E-77 60S ribosomal protein LI 7
492 3 148 3 446 gl 3430182 3.00E-76 ribosomal protein LI 7
492 3 148 3 446 g 145961 1 1 2.00E-75 (AY042843) 60S ribosomal protein LI 7
493 3 158 3 476 g643074 5.00E-76 putative 40S ribosomal protein sl 2
493 3 158 3 476 g6716785 1.00E-75 40s ribosomal protein S23
493 3 158 3 476 g 14532718 7.00E-75 (AY039983) unknown protein
494 1 188 1 564 g 1490384 1.00E-104 ribosomal protein L6
494 1 188 1 564 g695638 1. OOE-100 M-TAXREB107
494 1 188 1 564 g 14210106 1. OOE-100 ribosomal protein L6
495 3 144 3 434 g915313 2.00E-43 ribosomal protein L31
495 3 144 3 434 g7229709 3.00E-43 80S ribosomal protein L31
495 3 144 3 434 g2982295 3.00E-42 probable 60S ribosomal protein L31
496 3 159 3 479 g57714 5.00E-78 ribosomal protein S16 (AA 1-146)
496 3 159 3 479 g338447 5.00E-78 RPS16
496 3 159 3 479 g 140441 16 5.00E-78 ribosomal protein S16
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
497 147 1 441 gό006558 1.00E-74 ribosomal protein S18
497 147 1 441 g433447 1.00E-74 ribosomal protein S18
497 147 1 441 g4050105 1.00E-74 RPS18
498 130 58 447 g57720 8.00E-63 ribosomal protein S20 (AA 1-1 19)
498 130 58 447 g292443 8.00E-63 ribosomal protein S20
498 130 58 447 gl 3960133 8.00E-63 ribosomal protein S20
499 2 149 2 448 g57690 2.00E-67 ribosomal protein L23a
499 2 149 2 448 g404015 2.00E-67 ribosomal protein L23a
499 2 149 2 448 g306549 2.00E-67 homology to rat ribosomal protein L23
500 1 1 15 70 414 g409074 2.00E-40 HBpl5/L22
500 1 1 15 70 414 g409072 2.00E-40 HBp 15/122
500 1 1 15 70 414 g409070 2.00E-40 HBpl5/L22
501 2 178 2 535 g3717978 6.00E-86 5S ribosomal protein 00 501 2 178 2 535 g 1685071 6.00E-86 ribosomal protein S5 sj
501 2 178 2 535 g 12861440 6.00E-86 putative
502 3 153 207 665 g57682 2.00E-18 ribosomal protein LI 7
502 3 153 207 665 g571 1 1 2.00E-18 ribosomal protein L22
502 3 153 207 665 g34199 2.00E-18 putative ribosomal protein (AA 1-184)
503 2 155 2 466 g 14596085 7.00E-67 (AY042830) Putative 40S ribosomal protein S15A
503 2 155 2 466 g9757906 7.00E-67 40S ribosomal protein S15A
503 2 155 2 466 g8439890 7.00E-67 Strong similarity to 40S ribosomal protein S15A from Arabidopsis thaliana gb I L27461. EST gb I R30315 comes from this gene.
504 2 98 272 565 g2331301 4.00E-41 ribosomal protein S4 type 1
504 2 98 272 565 g2345154 3.00E-40 ribsomal protein S4
504 2 98 272 565 g457803 2.00E-38 ribosomal protein S4
505 1 132 1 396 g57702 3.00E-37 ribosomal protein L35 (AA 1 -123)
505 1 132 1 396 g 12849009 3.00E-37 putative
505 1 132 1 396 g 12846227 3.00E-37 putative
506 3 163 12 500 g643074 5.00E-76 putative 40S ribosomal protein sl 2
506 3 163 12 500 g6716785 2.00E-75 40s ribosomal protein S23
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
506 3 163 12 500 g 14532718 8.00E-75 (AY039983) unknown protein
507 1 1 19 139 495 g 14532718 7.00E-33 (AY039983) unknown protein
507 1 1 19 139 495 g7413571 7.00E-33 putative protein
507 1 1 19 139 495 g6716785 7.00E-33 40s ribosomal protein S23
508 3 1 19 3 359 g 14532718 7.00E-15 (AY039983) unknown protein
508 3 1 19 3 359 g7413571 7.00E-15 putative protein
508 3 1 19 3 359 gό716785 7.00E-15 40s ribosomal protein S23
509 1 107 334 654 g554269 2.00E-07 ribosomal protein L7
509 1 107 334 654 g36140 2.00E-07 ribosomal protein L7
509 1 107 334 654 g35903 2.00E-07 ribosomal protein L7
51 1 3 82 1 1 1 356 g5106775 7.00E-28 ribosomal protein S12
51 1 3 82 1 1 1 356 g6587799 6.00E-17 Strong similarity to gb I AF067732 ribosomal protein Sl 2 from Hordeum vulgare. ESTs gb I T41772, gb I T42570, gb I AI999345, gb | T20784, gb I F20068 come from this gene.
51 1 3 82 1 1 1 356 g 14190453 6.00E-17 Atl gl5930/T24D18_3
CO 00 512 2 166 2 499 g9759463 2.00E-60 ' 40S ribosomal protein S19 CD
512 2 166 2 499 gό513924 4.00E-60 putative 40S ribosomal protein S19
512 2 166 2 499 g 13878029 4.00E-60 putative 40S ribosomal protein S19
513 1 125 1 375 g 14586963 2.00E-22 (AF362574) M75
513 1 125 1 375 g571 15 2.00E-22 ribosomal protein L31 (AA 1-125)
513 1 125 1 375 g36130 2.00E-22 ribosomal protein L31 (AA 1-125)
514 3 91 216 488 g57720 2.00E-30 ribosomal protein S20 (AA 1-1 19)
514 3 91 216 488 g292443 2.00E-30 ribosomal protein S20
514 3 91 216 488 g214758 2.00E-30 ribosomal protein S22, 40S subunit
516 1 100 304 603 g 12842823 3.00E-17 putative
516 1 100 304 603 g57121 7.00E-17 ribosomal protein L37
516 1 100 304 603 g461232 7.00E-17 ribosomal protein L37
518 2 188 2 565 g3869148 1 . OOE-10 robosomal protein LI 3
518 2 188 2 565 g29383 1 . OOE-10 BBC1
518 2 188 2 565 g 14043668 1.OOE-10 ribosomal protein LI 3
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
519 1 143 1 429 g63466 2.00E-56 histone H2A
519 1 143 1 429 g6094631 2.00E-56 histone H2A.F
519 1 143 1 429 g3420799 2.00E-56 histone H2A.F/Z variant
520 3 197 951 1541 g14549638 2.00E-13 (AF255740) histone H2A variant
520 3 197 951 1541 gό4777 2.00E-13 histone H2A (aa 1-1-30)
520 3 197 951 1541 g64325 2.00E-13 histone H2A
522 1 153 202 660 g183233 1.00E-33 beta-glucuronidase precursor (EC 3.2.1.31)
522 1 153 202 660 g14346709 1.OOE-33 unnamed protein product
522 1 153 202 660 g3549609 2.00E-33 beta-glucuronidase
523 3 86 33 290 g8101071 3.00E-20 golgin-like protein
523 3 86 33 290 g8099669 3.00E-20 golgin-like protein
523 3 86 33 290 g7644350 2.00E-17 golgi matrix protein GM130
526 1 153 205 663 g183233 1.OOE-33 beta-glucuronidase precursor (EC 3.2.1.31)
GO CD 526 1 153 205 663 g14346709 1.OOE-33 unnamed protein product O 526 1 153 205 663 g3549609 2.00E-33 beta-glucuronidase
527 1 119 1 357 g4050095 2.00E-56 NADH oxidoreductase
527 1 119 1 357 g12845638 2.00E-56 putative
527 1 119 1 357 gl 2834155 2.00E-56 putative
528 2 62 35 220 g4454682 5.00E-07 NADH-ubiquinone oxidoreductase subunit B9 homolog
528 2 62 35 220 g4164444 5.00E-07 NADH:ubiquinone oxidoreductase B9 subunit
528 2 62 35 220 g248 1.00E-05 NADH dehydrogenase
529 1 381 28123954 g8101071 5.00E-18 golgin-like protein
529 1 381 28123954 g8099669 5.00E-18 golgin-like protein
529 1 381 28123954 g7644350 2.00E-17 golgi matrix protein GM130
530 3 173 3 521 g4510363 1.00E-67 putative ubiquitin-conjugating enzyme
530 3 173 3 521 g14596117 8.00E-67 (AY042846) Unknown protein
530 3 173 3 521 g4886271 8.00E-67 putative DNA-binding protein
531 3 124 3 374 g7248411 3.00E-38 ESTs C99632(E20954),C99633(E20954) correspond to a region of the predicted gene. -Similar to Arabidopsis thaliana putative pathogenesis-related protein (U20347)
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Sc
531 3 124 3 374 g7248405 1.00E-30 ESTs AU082419(E61744),AU031498(E61744) correspond fo a region of the predicted gene. -Similar to Arabidopsis thaliana putative pathogenesis-related protein (U20347)
531 3 124 3 374 g6715639 4.00E-25 T25K16.16
532 1 135 88 492 g3789950 7.00E-56 translation initiation factor
532 1 135 88 492 g20238 7.00E-56 GOS2
532 1 135 88 492 g14194275 7.00E-56 translational initiation factor elFl
535 1 172 82 597 g5912457 5.00E-87 dJ1068E13.2 (novel protein similar to bovine SCP2 (Sterol Carrier Protein 2) and part of HSD17B4 (hydroxysteroid (17-beta) dehydrogenase 4))
535 1 172 82 597 g12838636 1.00E-59 putative
535 1 172 82 597 g2315981 1.00E-35 17-beta-hydroxysteroid dehydrogenase type IV
538 2 168 2 505 g453189 1.00E-58 acyl carrier protein
C oO 538 2 168 2 505 gl66971 4.00E-49 acyl carrier protein III o 538 2 168 2 505 g166969 7.00E-41 acyl carrier protein II
541 3 146 3 440 g546420 1.00E-60 C-FABP=cutaneous fatty acid-binding protein (rats, Sprague- Dawley, skin, Peptide, 135 aa)
541 3 146 3 440 g1836058 4.00E-60 DAI 1=15.2 kDa fatty acid binding protein/FABP/C-FAPB homolog (rats, Sprague-Dawley, sciatic nerve traumatized, dorsal root ganglia, Peptide, 135 aa)
541 3 146 3 440 g533124 5.00E-60 lipid-binding protein
542 2 142 290 715 g7671659 1.00E-26 dJ1069P2.3.4 (novel PABPC1 (poly(A)-binding protein, cytoplasmic 1) (PABPL1) like protein (putative isoform 4))
542 2 142 290 715 g7671658 1.00E-26 dJ1069P2.3.3 (novel PABPC1 (poly(A)-binding protein, cytoplasmic 1) (PABPL1) like protein (putative isoform 3))
542 2 142 290 715 g7671657 1.00E-26 dJ1069P2.3.2 (novel PABPC1 (poly(A)-binding protein, cytoplasmic 1) (PABPL1) like protein (putative isoform 2))
543 3 175 3 527 g8176525 2.00E-23 interferon-inducible myeloid differentiation transcriptional
543 3 175 3 527 g6644297 2.00E-23 IFIlόb
543 3 175 3 527 g184569 2.00E-23 interferon-gamma induced protein
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation 544 3 96 639 926 g3879684 3.00E-05 (Z74042) predicted using Genefinder-Similarity to Haemophilus 3- oxoacyl-(acyl-carrier protein) reductase (SW:FABG_HAEIN), contains similarity to Pfam domain: PF00106 (short chain dehydrogenase), Score=170.5, E-value=9.2e-48, N=l ~cDNA EST yk470b2.3 comes from this gene-cDNA ESTyk470b2.5 comes from
545 3 139 219 635 g325465 9.00E-45 (Human endogenous retrovirus type C oncovirus sequence.), gene product
545 3 139 219 635 g2393895 2.00E-21 protease/polymerase
545 3 139 219 635 g334989 8.00E-20 gag protein
546 3 219 3 659 g12855728 4.00E-72 putative
546 3 219 3 659 g12839570 4.00E-72 putative
546 3 219 3 659 g14530111 7.00E-50 (AJ312322) OVARIAN/Breast septin beta
547 3 161 42 524 g6708478 1.00E-22 formin-like protein
547 3 161 42 524 g4101720 1.00E-21 lymphocyte specific formin related protein
547 3 161 42 524 g7294416 2.00E-09 CG6807 gene product
548 1 177 1 531 g5478757 1.00E-91 fertility protein SP22
548 1 177 1 531 g5478755 1.00E-91 fertility protein SP22
548 1 177 1 531 g325091ό 1.00E-91 CAP1
549 2 241 2 724 g11527997 1.OOE-150 NOTCH2 protein
549 2 241 2 724 g11275978 1.OOE-150 NOTCH 2
549 2 241 2 724 g287990 1.00E-149 Motch B
551 3 236 3 710 g12855728 1.00E-25 putative
551 3 236 3 710 g12839570 1.00E-25 putative
551 3 236 3 710 g14530111 7.00E-14 (AJ312322) OVARIAN/Breast septin beta
552 1 137 217 627 g4406393 8.00E-47 differentiation enhancing factor 1
552 1 137 217 627 g4063616 8.00E-47 ADP-ribosylation factor-directed GTPase activating protein
552 1 137 217 627 g4063614 8.00E-47 ADP-ribosylation factor-directed GTPase activating protein
553 3 150 48 497 g14042295 2.00E-30 unnamed protein product
553 3 150 48 497 g11640564 2.00E-30 MSTP028
553 3 150 48 497 g13905272 3.00E-30 Similar to tumor necrosis factor, alpha-induced protein 1
Table 8
Program Description Reference Parameter Threshold
ABI FACTURA A program that removes vector sequences and Applied Biosystems, Foster City, CA. masks ambiguous bases in nucleic acid sequences.
ABI/PARACEL FDF A Fast Data Finder useful in comparing and Applied Biosystems, Foster City, CA; Mismatch <50% annotating amino acid or nucleic acid sequences. Paracel Inc., Pasadena, CA.
ABI AutoAssembler A program that assembles nucleic acid sequences. Applied Biosystems, Foster City, CA.
BLAST A Basic Local Alignment Search Tool useful in Altschul, S.F. et al. (1990) J. Mol. Biol. ESTs: Probability value= 1.0E-8 sequence similarity search for amino acid and 215:403-410; Altschul, S.F. et al. (1997) or less nucleic acid sequences. BLAST includes five Nucleic Acids Res. 25:3389-3402. Full Length sequences: Probabilit functions: blastp, blastn, blastx, tblastn, and tblastx. value= l.OE-10 or less
FASTA A Pearson and Lipman algorithm that searches for Pearson, W.R. and D.J. Lipman (1988) Proc. ESTs: fasta E value=l .06E-6 similarity between a query sequence and a group of Natl. Acad Sci. USA 85:2444-2448; Pearson, Assembled ESTs: fasta Identity= sequences of the same type. FASTA comprises as W.R. (1990) Methods Enzymol. 183:63-98; 95% or greater and least five functions: fasta, tfasta, fastx, tfastx, and and Smith, T.F. and M.S. Waterman (1981) Match length=200 bases or greate ssearch. Adv. Appl. Math. 2:482-489. fastx E value=1.0E-8 or less
Full Length sequences: fastx score=100 or greater
BLIMPS A BLocks IMProved Searcher that matches a Henikoff, S. and J.G. Henikoff (1991) Nucleic Probability value= 1.0E-3 or less sequence against those in BLOCKS, PRINTS, Acids Res. 19:6565-6572; Henikoff, J.G. and DOMO, PRODOM, and PFAM databases to search S. Henikoff (1996) Methods Enzymol. for gene families, sequence homology, and structural 266:88-105; and Attwood, T.K. et al. (1997) J. fingerprint regions. Chem. Inf. Comput. Sci. 37:417-424.
HMMER An algorithm for searching a query sequence against Krogh, A. et al. (1994) J. Mol. Biol. PFAM hits: Probability value= hidden Markov model (HMM)-based databases of 235:1501-1531; Sonnhammer, E.L.L. et al. 1.0E-3 or less protein family consensus sequences, such as PFAM. (1988) Nucleic Acids Res. 26:320-322; Signal peptide hits: Score= 0 or Durbin, R. et al. (1998) Our World View, in a greater Nutshell, Cambridge Univ. Press, pp. 1-350. ,
Table 8 (cont.)
Program Description Reference Parameter Threshold
ProfileScan An algorithm that searches for structural and sequence Gribskov, M. et al. (1988) CABIOS 4:61-66; Normalized quality score≥GC motifs in protein sequences that match sequence patterns Gribskov, M. et al. (1989) Methods Enzymol. specified "HIGH" value for tha defined in Prosite. 183:146-159; Bairoch, A. et al. (1997) particular Prosite motif. Nucleic Acids Res. 25:217-221. Generally, score=l .4-2.1.
Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. sequencer traces with high sensitivity and probability. 8:175-185; Ewing, B. and P. Green (1998) Genome Res. 8:186-194.
Phrap A Phils Revised Assembly Program including SWAT and Smith, T.F. and M.S. Waterman (1981) Adv. Score= 120 or greater; CrossMatch, programs based on efficient implementation Appl. Math. 2:482-489; Smith, T.F. and M.S. Match length= 56 or greater of the Smith-Waterman algorithm, useful in searching Waterman (1981) J. Mol. Biol. 147:195-197; sequence homology and assembling DNA sequences. and Green, P., University of Washington, Seattle, WA.
Consed A graphical tool for viewing and editing Phrap assemblies. Gordon, D. et al. (1998) Genome Res. 8:195-202.
SPScan A weight matrix analysis program that scans protein Nielson, H. et al. (1997) Protein Engineering Scofe=3.5 or greater sequences for the presence of secretory signal peptides. 10:1-6; Claverie, J.M. and S. Audic (1997) CABIOS 12:431-439.
TMAP A program that uses weight matrices to delineate Persson, B. and P. Argos (1994) J. Mol. Biol. transmembrane segments on protein sequences and 237:182-192; Persson, B. and P. Argos (1996) determine orientation. Protein Sci. 5:363-371.
TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segments on protein sequences Conf. on Intelligent Systems for Mol. Biol., and determine orientation. Glasgow et al., eds., The Am. Assoc. for Artificial Intelligence Press, Menlo Park, CA, pp. 175-182.
Motifs A program that searches amino acid sequences for patterns Bairoch, A. et al. (1997) Nucleic Acids Res. 25:217-221; that matched those defined in Prosite. Wisconsin Package Program Manual, version 9, page M51-59, Genetics Computer Group, Madison, WI.

Claims

CLAIMS What is claimed is:
1. An isolated polynucleotide selected from the group consisting of: a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO: 1-275, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-275, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d).
2. An isolated polynucleotide of claim 1, selected from the group consisting of SEQ ID NO: 1-275.
3. An isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide of claim 1.
4. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide of claim 1.
5. A composition for the detection of expression of diagnostic and therapeutic polynucleotides comprising at least one of the polynucleotides of claim 1 and a detectable label.
6. A method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 1, the method comprising: a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
7. A method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 1, the method comprising: a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
8. A method of claim 7, wherein the probe comprises at least 30 contiguous nucleotides.
9. A method of claim 7, wherein the probe comprises at least 60 contiguous nucleotides.
10. A recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide of claim 1.
11. A cell transformed with a recombinant polynucleotide of claim 10.
12. A transgenic organism comprising a recombinant polynucleotide of claim 10.
13. A method for producing a diagnostic and therapeutic polypeptide encoded by a polynucleotide of claim 1, the method comprising: a) culturing a cell under conditions suitable for expression of the diagnostic and therapeutic polypeptide, wherein said cell is transformed with a recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide of claim 1, and b) recovering the diagnostic and therapeutic polypeptide so expressed.
14. A method of claim 13, wherein the polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
15. An isolated diagnostic and therapeutic polypeptide (DITHP) encoded by at least one of the polynucleotides of claim 2.
16. A method of screening for a test compound that specifically binds to the polypeptide of claim 15, the method comprising: a) combining the polypeptide of claim 15 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide of claim 15 to the test compound, thereby identifying a compound that specifically binds to the polypeptide of claim 15.
17. A microarray wherein at least one element of the microarray is a polynucleotide of claim 3.
5 18. A method for generating a transcript image of a sample which contains polynucleotides, the method comprising: a) labeling the polynucleotides of the sample, b contacting the elements of the microarray of claim 17 with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, o and c) quantifying the expression of the polynucleotides in the sample.
19. A method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of a 5 polynucleotide of claim 1, the method comprising: a) exposing a sample comprising the target polynucleotide to a compound, under conditions suitable for the expression of the target polynucleotide, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying o amounts of the compound and in the absence of the compound.
20. A method for assessing toxicity of a test compound, said method comprising: a) treating a biological sample containing nucleic acids with the test compound, b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at 5 least 20 contiguous nucleotides of a polynucleotide of claim 1 under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide of claim 1 or fragment thereof, c) quantifying the amount of hybridization complex, and 0 d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
5 21. J Π array comprising different nucleotide molecules affixed in distinct physical locations on a solid substrate, wherein at least one of said nucleotide molecules comprises a first oligonucleotide or polynucleotide sequence specifically hybridizable with at least 30 contiguous nucleotides of a target polynucleotide, and wherein said target polynucleotide is a polynucleotide of claim 1.
22. An array of claim 21, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to at least 30 contiguous nucleotides of said target polynucleotide.
23. An array of claim 21, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to at least 60 contiguous nucleotides of said target polynucleotide
24. An array of claim 21, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to said target polynucleotide.
25. An array of claim 21, which is a microarray.
26. An array of claim 21, further comprising said target polynucleotide hybridized to a nucleotide molecule comprising said first oligonucleotide or polynucleotide sequence.
27. An array of claim 21, wherein a linker joins at least one of said nucleotide molecules to said solid substrate.
28. An array of claim 21, wherein each distinct physical location on the substrate contains multiple nucleotide molecules, and the multiple nucleotide molecules at any single distinct physical location have the same sequence, and each distinct physical location on the substrate contains nucleotide molecules having a sequence which differs from the sequence of nucleotide molecules at another distinct physical location on the substrate.
29. An isolated polypeptide selected from the group consisting of: a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
30. An isolated polypeptide of claim 29, having a sequence selected from the group consisting of SEQ ID NO-.276-553.
31. An isolated polynucleotide encoding a polypeptide of claim 29.
32. An isolated polynucleotide encoding a polypeptide of claim 30.
33. An isolated polynucleotide of claim 32, having a sequence selected from the group consisting of SEQ ID NO: 1-275.
34. An isolated antibody which specifically binds to a diagnostic and therapeutic polypeptide of claim 29.
35. A diagnostic test for a condition or disease associated with the expression of DITHP in a biological sample, the method comprising: a) combining the biological sample with an antibody of claim 34, under conditions suitable for the antibody to bind the polypeptide and form an antibodyφolypeptide complex, and b) detecting the complex, wherein the presence of the complex correlates with the presence of the polypeptide in the biological sample.
36. The antibody of claim 34, wherein the antibody is: a) a chimeric antibody, b) a single chain antibody, c) a Fab fragment, d) a F(ab')2 fragment, or e) a humanized antibody.
37. A composition comprising an antibody of claim 34 and an acceptable excipient.
38. A method of diagnosing a condition or disease associated with the expression of DITHP in a subject, comprising administering to said subject an effective amount of the composition of claim 37.
39. A composition of claim 37, wherein the antibody is labeled.
40. A method of diagnosing a condition or disease associated with the expression of DITHP in a subject, comprising administering to said subject an effective amount of the composition of claim 39.
41. A method of preparing a polyclonal antibody with the specificity of the antibody of claim 34, the method comprising: a) immunizing an animal with a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, or an immunogenic fragment thereof, under conditions to elicit an antibody response, b) isolating antibodies from said animal, and c) screening the isolated antibodies with the polypeptide, thereby identifying a polyclonal antibody which binds specifically to a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
42. An antibody produced by a method of claim 41.
43. A composition comprising the antibody of claim 42 and a suitable carrier.
44. A method of making a monoclonal antibody with the specificity of the antibody of claim 34, the method comprising: a) immunizing an animal with a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553, or an immunogenic fragment thereof, under conditions to elicit an antibody response, b) isolating antibody producing cells from the animal, c) fusing the antibody producing cells with immortalized cells to form monoclonal antibody-producing hybridoma cells, d) culturing the hybridoma cells, and e) isolating from the culture monoclonal antibody which binds specifically to a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
45. A monoclonal antibody produced by a method of claim 44.
46. A composition comprising the antibody of claim 45 and a suitable carrier.
47. The antibody of claim 34, wherein the antibody is produced by screening a Fab expression library.
48. The antibody of claim 34, wherein the antibody is produced by screening a recombinant 5 immunoglobulin library.
49. A method of detecting a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553 in a sample, the method comprising: a) incubating the antibody of claim 34 with a sample under conditions to allow specific o binding of the antibody and the polypeptide, and b) detecting specific binding, wherein specific binding indicates the presence of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:276-553 in the sample.
5 50. A method of purifying a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO.-276-553 from a sample, the method comprising: a) incubating the antibody of claim 34 with a sample under conditions to allow specific binding of the antibody and the polypeptide, and b) separating the antibody from the sample and obtaining the purified polypeptide having 0 an amino acid sequence selected from the group consisting of SEQ ID NO:276-553.
51. A composition comprising a polypeptide of claim 29 and a pharmaceutically acceptable excipient.
5 52. A composition of claim 51, wherein the polypeptide has an amino acid sequence of SEQ
ID NO:276-553.
53. A method for treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition of o claim 51.
54. A method for screening a compound for effectiveness as an agonist of a polypeptide of claim 29, the method comprising: a) exposing a sample comprising a polypeptide of claim 29 to a compound, and 5 b) detecting agonist activity in the sample.
55. A composition comprising an agonist compound identified by a method of claim 54 and a pharmaceutically acceptable excipient.
56. A method for treating a disease or condition associated with decreased expression of
5 functional DITHP, comprising administering to a patient in need of such treatment a composition of claim 55.
57. A method for screening a compound for effectiveness as an antagonist of a polypeptide of claim 29, the method comprising: 0 a) exposing a sample comprising a polypeptide of claim 29 to a compound, and b) detecting antagonist activity in the sample.
58. A composition comprising an antagonist compound identified by a method of claim 57 and a pharmaceutically acceptable excipient. 5
59. A method for treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment a composition of claim 58.
60. A method of screening for a compound that modulates the activity of the polypeptide of o claim 29, said method comprising: a) combining the polypeptide of claim 29 with at least one test compound under conditions permissive for the activity of the polypeptide of claim 29, b) assessing the activity of the polypeptide of claim 29 in the presence of the test compound, and 5 c) comparing the activity of the polypeptide of claim 29 in the presence of the test compound with the activity of the polypeptide of claim 29 in the absence of the test compound, wherein a change in the activity of the polypeptide of claim 29 in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide of claim 29.
EP01966454A 2000-09-05 2001-08-29 Molecules for diagnostics and therapeutics Withdrawn EP1364015A2 (en)

Applications Claiming Priority (45)

Application Number Priority Date Filing Date Title
US22974700P 2000-09-05 2000-09-05
US22975100P 2000-09-05 2000-09-05
US22975000P 2000-09-05 2000-09-05
US22974900P 2000-09-05 2000-09-05
US23058300P 2000-09-05 2000-09-05
US22974800P 2000-09-05 2000-09-05
US229750P 2000-09-05
US229751P 2000-09-05
US229747P 2000-09-05
US230583P 2000-09-05
US229748P 2000-09-05
US229749P 2000-09-05
US23061000P 2000-09-06 2000-09-06
US23098800P 2000-09-06 2000-09-06
US23059900P 2000-09-06 2000-09-06
US23051800P 2000-09-06 2000-09-06
US23086500P 2000-09-06 2000-09-06
US23051900P 2000-09-06 2000-09-06
US23059500P 2000-09-06 2000-09-06
US23051500P 2000-09-06 2000-09-06
US23051400P 2000-09-06 2000-09-06
US23059800P 2000-09-06 2000-09-06
US23051700P 2000-09-06 2000-09-06
US23050500P 2000-09-06 2000-09-06
US23059700P 2000-09-06 2000-09-06
US230515P 2000-09-06
US230598P 2000-09-06
US230610P 2000-09-06
US230597P 2000-09-06
US230517P 2000-09-06
US230988P 2000-09-06
US230518P 2000-09-06
US230505P 2000-09-06
US230865P 2000-09-06
US230599P 2000-09-06
US230514P 2000-09-06
US230519P 2000-09-06
US230595P 2000-09-06
US23116700P 2000-09-07 2000-09-07
US23095100P 2000-09-07 2000-09-07
US23116300P 2000-09-07 2000-09-07
US231163P 2000-09-07
US231167P 2000-09-07
US230951P 2000-09-07
PCT/US2001/027127 WO2002020754A2 (en) 2000-09-05 2001-08-29 Molecules for diagnostics and therapeutics

Publications (1)

Publication Number Publication Date
EP1364015A2 true EP1364015A2 (en) 2003-11-26

Family

ID=27586647

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01966454A Withdrawn EP1364015A2 (en) 2000-09-05 2001-08-29 Molecules for diagnostics and therapeutics

Country Status (3)

Country Link
EP (1) EP1364015A2 (en)
CA (1) CA2421265A1 (en)
WO (1) WO2002020754A2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020102637A1 (en) * 2000-12-21 2002-08-01 Chunhua Yan Isolated human transporter proteins, nucleic acid molecules encoding human transporter proteins, and uses thereof
WO2002074906A2 (en) * 2001-03-16 2002-09-26 Eli Lilly And Company Lp mammalian proteins; related reagents
US8163892B2 (en) 2002-07-08 2012-04-24 Oncolys Biopharma, Inc. Oncolytic virus replicating selectively in tumor cells
WO2005019258A2 (en) * 2003-08-11 2005-03-03 Genentech, Inc. Compositions and methods for the treatment of immune related diseases

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001521383A (en) * 1997-04-08 2001-11-06 ヒューマン ジノーム サイエンシーズ,インコーポレイテッド 20 human secretory proteins
CA2287427A1 (en) * 1997-04-22 1998-10-29 Damien J. Dunnington Homogeneous fluorescence assay for measuring the effect of compounds on gene expression
JP2001523453A (en) * 1997-11-13 2001-11-27 ジェンセット Extended cDNA of secreted protein
CA2395295A1 (en) * 2000-01-31 2001-08-02 Human Genome Sciences, Inc. Nucleic acids, proteins and antibodies
WO2001090325A2 (en) * 2000-05-19 2001-11-29 Millennium Pharmaceuticals, Inc. 50365, a hexokinase family member and uses thereof
JP2004527209A (en) * 2000-07-21 2004-09-09 ソーントン、マイケル Human kinase

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0220754A3 *

Also Published As

Publication number Publication date
WO2002020754A3 (en) 2003-09-25
WO2002020754A2 (en) 2002-03-14
CA2421265A1 (en) 2002-03-14

Similar Documents

Publication Publication Date Title
WO2004023973A2 (en) Molecules for diagnostics and therapeutics
CA2442705A1 (en) Molecules for diagnostics and therapeutics
US20040115629A1 (en) Molecules for diagnostics and therapeutics
US20040014087A1 (en) Molecules for diagnostics and therapeutics
EP1268758A2 (en) Molecules for diagnostics and therapeutics
US20040048253A1 (en) Molecules for diagnostics and therapeutics
JP2003529325A (en) Human transport protein
JP2004500114A (en) Transcription factor
WO2003062376A2 (en) Molecules for diagnostics and therapeutics
JP2003532419A (en) Cytoskeletal binding protein
WO2001062927A2 (en) Polypeptides and corresponding polynucleotides for diagnostics and therapeutics
WO2002020754A2 (en) Molecules for diagnostics and therapeutics
EP1265998A2 (en) Polypeptides and corresponding polynucleotides for diagnostics and therapeutics
JP2003517290A (en) Human transcription regulatory protein
WO2001021836A2 (en) Molecules for diagnostics and therapeutics
WO2003062385A2 (en) Secretory molecules
CA2434677A1 (en) Molecules for diagnostics and therapeutics
JP2005508636A (en) Nucleic acid binding protein
JP2004511208A (en) RNA metabolism protein
US20040171012A1 (en) Nucleic acid-associated proteins
JP2004509610A (en) Nuclear hormone receptor
US20040101884A1 (en) Molecules for disease detection and treatment
EP1224275A2 (en) Molecules for diagnostics and therapeutics
JP2005511028A (en) Disease detection and therapeutic molecules
WO2002078420A2 (en) Molecules for disease detection and treatment

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030327

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20050125

RIN1 Information on inventor provided before grant (corrected)

Inventor name: INMAN, REBEKAH, R

Inventor name: AU, ALAN P.

Inventor name: CHANG, SIMON, C

Inventor name: CHEN, ALICE,J.

Inventor name: MARWAHA, RAKESH

Inventor name: DAFFO, ABEL

Inventor name: FLORES, VINCENT

Inventor name: PANZER, SCOTT,R

Inventor name: DAVID, MARIE,H

Inventor name: PERALTA, CAREYNA, H

Inventor name: GERSTIN, EDWARD, H,JR.

Inventor name: LINCOLN, ANN M. ROSEBERRY

Inventor name: HARRIS, BERNARD

Inventor name: ROHATGI, SAMEER D.

Inventor name: BRADLEY, DIANA L.

Inventor name: MOMIYAMA, MONIKA G.

Inventor name: DAHL, CHRISTOHER, R

Inventor name: YAP, PIERRE E.

Inventor name: LIU, TOOMY F.

Inventor name: GIETZEN, DARRYL

Inventor name: WRIGHT, RACHEL J.

Inventor name: YU, JIMMY, Y

Inventor name: JONES, ANISSA,LEE

Inventor name: HILLMAN, JENNIFER,L

Inventor name: CHALUP, MICHAEL, S

Inventor name: DUFOUR, GERARD, E

Inventor name: ALTUS, CHRISTINA, M

Inventor name: LINCOLN, STEPHEN,E

Inventor name: STUART, JACKSON