MOLECULES FOR DIAGNOSTICS AND THERAPEUTICS
TECHNICAL FIELD The present invention relates to human molecules and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
BACKGROUND OF THE INVENTION The human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders, autoimmune/inflammatory disorders, infections, developmental disorders, endocrine disorders, metabolic disorders, neurological disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders. The identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment. Therefore, these genes and their products are useful as diagnostics and therapeutics. These genes may encode, for example, enzyme molecules, molecules associated with growth and development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription factor molecules, chromatin molecules, cell membrane molecules, and organelle associated molecules.
For example, cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors.
Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways and include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. In contrast, tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced function or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Although many different genes and their products have been found to be associated with cell proliferative disorders such as cancer, many more may exist that are yet to be discovered.
DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as when the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes.
Enzyme Molecules
The cellular processes of biogenesis and biodegradation involve a number of key enzyme classes including oxidoreductases, transferases, hydrolases, lyases, isomerases, and Hgases. These enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation, apoptosis, etc., and through catalyzing critical steps in DNA replication and repair, and the process of translation. Oxidoreductases
Many pathways of biogenesis and biodegradation require oxidoreductase (dehydrogenase or reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor. Potential cofactors include cytochromes, oxygen, disulfide, iron-sulfur proteins, flavin adenine dinucleotide (FAD), and the nicotinamide adenine dinucleotides NAD and NADP (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences, John Wiley and Sons, Chichester, U.K., pp. 779-793). Reductase activity catalyzes the transfer of electrons between substrate(s) and cofactor(s) with concurrent oxidation of the cofactor. The reverse dehydrogenase reaction catalyzes the reduction of a cofactor and consequent oxidation of the substrate. Oxidoreductase enzymes are a broad superfamily of proteins that catalyze numerous reactions in all cells of organisms ranging from bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, estrogens, androgens, and prostaglandins. Different family members are named according to the direction in which their reactions are typically catalyzed; thus they may be referred to as oxidoreductases, oxidases, reductases, or dehydrogenases. In addition, family members often have distinct cellular localizations, including the cytosol, the plasma membrane, mitochondrial inner or outer membrane, and peroxisomes.
Short-chain alcohol dehydrogenases (SCADs) are a family of dehydrogenases that only share 15% to 30% sequence identity, with similarity predominantly in the coenzyme binding domain and the substrate binding domain. In addition to the well-known role in detoxification of ethanol, SCADs are also involved in synthesis and degradation of fatty acids, steroids, and some prostaglandins, and are therefore implicated in a variety of disorders such as lipid storage disease, myopathy, SCAD deficiency, and certain genetic disorders. For example, retinol dehydrogenase is a SCAD-family member (Simon, A. et al. (1995) J. Biol. Chem. 270:1107-1112) that converts retinol to retinal, the precursor of retinoic acid. Retinoic acid, a regulator of differentiation and apoptosis, has been shown to down-regulate genes involved in cell proliferation and inflammation (Chai, X. et al. (1995) J. Biol. Chem. 270:3900-3904). In addition, retinol dehydrogenase has been linked to hereditary eye diseases such as autosomal recessive childhood-onset severe retinal dystrophy (Simon, A. et al. (1996) Genomics 36:424-430).
Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B. (1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; Brown, J.K. and H. Imam (1991) J. Inherit. Metab. Dis. 14:436-458). Many pathways of neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a cofactor, such as NAD+/NADH (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K. pp. 779-793). Degradation of
catecholamines (epinephrine or norepinephrine) requires alcohol dehydrogenase (in the brain) or aldehyde dehydrogenase (in peripheral tissue). NAD+ -dependent aldehyde dehydrogenase oxidizes 5-hydroxyindole-3-acetate (the product of 5-hydroxytryptamine (serotonin) metabolism) in the brain, blood platelets, liver and pulmonary endothelium (Newsholme, supra, p. 786). Other neurotransmitter degradation pathways that utilize NAD+/NADH-dependent oxidoreductase activity include those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain stem, spinal cord and retina) (Newsholme, supra, pp. 790, 792). Epigenetic or genetic defects in neurotransmitter metabolic pathways can result in a spectrum of disease states in different tissues including Parkinson disease and inherited myoclonus (McCance, K.L. and S.E. Huether (1994) Pathophysiology , Mosby- Year Book, Inc., St. Louis MO, pp. 402-404; Gundlach, A.L. (1990) FASEB J. 4:2761-2766).
Tetrahydrofolate is a derivatized glutamate molecule that acts as a carrier, providing activated one-carbon units to a wide variety of biosynthetic reactions, including synthesis of purines, pyrimidines, and the amino acid methionine. Tetrahydrofolate is generated by the activity of a holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase. Thus, tetrahydrofolate dehydrogenase plays an important role in generating building blocks for nucleic and amino acids, crucial to proliferating cells. 3-Hydroxyacyl-CoA dehydrogenase (3HACD) is involved in fatty acid metabolism. It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells. In peroxisomes, 3HACD and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are associated with peroxisomal bifunctional enzyme deficiency. This interruption in fatty acid metabolism produces accumulation of very-long chain fatty acids, disrupting development of the brain, bone, and adrenal glands. Infants born with this deficiency typically die within 6 months (Watkins, P. et al. (1989) J. Clin. Invest. 83:771-777; Online Mendelian Inheritance in Man (OMIM), #261515). The neurodegeneration that is characteristic of Alzheimer's disease involves development of extracellular plaques in certain brain regions. A major protein component of these plaques is the peptide amyloid- β (Aβ), which is one of several cleavage products of amyloid precursor protein
(APP). 3HACD has been shown to bind the Aβ peptide, and is overexpressed in neurons affected in Alzheimer's disease. In addition, an antibody against 3HACD can block the toxic effects of A β in a cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMIM, #602057).
Steroids, such as estrogen, testosterone, corticosterone, and others, are generated from a common precursor, cholesterol, and are interconverted into one another. A wide variety of enzymes act upon cholesterol, including a number of dehydrogenases. Steroid dehydrogenases, such as the hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and D. Ghosh (1997) Steroids 62:95-100). One such dehydrogenase is 3-oxo-5-α-steroid dehydrogenase (OASD), a microsomal membrane protein highly expressed in prostate and other androgen- responsive tissues. OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most potent androgen. Dihydrotestosterone is essential for the formation of the male phenotype during embryogenesis, as well as for proper androgen-mediated growth of tissues such as the prostate and male genitalia. A defect in OASD that prevents the conversion of testosterone into dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective formation of the external genitalia (Andersson, S. et al. (1991) Nature 354:159-161; Labrie, F. et al. (1992) Endocrinology 131:1571-1573; OMIM #264600). Thus, OASD plays a central role in sexual differentiation and androgen physiology. 17β-hydroxysteroid dehydrogenase (17βHSD6) plays an important role in the regulation of the male reproductive hormone, dihydrotestosterone (DHTT). 17 βHSD6 acts to reduce levels of DHTT by oxidizing a precursor of DHTT, 3α-diol, to androsterone which is readily glucuronidated and removed from tissues. 17βHSD6 is active with both androgen and estrogen substrates when expressed in embryonic kidney 293 cells. At least five other isozymes of 17 βHSD have been identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for different steroid substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol. Chem. 272:15959-15966). For example, 17βHSDl preferentially reduces estradiol and is abundant in the ovary and placenta. 17βHSD2 catalyzes oxidation of androgens and is present in the endometrium and placenta. 17βHSD3 is exclusively a reductive enzyme in the testis (Geissler, W.M. et al. (1994) Nat. Genet. 7:34-39). An excess of androgens such as DHTT can contribute to certain disease states such as benign prostatic hyperplasia and prostate cancer.
Oxidoreductases are components of the fatty acid metabolism pathways in mitochondria and peroxisomes. The main beta-oxidation pathway degrades both saturated and unsaturated fatty acids, while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids. The auxiUary beta-oxidation enzyme 2,4-dienoyl-CoA reductase catalyzes the removal of even-numbered double bonds from unsaturated fatty acids prior to their entry into the main beta- oxidation pathway. The enzyme may also remove odd-numbered double bonds from unsaturated fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al. (1992) Proc. Natl. Acad. Sci. USA 89:6673-6677). 2,4-dienoyl-CoA reductase is located in both mitochondria and peroxisomes. Inherited deficiencies in mitochondrial and peroxisomal beta-oxidation enzymes are
associated with severe diseases, some of which manifest themselves soon after birth and lead to death within a few years. Defects in beta-oxidation are associated with Reye's syndrome, Zellweger syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, and bifunctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoefler, supra; Cotran, R.S. et al. (1994) Robbins Pathologic Basis of Disease. W.B. Saunders Co.,
Philadelphia PA, p.866). Peroxisomal beta-oxidation is impaired in cancerous tissue. Although neoplastic human breast epithelial cells have the same number of peroxisomes as do normal cells, fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have lower fatty-acyl-CoA oxidase and bifunctional enzyme (including enoyl-CoA hydratase) activities than normal tissue (Cable, S. et al. (1992) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol. 62:221- 226). Another important oxidoreductase is isocitrate dehydrogenase, which catalyzes the conversion of isocitrate to a-ketoglutarate, a substrate of the citric acid cycle. Isocitrate dehydrogenase can be either NAD or NADP dependent, and is found in the cytosol, mitochondria, and peroxisomes. Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, neurotransmitters, and growth factors.
Hydroxypyruvate reductase (HPR), a peroxisomal 2-hydroxyacid dehydrogenase in the glycolate pathway, catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of both NADH and NADPH. The reverse dehydrogenase reaction reduces NAD+ and NADP+. HPR recycles nucleotides and bases back into pathways leading to the synthesis of ATP and GTP. ATP and GTP are used to produce DNA and RNA and to control various aspects of signal transduction and energy metabolism. Inhibitors of purine nucleotide biosynthesis have long been employed as antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis of serine and cellular serine levels available for protein synthesis. The mitochondrial electron transport (or respiratory) chain is a series of oxidoreductase-type enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons from NADH through a series of redox centers within these complexes to oxygen, and the coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the primary source of energy for driving a cell's many energy-requiring reactions. The key complexes in the respiratory chain are NADH:ubiquinone oxidoreductase (complex I), succinate:ubiquinone oxidoreductase (complex II), cytochrome c rb oxidoreductase (complex UI), cytochrome c oxidase (complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on the inner matrix side of the mitochondrial membrane except complex π, which is on the cytosolic side. Complex II transports electrons generated in the citric acid cycle to the respiratory chain. The
electrons generated by oxidation of succinate to fumarate in the citric acid cycle are transferred through electron carriers in complex II to membrane bound ubiquinone (Q). Transcriptional regulation of these nuclear-encoded genes appears to be the predominant means for controlling the biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain are associated with a variety of disease conditions.
Other dehydrogenase activities using NAD as a cofactor are also important in mitochondrial function. 3-hydroxyisobutyrate dehydrogenase (3HBD), important in valine catabolism, catalyzes the NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within mitochondria. Elevated levels of 3-hydroxyisobutyrate have been reported in a number of disease states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraff, P.M. et al. (1989) J. Biol. Chem. 264:5899-5903).
Another mitochondrial dehydrogenase important in amino acid metabolism is the enzyme isovaleryl-CoA-dehydrogenase (IVD). IVD is involved in leucine metabolism and catalyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA. Human IVD is a tetrameric flavoprotein that is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial import signal sequence. A genetic deficiency, caused by a mutation in the gene encoding IVD, results in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial import and processing of the IVD precursor (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494-2501). Transferases
Transferases are enzymes that catalyze the transfer of molecular groups. The reaction may involve an oxidation, reduction, or cleavage of covalent bonds, and is often specific to a substrate or to particular sites on a type of substrate. Transferases participate in reactions essential to such functions as synthesis and degradation of cell components, regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Transferases are involved in key steps in disease processes involving these functions. Transferases are frequently classified according to the type of group transferred. For example, methyl transferases transfer one- carbon methyl groups, amino transferases transfer nitrogenous amino groups, and similarly denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, phosphorous-containing, sulfur-containing, or selenium-containing groups, as well as small enzymatic groups such as Coenzyme A.
Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in
fatty acid metabolism and transport. Choline O-acetyl transferase catalyzes the biosynthesis of the neurotransmitter acetylcholine.
Amino transferases play key roles in protein synthesis and degradation, and they contribute to other processes as well. For example, the amino transferase 5-aminolevulinic acid synthase catalyzes the addition of succinyl-CoA to glycine, the first step in heme biosynthesis. Other amino transferases participate in pathways important for neurological function and metabolism. For example, glutamine-phenylpyruvate amino transferase, also known as glutamine transaminase K (GTK), catalyzes several reactions with a pyridoxal phosphate cofactor. GTK catalyzes the reversible conversion of L-glutamine and phenylpyruvate to 2-oxoglutaramate and L-phenylalanine. Other amino acid substrates for GTK include L-methionine, L-histidine, and L-tyrosine. GTK also catalyzes the conversion of kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D-aspartate (NMD A) receptor in the brain and may exert a neuromodulatory function. Alteration of the kynurenine metabolic pathway may be associated with several neurological disorders. GTK also plays a role in the metabolism of halogenated xenobiotics conjugated to glutathione, leading to nephrotoxicity in rats and neurotoxicity in humans. GTK is expressed in kidney, liver, and brain. Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: EC 2.6.1.64; Perry, SJ. et al. (1993) Mol. Pharmacol. 43:660-665; Perry, S. et al. (1995) FEBS Lett. 360:277-280; and Alberati-Giani, D. et al. (1995) J. Neurochem. 64:1448-1455). A second amino transferase associated with this pathway is kynurenine/α-aminoadipate amino transferase (AadAT). AadAT catalyzes the reversible conversion of α-aminoadipate and α-ketoglutarate to α-ketoadipate and L-glutamate during lysine metabolism. AadAT also catalyzes the transamination of kynurenine to kynurenic acid. A cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. (1970) Biochim. Biophys. Acta 198:219- 228; Buchli, R. et al. (1995) J. Biol. Chem. 270:29330-29335). Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, and other foreign substances. Another mammalian glycosyl transferase, mammalian UDP-galactose- ceramide galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of galactocerebrosides in myelin membranes of the nervous system. The UDP-glycosyl transferases share a conserved signature domain of about 50 amino acid residues (PROSITE: PDOC00359, http://expasy.hcuge.ch/sprot/prosite.html).
Methyl transferases are involved in a variety of pharmacologically important processes. Nicotinamide N-methyl transferase catalyzes the N-methylation of nicotinamides and other pyridines, an important step in the cellular handling of drugs and other foreign compounds.
Phenylethanolamine N-methyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6- O-methylguanine-DNA methyl transferase reverses DNA methylation, an important step in carcinogenesis. Uroporphyrin-m C-methyl transferase, which catalyzes the transfer of two methyl groups from S-adenosyl-L-methionine to uroporphyrinogen IH, is the first specific enzyme in the biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia. Protein- arginine methyl transferases catalyze the posttranslational methylation of arginine residues in proteins, resulting in the mono- and dimethylation of arginine on the guanidino group. Substrates include histones, myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in mRNA processing, splicing, and transport. Protein-arginine methyl transferase interacts with proteins upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with interferon, suggesting an important role for methylation in cytokine receptor signaling (Lin, W.-J. et al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and Scott, H.S. et al. (1998) Genomics 48:330-340).
Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important in energy-requiring and -releasing reactions. The metabolic enzyme creatine kinase catalyzes the reversible phosphate transfer between creatine/creatine phosphate and ATP/ADP. Glycocyamine kinase catalyzes phosphate transfer from ATP to guanidoacetate, and arginine kinase catalyzes phosphate transfer from ATP to arginine. A cysteine-containing active site is conserved in this family (PROSITE: PDOC00103). Prenyl transferases are heterodimers, consisting of an alpha and a beta subunit, that catalyze the transfer of an isoprenyl group. An example of a prenyl transferase is the mammalian protein farnesyl transferase. The alpha subunit of farnesyl transferase consists of 5 repeats of 34 amino acids each, with each repeat containing an invariant tryptophan (PROSITE: PDOC00703).
Saccharyl transferases are glycating enzymes involved in a variety of metabolic processes. Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts.
Accumulation of these endproducts is observed in vascular complications of diabetes, macrovascular disease, renal insufficiency, and Alzheimer's disease (Thornalley, P.J. (1998) Cell Mol. Biol. (Noisy- Le-Grand) 44: 1013-1023).
Coenzyme A (CoA) transferase catalyzes the transfer of CoA between two carboxylic acids. Succinyl CoA:3-oxoacid CoA transferase, for example, transfers CoA from succinyl-CoA to a recipient such as acetoacetate. Acetoacetate is essential to the metabolism of ketone bodies, which accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). Hvdrolases
Hydrolysis is the breaking of a covalent bond in a substrate by introduction of a molecule of water. The reaction involves a nucleophiUc attack by the water molecule' s oxygen atom on a target
bond in the substrate. The water molecule is split across the target bond, breaking the bond and generating two product molecules. Hydrolases participate in reactions essential to such functions as synthesis and degradation of cell components, and for regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Hydrolases are involved in key steps in disease processes involving these functions. Hydrolytic enzymes, or hydrolases, may be grouped by substrate specificity into classes including phosphatases, peptidases, lysophospholipases, phosphodiesterases, glycosidases, and glyoxalases.
Phosphatases hydrolytically remove phosphate groups from proteins, an energy-providing step that regulates many cellular processes, including intracellular signaling pathways that in turn control cell growth and differentiation, cell-cell contact, the cell cycle, and oncogenesis.
Lysophospholipases (LPLs) regulate intracellular lipids by catalyzing the hydrolysis of ester bonds to remove an acyl group, a key step in lipid degradation. Small LPL isoforms, approximately 15-30 kD, function as hydrolases; larger isoforms function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes. LPL activity is regulated by signaling molecules important in numerous pathways, including the inflammatory response.
Peptidases, also called proteases, cleave peptide bonds that form the backbone of peptide or protein chains. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Since typical protein half-lives range from hours to a few days, peptidases are continually cleaving precursor proteins to their active form, removing signal sequences from targeted proteins, and degrading aged or defective proteins. Peptidases function in bacterial, parasitic, and viral invasion and replication within a host. Examples of peptidases include trypsin and chymotrypsin (components of the complement cascade and the blood-clotting cascade) lysosomal cathepsins, calpains, pepsin, renin, and chymosin (Beynon, RJ. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach. Oxford University Press, New York NY, pp. 1-5).
The phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell growth and replication as well as protein synthesis. Another phosphodiesterase is acid sphingomyelinase, which hydrolyzes the membrane phospholipid sphingomyelin to ceramide and phosphorylchoUne. Phosphorylcholine is used in the synthesis of phosphatidylcholine, which is involved in numerous intracellular signaling pathways. Ceramide is an essential precursor for the generation of gangliosides, membrane lipids found in high concentration in neural tissue. Defective
acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyelin molecules in lysosomes, resulting in Niemann-Pick disease.
Glycosidases catalyze the cleavage of hemiacetyl bonds of glycosides, which are compounds that contain one br more sugar. Mammalian lactase-phlorizin hydrolase, for example, is an intestinal enzyme that splits lactose. Mammalian beta-galactosidase removes the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated with a gangliosidosis known as Morquio disease type B. Vertebrate lysosomal alpha-glucosidase, which hydrolyzes glycogen, maltose, and isomaltose, and vertebrate intestinal sucrase-isomaltase, which hydrolyzes sucrose, maltose, and isomaltose, are widely distributed members of this family with highly conserved sequences at their active sites.
The glyoxylase system is involved in gluconeogenesis, the production of glucose from storage compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and glyoxylase II, which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. Glyoxylases are involved in hyperglycemia, non-insulin-dependent diabetes mellitus, the detoxification of bacterial toxins, and in the control of cell proliferation and microtubule assembly. Lyases
Lyases are a class of enzymes that catalyze the cleavage of C-C, C-O, C-N, C-S, C-(halide), P-O or other bonds without hydrolysis or oxidation to form two molecules, at least one of which contains a double bond (Stryer, L. (1995) Biochemistry W.H. Freeman and Co. New York, NY p.620). Lyases are critical components of cellular biochemistry with roles in metaboUc energy production including fatty acid metabolism, as well as other diverse enzymatic processes. Further classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group. The group of C-C lyases include carboxyl-lyases (decarboxylases), aldehyde-lyases (aldolases), oxo-acid-lyases and others. The C-O lyase group includes hydro-lyases, lyases acting on polysaccharides and other lyases. The C-N lyase group includes ammonia-lyases, amidine-lyases, amine-lyases (deaminases) and other lyases.
Proper regulation of lyases is critical to normal physiology. For example, mutation induced deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in the geneticaUy-linked disorder familial porphyria cutanea tarda (Mendez, M. et al. (1998) Am. J. Genet. 63: 1363-1375). It has also been shown that adenosine deaminase (ADA) deficiency stems from genetic mutations in the ADA gene, resulting in the disorder severe combined immunodeficiency disease (SCID) (Hershfield, M.S. (1998) Semin. Hematol. 35:291-298). Isomerases
Isomerases are a class of enzymes that catalyze geometric or structural changes within a molecule to form a single product. This class includes racemases and epimerases, cis-trans- isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) Biochemistry. W.H. Freeman and Co., New York NY, pp.483-507).
Racemases are a subset of isomerases that catalyze inversion of a molecules configuration around the asymmetric carbon atom in a substrate having a single center of asymmetry, thereby interconverting two racemers. Epimerases are another subset of isomerases that catalyze inversion of configuration around an asymmetric carbon atom in a substrate with more than one center of symmetry, thereby interconverting two epimers. Racemases and epimerases can act on amino acids and derivatives, hydroxy acids and derivatives, as well as carbohydrates and derivatives. The interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4' -epimerase. Proper regulation and function of this epimerase is essential to the synthesis of glycoproteins and glycolipids. Elevated blood galactose levels have been correlated with UDP-galactose-4' -epimerase deficiency in screening programs of infants (Gitzelmann, R. (1972) Helv. Paediat. Acta 27:125-130). Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer of electrons from a substrate that becomes oxidized to a substrate that becomes reduced. This class of enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases. Proper maintenance of oxidoreductase levels is physiologically important. For example, genetically-linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. (1977) Pediat. Res. 11:1198-1202).
Another subgroup of isomerases are the transferases (or mutases). Transferases transfer a chemical group from one compound (the donor) to another compound (the acceptor). The types of groups transferred by these enzymes include acyl groups, amino groups, phosphate groups
(phosphotransferases or phosphomutases), and others. The transferase carnitine palmitoyltransferase is an important component of fatty acid metaboUsm. Genetically-linked deficiencies in this transferase can lead to myopathy (Scriver, CR. et al. (1995) The Metabolic and Molecular Basis of Inherited Disease. McGraw-Hill, New York NY, ρp.1501-1533). Yet another subgroup of isomerases are the topoisomersases. Topoisomerases are enzymes that affect the topological state of DNA. For example, defects in topoisomerases or their regulation can affect normal physiology. Reduced levels of topoisomerase II have been correlated with some of the DNA processing defects associated with the disorder ataxia-telangiectasia (Singh, S.P. et al. (1988) Nucleic Acids Res. 16:3919-3929). Ligases
Ligases catalyze the formation of a bond between two substrate molecules. The process involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are classified based on the nature of the type of bond they form, which can include carbon-oxygen, carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds. Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RNA (tRNA) synthetases which are important RNA-associated enzymes with roles in translation. Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, and each class is characterized by a distinctive topology of the catalytic domain. Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold. Class II enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel β-sheet motif, as well as N- and C- terminal regulatory domains. Class II enzymes are separated into two groups based on the heterodimeric or homodimeric structure of the enzyme; the latter group is further subdivided by the structure of the N- and C-terminal regulatory domains (Hartlein, M. and S. Cusack (1995) J. Mol. Evol. 40:519-530). Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (ILD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals. Ligases forming carbon-sulfur bonds (Acid-thiol ligases) mediate a large number of cellular biosynthetic intermediary metabolism processes involve intermolecular transfer of carbon atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic acid cycle, synthesis of fatty acids and long-chain phospholipids, synthesis of alcohols and aldehydes, synthesis of intermediary metabolites, and reactions involved in the amino acid degradation pathways. Some of these reactions require input of energy, usually in the form of conversion of ATP to either ADP or AMP and pyrophosphate.
In many cases, a carbon substrate is derived from a small molecule containing at least two carbon atoms. The carbon substrate is often covalently bound to a larger molecule which acts as a carbon substrate carrier molecule within the cell. In the biosynthetic mechanisms described above, the carrier molecule is coenzyme A. Coenzyme A (CoA) is structurally related to derivatives of the nucleotide ADP and consists of 4'-phosphopantetheine linked via a phosphodiester bond to the alpha phosphate group of adenosine 3',5'-bisphosphate. The terminal thiol group of 4'-phosphopantetheine acts as the site for carbon substrate bond formation. The predominant carbon substrates which utilize CoA as a carrier molecule during biosynthesis and intermediary metabolism in the cell are acetyl, succinyl, and propionyl moieties, collectively referred to as acyl groups. Other carbon substrates
include enoyl Upid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein. Acyl-CoA and acetyl-CoA are synthesized in the cell by acyl-CoA synthetase and acetyl-CoA synthetase, respectively.
Activation of fatty acids is mediated by at least three forms of acyl-CoA synthetase activity: i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight carboxylic acids and is found in muscle mitochondria and the cytosol of other tissues; ii) medium-chain acyl-CoA synthetase, which activates fatty acids containing between four and eleven carbon atoms (predominantly from dietary sources), and is present only in liver mitochondria; and iii) acyl CoA synthetase, which is specific for long chain fatty acids with between six and twenty carbon atoms, and is found in microsomes and the mitochondria. Proteins associated with acyl-CoA synthetase activity have been identified from many sources including bacteria, yeast, plants, mouse, and man. The activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by cAMP-dependent protein kinase.
Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase (glutamate-ammonia ligase) that catalyzes the amination of glutamic acid to glutamine by ammonia using the energy of ATP hydrolysis. Glutamine is the primary source for the amino group in various amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and pyrimidine ribonucleotide interconversions. Overexpression of glutamine synthetase has been observed in primary liver cancer (Christa, L. et al. (1994) Gastroent. 106:1312-1320). Acid-amino-acid ligases (peptide synthases) are represented by the ubiquitin proteases which are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-Uves of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein. Ub is first activated by a ubiquitin-activating enzyme (El), and then transferred to one of several Ub- conjugating enzymes (E2). E2 then links the Ub molecule through its C-terminal glycine to an internal lysine (acceptor lysine) of a target protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. The UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79:13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung
(Gray, D.A. (1995) Oncogene 10:2179-2183).
Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis. Because these pathways are critical to the synthesis of nucleotides for replication of both RNA and DNA, many of these enzymes have been the targets of clinical agents for the treatment of cell proliferative disorders such as cancer and infectious diseases.
Purine biosynthesis occurs de novo from the amino acids glycine and glutamine, and other small molecules. Three of the key reactions in this process are catalyzed by a trifunctional enzyme composed of glycinamide-ribonucleotide synthetase (GARS), aminoimidazole ribonucleotide synthetase (AIRS), and glycinamide ribonucleotide transformylase (GART). Together these three enzymes combine ribosylamine phosphate with glycine to yield phosphoribosyl aminoimidazole, a precursor to both adenylate and guanylate nucleotides. This trifunctional protein has been implicated in the pathology of Downs syndrome (Aimi, J. et al. (1990) Nucleic Acid Res. 18:6665-6672). Adenylosuccinate synthetase catalyzes a later step in purine biosynthesis that converts inosinic acid to adenylosuccinate, a key step on the path to ATP synthesis. This enzyme is also similar to another carbon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea cycle (Powell, S.M. et al. (1992) FEBS Lett. 303:4-10).
Like the de novo biosynthesis of purines, de novo synthesis of the pyrimidine nucleotides uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide orotidylate derived from orotate and phosphoribosyl pyrophosphate (PPRP). Again a trifunctional enzyme comprising three carbon-nitrogen ligases plays a key role in the process. In this case the enzymes aspartate transcarbamylase (ATCase), carbamyl phosphate synthetase II, and dihydroorotase (DHOase) are encoded by a single gene called CAD. Together these three enzymes combine the initial reactants in pyrimidine biosynthesis, glutamine, CO 2 and ATP to form dihydroorotate, the precursor to orotate and orotidylate (Iwahana, H. et al. (1996) Biochem. Biophys. Res. Commun. 219:249-255). Further steps then lead to the synthesis of uridine nucleotides from orotidylate. Cytidine nucleotides are derived from uridine-5' -triphosphate (UTP) by the amidation of UTP using glutamine as the amino donor and the enzyme CTP synthetase. Regulatory mutations in the human CTP synthetase are believed to confer multi-drug resistance to agents widely used in cancer therapy (Yamauchi, M. et al. (1990) EMBO J. 9:2095-2099).
Ligases forming carbon-carbon bonds include the carboxylases acetyl-CoA carboxylase and pyruvate carboxylase. Acetyl-CoA carboxylase catalyzes the carboxylation of acetyl-CoA from C02 and H20 using the energy of ATP hydrolysis. Acetyl-CoA carboxylase is the rate-limiting step in the biogenesis of long-chain fatty acids. Two isoforms of acetyl-CoA carboxylase, types I and types II, are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297-
306). Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle.
Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a DNA chain using the energy from ATP hydrolysis to first activate the free 5 '-phosphate of one nucleotide and then react it with the 3' -OH group of the adjacent nucleotide. This resealing reaction is used in both DNA replication to join small DNA fragments called Okazaki fragments that are transiently formed in the process of replicating new DNA, and in DNA repair. DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the DNA can occur. Bloom's syndrome is an inherited human disease in which individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, B. et al. (1994) The Molecular Biology of the Cell. Garland Publishing hie, New York NY, p. 247).
Molecules Associated with Growth and Development
Human growth and development requires the spatial and temporal regulation of cell differentiation, cell proliferation, and apoptosis. These processes coordinately control reproduction, aging, embryogenesis, morphogenesis, organogenesis, and tissue repair and maintenance. At the cellular level, growth and development is governed by the cell's decision to enter into or exit from the cell division cycle and by the cell's commitment to a terminally differentiated state. These decisions are made by the ceU in response to extracellular signals and other environmental cues it receives. The following discussion focuses on the molecular mechanisms of cell division, reproduction, cell differentiation and proliferation, apoptosis, and aging. Cell Division Cell division is the fundamental process by which all living things grow and reproduce. In unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, while in multicellular species many rounds of cell division are required to replace cells lost by wear or by programmed cell death, and for cell differentiation to produce a new tissue or organ. Details of the cell division cycle may vary, but the basic process consists of three principle events. The first event, interphase, involves preparations for cell division, replication of the DNA, and production of essential proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of the cell. The final event, cytokinesis, is division and fission of the cell cytoplasm. The sequence and timing of cell cycle transitions is under the control of the cell cycle regulation system which controls the process by positive or negative regulatory circuits at various check points. Regulated progression of the cell cycle depends on the integration of growth control
pathways with the basic cell cycle machinery. Cell cycle regulators have been identified by selecting for human and yeast cDNAs that block or activate cell cycle arrest signals in the yeast mating pheromone pathway when they are overexpressed. Known regulators include human CPR (cell cycle progression restoration) genes, such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including CDC91, that block the arrest signals. The CPR genes express a variety of proteins including cyclins, tumor suppressor binding proteins, chaperones, transcription factors, translation factors, and RNA-binding proteins (Edwards, M.C. et al.(1997) Genetics 147:1063-1076).
Several cell cycle transitions, including the entry and exit of a cell from mitosis, are dependent upon the activation and inhibition of cyclin-dependent kinases (Cdks). The Cdks are composed of a kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levels of regulation. There appears to be a single Cdk in Saccharomyces cerevisiae and Saccharomyces pombe whereas mammals have a variety of specialized Cdks. Cyclins act by binding to and activating cyclin-dependent protein kinases which then phosphorylate and activate selected proteins involved in the mitotic process. The Cdk-cyclin complex is both positively and negatively regulated by phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53. In addition, Cdks are further regulated by binding to inhibitors and other proteins such as Sucl that modify their specificity or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 10:1503-1515; and Mathias, N. et al. (1996) Mol. Cell Biol. 16:6634-6643). Reproduction The male and female reproductive systems are complex and involve many aspects of growth and development. The anatomy and physiology of the male and female reproductive systems are reviewed in (Guyton, A.C. (1991) Textbook of Medical Physiology. W.B. Saunders Co., Philadelphia PA, pp. 899-928).
The male reproductive system includes the process of spermatogenesis, in which the sperm are formed, and male reproductive functions are regulated by various hormones and their effects on accessory sexual organs, cellular metabolism, growth, and other bodily functions.
Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones released from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell divisions before undergoing meiosis and full maturation. The testes secrete several male sex hormones, the most abundant being testosterone, that is essential for growth and division of the immature sperm, and for the masculine characteristics of the male body. Three other male sex hormones, gonadotropin-releasing hormone (GnRH), luteinizing hormone (LH), and follicle- stimulating hormone (FSH) control sexual function.
The uterus, ovaries, fallopian tubes, vagina, and breasts comprise the female reproductive system. The ovaries and uterus are the source of ova and the location of fetal development,
respectively. The fallopian tubes and vagina are accessory organs attached to the top and bottom of the uterus, respectively. Both the uterus and ovaries have additional roles in the development and loss of reproductive capability during a female' s lifetime. The primary role of the breasts is lactation.
Multiple endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other tissues coordinate reproduction and lactation. These signals vary during the monthly menstruation cycle and during the female's lifetime. Similarly, the sensitivity of reproductive organs to these endocrine signals varies during the female's lifetime.
A combination of positive and negative feedback to the ovaries, pituitary and hypothalamus glands controls physiologic changes during the monthly ovulation and endometrial cycles. The anterior pituitary secretes two major gonadotropin hormones, follicle-stimulating hormone (FSH) and luteinizing hormone (LH), regulated by negative feedback of steroids, most notably by ovarian estradiol. If fertilization does not occur, estrogen and progesterone levels decrease. This sudden reduction of the ovarian hormones leads to menstruation, the desquamation of the endometrium.
Hormones further govern all the steps of pregnancy, parturition, lactation, and menopause. During pregnancy large quantities of human chorionic gonadotropin (hCG), estrogens, progesterone, and human chorionic somatomammotropin (hCS) are formed by the placenta. hCG, a glycoprotein similar to luteinizing hormone, stimulates the corpus luteum to continue producing more progesterone and estrogens, rather than to involute as occurs if the ovum is not fertilized. hCS is similar to growth hormone and is crucial for fetal nutrition. The female breast also matures during pregnancy. Large amounts of estrogen secreted by the placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the secretion of prolactin by the pituitary gland.
Parturition involves several hormonal changes that increase uterine contractility toward the end of pregnancy, as follows. The levels of estrogens increase more than those of progesterone. Oxytocin is secreted by the neurohypophysis. Concomitantly, uterine sensitivity to oxytocin increases. The fetus itself secretes oxytocin, cortisol (from adrenal glands), and prostaglandins. Menopause occurs when most of the ovarian follicles have degenerated. The ovary then produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands.
Mean levels of circulating FSH and LH increase, even as ovulatory cycles continue. Therefore, the ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual cycles. Consequently, menstrual bleeding ceases and reproductive capability ends.
Cell Differentiation and Proliferation
Tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of
proteins which control cell cycle progression in response to extracellular signals, such as growth factors and other mitogens, and intracellular cues, such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors.
Growth factors were originally described as serum factors required to promote cell proliferation. Most growth factors are large, secreted polypeptides that act on cells in their local environment. Growth factors bind to and activate specific cell surface receptors and initiate intracellular signal transduction cascades. Many growth factor receptors are classified as receptor tyrosine kinases which undergo autophosphorylation upon ligand binding. Autophosphorylation enables the receptor to interact with signal transduction proteins characterized by the presence of SH2 or SH3 domains (Src homology regions 2 or 3). These proteins then modulate the activity state of small G-proteins, such as Ras, Rab, and Rho, along with GTPase activating proteins (GAPs), guanine nucleotide releasing proteins (GNRPs), and other guanine nucleotide exchange factors. Small G proteins act as molecular switches that activate other downstream events, such as mitogen-activated protein kinase (MAP kinase) cascades. MAP kinases ultimately activate transcription of mitosis- promoting genes.
In addition to growth factors, small signaling peptides and hormones also influence cell proliferation. These molecules bind primarily to another class of receptor, the trimeric G-protein coupled receptor (GPCR), found predominantly on the surface of immune, neuronal and neuroendocrine cells. Upon ligand binding, the GPCR activates a trimeric G protein which in turn triggers increased levels of intracellular second messengers such as phospholipase C, Ca2+, and cyclic AMP. Most GPCR-mediated signaling pathways indirectly promote cell proliferation by causing the secretion or breakdown of other signaling molecules that have direct mitogenic effects. These signaling cascades often involve activation of kinases and phosphatases. Some growth factors, such as some members of the transforming growth factor beta (TGF-β) family, act on some cells to stimulate cell proliferation and on other cells to inhibit it. Growth factors may also stimulate a cell at one concentration and inhibit the same cell at another concentration. Most growth factors also have a multitude of other actions besides the regulation of cell growth and division: they can control the proliferation, survival, differentiation, migration, or function of cells depending on the circumstance. For example, the tumor necrosis factor/nerve growth factor (TNF/NGF) family can activate or inhibit cell death, as well as regulate proliferation and differentiation. The cell response depends on the type of cell, its stage of differentiation and transformation status, which surface receptors are stimulated, and the types of stimuli acting on the cell (Smith, A. et al. (1994) Cell 76:959-962; and Nocentini, G. et al. (1997) Proc. Natl. Acad. Sci. USA 94:6216-6221).
Neighboring cells in a tissue compete for growth factors, and when provided with "unlimited" quantities in a perfused system will grow to even higher cell densities before reaching density- dependent inhibition of cell division. Cells often demonstrate an anchorage dependence of cell division as well. This anchorage dependence may be associated with the formation of focal contacts linking the cytoskeleton with the extracellular matrix (ECM). The expression of ECM components can be stimulated by growth factors. For example, TGF-β stimulates fibroblasts to produce a variety of ECM proteins, including fibronectin, collagen, and tenascin (Pearson, CA. et al. (1988) EMBO J. 7:2677-2981). In fact, for some cell types specific ECM molecules, such as laminin or fibronectin, may act as growth factors. Tenascin-C and -R, expressed in developing and lesioned neural tissue, provide stimulatory/anti-adhesive or inhibitory properties, respectively, for axonal growth (Faissner, A. (1997) Cell Tissue Res. 290:331-341).
Cancers are associated with the activation of oncogenes which are derived from normal cellular genes. These oncogenes encode oncoproteins which convert normal cells into malignant cells. Some oncoproteins are mutant isoforms of the normal protein, and other oncoproteins are abnormally expressed with respect to location or amount of expression. The latter category of oncoprotein causes cancer by altering transcriptional control of cell proliferation. Five classes of oncoproteins are known to affect cell cycle controls. These classes include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. Viral oncogenes are integrated into the human genome after infection of human cells by certain viruses. Examples of viral oncogenes include v-src, v-abl, and v-fps.
Many oncogenes have been identified and characterized. These include sis, erbA, erbB, her- 2, mutated Gs, src, abl, ras, crk, jun, fos, myc, and mutated tumor-suppressor genes such as RB, p53, mdm2, Cipl, pl6, and cyclin D. Transformation of normal genes to oncogenes may also occur by chromosomal translocation. The Philadelphia chromosome, characteristic of chronic myeloid leukemia and a subset of acute lymphoblastic leukemias, results from a reciprocal translocation between chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint cluster region (ber) on chromosome 22.
Tumor-suppressor genes are involved in regulating cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in uncontrolled cell proliferation. For example, the retinoblastoma gene product (RB), in a non-phosphorylated state, binds several early- response genes and suppresses their transcription, thus blocking cell division. Phosphorylation of RB causes it to dissociate from the genes, releasing the suppression, and allowing cell division to proceed. Apoptosis Apoptosis is the genetically controlled process by which unneeded or defective cells undergo
programmed cell death. Selective elimination of cells is as important for morphogenesis and tissue remodeling as is cell proliferation and differentiation. Lack of apoptosis may result in hyperplasia and other disorders associated with increased cell proliferation. Apoptosis is also a critical component of the immune response. Immune cells such as cytotoxic T-cells and natural killer cells prevent the spread of disease by inducing apoptosis in tumor cells and virus-infected cells. In addition, immune cells that fail to distinguish self molecules from foreign molecules must be eliminated by apoptosis to avoid an autoimmune response.
Apoptotic cells undergo distinct morphological changes. Hallmarks of apoptosis include cell shrinkage, nuclear and cytoplasmic condensation, and alterations in plasma membrane topology. Biochemically, apoptotic cells are characterized by increased intracellular calcium concentration, fragmentation of chromosomal DNA, and expression of novel cell surface components.
The molecular mechanisms of apoptosis are highly conserved, and many of the key protein regulators and effectors of apoptosis have been identified. Apoptosis generally proceeds in response to a signal which is transduced intracellularly and results in altered patterns of gene expression and protein activity. Signaling molecules such as hormones and cytokines are known both to stimulate and to inhibit apoptosis through interactions with cell surface receptors. Transcription factors also play an important role in the onset of apoptosis. A number of downstream effector molecules, particularly proteases such as the cysteine proteases called caspases, have been implicated in the degradation of cellular components and the proteolytic activation of other apoptotic effectors. Aging and Senescence
Studies of the aging process or senescence have shown a number of characteristic cellular and molecular changes (Fauci et al. (1998) Harrison's Principles of Internal Medicine. McGraw-Hill, New York NY, p.37). These characteristics include increases in chromosome structural abnormalities, DNA cross-linking, incidence of single-stranded breaks in DNA, losses in DNA methylation, and degradation of telomere regions. In addition to these DNA changes, post-translational alterations of proteins increase including, deamidation, oxidation, cross-linking, and nonenzymatic glycation. Still further molecular changes occur in the mitochondria of aging cells through deterioration of structure. These changes eventually contribute to decreased function in every organ of the body.
Biochemical Pathway Molecules
Biochemical pathways are responsible for regulating metabolism, growth and development, protein secretion and trafficking, environmental responses, and ecological interactions including immune response and response to parasites. DNA replication Deoxyribonucleic acid (DNA), the genetic material, is found in both the nucleus and
mitochondria of human cells. The bulk of human DNA is nuclear, in the form of linear chromosomes, while mitochondrial DNA is circular. DNA replication begins at specific sites called origins of replication. Bidirectional synthesis occurs from the origin via two growing forks that move in opposite directions. Replication is semi-conservative, with each daughter duplex containing one old strand and its newly synthesized complementary partner. Proteins involved in DNA replication include DNA polymerases, DNA primase, telomerase, DNA helicase, topoisomerases, DNA ligases, replication factors, and DNA-binding proteins. DNA Recombination and Repair
Cells are constantly faced with replication errors and environmental assault (such as ultraviolet irradiation) that can produce DNA damage. Damage to DNA consists of any change that modifies the structure of the molecule. Changes to DNA can be divided into two general classes, single base changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may produce a disorder, such as cancer.
Changes in DNA are recognized by repair systems within the cell. These repair systems act to correct the damage and thus prevent any deleterious affects of a mutational event. Repair systems can be divided into three general types, direct repair, excision repair, and retrieval systems. Proteins involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link repair proteins, recombination and repair proteins, RAD51 proteins, and BLN and WRN proteins that are homologs of RecQ helicase. When the repair systems are eliminated, cells become exceedingly sensitive to environmental mutagens, such as ultraviolet irradiation. Patients with disorders associated with a loss in DNA repair systems often exhibit a high sensitivity to environmental mutagens. Examples of such disorders include xeroderma pigmentosum (XP), Bloom's syndrome (BS), and Werner's syndrome (WS) (Yamagata, K. et al. (1998) Proc. Natl. Acad. Sci. USA 95:8733- 8738), ataxia telangiectasia, Cockayne's syndrome, and Fanconi's anemia. Recombination is the process whereby new DNA sequences are generated by the movements of large pieces of DNA. In homologous recombination, which occurs during meiosis and DNA repair, parent DNA duplexes align at regions of sequence similarity, and new DNA molecules form by the breakage and joining of homologous segments. Proteins involved include RAD51 recombinase. In site-specific recombination, two specific but not necessarily homologous DNA sequences are exchanged. In the immune system this process generates a diverse collection of antibody and T cell receptor genes. Proteins involved in site-specific recombination in the immune system include recombination activating genes 1 and 2 (RAG1 and RAG2). A defect in immune system site-specific recombination causes severe combined immunodeficiency disease in mice. RNA Metabolism Ribonucleic acid (RNA) is a linear single-stranded polymer of four nucleotides, ATP, CTP,
UTP, and GTP. In most organisms, RNA is transcribed as a copy of DNA, the genetic material of the organism. In retroviruses RNA rather than DNA serves as the genetic material. RNA copies of the genetic material encode proteins or serve various structural, catalytic, or regulatory roles in organisms. RNA is classified according to its cellular localization and function. Messenger RNAs (mRNAs) encode polypeptides. Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate mRNA into polypeptides. Transfer RNAs (tRNAs) are cytosolic adaptor molecules that function in mRNA translation by recognizing both an mRNA codon and the amino acid that matches that codon. Heterogeneous nuclear RNAs (hnRNAs) include mRNA precursors and other nuclear RNAs of various sizes. Small nuclear RNAs (snRNAs) are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) and rejoins exons in pre-mRNAs. RNA Transcription
The transcription process synthesizes an RNA copy of DNA. Proteins involved include multi-subunit RNA polymerases, transcription factors DA, DB, DD, DE, DF, DH, and DJ. Many transcription factors incorporate DNA-binding structural motifs which comprise either α-helices or β- sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix- turn-helix, zinc finger, leucine zipper, and helix-loop-helix. RNA Processing
Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns. The spliceosomal complex is comprised of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6. Each snRNP contains a single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base-pair with intron consensus sequences. The protein components mediate spliceosome assembly and the splicing reaction. Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, p. 863).
Heterogeneous nuclear ribonucleoproteins (hnRNPs) have been identified that have roles in splicing, exporting of the mature RNAs to the cytoplasm, and mRNA translation (Biamonti, G. et al. (1998) Clin. Exp. Rheumatol. 16:317-326). Some examples of hnRNPs include the yeast proteins Hrplp, involved in cleavage and polyadenylation at the 3' end of the RNA; Cbp80p, involved in capping the 5' end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, involved in export of mRNA from the nucleus (Shen, E.G. et al. (1998) Genes Dev. 12:679-691). HnRNPs have been shown to be important targets of the autoimmune response in rheumatic diseases (Biamonti, supra). Many snRNP proteins, hnRNP proteins, and alternative splicing factors are characterized by
an RNA recognition motif (RRM). (Reviewed in Birney, E. et al. (1993) Nucleic Acids Res. 21:5803-5816.) The RRM is about 80 amino acids in length and forms four β-strands and two α- helices arranged in an α/β sandwich. The RRM contains a core RNP-1 octapeptide motif along with surrounding conserved sequences. RNA Stability and Degradation
RNA helicases alter and regulate RNA conformation and secondary structure by using energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most well-characterized and ubiquitous family of RNA helicases is the DEAD-box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family. Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 337:121-122.) Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors. Other DEAD-box helicases have been implicated either directly or indirectly in ultraviolet light-induced tumors, B cell lymphoma, and myeloid malignancies. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168.)
Ribonucleases (RNases) catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus cleaving the RNA. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle. RNase H domains are often found as a domain associated with reverse transcriptases. RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Protein Translation
The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more than fifty proteins. The ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small). Three important sites are identified on the ribosome. The aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds. The exit site (E site) is where
deacylated tRNAs bind prior to their release from the ribosome. (Translation is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, pp. 875-908; and Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, pp. 119-138.) tRNA Charging Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, Class I and Class D. Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (ILD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals. Translation Initiation
Initiation of translation can be divided into three stages. The first stage brings an initiator transfer RNA (Met-tRNAf) together with the 40S ribosomal subunit to form the 43S preinitiation complex. The second stage binds the 43S preinitiation complex to the mRNA, followed by migration of the complex to the correct AUG initiation codon. The third stage brings the 60S ribosomal subunit to the 40S subunit to generate an 80S ribosome at the initiation codon. Regulation of translation primarily involves the first and second stage in the initiation process (Pain, V.M. (1996) Eur. J. Biochem. 236:747-771). Several initiation factors, many of which contain multiple subunits, are involved in bringing an initiator tRNA and 40S ribosomal subunit together. eIF2, a guanine nucleotide binding protein, recruits the initiator tRNA to the 40S ribosomal subunit. Only when eIF2 is bound to GTP does it associate with the initiator tRNA. eIF2B, a guanine nucleotide exchange protein, is responsible for converting eIF2 from the GDP-bound inactive form to the GTP-bound active form. Two other factors, elFIA and eIF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal structural proteins. eIF3 is also involved in association of the 40S ribosomal subunit with mRNA. The Met-tRNAf, elFIA, eIF3, and 40S ribosomal subunit together make up the 43S preinitiation complex (Pain, supra).
Additional factors are required for binding of the 43S preinitiation complex to an mRNA molecule, and the process is regulated at several levels. e!F4F is a complex consisting of three proteins: eIF4E, eIF4A, and eD?4G. eIF4E recognizes and binds to the mRNA 5 -terminal m7GTP cap, eIF4A is a bidirectional RNA-dependent helicase, and eIF4G is a scaffolding polypeptide. eJF4G has three binding domains. The N-terminal third of eJF4G interacts with eIF4E, the central third interacts with eIF4A, and the C-terminal third interacts with eIF3 bound to the 43S preinitiation complex. Thus, eIF4G acts as a bridge between the 40S ribosomal subunit and the mRNA (Hentze,
M.W. (1997) Science 275:500-501).
The ability of eIF4F to initiate binding of the 43S preinitiation complex is regulated by structural features of the mRNA. The mRNA molecule has an untranslated region (UTR) between the 5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that impede binding of the 43S preinitiation complex. The helicase activity of eIF4A is thought to function in removing this secondary structure to facilitate binding of the 43S preinitiation complex
(Pain, supra).
Translation Elongation
Elongation is the process whereby additional amino acids are joined to the initiator methionine to form the complete polypeptide chain. The elongation factors EFlα, EFlβ γ, and EF2 are involved in elongating the polypeptide chain following initiation. EFlα is a GTP-binding protein.
In EFlα' s GTP-bound form, it brings an aminoacyl-tRNA to the ribosome' s A site. The amino acid attached to the newly arrived aminoacyl-tRNA forms a peptide bond with the initiator methionine.
The GTP on EFlα is hydrolyzed to GDP, and EFlα-GDP dissociates from the ribosome. EFlβ γ binds EFlα -GDP and induces the dissociation of GDP from EFlα, allowing EFlα to bind GTP and a new cycle to begin. ι
As subsequent aminoacyl-tRNAs are brought to the ribosome, EF-G, another GTP-binding protein, catalyzes the translocation of tRNAs from the A site to the P site and finally to the E site of the ribosome. This allows the processivity of translation. Translation Termination
The release factor eRF carries out termination of translation. eRF recognizes stop codons in the mRNA, leading to the release of the polypeptide chain from the ribosome.
Post-Translational Pathways
Proteins may be modified after translation by the addition of phosphate, sugar, prenyl, fatty acid, and other chemical groups. These modifications are often required for proper protein activity.
Enzymes involved in post-translational modification include kinases, phosphatases, glycosyltransferases, and prenyltransferases. The conformation of proteins may also be modified after translation by the introduction and rearrangement of disulfide bonds (rearrangement catalyzed by protein disulfide isomerase), the isomerization of proline sidechains by prolyl isomerase, and by interactions with molecular chaperone proteins.
Proteins may also be cleaved by proteases. Such cleavage may result in activation, inactivation, or complete degradation of the protein. Proteases include serine proteases, cysteine proteases, aspartic proteases, and metalloproteases. Signal peptidase in the endoplasmic reticulum
(ER) lumen cleaves the signal peptide from membrane or secretory proteins that are imported into the ER. Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major
pathway for the degradation of ceUular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein. Proteins involved in the UCS include ubiquitin-activating enzyme, ubiquitin-conjugating enzymes, ubiquitin-ligases, and ubiquitin C-terminal hydrolases. The ubiquitinated protein is then recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. Lipid Metabolism Lipids are water-insoluble, oily or greasy substances that are soluble in nonpolar solvents such as chloroform or ether. Neutral fats (triacylglycerols) serve as major fuels and energy stores. Polar lipids, such as phospholipids, sphingolipids, glycolipids, and cholesterol, are key structural components of cell membranes.
Lipid metabolism is involved in human diseases and disorders. In the arterial disease atherosclerosis, fatty lesions form on the inside of the arterial wall. These lesions promote the loss of arterial flexibility and the formation of blood clots (Guyton, A.C Textbook of Medical Physiology (1991) W.B. Saunders Company, Philadelphia PA, pp.760-763). In Tay-Sachs disease, the GM2 ganglioside (a sphingolipid) accumulates in lysosomes of the central nervous system due to a lack of the enzyme N-acetylhexosaminidase. Patients suffer nervous system degeneration leading to early death (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY, p. 2171). The Niemann-Pick diseases are caused by defects in lipid metabolism. Niemann-Pick diseases types A and B are caused by accumulation of sphingomyelin (a sphingolipid) and other lipids in the central nervous system due to a defect in the enzyme sphingomyelinase, leading to neurodegeneration and lung disease. Niemann-Pick disease type C results from a defect in cholesterol transport, leading to the accumulation of sphingomyelin and cholesterol in lysosomes and a secondary reduction in sphingomyelinase activity. Neurological symptoms such as grand mal seizures, ataxia, and loss of previously learned speech, manifest 1-2 years after birth. A mutation in the NPC protein, which contains a putative cholesterol-sensing domain, was found in a mouse model of Niemann-Pick disease type C (Fauci, supra, p. 2175; Loftus, S.K. et al. (1997) Science 277:232- 235). (Lipid metabolism is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY; Lehninger, A. (1982) Principles of Biochemistry Worth Publishers, Inc., New York NY; and ExPASy "Biochemical Pathways" index of Boehringer Mannheim World Wide Web site.) Fatty Acid Synthesis
Fatty acids are long-chain organic acids with a single carboxyl group and a long non-polar hydrocarbon tail. Long-chain fatty acids are essential components of glycolipids, phospholipids, and
cholesterol, which are building blocks for biological membranes, and of triglycerides, which are biological fuel molecules. Long-chain fatty acids are also substrates for eicosanoid production, and are important in the functional modification of certain complex carbohydrates and proteins. 16- carbon and 18-carbon fatty acids are the most common. Fatty acid synthesis occurs in the cytoplasm. In the first step, acetyl-Coenzyme A (CoA) carboxylase (ACC) synthesizes malonyl-CoA from acetyl-CoA and bicarbonate. The enzymes which catalyze the remaining reactions are covalently linked into a single polypeptide chain, referred to as the multifunctional enzyme fatty acid synthase (FAS). FAS catalyzes the synthesis of palmitate from acetyl-CoA and malonyl-CoA. FAS contains acetyl transferase, malonyl transferase, β-ketoacetyl synthase, acyl carrier protein, β-ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase activities. The final product of the FAS reaction is the 16-carbon fatty acid palmitate. Further elongation, as well as unsaturation, of palmitate by accessory enzymes of the ER produces the variety of long chain fatty acids required by the individual cell. These enzymes include a NADH-cytochrome b5 reductase, cytochrome b5, and a desaturase. Phospholipid and Triacylglycerol Synthesis
Triacylglycerols, also known as triglycerides and neutral fats, are major energy stores in animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3-phosphate is produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from glycerol by glycerol kinase. Fatty acid-CoA's are produced from fatty acids by fatty acyl-CoA synthetases. Glyercol-3-phosphate is acylated with two fatty acyl-CoA's by the enzyme glycerol phosphate acyltransferase to give phosphatidate. Phosphatidate phosphatase converts phosphatidate to diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride acyltransferase. Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol synthetase complex bound to the ER membrane. A major class of phospholipids are the phosphoglycerides, which are composed of a glycerol backbone, two fatty acid chains, and a phosphorylated alcohol. Phosphoglycerides are components of cell membranes. Principal phosphoglycerides are phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol. Many enzymes involved in phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology and Biotechnology, VCH Publishers Inc., New York NY, pp.494-501). Phosphatidate is converted to CDP-diacylglycerol by the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 2.7.7.41). Transfer of the diacylglycerol group from CDP-diacylglycerol to serine to yield phosphatidyl serine, or to inositol to yield phosphatidyl inositol, is catalyzed by the enzymes CDP- diacylglycerol-serine O-phosphatidyltransferase and CDP-diacylglycerol-inositol 3- phosphatidyltransferase, respectively (ExPASy ENZYME EC 2.7.8.8; ExPASy ENZYME EC
2.7.8.11). The enzyme phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl ethanolamine, using a pyruvate cofactor (Voelker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244). Phosphatidyl choline is formed using diet-derived choline by the reaction of CDP-choline with 1,2-diacylglycerol, catalyzed by diacylglycerol cholinephosphotransferase (ExPASy ENZYME 2.7.8.2). Sterol, Steroid, and Isoprenoid Metabolism
Cholesterol, composed of four fused hydrocarbon rings with an alcohol at one end, moderates the fluidity of membranes in which it is incorporated. In addition, cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, estrogen, and testosterone. Bile salts derived from cholesterol facilitate the digestion of lipids. Cholesterol in the skin forms a barrier that prevents excess water evaporation from the body. Farnesyl and geranylgeranyl groups, which are derived from cholesterol biosynthesis intermediates, are post-translationally added to signal transduction proteins such as ras and protein-targeting proteins such as rab. These modifications are important for the activities of these proteins (Guyton, supra; Stryer, supra, pp. 279-280, 691-702, 934). Mammals obtain cholesterol derived from both de novo biosynthesis and the diet. The liver is the major site of cholesterol biosynthesis in mammals. Two acetyl-CoA molecules initially condense to form acetoacetyl-CoA, catalyzed by a thiolase. Acetoacetyl-CoA condenses with a third acetyl-CoA to form hydroxymethylglutaryl-CoA (HMG-CoA), catalyzed by HMG-CoA synthase. Conversion of HMG-CoA to cholesterol is accomplished via a series of enzymatic steps known as the mevalonate pathway. The rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG- CoA reductase. The drug lovastatin, a potent inhibitor of HMG-CoA reductase, is given to patients to reduce their serum cholesterol levels. Other mevalonate pathway enzymes include mevalonate kinase, phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase, dimethylallyl transferase, geranyl transferase, farnesyl-diphosphate farnesyltransferase, squalene monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholesterol reductase.
Cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, aldosterone, estrogen, and testosterone. First, cholesterol is converted to pregnenolone by cholesterol monooxygenases. The other steroid hormones are synthesized from pregnenolone by a series of enzyme-catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and demethylations. Examples of these enzymes include steroid Δ-isomerase, 3β-hydroxy-Δ5-steroid dehydrogenase, steroid 21 -monooxygenase, steroid 19-hydroxylase, and 3 β-hydroxy steroid dehydrogenase. Cholesterol is also the precursor to vitamin D.
Numerous compounds contain 5-carbon isoprene units derived from the mevalonate pathway intermediate isopentenyl pyrophosphate. Isoprenoid groups are found in vitamin K, ubiquinone,
retinal, dolichol phosphate (a carrier of oligosaccharides needed for N-linked glycosylation), and farnesyl and geranylgeranyl groups that modify proteins. Enzymes involved include farnesyl transferase, polyprenyl transferases, dolichyl phosphatase, and dolichyl kinase. Sphingolipid Metabolism Sphingolipids are an important class of membrane lipids that contain sphingosine, a long chain amino alcohol. They are composed of one long-chain fatty acid, one polar head alcohol, and sphingosine or sphingosine derivative. The three classes of sphingolipids are sphingomyelins, cerebrosides, and gangliosides. Sphingomyelins, which contain phosphocholine or phosphoethanolamine as their head group, are abundant in the myelin sheath surrounding nerve cells. Galactocerebrosides, which contain a glucose or galactose head group, are characteristic of the brain. Other cerebrosides are found in nonneural tissues. Gangliosides, whose head groups contain multiple sugar units, are abundant in the brain, but are also found in nonneural tissues.
Sphingolipids are built on a sphingosine backbone. Sphingosine is acylated to ceramide by the enzyme sphingosine acetyltransferase. Ceramide and phosphatidyl choline are converted to sphingomyelin by the enzyme ceramide choline phosphottansf erase. Cerebrosides are synthesized by the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues to ceramide by transferase enzymes yields gangliosides. Eicosanoid Metabolism
Eicosanoids, including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated from phospholipids by phospholipase A2 and from diacylglycerols by diacylglycerol lipase. Leukotrienes are produced from arachidonate by the action of lipoxygenases. Prostaglandin synthase, reductases, and isomerases are responsible for the synthesis of the prostaglandins. Prostaglandins have roles in inflammation, blood flow, ion transport, synaptic transmission, and sleep. Prostacyclin and the thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and thromboxane synthases, respectively. Ketone Body Metabolism
Pairs of acetyl-CoA molecules derived from fatty acid oxidation in the liver can condense to form acetoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone. These three products are known as ketone bodies. Enzymes involved in ketone body metabolism include HMG-CoA synthetase, HMG-CoA cleavage enzyme, D-3-hydroxybutyrate dehydrogenase, acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase. Ketone bodies are a normal fuel supply of the heart and renal cortex. Acetoacetate produced by the liver is transported to cells where the acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle. In times of starvation,
ketone bodies produced from stored triacylglyerols become an important fuel source, especially for the brain. Abnormally high levels of ketone bodies are observed in diabetics. Diabetic coma can result if ketone body levels become too great. Lipid Mobilization Within cells, fatty acids are transported by cytoplasmic fatty acid binding proteins (Online
Mendelian Inheritance in Man (OMIM) *134650 Fatty Acid-Binding Protein 1, Liver; FABP1). Diazepam binding inhibitor (DBl), also known as endozepine and acyl CoA-binding protein, is an endogenous γ-aminobutyric acid (GABA) receptor ligand which is thought to down-regulate the effects of GABA. DBl binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters (OMIM * 125950 Diazepam Binding Inhibitor; DBl; PROSITE PDOC00686 Acyl-CoA-binding protein signature).
Fat stored in liver and adipose triglycerides may be released by hydrolysis and transported in the blood. Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol esters in the blood are transported in lipoprotein particles. The particles consist of a core of hydrophobic lipids surrounded by a shell of polar lipids and apolipoproteins. The protein components serve in the solubilization of hydrophobic lipids and also contain cell-targeting signals. Lipoproteins include chylomicrons, chylomicron remnants, very-low-density lipoproteins (VLDL), intermediate- density lipoproteins (DDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL). There is a strong inverse correlation between the levels of plasma HDL and risk of premature coronary heart disease.
Triacylglycerols in chylomicrons and VLDL are hydrolyzed by lipoprotein Upases that line blood vessels in muscle and other tissues that use fatty acids. Cell surface LDL receptors bind LDL particles which are then internalized by endocytosis. Absence of the LDL receptor, the cause of the disease familial hypercholesterolemia, leads to increased plasma cholesterol levels and ultimately to atherosclerosis. Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters from HDL to apolipoprotein B-containing lipoproteins. Cholesteryl ester transfer protein is important in the reverse cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. et al. (1997) Curr. Opin. Lipidol. 8: 101-110). Macrophage scavenger receptors, which bind and internalize modified lipoproteins, play a role in lipid transport and may contribute to atherosclerosis (Greaves, D.R. et al. (1998) Curr. Opin. Lipidol. 9:425-432).
Proteins involved in cholesterol uptake and biosynthesis are tightly regulated in response to cellular cholesterol levels. The sterol regulatory element binding protein (SREBP) is a sterol- responsive transcription factor. Under normal cholesterol conditions, SREBP resides in the ER membrane. When cholesterol levels are low, a regulated cleavage of SREBP occurs which releases the extracellular domain of the protein. This cleaved domain is then transported to the nucleus where
it activates the transcription of the LDL receptor gene, and genes encoding enzymes of cholesterol synthesis, by binding the sterol regulatory element (SRE) upstream of the genes (Yang, J. et al. (1995) J. Biol. Chem. 270:12152-12161). Regulation of cholesterol uptake and biosynthesis also occurs via the oxysterol-binding protein (OSBP). OSBP is a high-affinity intracellular receptor for a variety of oxysterols that down-regulate cholesterol synthesis and stimulate cholesterol esterification (Lagace, T.A. et al. (1997) Biochem. J. 326:205-213). Beta-oxidation
Mitochondrial and peroxisomal beta-oxidation enzymes degrade saturated and unsaturated fatty acids by sequential removal of two-carbon units from CoA-activated fatty acids. The main beta- oxidation pathway degrades both saturated and unsaturated fatty acids while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
The pathways of mitochondrial and peroxisomal beta-oxidation use similar enzymes, but have different substrate specificities and functions. Mitochondria oxidize short-, medium-, and long- chain fatty acids to produce energy for cells. Mitochondrial beta-oxidation is a major energy source for cardiac and skeletal muscle. In liver, it provides ketone bodies to the peripheral circulation when glucose levels are low as in starvation, endurance exercise, and diabetes (Eaton, S. et al. (1996) Biochem. J. 320:345-357). Peroxisomes oxidize medium-, long-, and very-long-chain fatty acids, dicarboxylic fatty acids, branched fatty acids, prostaglandins, xenobiotics, and bile acid intermediates. The chief roles of peroxisomal beta-oxidation are to shorten toxic lipophilic carboxylic acids to facilitate their excretion and to shorten very-long-chain fatty acids prior to mitochondrial beta-oxidation (Mannaerts, G.P. and P.P. van Veldhoven (1993) Biochimie 75:147- 158).
Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogenase, β- ketothiolase, 2,4-dienoyl CoA reductase, and isomerase. Lipid Cleavage and Degradation
Triglycerides are hydrolyzed to fatty acids and glycerol by Upases. Lysophospholipases (LPLs) are widely distributed enzymes that metabolize intracellular lipids, and occur in numerous isoforms. Small isoforms, approximately 15-30 kD, function as hydrolases; large isoforms, those exceeding 60 kD, function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes when it is formed or imported into a cell. LPLs are regulated by lipid factors including acylcarnitine, arachidonic acid, and phosphatidic acid. These lipid factors are signaling molecules important in numerous pathways, including the inflammatory response. (Anderson, R. et al. (1994) Toxicol. Appl. Pharmacol. 125:176-183; Selle, H. et al. (1993); Eur. J. Biochem. 212:411*416.)
The secretory phospholipase A2 (PLA2) superfamily comprises a number of heterogeneous enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of phosphoglycerides. Hydrolysis of the glycerophospholipids releases free fatty acids and lysophospholipids. PLA2 activity generates precursors for the biosynthesis of biologically active lipids, hydroxy fatty acids, and platelet-activating factor. PLA2 hydrolysis of the sn-2 ester bond in phospholipids generates free fatty acids, such as arachidonic acid and lysophospholipids. Carbon and Carbohydrate Metabolism
Carbohydrates, including sugars or saccharides, starch, and cellulose, are aldehyde or ketone compounds with multiple hydroxyl groups. The importance of carbohydrate metabolism is demonstrated by the sensitive regulatory system in place for maintenance of blood glucose levels. Two pancreatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and increased glucose release from cells, respectively. Carbohydrates have three important roles in mammalian cells. First, carbohydrates are used as energy stores, fuels, and metabolic intermediates. Carbohydrates are broken down to form energy in glycolysis and are stored as glycogen for later use. Second, the sugars deoxyribose and ribose form part of the structural support of DNA and RNA, respectively. Third, carbohydrate modifications are added to secreted and membrane proteins and lipids as they traverse the secretory pathway. Cell surface carbohydrate- containing macromolecules, including glycoproteins, glycolipids, and transmembrane proteoglycans, mediate adhesion with other cells and with components of the extracellular matrix. The extracellular matrix is comprised of diverse glycoproteins, glycosaminoglycans (GAGs), and carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the cell surface. The interaction of the cell with the surrounding matrix profoundly influences cell shape, strength, flexibility, motility, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proliferation and differentiation, tissue construction, and embryonic development.
Carbohydrate metabolism is altered in several disorders including diabetes mellitus, hyperglycemia, hypoglycemia, galactosemia, galactokinase deficiency, and UDP-galactose-4- epimerase deficiency (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine, McGraw- Hill, New York NY, pp. 2208-2209). Altered carbohydrate metabolism is associated with cancer. Reduced GAG and proteoglycan expression is associated with human lung carcinomas (Nackaerts, K. et al. (1997) Int. J. Cancer 74:335-345). The carbohydrate determinants sialyl Lewis A and sialyl Lewis X are frequently expressed on human cancer cells (Kannagi, R. (1997) Glycoconj. J. 14:577- 584). Alterations of the N-linked carbohydrate core structure of cell surface glycoproteins are linked to colon and pancreatic cancers (Schwarz, R.E. et al. (1996) Cancer Lett. 107:285-291). Reduced expression of the Sda blood group carbohydrate structure in cell surface glycolipids and glycoproteins
is observed in gastrointestinal cancer (Dohi, T. et al. (1996) Int. J. Cancer 67:626-663). (Carbon and carbohydrate metabolism is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY; Lehninger, A.L. (1982) Principles of Biochemistry Worth Publishers Inc., New York NY; and Lodish, H. et al. (1995) Molecular Cell Biology Scientific American Books, New York NY.) Glvcolvsis
Enzymes of the glycolytic pathway convert the sugar glucose to pyruvate while simultaneously producing ATP. The pathway also provides building blocks for the synthesis of cellular components such as long-chain fatty acids. After glycolysis, pyrvuate is converted to acetyl- Coenzyme A, which, in aerobic organisms, enters the citric acid cycle. Glycolytic enzymes include hexokinase, phosphoglucose isomerase, phosphofructokinase, aldolase, triose phosphate isomerase, glyceraldehyde 3-phosρhate dehydrogenase, phosphoglycerate kinase, phosphoglyceromutase, enolase, and pyruvate kinase. Of these, phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of glycolysis. Gluconeogenesis Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate and amino acids. The pathway, which functions mainly in times of starvation and intense exercise, occurs mostly in the liver and kidney. Responsible enzymes include pyruvate carboxylase, phosphoenolpyruvate carboxykinase, fructose 1,6-bisphosphatase, and glucose-6-phosphatase. Pentose Phosphate Pathway Pentose phosphate pathway enzymes are responsible for generating the reducing agent
NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5-phosphate. Ribose-5- phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme A, NAD+, FAD, RNA, and DNA. The pentose phosphate pathway has both oxidative and non- oxidative branches. The oxidative branch steps, which are catalyzed by the enzymes glucose-6- phosphate dehydrogenase, lactonase, and 6-phosphogluconate dehydrogenase, convert glucose-6- phosphate and NADP+ to ribulose-6-phosphate and NADPH. The non-oxidative branch steps, which are catalyzed by the enzymes phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase, allow the interconversion of three-, four-, five-, six-, and seven-carbon sugars. Glucouronate Metabolism Glucuronate is a monosaccharide which, in the form of D-glucuronic acid, is found in the
GAGs chondroitin and dermatan. D-glucuronic acid is also important in the detoxification and excretion of foreign organic compounds such as phenol. Enzymes involved in glucuronate metabolism include UDP-glucose dehydrogenase and glucuronate reductase. Disaccharide Metabolism Disaccharides must be hydrolyzed to monosaccharides to be digested. Lactose, a
disaccharide found in milk, is hydrolyzed to galactose and glucose by the enzyme lactase. Maltose is derived from plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in insects and mushrooms, is hydrolyzed to glucose by the enzyme trehalase (OMIM *275360 Trehalase; Ruf, J. et al. (1990) J. Biol. Chem. 265: 15034-15039). Lactase, maltase, sucrase, and trehalase are bound to mucosal cells lining the small intestine, where they participate in the digestion of dietary disaccharides. The enzyme lactose synthetase, composed of the catalytic subunit galactosyltransferase and the modifier subunit α-lactalbumin, converts UDP- galactose and glucose to lactose in the mammary glands. Glycogen, Starch, and Chitin Metabolism
Glycogen is the storage form of carbohydrates in mammals. Mobilization of glycogen maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly in the liver and in skeletal muscle in the form of cytoplasmic granules. These granules contain enzymes that catalyze the synthesis and degradation of glycogen, as well as enzymes that regulate these processes. Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a transferase, α-l,6-glucosidase, and phosphoglucomutase. Enzymes that catalyze the synthesis of glycogen include UDP-glucose pyrophosphorylase, glycogen synthetase, a branching enzyme, and nucleoside diphosphokinase. The enzymes of glycogen synthesis and degradation are tightly regulated by the hormones insulin, glucagon, and epinephrine. Starch, a plant-derived polysaccharide, is hydrolyzed to maltose, maltotriose, and α-dextrin by α-amylase, an enzyme secreted by the salivary glands and pancreas. Chitin is a polysaccharide found in insects and Crustacea. A chitotriosidase is secreted by macrophages and may play a role in the degradation of chitin-containing pathogens (Boot, R.G. et al. (1995) J. Biol. Chem. 270:26252-26256). Peptidoglvcans and Glycosaminoglycans Glycosaminoglycans (GAGs) are anionic linear unbranched polysaccharides composed of repetitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed of a core protein attached to one or more GAGs. GAGs are found on the cell surface, inside cells, and in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases including autoimmune thyroid disease, autoimmune diabetes mellitus, and systemic lupus erythematosus (Hansen, C. et al. (1996) Clin. Exp. Rheum. 14 (Suppl. 15):S59-S67). GAGs include chondroitin sulfate, keratan sulfate, heparin, heparan sulfate, dermatan sulfate, and hyaluronan.
The GAG hyaluronan (HA) is found in the extracellular matrix of many cells, especially in soft connective tissues, and is abundant in synovial fluid (Pitsillides, A.A. et al. (1993) Int. J. Exp.
Pathol. 74:27-34). HA seems to play important roles in cell regulation, development, and differentiation (Laurent, T.C and J.R. Fraser (1992) FASEB J. 6:2397-2404). Hyaluronidase is an enzyme that degrades HA to oligosaccharides. Hyaluronidases may function in cell adhesion, infection, angiogenesis, signal transduction, reproduction, cancer, and inflammation. Proteoglycans, also known as peptidoglycans, are found in the extracellular matrix of connective tissues such as cartilage and are essential for distributing the load in weight-bearing joints. Cell-surface-attached proteoglycans anchor cells to the extracellular matrix. Both extracellular and cell-surface proteoglycans bind growth factors, facilitating their binding to cell-surface receptors and subsequent triggering of signal transduction pathways. Amino Acid and Nitrogen Metabolism
NH4 + is assimilated into amino acids by the actions of two enzymes, glutamate dehydrogenase and glutamine synthetase. The carbon skeletons of amino acids come from the intermediates of glycolysis, the pentose phosphate pathway, or the citric acid cycle. Of the twenty amino acids used in proteins, humans can synthesize only thirteen (nonessential amino acids). The remaining nine must come from the diet (essential amino acids). Enzymes involved in nonessential amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, adenosylhomocysteinase, cystathionine β-synthase, cystathionine γ-lyase, phosphoglycerate dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine hydroxylmethyltransferase, and glycine synthase.
Metabolism of amino acids takes place almost entirely in the liver, where the amino group is removed by aminotransferases (transaminases), for example, alanine aminotransferase. The amino group is transferred to α-ketoglutarate to form glutamate. Glutamate dehydrogenase converts glutamate to NH4 + and α-ketoglutarate. NH4 +is converted to urea by the urea cycle which is catalyzed by the enzymes arginase, ornithine ttanscarbamoylase, arginosuccinate synthetase, and arginosuccinase. Carbamoyl phosphate synthetase is also involved in urea formation. Enzymes involved in the metabolism of the carbon skeleton of amino acids include serine dehydratase, asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain α-keto dehydrogenase complex, isovaleryl CoA dehydrogenase, β-methylcrotonyl CoA carboxylase, phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase. Polyamines, which include spermidine, puttescine, and spermine, bind tightly to nucleic acids and are abundant in rapidly proliferating cells. Enzymes involved in polyamine synthesis include ornithine decarboxylase.
Diseases involved in amino acid and nitrogen metabolism include hyperammonemia, carbamoyl phosphate synthetase deficiency, urea cycle enzyme deficiencies, methyhnalonic aciduria,
maple syrup disease, alcaptonuria, and phenylketonuria. Energy Metabolism
Cells derive energy from metabolism of ingested compounds that may be roughly categorized as carbohydrates, fats, or proteins. Energy is also stored in polymers such as triglycerides (fats) and glycogen (carbohydrates). Metabolism proceeds along separate reaction pathways connected by key intermediates such as acetyl coenzyme A (acetyl-CoA). Metabolic pathways feature anaerobic and aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine (GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the triphosphate drives reactions needed for cell maintenance, growth, and proliferation.
Digestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are converted in the liver to glucose. Enzymes involved in these conversions include galactose- 1- phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts glucose to pyruvate in a series of reactions coupled to ATP synthesis. Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate • synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH,
FADH^ and GTP. In oxidative phosphorylation, the transport of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P- by the F fj ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the FQFJ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome cl5 FeS protein, and cytochrome c oxidase.
Triglycerides are hydrolyzed to fatty acids and glycerol by Upases. Glycerol is then phosphorylated to glycerol-3-phosphate by glycerol kinase and glycerol phosphate dehydrogenase, and degraded by the glycolysis. Fatty acids are transported into the mitochondria as fatty acyl- carnitine esters and undergo oxidative degradation.
In addition to metabolic disorders such as diabetes and obesity, disorders of energy metabolism are associated with cancers (Dorward, A. et al. (1997) J. Bioenerg. Biomembr. 29:385- 392), autism (Lombard, J. (1998) Med. Hypotheses 50:497-500), neurodegenerative disorders (Alexi, T. et al. (1998) Neuroreport 9:R57-64), and neuromuscular disorders (DiMauro, S. et al. (1998) Biochim. Biophys. Acta 1366: 199-210). The myocardium is heavily dependent on oxidative
metabolism, so metabolic dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol. 13:190-197).
For a review of energy metabolism enzymes and inteπnediates, see Stryer, L. et al. (1995) Biochemistry. W.H. Freeman and Co., San Francisco CA, pp. 443-652. For a review of energy metaboUsm regulation, see Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, pp. 744-770. Cofactor Metabolism
Cofactors, including coenzymes and prosthetic groups, are small molecular weight inorganic or organic compounds that are required for the action of an enzyme. Many cofactors contain vitamins as a component. Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, tetrahydrofolate, lipoamide, and heme. The vitamins biotin and cobalamin are associated with enzymes as well. Heme, a prosthetic group found in myoglobin and hemoglobin, consists of protoporphyrin group bound to iron. Porphyrin groups contain four substituted pyrroles covalently joined in a ring, often with a bound metal atom. Enzymes involved in porphyrin synthesis include δ- aminolevulinate synthase, δ-aminolevulinate dehydrase, porphobilinogen deaminase, and cosynthase. Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte turnover. Enzymes involved in heme degradation include heme oxygenase and biliverdin reductase. Iron is a required cofactor for many enzymes. Besides the heme-containing enzymes, iron is found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q reductase. Iron is transported in the blood by the protein ttansferrin. Binding of ttansferrin to the ttansferrin receptor on cell surfaces allows uptake by receptor mediated endocytosis. Cytosolic iron is bound to ferritin protein.
A molybdenum-containing cofactor (molybdopterin) is found in enzymes including sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental retardation and lens dislocation. Other diseases caused by defects in cofactor metabolism include pernicious anemia and methyhnalonic aciduria. Secretion and Trafficking Eukaryotic cells are bound by a lipid bilayer membrane and subdivided into functionally distinct, membrane bound compartments. The membranes maintain the essential differences between the cytosol, the extracellular environment, and the lumenal space of each intracellular organelle. As lipid membranes are highly impermeable to most polar molecules, transport of essential nutrients, metabolic waste products, cell signaling molecules, macromolecules and proteins across lipid membranes and between organelles must be mediated by a variety of ttansport-associated molecules.
Protein Trafficking
In eukaryotes, some proteins are synthesized on ER-bound ribosomes, co-ttanslationally imported into the ER, delivered from the ER to the Golgi complex for post-translational processing and sorting, and transported from the Golgi to specific intracellular and extracellular destinations. All cells possess a constitutive transport process which maintains homeostasis between the cell and its environment. In many differentiated cell types, the basic machinery is modified to carry out specific transport functions. For example, in endocrine glands, hormones and other secreted proteins are packaged into secretory granules for regulated exocytosis to the cell exterior. In macrophage, foreign extracellular material is engulfed (phagocytosis) and delivered to lysosomes for degradation. In fat and muscle cells, glucose transporters are stored in vesicles which fuse with the plasma membrane only in response to insulin stimulation. The Secretory Pathway
Synthesis of most integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organelle occurs on ER-bound ribosomes. These proteins are co-translationally imported into the ER. The proteins leave the ER via membrane-bound vesicles which bud off the ER at specific sites and fuse with each other (homotypic fusion) to form the ER-Golgi Intermediate Compartment (ERGIC). The ERGIC matures progressively through the cis, medial, and trans asternal stacks of the Golgi, modifying the enzyme composition by retrograde transport of specific Golgi enzymes. In this way, proteins moving through the Golgi undergo post-translational modification, such as glycosylation. The final Golgi compartment is the Trans-Golgi Network
(TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport vesicles destined for intracellular compartments, such as the lysosome, bud off the TGN. What remains is a secretory vesicle which contains proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Secretory vesicles eventually fuse with the plasma membrane (Glick, B.S. and V. Malhotra (1998) Cell 95:883-889).
The secretory process can be constitutive or regulated. Most cells have a constitutive pathway for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse with the plasma membrane. In many cells, such as endocrine cells, digestive cells, and neurons, vesicle pools derived from the TGN collect in the cytoplasm and do not fuse with the plasma membrane until they are directed to by a specific signal. Endocytosis
Endocytosis, wherein cells internalize material from the extracellular environment, is essential for transmission of neuronal, metabolic, and proliferative signals; uptake of many essential nutrients; and defense against invading organisms. Most cells exhibit two forms of endocytosis. The
first, phagocytosis, is an actin-driven process exemplified in macrophage and neutrophils. Material to be endocytosed contacts numerous cell surface receptors which stimulate the plasma membrane to extend and surround the particle, enclosing it in a membrane-bound phagosome. In the mammalian immune system, IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes. Activation of the Fc receptors initiates a signal cascade involving src-family cytosolic kinases and the monomeric GTP-binding (G) protein Rho. The resulting actin reorganization leads to phagocytosis of the particle. This process is an important component of the humoral immune response, allowing the processing and presentation of bacterial-derived peptides to antigen-specific T-lymphocytes.
The second form of endocytosis, pinocytosis, is a more generalized uptake of material from the external milieu. Like phagocytosis, pinocytosis is activated by ligand binding to cell surface receptors. Activation of individual receptors stimulates an internal response that includes coalescence of the receptor-ligand complexes and formation of clathrin-coated pits, hrvagination of the plasma membrane at clathrin-coated pits produces an endocytic vesicle within the cell cytoplasm. These vesicles undergo homotypic fusion to form an early endosomal (EE) compartment. The tubulovesicular EE serves as a sorting site for incoming material. ATP-driven proton pumps in the EE membrane lowers the pH of the EE lumen (pH 6.3-6.8). The acidic environment causes many ligands to dissociate from their receptors. The receptors, along with membrane and other integral membrane proteins, are recycled back to the plasma membrane by budding off the tubular extensions of the EE in recycling vesicles (RV). This selective removal of recycled components produces a carrier vesicle containing ligand and other material from the external environment. The carrier vesicle fuses with TGN-derived vesicles which contain hydrolytic enzymes. The acidic environment of the resulting late endosome (LE) activates the hydrolytic enzymes which degrade the ligands and other material. As digestion takes place, the LE fuses with the lysosome where digestion is completed (Mellman, I. (1996) Annu. Rev. Cell Dev. Biol. 12:575-625). Recycling vesicles may return directly to the plasma membrane. Receptors internalized and returned directly to the plasma membrane have a turnover rate of 2-3 minutes. Some RVs undergo microtubule-directed relocation to a perinuclear site, from which they then return to the plasma membrane. Receptors following this route have a turnover rate of 5-10 minutes. Still other RVs are retained within the cell until an appropriate signal is received (Mellman, supra; and James, D.E. et al. (1994) Trends Cell Biol. 4: 120-126). Vesicle Formation
Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles. Specifically, vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. The process begins with the budding of a vesicle out
of the donor membrane. The membrane-bound vesicle contains proteins to be transported and is surrounded by a protective coat made up of protein subunits recruited from the cytosol. The initial budding and coating processes are controlled by a cytosolic ras-like GTP-binding protein, ADP- ribosylating factor (Arf), and adapter proteins (AP). Different isoforms of both Arf and AP are involved at different sites of budding. Another small G-protein, dynamin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to accomplish the final step of the budding process. The coated vesicle complex is then transported through the cytosol. During the transport process, Arf-bound GTP is hydrolyzed to GDP and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254). Two different classes of coat protein have also been identified. Clathrin coats form on the TGN and PM surfaces, whereas coatomer or COP coats form on the ER and Golgi. COP coats can further be distinguished as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPD, involved in anterograde traffic from the ER to the Golgi (Mellman, supra). The COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. (Harter, C and FT. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654.) Membrane Fusion
Transport vesicles undergo homotypic or heterotypic fusion in the secretory and endocytotic pathways. Molecules required for appropriate targeting and fusion of vesicles with their target membrane include proteins incorporated in the vesicle membrane, the target membrane, and proteins recruited from the cytosol. During budding of the vesicle from the donor compartment, an integral membrane protein, VAMP (vesicle-associated membrane protein) is incorporated into the vesicle. Soon after the vesicle uncoats, a cytosolic prenylated GTP-binding protein, Rab (a member of the Ras superfamily), is inserted into the vesicle membrane. GTP-bound Rab proteins are directed into nascent transport vesicles where they interact with VAMP. Following vesicle transport, GTPase activating proteins (GAPs) in the target membrane convert Rab proteins to the GDP-bound form. A cytosolic protein, guanine-nucleotide dissociation inhibitor (GDI) helps return GDP-bound Rab proteins to their membrane of origin. Several Rab isoforms have been identified and appear to associate with specific compartments within the cell. Rab proteins appear to play a role in mediating the function of a viral gene, Rev, which is essential for replication of HIV-1, the virus responsible for AIDS (Flavell, R.A. et al. (1996) Proc. Natl. Acad. Sci. USA 93:4421-4424).
Docking of the transport vesicle with the target membrane involves the formation of a complex between the vesicle SNAP receptor (v-SNARE), target membrane (t-) SNAREs, and certain other membrane and cytosolic proteins. Many of these other proteins have been identified although their exact functions in the docking complex remain uncertain (Tellam, J.T. et al. (1995) J. Biol.
Chem. 270:5857-5863; andHata, Y. and T.C Sudhof (1995) J. Biol. Chem. 270:13022-13028). N-ethylmaleimide sensitive factor (NSF) and soluble NSF-attachment protein (α-SNAP and β-SNAP) are two such proteins that are conserved from yeast to man and function in most intracellular membrane fusion reactions. Seel represents a family of yeast proteins that function at many different stages in the secretory pathway including membrane fusion. Recently, mammalian homologs of Seel, called Munc-18 proteins, have been identified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Hata et al. supra).
The SNARE complex involves three SNARE molecules, one in the vesicular membrane and two in the target membrane. Synaptotagmin is an integral membrane protein in the synaptic vesicle which associates with the t-SNARE syntaxin in the docking complex. Synaptotagmin binds calcium in a complex with negatively charged phospholipids, which allows the cytosolic SNAP protein to displace synaptotagmin from syntaxin and fusion to occur. Thus, synaptotagmin is a negative regulator of fusion in the neuron (Littleton, J.T. et al. (1993) Cell 74:1125-1134). The most abundant membrane protein of synaptic vesicles appears to be the glycoprotein synaptophysin, a 38 kDa protein with four transmembrane domains.
Specificity between a vesicle and its target is derived from the v-SNARE, t-SNAREs, and associated proteins involved. Different isoforms of SNAREs and Rabs show distinct cellular and subcellular distributions. VAMP-1/synaptobrevin, membrane-anchored synaptosome-associated protein of 25 kDa (SNAP-25), syntaxin-1, Rab3A, Rabl5, and Rab23 are predominantly expressed in the brain and nervous system. Different syntaxin, VAMP, and Rab proteins are associated with distinct subcellular compartments and their vesicular carriers. Nuclear Transport
Transport of proteins and RNA between the nucleus and the cytoplasm occurs through nuclear pore complexes (NPCs). NPC-mediated transport occurs in both directions through the nuclear envelope. All nuclear proteins are imported from the cytoplasm, their site of synthesis. tRNA and mRNA are exported from the nucleus, their site of synthesis, to the cytoplasm, their site of function. Processing of small nuclear RNAs involves export into the cytoplasm, assembly with proteins and modifications such as hypermethylation to produce small nuclear ribonuclear proteins (snRNPs), and subsequent import of the snRNPs back into the nucleus. The assembly of ribosomes requires the initial import of ribosomal proteins from the cytoplasm, their incorporation with RNA into ribosomal subunits, and export back to the cytoplasm. (Gorlich, D. and I.W. Mattaj (1996) Science 271:1513-1518.)
The transport of proteins and mRNAs across the NPC is selective, dependent on nuclear localization signals, and generally requires association with nuclear transport factors. Nuclear localization signals (NLS) consist of short stretches of amino acids enriched in basic residues. NLS
are found on proteins that are targeted to the nucleus, such as the glucocorticoid receptor. The NLS is recognized by the NLS receptor, importin, which then interacts with the monomeric GTP-binding protein Ran. This NLS protein/receptor/Ran complex navigates the nuclear pore with the help of the homodimeric protein nuclear transport factor 2 (NTF2). NTF2 binds the GDP-bound form of Ran and to multiple proteins of the nuclear pore complex containing FXFG repeat motifs, such as p62. (Paschal, B. et al. (1997) J. Biol. Chem. 272:21534-21539; and Wong, D.H. et al. (1997) Mol. Cell Biol. 17:3755-3767). Some proteins are dissociated before nuclear mRNAs are transported across the NPC while others are dissociated shortly after nuclear mRNA transport across the NPC and are reimported into the nucleus. Disease Correlation
The etiology of numerous human diseases and disorders can be attributed to defects in the transport or secretion of proteins. For example, abnormal hormonal secretion is linked to disorders such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid hormone), and Gushing' s and Addison's diseases (adrenocorticottopic hormone, ACTH). Moreover, cancer cells secrete excessive amounts of hormones or other biologically active peptides. Disorders related to excessive secretion of biologically active peptides by tumor cells include fasting hypoglycemia due to increased insulin secretion from insulinoma-islet cell tumors; hypertension due to increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal medulla and sympathetic paraganglia; and carcinoid syndrome, which is characterized by abdominal cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors. Biologically active peptides that are ectopically synthesized in and secreted from tumor cells include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung and bladder cancers); calcitonin (lung and breast cancers); and thyroid-stimulating hormone (medullary thyroid carcinoma). Such peptides may be useful as diagnostic markers for tumorigenesis (Schwartz, M.Z. (1997) Semin. Pediatt. Surg. 3:141-146; and Said, S.L and G.R. Faloona (1975) N. Engl. J. Med. 293:155-160).
Defective nuclear transport may play a role in cancer. The BRCA1 protein contains three potential NLSs which interact with importin alpha, and is transported into the nucleus by the importin/NPC pathway. In breast cancer cells the BRCA1 protein is aberrantly localized in the cytoplasm. The mislocation of the BRCA1 protein in breast cancer cells may be due to a defect in the NPC nuclear import pathway (Chen, CF. et al. (1996) J. Biol. Chem. 271:32863-32868).
It has been suggested that in some breast cancers, the tumor-suppressing activity of p53 is inactivated by the sequestration of the protein in the cytoplasm, away from its site of action in the cell nucleus. Cytoplasmic wild-type p53 was also found in human cervical carcinoma cell lines. (Moll,
U.M. et al. (1992) Proc. Natl. Acad. Sci. USA 89:7262-7266; and Liang, X.H. et al. (1993) Oncogene
8:2645-2652.)
Environmental Responses
Organisms respond to the environment by a number of pathways. Heat shock proteins, including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to cellular proteins.
Aquaporins (AQP) are channels that transport water and, in some cases, nonionic small solutes such as urea and glycerol. Water movement is important for a number of physiological processes including renal fluid filtration, aqueous humor generation in the eye, cerebrospinal fluid production in the brain, and appropriate hydration of the lung. Aquaporins are members of the major intrinsic protein (MfP) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev.
Physiol. 58:619-648; Ishibashi, K. et al. (1997) J. Biol. Chem. 272:20782-20786). The study of aquaporins may have relevance to understanding edema formation and fluid balance in both normal physiology and disease states (King, supra). Mutations in AQP2 cause autosomal recessive nephrogenic diabetes insipidus (OMIM * 107777 Aquaporin 2; AQP2). Reduced AQP4 expression in skeletal muscle may be associated with Duchenne muscular dystrophy (Frigeri, A. et al. (1998) J.
Clin. Invest. 102:695-703). Mutations in AQP0 cause autosomal dominant cataracts in the mouse
(OMEVI *154050 Major Intrinsic Protein of Lens Fiber; MIP).
The metallothioneins (MTs) are a group of small (61 amino acids), cysteine-rich proteins that bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in metal detoxification or the metabolism and homeostasis of metals. Arsenite-resistance proteins have been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997)
Mutat. Res. 386:307-314).
Humans respond to light and odors by specific protein pathways. Proteins involved in light perception include rhodopsin, ttansducin, and cGMP phosphodiesterase. Proteins involved in odor perception include multiple olfactory receptors. Other proteins are important in human Circadian rhythms and responses to wounds.
Immunity and Host Defense
All vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal and parasitic infections. Included in these systems are the processes of humoral immunity, the complement cascade and the inflammatory response (Paul, W.E.
(1993) Fundamental Immunology, Raven Press, Ltd., New York NY, pp.1-20).
The cellular components of the humoral immune system include six different types of leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, eosinophils, and basophils) and plasma cells. Additionally, fragments of megakaryocytes, a seventh
type of white blood cell in the bone marrow, occur in large numbers in the blood as platelets.
Leukocytes are formed from two stem cell lineages in bone marrow. The myeloid stem cell line produces granulocytes and monocytes and, the lymphoid stem ceU produces lymphocytes. Lymphoid cells travel to the thymus, spleen and lymph nodes, where they mature and differentiate into lymphocytes. Leukocytes are responsible for defending the body against invading pathogens. Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them . by phagocytosis. Monocytes enter tissues and differentiate into macrophages which are extremely phagocytic. Lymphocytes and plasma cells are a part of the immune system which recognizes specific foreign molecules and organisms and inactivates them, as well as signals other cells to attack the invaders.
Granulocytes and monocytes are formed and stored in the bone marrow until needed. Megakaryocytes are produced in bone marrow, where they fragment into platelets and are released into the bloodstream. The main function of platelets is to activate the blood clotting mechanism. Lymphocytes and plasma cells are produced in various lymphogenous organs, including the lymph nodes, spleen, thymus, and tonsils.
Both neutrophils and macrophages exhibit chemotaxis towards sites of inflammation. Tissue inflammation in response to pathogen invasion results in production of chemo-atttactants for leukocytes, such as endotoxins or other bacterial products, prostaglandins, and products of leukocytes or platelets. Basophils participate in the release of the chemicals involved in the inflammatory process.
The main function of basophils is secretion of these chemicals to such a degree that they have been referred to as "unicellular endocrine glands." A distinct aspect of basophilic secretion is that the contents of granules go directly into the extracellular environment, not into vacuoles as occurs with neutrophils, eosinophils and monocytes. Basophils have receptors for the Fc fragment of immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE with anti-IgE or other ligands triggers degranulation.
Eosinophils are bi- or multi-nucleated white blood cells which contain eosinophilic granules. Their plasma membrane is characterized by Ig receptors, particularly IgG and IgE. Generally, eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. They have specific functions in parasitic infections and allergic reactions, and are thought to detoxify some of the substances released by mast cells and basophils which cause inflammation. Additionally, they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation.
Macrophages are monocytes that have left the blood stream to settle in tissue. Once monocytes have migrated into tissues, they do not re-enter the bloodstream. The mononuclear phagocyte system is comprised of precursor ceUs in the bone marrow, monocytes in circulation, and
macrophages in tissues. The system is capable of very fast and extensive phagocytosis. A macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive for many more months. Macrophages are also capable of ingesting large particles, including red blood cells and malarial parasites. They increase several-fold in size and transform into macrophages that are characteristic of the tissue they have entered, surviving in tissues for several months.
Mononuclear phagocytes are essential in defending the body against invasion by foreign pathogens, particularly intracellular microorganisms such as M. tuberculosis, listeria, leishmania and toxoplasma. Macrophages can also control the growth of tumorous cells, via both phagocytosis and secretion of hydrolytic enzymes. Another important function of macrophages is that of processing antigen and presenting them in a biochemically modified form to lymphocytes.
The immune system responds to invading microorganisms in two major ways: antibody production and cell mediated responses. Antibodies are immunoglobulin proteins produced by B-lymphocytes which bind to specific antigens and cause inactivation or promote destruction of the antigen by other cells. Cell-mediated immune responses involve T-lymphocytes (T cells) that react with foreign antigen on the surface of infected host cells. Depending on the type of T cell, the infected cell is either killed or signals are secreted which activate macrophages and other cells to destroy the infected cell (Paul, supra).
T-lymphocytes originate in the bone marrow or liver in fetuses. Precursor cells migrate via the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is crucial because of positive and negative selection of T cells that wiU react with foreign antigen and not with self molecules. After processing, T cells continuously circulate in the blood and secondary lymphoid tissues, such as lymph nodes, spleen, certain epithelium-associated tissues in the gastrointestinal tract, respiratory tract and skin. When T-lymphocytes are presented with the complementary antigen, they are stimulated to proliferate and release large numbers of activated T cells into the lymph system and the blood system. These activated T cells can survive and circulate for several days. At the same time, T memory cells are created, which remain in the lymphoid tissue for months or years. Upon subsequent exposure to that specific antigen, these memory cells will respond more rapidly and with a stronger response than induced by the original antigen. This creates an "immunological memory" that can provide immunity for years. There are two major types of T cells: cytotoxic T cells destroy infected host cells, and helper
T ceUs activate other white blood cells via chemical signals. One class of helper cell, TH1, activates macrophages to destroy ingested microorganisms, while another, TH2, stimulates the production of antibodies by B cells.
Cytotoxic T cells directly attack the infected target cell. In virus-infected cells, peptides derived from viral proteins are generated by the proteasome. These peptides are transported into the
ER by the transporter associated with antigen processing (TAP) (Pamer, E. and P. Cresswell (1998) Annu. Rev. Immunol. 16:323-358). Once inside the ER, the peptides bind MHC I chains, and the peptide/MHC I complex is transported to the cell surface. Receptors on the surface of T cells bind to antigen presented on ceU surface MHC molecules. Once activated by binding to antigen, T cells secrete γ-interferon, a signal molecule that induces the expression of genes necessary for presenting viral (or other) antigens to cytotoxic T cells. Cytotoxic T cells kill the infected cell by stimulating programmed cell death.
Helper T cells constitute up to 75% of the total T cell population. They regulate the immune functions by producing a variety of lymphokines that act on other cells in the immune system and on bone marrow. Among these lymphokines are: interleukins-2,3,4,5,6; granulocyte-monocyte colony stimulating factor, and γ-interferon.
Helper T cells are required for most B cells to respond to antigen. When an activated helper cell contacts a B cell, its centrosome and Golgi apparatus become oriented toward the B cell, aiding the directing of signal molecules, such as transmembrane-bound protein called CD40 ligand, onto the B cell surface to interact with the CD40 transmembrane protein. Secreted signals also help B cells to proliferate and mature and, in some cases, to switch the class of antibody being produced.
B-lymphocytes (B cells) produce antibodies which react with specific antigenic proteins presented by pathogens. Once activated, B cells become filled with extensive rough endoplasmic reticulum and are known as plasma cells. As with T cells, interaction of B cells with antigen stimulates proliferation of only those B cells which produce antibody specific to that antigen. There are five classes of antibodies, known as immunoglobulins, which together comprise about 20% of total plasma protein. Each class mediates a characteristic biological response after antigen binding. Upon activation by specific antigen B ceUs switch from making membrane-bound antibody to secretion of that antibody. Antibodies, or immunoglobulins (Ig), are the founding members of the Ig superfamily and the central components of the humoral immune response. Antibodies are either expressed on the surface of B cells or secreted by B cells into the circulation. Antibodies bind and neutralize blood- borne foreign antigens. The prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules.
Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the α, δ, e, γ, and μ H-chain types. There are two types of L- chains, K and λ, either of which may associate as a pair with any H-chain pair. IgG, the most common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. In addition, H chains such as μ have been shown to associate with other polypeptides during differentiation of the B cell.
Antibodies can be described in terms of their two main functional domains. Antigen recognition is mediated by the Fab (antigen binding fragment) region of the antibody, while effector functions are mediated by the Fc (crystalUzable fragment) region. Binding of antibody to an antigen, such as a bacterium, triggers the destruction of the antigen by phagocytic white blood cells such as macrophages and neutrophils. These cells express surface receptors that specifically bind to the antibody Fc region and allow the phagocytic cells to engulf, ingest, and degrade the antibody-bound antigen. The Fc receptors expressed by phagocytic cells are single-pass transmembrane glycoproteins of about 300 to 400 amino acids (Sears, D.W. et al. (1990) J. Immunol. 144:371-378). The extracellular portion of the Fc receptor typically contains two or three Ig domains.
Diseases which cause over- or under-abundance of any one type of leukocyte usually result in the entire immune defense system becoming involved. A well-known autoimmune disease is AIDS (Acquired Immunodeficiency Syndrome) where the number of helper T cells is depleted, leaving the patient susceptible to infection by microorganisms and parasites. Another widespread medical condition attributable to the immune system is that of allergic reactions to certain antigens. Allergic reactions include: hay fever, asthma, anaphylaxis, and urticaria (hives). Leukemias are an excess production of white blood cells, to the point where a major portion of the body ' s metabolic resources are directed solely at proliferation of white blood cells, leaving other tissues to starve. Leukopenia or agranulocytosis occurs when the bone marrow stops producing white blood cells. This leaves the body unprotected against foreign microorganisms, including those which normally inhabit skin, mucous membranes, and gastrointestinal tract. If all white blood cell production stops completely, infection will occur within two days and death may follow only 1 to 4 days later.
Impaired phagocytosis occurs in several diseases, including monocytic leukemia, systemic lupus, and granulomatous disease. In such a situation, macrophages can phagocytize normally, but the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts oxygen to lethaUy reactive forms results in abscess formation in liver, lungs, spleen, lymph nodes, and beneath the skin. Eosinophilia is an excess of eosinophUs commonly observed in patients with aUergies (hay fever, asthma), allergic reactions to drugs, rheumatoid arthritis, and cancers (Hodgkin's disease, lung, and liver cancer) (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal
Medicine. McGraw-Hill, Inc., New York NY).
Host defense is further augmented by the complement system. The complement system serves as an effector system and is involved in infectious agent recognition. It can function as an independent immune network or in conjunction with other humoral immune responses. The complement system is comprised of numerous plasma and membrane proteins that act in a cascade of reaction sequences whereby one component activates the next. The result is a rapid and amplified response to infection through either an inflammatory response or increased phagocytosis.
The complement system has more than 30 protein components which can be divided into functional groupings including modified serine proteases, membrane-binding proteins and regulators of complement activation. Activation occurs through two different pathways the classical and the alternative. Both pathways serve to destroy infectious agents through distinct triggering mechanisms that eventually merge with the involvement of the component C3.
The classical pathway requires antibody binding to infectious agent antigens. The antibodies serve to define the target and initiate the complement system cascade, culminating in the destruction of the infectious agent. In this pathway, since the antibody guides initiation of the process, the complement can be seen as an effector arm of the humoral immune system.
The alternative pathway of the complement system does not require the presence of preexisting antibodies for targeting infectious agent destruction. Rather, this pathway, through low levels of an activated component, remains constantly primed and provides surveillance in the non- immune host to enable targeting and destruction of infectious agents. In this case foreign material triggers the cascade, thereby facilitating phagocytosis or lysis (Paul, supra, pp.918-919).
Another important component of host defense is the process of inflammation. Inflammatory responses are divided into four categories on the basis of pathology and include allergic inflammation, cytotoxic antibody mediated inflammation, immune complex mediated inflammation and monocyte mediated inflammation. Inflammation manifests as a combination of each of these forms with one predominating.
Allergic acute inflammation is observed in individuals wherein specific antigens stimulate IgE antibody production. Mast cells and basophils are subsequently activated by the attachment of antigen-IgE complexes, resulting in the release of cytoplasmic granule contents such as histamine. The products of activated mast ceUs can increase vascular permeability and constrict the smooth muscle of breathing passages, resulting in anaphylaxis or asthma. Acute inflammation is also mediated by cytotoxic antibodies and can result in the destruction of tissue through the binding of complement-fixing antibodies to cells. The responsible antibodies are of the IgG or IgM types. Resultant clinical disorders include autoimmune hemolytic anemia and thrombocytopenia as associated with systemic lupus erythematosis.
Immune complex mediated acute inflammation involves the IgG or IgM antibody types which combine with antigen to activate the complement cascade. When such immune complexes bind to neutrophils and macrophages they activate the respiratory burst to form protein- and vessel- damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines. Clinical manifestations include rheumatoid arthritis and systemic lupus erythematosus.
In chronic inflammation or delayed-type hypersensitivity, macrophages are activated and process antigen for presentation to T cells that subsequently produce lymphokines and monokines. This type of inflammatory response is likely important for defense against intracellular parasites and certain viruses. Clinical associations include, granulomatous disease, tuberculosis, leprosy, and sarcoidosis (Paul, W.E., supra, pp.1017-1018).
Extracellular Information Transmission Molecules
Intercellular communication is essential for the growth and survival of multicellular organisms, and in particular, for the function of the endocrine, nervous, and immune systems. In addition, intercellular communication is critical for developmental processes such as tissue construction and organogenesis, in which cell proliferation, cell differentiation, and morphogenesis must be spatially and temporally regulated in a precise and coordinated manner. Cells communicate with one another through the secretion and uptake of diverse types of signaling molecules such as hormones, growth factors, neuropeptides, and cytokines. Hormones
Hormones are signaling molecules that coordinately regulate basic physiological processes from embryogenesis throughout adulthood. These processes include metabolism, respiration, reproduction, excretion, fetal tissue differentiation and organogenesis, growth and development, homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly integrated and interdependent. Hormones are secreted by endocrine glands, primarily the hypothalamus and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes.
The secretion of hormones into the circulation is tightly controlled. Hormones are often secreted in diurnal, pulsatile, and cyclic patterns. Hormone secretion is regulated by perturbations in blood biochemistry, by other upstream-acting hormones, by neural impulses, and by negative feedback loops. Blood hormone concentrations are constantly monitored and adjusted to maintain optimal, steady-state levels. Once secreted, hormones act only on those target cells that express specific receptors.
Most disorders of the endocrine system are caused by either hyposecretion or hypersecretion of hormones. Hyposecretion often occurs when a hormone' s gland of origin is damaged or otherwise
impaired. Hypersecretion often results from the proliferation of tumors derived from hormone- secreting cells. Inappropriate hormone levels may also be caused by defects in regulatory feedback loops or in the processing of hormone precursors. Endocrine malfunction may also occur when the target cell fails to respond to the hormone. Hormones can be classified biochemically as polypeptides, steroids, eicosanoids, or amines.
Polypeptides, which include diverse hormones such as insulin and growth hormone, vary in size and function and are often synthesized as inactive precursors that are processed inttacellularly into mature, active forms. Amines, which include epinephrine and dopamine, are amino acid derivatives that function in neuroendocrine signaling. Steroids, which include the cholesterol-derived hormones estrogen and testosterone, function in sexual development and reproduction. Eicosanoids, which include prostaglandins and prostacyclins, are fatty acid derivatives that function in a variety of processes. Most polypeptides and some amines are soluble in the circulation where they are highly susceptible to proteolytic degradation within seconds after their secretion. Steroids and lipids are insoluble and must be transported in the circulation by carrier proteins. The following discussion will focus primarily on polypeptide hormones.
Hormones secreted by the hypothalamus and pituitary gland play a critical role in endocrine function by coordinately regulating hormonal secretions from other endocrine glands in response to neural signals. Hypothalamic hormones include thyrotropin-releasing hormone, gonadotropin- releasing hormone, somatostatin, growth-hormone releasing factor, corticotropin-releasing hormone, substance P, dopamine, and prolactin-releasing hormone. These hormones directly regulate the secretion of hormones from the anterior lobe of the pituitary. Hormones secreted by the anterior pituitary include adrenocorticottopic hormone (ACTH), melanocyte-stimulating hormone, somatotropic hormones such as growth hormone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), β- lipottopin, and β-endorphins. These hormones regulate hormonal secretions from the thyroid, pancreas, and adrenal glands, and act directly on the reproductive organs to stimulate ovulation and spermatogenesis. The posterior pituitary synthesizes and secretes antidiuretic hormone (ADH, vasopressin) and oxytocin.
Disorders of the hypothalamus and pituitary often result from lesions such as primary brain tumors, adenomas, infarction associated with pregnancy, hypophysectomy, aneurysms, vascular malformations, thrombosis, infections, immunological disorders, and complications due to head trauma. Such disorders have profound effects on the function of other endocrine glands. Disorders associated with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus, Kallman's disease, Hand-Schuller-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sella syndrome, and dwarfism. Disorders associated with hyperpituitarism include acromegaly, giantism,
and syndrome of inappropriate ADH secretion (SIADH), often caused by benign adenomas.
Hormones secreted by the thyroid and parathyroid primarily control metabolic rates and the regulation of serum calcium levels, respectively. Thyroid hormones include calcitonin, somatostatin, and thyroid hormone. The parathyroid secretes parathyroid hormone. Disorders associated with hypothyroidism include goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism. Disorders associated with hyperthyroidism include thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease. Disorders associated with hyperparathyroidism include Conn disease (chronic hypercalemia) leading to bone resorption and parathyroid hyperplasia.
Hormones secreted by the pancreas regulate blood glucose levels by modulating the rates of carbohydrate, fat, and protein metabolism. Pancreatic hormones include insulin, glucagon, amylin, γ- aminobutyric acid, gastrin, somatostatin, and pancreatic polypeptide. The principal disorder associated with pancreatic dysfunction is diabetes mellitus caused by insufficient insulin activity. Diabetes mellitus is generally classified as either Type I (insulin-dependent, juvenile diabetes) or Type D (non-insulin-dependent, adult diabetes). The treatment of both forms by insulin replacement therapy is well known. Diabetes mellitus often leads to acute complications such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic complications leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
The anatomy, physiology, and diseases related to hormonal function are reviewed in McCance, K.L. and S.E. Huether (1994) Pathophvsiology: The Biological Basis for Disease in Adults and Children, Mosby-Year Book, Inc., St. Louis MO; Greenspan, F.S. and J.D. Baxter (1994) Basic and Clinical Endocrinology. Appleton and Lange, East Norwalk CT. Growth Factors
Growth factors are secreted proteins that mediate intercellular communication. Unlike hormones, which travel great distances via the circulatory system, most growth factors are primarily local mediators that act on neighboring cells. Most growth factors contain a hydrophobic N-terminal signal peptide sequence which directs the growth factor into the secretory pathway. Most growth factors also undergo post-translational modifications within the secretory pathway. These modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond formation. Once secreted, growth factors bind to specific receptors on the surfaces of neighboring target cells, and the bound receptors trigger intracellular signal transduction pathways. These signal transduction pathways elicit specific cellular responses in the target cells. These responses can include the modulation of gene expression and the stimulation or inhibition of cell
division, cell differentiation, and cell motility.
Growth factors fall into at least two broad and overlapping classes. The broadest class includes the large polypeptide growth factors, which are wide-ranging in their effects. These factors include epidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor-β (TGF-β), insulin-like growth factor (IGF), nerve growth factor (NGF), and platelet-derived growth factor (PDGF), each defining a family of numerous related factors. The large polypeptide growth factors, with the exception of NGF, act as mitogens on diverse cell types to stimulate wound healing, bone synthesis and remodeUng, extracellular matrix synthesis, and proliferation of epithelial, epidermal, and connective tissues. Members of the TGF-β, EGF, and FGF families also function as inductive signals in the differentiation of embryonic tissue. NGF functions specifically as a neuroteophic factor, promoting neuronal growth and differentiation.
Another class of growth factors includes the hematopoietic growth factors, which are narrow in their target specificity. These factors stimulate the proliferation and differentiation of blood cells such as B-lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, macrophages, and their stem cell precursors. These factors include the colony-stimulating factors (G-CSF, M-CSF, GM-CSF, and CSF1-3), erythropoietin, and the cytokines. The cytokines are specialized hematopoietic factors secreted by cells of the immune system and are discussed in detail below.
Growth factors play critical roles in neoplastic transformation of cells in vitro and in tumor progression in vivo. Overexpression of the large polypeptide growth factors promotes the proliferation and transformation of cells in culture. Inappropriate expression of these growth factors by tumor cells in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth factors are both structurally and functionaUy related to oncoproteins, the potentially cancer- causing products of proto-oncogenes. Certain FGF and PDGF family members are themselves homologous to oncoproteins, whereas receptors for some members of the EGF, NGF, and FGF families are encoded by proto-oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors. CRC Press, Arm Arbor MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach. Oxford University Press, New York NY; Habenicht, A., ed. (1990) Growth Factors. Differentiation Factors, and Cytokines. Springer-Veriag, New York NY).
In addition, some of the large polypeptide growth factors play crucial roles in the induction of the primordial germ layers in the developing embryo. This induction ultimately results in the formation of the embryonic mesoderm, ectoderm, and endoderm which in turn provide the framework for the entire adult body plan. Disruption of this inductive process would be catastrophic to
embryonic development.
Small Peptide Factors - Neuropeptides and Vasomediators
Neuropeptides and vasomediators (NP/VM) comprise a family of small peptide factors, typically of 20 amino acids or less. These factors generaUy function in neuronal excitation and inhibition of vasoconsttiction/vasodilation, muscle contraction, and hormonal secretions from the brain and other endocrine tissues. Included in this family are neuropeptides and neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin D and related peptides involved in smooth muscle stimulation, vasopressin, vasoactive intestinal peptide, and circulatory system-borne signaling molecules such as angiotensin, complement, calcitonin, endothelins, formyl-methionyl peptides, glucagon, cholecystokinin, gastrin, and many of the peptide hormones discussed above. NP/VMs can transduce signals directly, modulate the activity or release of other neurotransmitters and hormones, , and act as catalytic enzymes in signaling cascades. The effects of NP/VMs range from extremely brief to long-lasting. (Reviewed in Martin, CR. et al. (1985) Endocrine Physiology, Oxford .University Press, New York NY, pp. 57-62.) Cytokines
Cytokines comprise a family of signaling molecules that modulate the immune system and the inflammatory response. Cytokines are usually secreted by leukocytes, or white blood cells, in response to injury or infection. Cytokines function as growth and differentiation factors that act primarily on cells of the immune system such as B- and T-lymphocytes, monocytes, macrophages, and granulocytes. Like other signaling molecules, cytokines bind to specific plasma membrane receptors and trigger intracellular signal transduction pathways which alter gene expression patterns. There is considerable potential for the use of cytokines in the treatment of inflammation and immune system disorders. Cytokine structure and function have been extensively characterized in vitro. Most cytokines are small polypeptides of about 30 kilodaltons or less. Over 50 cytokines have been identified from human and rodent sources. Examples of cytokine subfamilies include the interferons (IFN-α, -β, and -γ), the interleukins (IL1-1L13), the tumor necrosis factors (TNF-α and -β), and the chemokines. Many cytokines have been produced using recombinant DNA techniques, and the activities of individual cytokines have been determined in vitro. These activities include regulation of leukocyte proliferation, differentiation, and motility.
The activity of an individual cytokine in vitto may not reflect the full scope of that cytokine' s activity in vivo. Cytokines are not expressed individually in vivo but are instead expressed in combination with a multitude of other cytokines when the organism is challenged with a stimulus. Together, these cytokines collectively modulate the immune response in a manner appropriate for that
particular stimulus. Therefore, the physiological activity of a cytokine is determined by the stimulus itself and by complex interactive networks among co-expressed cytokines which may demonstrate both synergistic and antagonistic relationships.
Chemokines comprise a cytokine subfamily with over 30 members. (Reviewed in Wells, T. N.C. and M.C. Peitsch (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initially identified as chemotactic proteins that recruit monocytes and macrophages to sites of inflammation. Recent evidence indicates that chemokines may also play key roles in hematopoiesis and HJV-1 infection. Chemokines are small proteins which range from about 6-15 kilodaltons in molecular weight. Chemokines are further classified as C, CC, CXC, or CX3C based on the number and position of critical cysteine residues. The CC chemokines, for example, each contain a conserved motif consisting of two consecutive cysteines followed by two additional cysteines which occur downstream at 24- and 16-residue intervals, respectively (ExPASy PROSITE database, documents PS00472 and PDOC00434). The presence and spacing of these four cysteine residues are highly conserved, whereas the intervening residues diverge significantly. However, a conserved tyrosine located about 15 residues downstream of the cysteine doublet seems to be important for chemotactic activity. Most of the human genes encoding CC chemokines are clustered on chromosome 17, although there are a few examples of CC chemokine genes that map elsewhere. Other chemokines include lymphotactin (C chemokine); macrophage chemotactic and activating factor (MCAF/MCP-1; CC chemokine); platelet factor 4 and IL-8 (CXC chemokines); and fractalkine and neurotractin (CX3C chemokines). (Reviewed in Luster, A.D. (1998) N. Engl. J. Med. 338:436-445.)
Receptor Molecules
The term receptor describes proteins that specifically recognize other molecules. The category is broad and includes proteins with a variety of functions. The bulk of receptors are cell surface proteins which bind extracellular ligands and produce cellular responses in the areas of growth, differentiation, endocytosis, and immune response. Other receptors facilitate the selective transport of proteins out of the endoplasmic reticulum and localize enzymes to particular locations in the cell. The term may also be applied to proteins which act as receptors for ligands with known or unknown chemical composition and which interact with other cellular components. For example, the steroid hormone receptors bind to and regulate transcription of DNA.
Regulation of cell proliferation, differentiation, and migration is important for the formation and function of tissues. Regulatory proteins such as growth factors coordinately conttol these cellular processes and act as mediators in cell-cell signaling pathways. Growth factors are secreted proteins that bind to specific cell-surface receptors on target cells. The bound receptors trigger intracellular signal transduction pathways which activate various downstream effectors that regulate gene
expression, cell division, cell differentiation, cell motility, and other cellular processes.
Cell surface receptors are typically integral plasma membrane proteins. These receptors recognize hormones such as catecholamines; peptide hormones; growth and differentiation factors; small peptide factors such as thyrottopin-releasing hormone; galanin, somatostatin, and tachykinins; and circulatory system-borne signaling molecules. Cell surface receptors on immune system cells recognize antigens, antibodies, and major histocompatibility complex (MHC)-bound peptides. Other cell surface receptors bind ligands to be internalized by the cell. This receptor-mediated endocytosis functions in the uptake of low density lipoproteins (LDL), ttansferrin, glucose- or mannose-terminal glycoproteins, galactose-terminal glycoproteins, immunoglobulins, phosphovitellogenins, fibrin, proteinase-inhibitor complexes, plasminogen activators, and thrombospondin (Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY, p. 723; Mikhailenko, I. et al. (1997) J. Biol. Chem. 272:6784-6791). Receptor Protein Kinases
Many growth factor receptors, including receptors for epidermal growth factor, platelet-derived growth factor, fibroblast growth factor, as well as the growth modulator α-thrombin, contain intrinsic protein kinase activities. When growth factor binds to the receptor, it triggers the autophosphorylation of a serine, threonine, or tyrosine residue on the receptor. These phosphorylated sites are recognition sites for the binding of other cytoplasmic signaling proteins. These proteins participate in signaling pathways that eventually link the initial receptor activation at the cell surface to the activation of a specific intracellular target molecule. In the case of tyrosine residue autophosphorylation, these signaling proteins contain a common domain referred to as a Src homology (SH) domain. SH2 domains and SH3 domains are found in phospholipase C-γ, PI-3-K p85 regulatory subunit, Ras-GTPase activating protein, and pp60c src (Lowenstein, EJ. et al. (1992) Cell 70:431-442). The cytokine family of receptors share a different common binding domain and include transmembrane receptors for growth hormone (GH), interleukins, erythropoietin, and prolactin. Other receptors and second messenger-binding proteins have intrinsic serine/threonine protein kinase activity. These include activin/TGF-β/BMP-superfamily receptors, calcium- and diacylglycerol-activated/phospholipid-dependant protein kinase (PK-C), and RNA-dependant protein kinase (PK-R). In addition, other serine/threonine protein kinases, including nematode Twitchin, have fibronectin-like, immunoglobulin C2-like domains. G-Protein Coupled Receptors
G-protein coupled receptors (GPCRs) are integral membrane proteins characterized by the presence of seven hydrophobic transmembrane domains which span the plasma membrane and form a bundle of antiparallel alpha (α) helices. These proteins range in size from under 400 to over 1000 amino acids (Sttosberg, A.D. (1991) Eur. J. Biochem. 196: 1-10; Coughlin, S.R. (1994) Curr. Opin.
Cell Biol. 6:191-197). The amino-terminus of the GPCR is extracellular, of variable length and often glycosylated; the carboxy-terminus is cytoplasmic and generally phosphorylated. Extracellular loops of the GPCR alternate with intracellular loops and link the transmembrane domains. The most conserved domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops. The transmembrane domains account for structural and functional features of the receptor. In most cases, the bundle of α helices forms a binding pocket. In addition, the extracellular N-terminal segment or one or more of the three extracellular loops may also participate in ligand binding. Ligand binding activates the receptor by inducing a conformational change in intracellular portions of the receptor. The activated receptor, in turn, interacts with an intracellular heterotrimeric guanine nucleotide binding (G) protein complex which mediates further intracellular signaling activities, generally the production of second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate, or interactions with ion channel proteins (Baldwin, J.M. (1994) Curr. Opin. Cell Biol. 6:180-190).
GPCRs include those for acetylcholine, adenosine, epinephrine and norepinephrine, bombesin, bradykinin, chemokines, dopamine, endothelin, γ-aminobutyric acid (GABA), follicle- stimulating hormone (FSH), glutamate, gonadotropin-releasing hormone (GnRH), hepatocyte growth factor, histamine, leukotrienes, melanocortins, neuropeptide Y, opioid peptides, opsins, prostanoids, serotonin, somatostatin, tachykinins, thrombin, thyrottopin-releasing hormone (TRH), vasoactive intestinal polypeptide family, vasopressin and oxytocin, and orphan receptors. GPCR mutations, which may cause loss of function or constitutive activation, have been associated with numerous human diseases (Coughlin, supra). For instance, retinitis pigmentosa may arise from mutations in the rhodopsin gene. Rhodopsin is the retinal photoreceptor which is located within the discs of the eye rod cell. Parma, J. et al. (1993, Nature 365:649-651) report that somatic activating mutations in the thyrotropin receptor cause hyperfunctioning thyroid adenomas and suggest that certain GPCRs susceptible to constitutive activation may behave as protooncogenes. Nuclear Receptors
Nuclear receptors bind small molecules such as hormones or second messengers, leading to increased receptor-binding affinity to specific chromosomal DNA elements. In addition the affinity for other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate and modulate gene expression. Examples of such receptors include the steroid hormone receptors family, the retinoic acid receptors family, and the thyroid hormone receptors family. Ligand-Gated Receptor Ion Channels
Ligand-gated receptor ion channels fall into two categories. The first category, extracellular ligand-gated receptor ion channels (ELGs), rapidly transduce neurottansmitter-binding events into electrical signals, such as fast synaptic neurottansmission. ELG fimction is regulated by post-
translational modification. The second category, intracellular ligand-gated receptor ion channels (ILGs), are activated by many intracellular second messengers and do not require post-translational modification(s) to effect a channel-opening response.
ELGs depolarize excitable cells to the threshold of action potential generation. In non- excitable cells, ELGs permit a limited calcium ion-influx during the presence of agonist. ELGs include channels directly gated by neurotransmitters such as acetylcholine, L-glutamate, glycine, ATP, serotonin, GABA, and histamine. ELG genes encode proteins having strong structural and functional similarities. ILGs are encoded by distinct and unrelated gene families and include receptors for cAMP, cGMP, calcium ions, ATP, and metabolites of arachidonic acid. Macrophage Scavenger Receptors
Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens. Scavenger receptors types I and D are trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a transmembrane domain, a large extracellular domain, and a C-terminal cysteine-rich domain. The extracellular domain contains a short spacer domain, an α-helical coiled-coil domain, and a triple helical collagenous domain. These receptors have been shown to bind a spectrum of ligands, including chemically modified lipoproteins and albumin, polyribonucleotides, polysaccharides, phospholipids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. (1995) Cell 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. T-Cell Receptors
T cells play a dual role in the immune system as effectors and regulators, coupling antigen recognition with the transmission of signals that induce cell death in infected cells and stimulate proliferation of other immune cells. Although a population of T cells can recognize a wide range of different antigens, an individual T cell can only recognize a single antigen and only when it is presented to the T cell receptor (TCR) as a peptide complexed with a major histocompatibility molecule (MHC) on the surface of an antigen presenting cell. The TCR on most T cells consists of immunoglobulin-like integral membrane glycoproteins containing two polypeptide subunits, α and β, of similar molecular weight. Both TCR subunits have an extracellular domain containing both variable and constant regions, a transmembrane domain that traverses the membrane once, and a short intracellular domain (Saito, H. et al. (1984) Nature 309:757-762). The genes for the TCR subunits are constructed through somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context with the TCR initiates signaling cascades that induce the proliferation, maturation, and function of cellular components of the immune system (Weiss, A. (1991) Annu. Rev.
Genet. 25:487-510). Rearrangements in TCR genes and alterations in TCR expression have been noted in lymphomas, leukemias, autoimmune disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. Engl. J. Med. 313:529-533; Weiss, supra).
Intracellular Signaling Molecules
Intracellular signaling is the general process by which cells respond to extracellular signals (hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of biochemical reactions that begins with the binding of a signaling molecule to a cell membrane receptor and ends with the activation of an intracellular target molecule. Intermediate steps in the process involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, and their deactivation by protein phosphatases, and the eventual translocation of some of these activated proteins to the cell nucleus where the transcription of specific genes is triggered. The intracellular signaling process regulates all types of cell functions including cell proliferation, cell differentiation, and gene transcription, and involves a diversity of molecules including protein kinases and phosphatases, and second messenger molecules, such as cyclic nucleotides, calcium- calmodulin, inositol, and various mitogens, that regulate protein phosphorylation. Protein Phosphorylation
Protein kinases and phosphatases play a key role in the intracellular signaling process by controlling the phosphorylation and activation of various signaling proteins. The high energy phosphate for this reaction is generaUy ttansferred from the adenosine triphosphate molecule (ATP) to a particular protein by a protein kinase and removed from that protein by a protein phosphatase. Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serine/threonine kinases, STK). A few protein kinases have dual specificity for serine/threonine and tyrosine residues. Almost all kinases contain a conserved 250-300 amino acid catalytic domain containing specific residues and sequence motifs characteristic of the kinase family (Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Books. Vol 1:7-20, Academic Press, San Diego CA). STKs include the second messenger dependent protein kinases such as the cyclic-AMP dependent protein kinases (PKA), involved in mediating hormone-induced cellular responses; calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle contraction, glycogen breakdown, and neurottansmission; and the mitogen-activated protein kinases (MAP) which mediate signal transduction from the cell surface to the nucleus via phosphorylation cascades. Altered PKA expression is implicated in a variety of disorders and diseases including cancer, thyroid disorders, diabetes, atherosclerosis, and cardiovascular disease (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal Medicine. McGraw-Hill, New York NY, pp. 416-431, 1887).
PTKs are divided into transmembrane, receptor PTKs and nontransmembrane, non-receptor PTKs. Transmembrane PTKs are receptors for most growth factors. Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intracellular regions of cell surface receptors. Receptors that function through non-receptor PTKs include those for cytokines and hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes. Many of these PTKs were first identified as the products of mutant oncogenes in cancer cells in which their activation was no longer subject to normal cellular controls. In fact, about one third of the known oncogenes encode PTKs, and it is well known that cellular transformation (oncogenesis) is often accompanied by increased tyrosine phosphorylation activity (Charbonneau, H. and N.K. Tonks (1992) Annu. Rev. Cell Biol. 8:463-493).
An additional family of protein kinases previously thought to exist only in procaryotes is the histidine protein kinase family (HPK). HPKs bear little homology with mammalian STKs or PTKs but have distinctive sequence motifs of their own (Davie, J.R. et al. (1995) J. Biol. Chem. 270: 19861-19867). A histidine residue in the N-terminal half of the molecule (region I) is an autophosphorylation site. Three additional motifs located in the C-terminal half of the molecule include an invariant asparagine residue in region II and two glycine-rich loops characteristic of nucleotide binding domains in regions ID and IV. Recently a branched chain alpha-ketoacid dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra).
Protein phosphatases regulate the effects of protein kinases by removing phosphate groups from molecules previously activated by kinases. The two principal categories of protein phosphatases are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs). PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508). PTPs reverse the effects of protein tyrosine kinases and play a significant role in cell cycle and cell signaling processes (Charbonneau, supra). As previously noted, many PTKs are encoded by oncogenes, and oncogenesis is often accompanied by increased tyrosine phosphorylation activity. It is therefore possible that PTPs may prevent or reverse cell transformation and the growth of various cancers by controlling the levels of tyrosine phosphorylation in cells. This hypothesis is supported by studies showing that overexpression of PTPs can suppress transformation in cells, and that specific inhibition of PTPs can enhance cell transformation (Charbonneau, supra). Phospholipid and Inositol-Phosphate Signaling
Inositol phospholipids (phosphoinositides) are involved in an intracellular signaling pathway that begins with binding of a signaling molecule to a G-protein linked receptor in the plasma membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the inner side of the plasma membrane to the biphosphate state (PIP2) by inositol kinases. Simultaneously, the G-
protein linked receptor binding stimulates a trimeric G-protein which in turn activates a phosphoinositide-specific phospholipase C-β. Phospholipase C-β then cleaves PIP2 into two products, inositol triphosphate (IP3) and diacylglycerol. These two products act as mediators for separate signaling events. IP3 diffuses through the plasma membrane to induce calcium release from the endoplasmic reticulum (ER), while diacylglycerol remains in the membrane and helps activate protein kinase C, an STK that phosphorylates selected proteins in the target cell. The calcium response initiated by rP3 is terminated by the dephosphorylation of IP3 by specific inositol phosphatases. Cellular responses that are mediated by this pathway are glycogen breakdown in the liver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and thrombin-induced platelet aggregation. Cyclic Nucleotide Signaling
Cyclic nucleotides (cAMP and cGMP) function as intracellular second messengers to transduce a variety of extracellular signals including hormones, light, and neurotransmitters. In particular, cyclic-AMP dependent protein kinases (PKA) are thought to account for all of the effects of cAMP in most mammalian cells, including various hormone-induced cellular responses. Visual excitation and the phototransmission of light signals in the eye is controlled by cyclic-GMP regulated, Ca2+-specific channels. Because of the importance of cellular levels of cyclic nucleotides in mediating these various responses, regulating the synthesis and breakdown of cyclic nucleotides is an important matter. Thus adenylyl cyclase, which synthesizes cAMP from AMP, is activated to increase cAMP levels in muscle by binding of adrenaline to β-andrenergic receptors, while activation of guanylate cyclase and increased cGMP levels in photoreceptors leads to reopening of the Ca2+-specific channels and recovery of the dark state in the eye. In contrast, hydrolysis of cyclic nucleotides by cAMP and cGMP-specific phosphodiesterases (PDEs) produces the opposite of these and other effects mediated by increased cyclic nucleotide levels. PDEs appear to be particularly important in the regulation of cyclic nucleotides, considering the diversity found in this family of proteins. At least seven families of mammalian PDEs (PDE1-7) have been identified based on substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, J.A. (1995) Physiological Reviews 75:725-748). PDE inhibitors have been found to be particularly useful in treating various clinical disorders. Rolipram, a specific inhibitor of PDE4, has been used in the treatment of depression, and similar inhibitors are undergoing evaluation as anti-inflammatory agents. Theophylline is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and other respiratory diseases (Banner, K.H. and CP. Page (1995) Eur. Respir. J. 8:996-1000). G-Protein Signaling
Guanine nucleotide binding proteins (G-proteins) are critical mediators of signal transduction between a particular class of exttacellular receptors, the G-protein coupled receptors (GPCR), and
intracellular second messengers such as cAMP and Ca2+. G-proteins are linked to the cytosolic side of a GPCR such that activation of the GPCR by ligand binding stimulates binding of the G-protein to GTP, inducing an "active" state in the G-protein. In the active state, the G-protein acts as a signal to trigger other events in the cell such as the increase of cAMP levels or the release of Ca2+ into the cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intracellular proteins. Recycling of the G-protein to the inactive state involves hydrolysis of the bound GTP to GDP by a GTPase activity in the G-protein. (See Alberts, B. et al. (1994) Molecular Biology of the Cell. Garland Publishing, Inc., New York NY, pp.734-759.) Two structurally distinct classes of G- proteins are recognized: heterotrimeric G-proteins, consisting of three different subunits, and monomeric, low molecular weight (LMW), G-proteins consisting of a single polypeptide chain.
The three polypeptide subunits of heterotrimeric G-proteins are the , β, and γ subunits. The subunit binds and hydrolyzes GTP. The β and γ subunits form a tight complex that anchors the protein to the inner side of the plasma membrane. The β subunits, also known as G-β proteins or β transducins, contain seven tandem repeats of the WD-repeat sequence motif, a motif found in many proteins with regulatory functions. Mutations and variant expression of β transducin proteins are linked with various disorders (Neer, E.J. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1998) Mol. Cell 1:565-574).
LMW GTP-proteins are GTPases which regulate cell growth, cell cycle conttol, protein secretion, and intracellular vesicle interaction. They consist of single polypeptides which, like the α subunit of the heterotrimeric G-proteins, are able to bind and hydrolyze GTP, thus cycling between an inactive and an active state. At least sixty members of the LMW G-protein superfamily have been identified and are currently grouped into the six subfamilies of ras, rho, arf, sari, ran, and rab. Activated ras genes were initially found in human cancers, and subsequent studies confirmed that ras function is critical in determining whether cells continue to grow or become differentiated. Other members of the LMW G-protein superfamily have roles in signal transduction that vary with the function of the activated genes and the locations of the G-proteins.
Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by determining whether GTP or GDP is bound. GTPase-activating protein (GAP) binds to GTP-ras and induces it to hydrolyze GTP to GDP. In contrast, guanine nucleotide releasing protein (GNRP) binds to GDP-ras and induces the release of GDP and the binding of GTP.
Other regulators of G-protein signaling (RGS) also exist that act primarily by negatively regulating the G-protein pathway by an unknown mechanism (Druey, K.M. et al. (1996) Nature 379:742-746). Some 15 members of the RGS family have been identified. RGS family members are related structuraUy through similarities in an approximately 120 amino acid region termed the RGS domain and functionally by their ability to inhibit the interleukin (cytokine) induction of MAP kinase
in cultured mammalian 293T cells (Druey, supra). Calcium Signaling Molecules
Ca+2 is another second messenger molecule that is even more widely used as an intracellular mediator than cAMP. Two pathways exist by which Ca+2 can enter the cytosol in response to exttacellular signals: One pathway acts primarily in nerve signal transduction where Ca+2 enters a nerve terminal through a voltage-gated Ca+2 channel. The second is a more ubiquitous pathway in which Ca+2 is released from the ER into the cytosol in response to binding of an extracellular signaling molecule to a receptor. Ca2+ directly activates regulatory enzymes, such as protein kinase C, which trigger signal ttansduction pathways. Ca2+ also binds to specific Ca2+-binding proteins (CBPs) such as calmodulin (CaM) which then activate multiple target proteins in the cell including enzymes, membrane transport pumps, and ion channels. CaM interactions are involved in a multitude of cellular processes including, but not limited to, gene regulation, DNA synthesis, cell cycle progression, mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion. homeostasis, exocytosis, and metabolic regulation (Celio, M.R. et al. (1996) Guidebook to Calcium-binding Proteins. Oxford University Press, Oxford, UK, pp. 15-20). Some CBPs can serve as a storage depot for Ca2+ in an inactive state. Calsequestrin is one such CBP that is expressed in isoforms specific to cardiac muscle and skeletal muscle. It is suggested that calsequestrin binds Ca2+ in a rapidly exchangeable state that is released during Ca2+ -signaling conditions (Celio, M.R. et al. (1996) Guidebook to Calcium-binding Proteins. Oxford University Press, New York NY, pp. 222- 224). Cyclins
Cell division is the fundamental process by which aU living things grow and reproduce. In most organisms, the cell cycle consists of three principle steps; interphase, mitosis, and cytokinesis. Interphase, involves preparations for cell division, replication of the DNA and production of essential proteins. In mitosis, the nuclear material is divided and separates to opposite sides of the cell. Cytokinesis is the final division and fission of the cell cytoplasm to produce the daughter cells.
The entry and exit of a cell from mitosis is regulated by the synthesis and destruction of a family of activating proteins called cyclins. Cyclins act by binding to and activating a group of cyclin-dependent protein kinases (Cdks) which then phosphorylate and activate selected proteins involved in the mitotic process. Several types of cyclins exist. (Ciechanover, A. (1994) Cell
79:13-21.) Two principle types are mitotic cyclin, or cyclin B, which controls entry of the cell into mitosis, and GI cyclin, which controls events that drive the cell out of mitosis. Signal Complex Scaffolding Proteins
Ceretain proteins in intracellular signaUng pathways serve to link or cluster other proteins involved in the signaling cascade. A conserved protein domain caUed the PDZ domain has been
identified in various membrane-associated signaling proteins. This domain has been implicated in receptor and ion channel clustering and in the targeting of multiprotein signaling complexes to specialized functional regions of the cytosolic face of the plasma membrane. (For a review of PDZ domain-containing proteins, see Ponting, CP. et al. (1997) Bioessays 19:469-479.) A large proportion of PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate kinase) protein family, members of which bind to the intracellular domains of receptors and channels. However, PDZ domains are also found in diverse membrane-localized proteins such as protein tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins such as synttophins and neuronal nitric oxide synthase (nNOS). Generally, about one to three PDZ domains are found in a given protein, although up to nine PDZ domains have been identified in a single protein.
Membrane Transport Molecules
The plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm and the exttacellular environment, and between the cytoplasm and lumenal spaces of cellular organelles requires specific transport proteins. Each transport protein carries a particular class of molecule, such as ions, sugars, or amino acids, and often is specific to a certain molecular species of the class. A variety of human inherited diseases are caused by a mutation in a transport protein. For example, cystinuria is an inherited disease that results from the inability to transport cystine, the disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cystine in the urine leads to the formation of cystine stones in the kidneys.
Transport proteins are multi-pass transmembrane proteins, which either actively transport molecules across the membrane or passively allow them to cross. Active transport involves directional pumping of a solute across the membrane, usually against an electrochemical gradient. Active transport is tightly coupled to a source of metabolic energy, such as ATP hydrolysis or an electrochemically favorable ion gradient. Passive transport involves the movement of a solute down its electrochemical gradient. Transport proteins can be further classified as either carrier proteins or channel proteins. Carrier proteins, which can function in active or passive.ttansport, bind to a specific solute to be transported and undergo a conformational change which transfers the bound solute across the membrane. Channel proteins, which only function in passive transport, form hydrophilic pores across the membrane. When the pores open, specific solutes, such as inorganic ions, pass through the membrane and down the electrochemical gradient of the solute.
Carrier proteins which transport a single solute from one side of the membrane to the other are called uniporters. In contrast, coupled transporters link the ttansfer of one solute with simultaneous or sequential transfer of a second solute, either in the same direction (symport) or in the
opposite direction (antiport). For example, intestinal and kidney epithelium contains a variety of symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium moves into the cell down its electrochemical gradient and brings the solute into the cell with it. The sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous Na+/K+ ATPase. Sodium-coupled transporters include the mammalian glucose ttansporter (SGLTl), iodide transporter (NIS), and multivitamin ttansporter (SMVT). All three ttansporters have twelve putative transmembrane segments, exttacellular glycosylation sites, and cytoplasmically-oriented N- and C-termini. NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for specific targeting of radioisotopes to the thyroid gland (Levy, O. et al. (1997) Proc. Natl. Acad. Sci. USA 94:5568-5573). SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is implicated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et al. (1998) J. Biol. Chem. 273:7501-7506).
Transporters play a major role in the regulation of pH, excretion of drugs, and the cellular K7Na+ balance. Monocarboxylate anion ttansporters are proton-coupled symporters with a broad substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and beta-hydroxybutyrate. At least seven isoforms have been identified to date. The isoforms are predicted to have twelve transmembrane (TM) helical domains with a large intracellular loop between TM6 and TM7, and play a critical role in maintaining intracellular pH by removing the protons that are produced stoichiometrically with lactate during glycolysis. The best characterized
H(+)-monocarboxylate ttansporter is that of the erythrocyte membrane, which transports L-lactate and a wide range of other aliphatic monocarboxylates. Other cells possess H(+)-linked monocarboxylate ttansporters with differing substrate and inhibitor selectivities. In particular, cardiac muscle and tumor cells have ttansporters that differ in their Km values for certain substrates, including stereoselectivity for L- over D-lactate, and in their sensitivity to inhibitors. There are Na(+)-monocarboxylate cottansporters on the luminal surface of intestinal and kidney epithelia, which allow the uptake of lactate, pyruvate, and ketone bodies in these tissues, hi addition, there are specific and selective transporters for organic cations and organic anions in organs including the kidney, intestine and liver. Organic anion ttansporters are selective for hydrophobic, charged molecules with electton-atttacting side groups. Organic cation ttansporters, such as the ammonium ttansporter, mediate the secretion of a variety of drugs and endogenous metabolites, and contribute to the maintenance of intercellular pH. (Poole, R.C. and A.P. Halesttap (1993) Am. J. Physiol. 264:C761-C782; Price, NT. et al. (1998) Biochem. J. 329:321-328; and Martinelle, K. and I. Haggstrom (1993) J. Biotechnol. 30: 339-350.) The largest and most diverse family of transport proteins known is the ATP-binding cassette
(ABC) transporters. As a family, ABC ttansporters can transport substances that differ markedly in chemical structure and size, ranging from small molecules such as ions, sugars, amino acids, peptides, and phospholipids, to lipopeptides, large proteins, and complex hydrophobic drugs. ABC proteins consist of four modules: two nucleotide-binding domains (NBD), which hydrolyze ATP to supply the energy required for transport, and two membrane-spanning domains (MSD), each containing six putative transmembrane segments. These four modules may be encoded by a single gene, as is the case for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes. When encoded by separate genes, each gene product contains a single NBD and MSD. These "half-molecules" form homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major histocompatibility (MHC) peptide transport system. Several genetic diseases are attributed to defects in ABC transporters, such as the following diseases and their corresponding proteins: cystic fibrosis (CFTR, an ion channel), adrenoleukodysttophy (adrenoleukodystrophy protein, ALDP), Zellweger syndrome (peroxisomal membrane protein-70, PMP70), and hyperinsulinemic hypoglycemia (sulfonylurea receptor, SUR). Overexpression of the multidrug resistance (MDR) protein, another ABC transporter, in human cancer cells makes the cells resistant to a variety of cytotoxic drugs used in chemotherapy (Taglight, D. and S. Michaelis (1998) Meth. Enzymol. 292:131-163).
Transport of fatty acids across the plasma membrane can occur by diffusion, a high capacity, low affinity process. However, under normal physiological conditions a significant fraction of fatty acid transport appears to occur via a high affinity, low capacity protein-mediated transport process. Fatty acid transport protein (FATP), an integral membrane protein with four transmembrane segments, is expressed in tissues exhibiting high levels of plasma membrane fatty acid flux, such as muscle, heart, and adipose. Expression of FATP is upregulated in 3T3-L1 cells during adipose conversion, and expression in COS7 fibroblasts elevates uptake of long-chain fatty acids (Hui, T.Y. et al. (1998) J. Biol. Chem. 273:27420-27429). Ion Channels
The electrical potential of a cell is generated and maintained by controlling the movement of ions across the plasma membrane. The movement of ions requires ion channels, which form an ion- selective pore within the membrane. There are two basic types of ion channels, ion ttansporters and gated ion channels. Ion transporters utilize the energy obtained from ATP hydrolysis to actively transport an ion against the ion's concentration gradient. Gated ion channels allow passive flow of an ion down the ion's electrochemical gradient under restricted conditions. Together, these types of ion channels generate, maintain, and utilize an electrochemical gradient that is used in 1) electrical impulse conduction down the axon of a nerve cell, 2) transport of molecules into cells against concentration gradients, 3) initiation of muscle contraction, and 4) endocrine cell secretion. Ion ttansporters generate and maintain the resting electrical potential of a cell. Utilizing the
energy derived from ATP hydrolysis, they transport ions against the ion's concentration gradient. These transmembrane ATPases are divided into three families. The phosphorylated (P) class ion transporters, including Na+-K+ ATPase, Ca2+-ATPase, and H+-ATPase, are activated by a phosphorylation event. P-class ion ttansporters are responsible for maintaining resting potential distributions such that cytosolic concentrations of Na+ and Ca2+ are low and cytosolic concentration of K+ is high. The vacuolar (V) class of ion transporters includes H+ pumps on intracellular organelles, such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within the lumen of these organelles that is required for function. The coupling factor (F) class consists of H+ pumps in the mitochondria. F-class ion ttansporters utilize a proton gradient to generate ATP from ADP and inorganic phosphate (P.).
The resting potential of the cell is utilized in many processes involving carrier proteins and gated ion channels. Carrier proteins utilize the resting potential to transport molecules into and out of the cell. Amino acid and glucose transport into many cells is linked to sodium ion co-transport (symport) so that the movement of Na+ down an electrochemical gradient drives transport of the other molecule up a concentration gradient. Similarly, cardiac muscle links ttansfer of Ca 2+ out of the cell with transport of Na+ into the cell (antiport).
Ion channels share common structural and mechanistic themes. The channel consists of four or five subunits or protein monomers that are arranged like a barrel in the plasma membrane. Each subunit typically consists of six potential transmembrane segments (SI, S2, S3, S4, S5, and S6). The center of the barrel forms a pore lined by α-helices or β-sttands. The side chains of the amino acid residues comprising the α-helices or β-sttands establish the charge (cation or anion) selectivity of the channel. The degree of selectivity, or what specific ions are allowed to pass through the channel, depends on the diameter of the narrowest part of the pore.
Gated ion channels control ion flow by regulating the opening and closing of pores. These channels are categorized according to the manner of regulating the gating function. Mechanically- gated channels open pores in response to mechanical stress, voltage-gated channels open pores in response to changes in membrane potential, and ligand-gated channels open pores in the presence of a specific ion, nucleotide, or neurottansmitter.
Voltage-gated Na+ and K+ channels are necessary for the function of electrically excitable cells, such as nerve and muscle cells. Action potentials, which lead to neurottansmitter release and muscle contraction, arise from large, transient changes in the permeability of the membrane to Na+ and K+ ions. Depolarization of the membrane beyond the threshold level opens voltage-gated Na+ channels. Sodium ions flow into the cell, further depolarizing the membrane and opening more voltage-gated Na+ channels, which propagates the depolarization down the length of the cell. Depolarization also opens voltage-gated potassium channels. Consequently, potassium ions flow
outward, which leads to repolarization of the membrane. Voltage-gated channels utilize charged residues in the fourth transmembrane segment (S4) to sense voltage change. The open state lasts only about 1 millisecond, at which time the channel spontaneously converts into an inactive state that cannot be opened irrespective of the membrane potential. Inactivation is mediated by the channel's N-terminus, which acts as a plug that closes the pore. The transition from an inactive to a closed state requires a return to resting potential.
Voltage-gated Na+ channels are heterotrimeric complexes composed of a 260 kDa pore forming α subunit that associates with two smaller auxiliary subunits, βl and β2. The β2 subunit is an integral membrane glycoprotein that contains an extracellular Ig domain, and its association with α and βl subunits correlates with increased functional expression of the channel, a change in its gating properties, and an increase in whole cell capacitance due to an increase in membrane surface area. (Isom, L.L. et al. (1995) Cell 83:433-442.)
Voltage-gated Ca2+ channels are involved in presynaptic neurotransmitter release, and heart and skeletal muscle contraction. The voltage-gated Ca2+ channels from skeletal muscle (L-type) and brain (N-type) have been purified, and though their functions differ dramatically, they have similar subunit compositions. The channels are composed of three subunits. The ax subunit forms the membrane pore and voltage sensor, while the α2δ and β subunits modulate the voltage-dependence, gating properties, and the current amplitude of the channel. These subunits are encoded by at least six α„ one a , and four β genes. A fourth subunit, γ, has been identified in skeletal muscle. (Walker, D. et al. (1998) J. Biol. Chem. 273:2361-2367; and Jay, S.D. et al. (1990) Science 248:490- 492.)
Chloride channels are necessary in endocrine secretion and in regulation of cytosolic and organelle pH. In secretory epithelial cells, Cl" enters the cell across a basolateral membrane through an Na+, K+/C1" cottansporter, accumulating in the cell above its electrochemical equilibrium concentration. Secretion of Cl " from the apical surface, in response to hormonal stimulation, leads to flow of Na+ and water into the secretory lumen. The cystic fibrosis transmembrane conductance regulator (CFTR) is a chloride channel encoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans. Loss of CFTR function decreases transepithelial water secretion and, as a result, the layers of mucus that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating "chronic obstructive pulmonary disease" (Al- Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266).
Many intracellular organelles contain H+-ATPase pumps that generate transmembrane pH and electrochemical differences by moving protons from the cytosol to the organelle lumen. If the membrane of the organelle is permeable to other ions, then the electrochemical gradient can be
abrogated without affecting the pH differential. In fact, removal of the electrochemical barrier allows more H+ to be pumped across the membrane, increasing the pH differential. Cl " is the sole counterion of H+ translocation in a number of organelles, including chromaffin granules, Golgi vesicles, lysosomes, and endosomes. Functions that require a low vacuolar pH include uptake of small molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as pro-hormones by proteolytic enzymes, and protein degradation in lysosomes (Al-Awqati, supra .
Ligand-gated channels open their pores when an extracellular or intracellular mediator binds to the channel. Neurottansmitter-gated channels are channels that open when a neurotransmitter binds to their extracellular domain. These channels exist in the postsynaptic membrane of nerve or muscle cells. There are two types of neurotransmitter-gated channels. Sodium channels open in response to excitatory neurotransmitters, such as acetylcholine, glutamate, and serotonin. This opening causes an influx of Na+ and produces the initial localized depolarization that activates the voltage-gated channels and starts the action potential. Chloride channels open in response to inhibitory neurotransmitters, such as γ-aminobutyric acid (GABA) and glycine, leading to hyperpolarization of the membrane and the subsequent generation of an action potential.
Ligand-gated channels can be regulated by intracellular second messengers. Calcium- activated K+ channels are gated by internal calcium ions. In nerve cells, an influx of calcium during depolarization opens K+ channels to modulate the magnitude of the action potential (Ishi, T.M. et al. (1997) Proc. Natl. Acad. Sci. USA 94: 11651-11656). Cyclic nucleotide-gated (CNG) channels are gated by cytosolic cyclic nucleotides. The best examples of these are the cAMP-gated Na+ channels involved in olfaction and the cGMP-gated cation channels involved in vision. Both systems involve ligand-mediated activation of a G-protein coupled receptor which then alters the level of cyclic nucleotide within the cell. Ion channels are expressed in a number of tissues where they are implicated in a variety of processes. CNG channels, while abundantly expressed in photoreceptor and olfactory sensory cells, are also found in kidney, lung, pineal, retinal ganglion cells, testis, aorta, and brain. Calcium- activated K+ channels may be responsible for the vasodilatory effects of bradykinin in the kidney and for shunting excess K+ from brain capillary endothelial cells into the blood. They are also implicated in repolarizing granulocytes after agonist-stimulated depolarization (Ishi, supra). Ion channels have been the target for many drug therapies. Neurotransmitter-gated channels have been targeted in therapies for treatment of insomnia, anxiety, depression, and schizophrenia. Voltage-gated channels have been targeted in therapies for arrhythmia, ischemic stroke, head trauma, and neurodegenerative disease (Taylor, CP. and L.S. Narasimhan (1997) Adv. Pharmacol. 39:47-98). Disease Correlation
The etiology of numerous human diseases and disorders can be attributed to defects in the transport of molecules across membranes. Defects in the trafficking of membrane-bound ttansporters and ion channels are associated with several disorders, e.g. cystic fibrosis, glucose-galactose malabsorption syndrome, hypercholesterolemia, von Gierke disease, and certain forms of diabetes mellitus. Single-gene defect diseases resulting in an inability to transport small molecules across membranes include, e.g., cystinuria, iminoglycinuria, Hartup disease, and Fanconi disease (van't Hoff, W.G. (1996) Exp. Nephrol. 4:253-262; Talente, GM. et al. (1994) Ann. Intern. Med. 120:218-226; and Chillon, M. et al. (1995) New Engl. J. Med. 332:1475-1480).
Protein Modification and Maintenance Molecules
The cellular processes regulating modification and maintenance of protein molecules coordinate their conformation, stabilization, and degradation. Each of these processes is mediated by key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and molecular chaperones. Proteases
Proteases cleave proteins and peptides at the peptide bond that forms the backbone of the peptide and protein chain. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Typical protein half- lives range from hours to a few days, so that within all living cells, precursor proteins are being cleaved to their active form, signal sequences proteolytically removed from targeted proteins, and aged or defective proteins degraded by proteolysis. Proteases function in bacterial, parasitic, and viral invasion and replication within a host. Four principal categories of mammalian proteases have been identified based on active site structure, mechanism of action, and overall three-dimensional structure. (Beynon, R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York NY, pp. 1-5).
The serine proteases (SPs) have a serine residue, usually within a conserved sequence, in an active site composed of the serine, an aspartate, and a histidine residue. SPs include the digestive enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting cascade, and enzymes that conttol exttaceUular protein degradation. The main SP sub-families are trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and serases which cleave after serine. Enterokinase, the initiator of intestinal digestion, is a serine protease found in the intestinal brush border, where it cleaves the acidic propeptide from trypsinogen to yield active trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA 91:7588-7592). Prolylcarboxypeptidase, a lysosomal serine peptidase that cleaves peptides such as angiotensin II and
UI and [des-Arg9] bradykinin, shares sequence homology with members of both the serine carboxypeptidase and prolylendopeptidase families (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 16638).
Cysteine proteases (CPs) have a cysteine as the major catalytic residue at an active site where catalysis proceeds via an intermediate thiol ester and is faciUtated by adjacent histidine and aspartic acid residues. CPs are involved in diverse cellular processes ranging from the processing of precursor proteins to intracellular degradation. Mammalian CPs include lysosomal cathepsins and cytosolic calcium activated proteases, calpains. CPs are produced by monocytes, macrophages and other cells of the immune system which migrate to sites of inflammation and secrete molecules involved in tissue repair. Overabundance of these repair molecules plays a role in certain disorders. In autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C degrades collagen, laminin, elastin and other structural proteins found in the exttacellular matrix of bones.
Aspartic proteases are members of the cathepsin family of lysosomal proteases and include pepsin A, gastticsin, chymosin, renin, and cathepsins D and E. Aspartic proteases have a pair of aspartic acid residues in the active site, and are most active in the pH 2 - 3 range, in which one of the aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial penicillopepsin, mammalian pepsin, renin, chymosin, and certain fungal proteases. Abnormal regulation and expression of cathepsins is evident in various inflammatory disease states. In cells isolated from inflamed synovia, the mRNA for sttomelysin, cytokines, TIMP-1, cathepsin, gelatinase, and other molecules is preferentially expressed. Expression of cathepsins L and D is elevated in synovial tissues from patients with rheumatoid arthritis and osteoarthritis. Cathepsin L expression may also contribute to the influx of mononuclear cells which exacerbates the destruction of the rheumatoid synovium. (Keyszer, G.M. (1995) Arthritis Rheum. 38:976-984.) The increased expression and differential regulation of the cathepsins are linked to the metastatic potential of a variety of cancers and as such are of therapeutic and prognostic interest (Chambers, A.F. et al. (1993) Crit. Rev. Oncog. 4:95-114).
MetaUoproteases have active sites that include two glutamic acid residues and one histidine residue that serve as binding sites for zinc. Carboxypeptidases A and B are the principal mammalian metalloproteases. Both are exoproteases of similar structure and active sites. Carboxypeptidase A, like chymotrypsin, prefers C-terminal aromatic and aliphatic side chains of hydrophobic nature, whereas carboxypeptidase B is directed toward basic arginine and lysine residues. Glycoprotease (GCP), or O-sialoglycoprotein endopeptidase, is a metallopeptidase which specifically cleaves O-sialoglycoproteins such as glycophorin A. Another metallopeptidase, placental leucine aminopeptidase (P-LAP) degrades several peptide hormones such as oxytocin and vasopressin,
suggesting a role in maintaining homeostasis during pregnancy, and is expressed in several tissues (Rogi, T. et al. (1996) J. Biol. Chem. 271:56-61).
Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that conttol cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. The UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal ttansduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79:13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183). j
Signal Peptidases
The mechanism for the translocation process into the endoplasmic reticulum (ER) involves the recognition of an N-terminal signal peptide on the elongating protein. The signal peptide directs the protein and attached ribosome to a receptor on the ER membrane. The polypeptide chain passes through a pore in the ER membrane into the lumen while the N-terminal signal peptide remains attached at the membrane surface. The process is completed when signal peptidase located inside the ER cleaves the signal peptide from the protein and releases the protein into the lumen. Protease Inhibitors Protease inhibitors and other regulators of protease activity conttol the activity and effects of proteases. Protease inhibitors have been shown to conttol pathogenesis in animal models of proteolytic disorders (Murphy, G. (1991) Agents Actions Suppl. 35:69-76). Low levels of the cystatins, low molecular weight inhibitors of the cysteine proteases, correlate with malignant progression of tumors. (Calkins, C. et al (1995) Biol. Biochem. Hoppe Seyler 376:71-80). Serpins are inhibitors of mammalian plasma serine proteases. Many serpins serve to regulate the blood clotting cascade and/or the complement cascade in mammals. Sp32 is a positive regulator of the mammalian acrosomal protease, acrosin, that binds the proenzyme, proacrosin , and thereby aides in packaging the enzyme into the acrosomal matrix (Baba, T. et al. (1994) J. Biol. Chem. 269:10133- 10140). The Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz domains" containing a series of cysteine residues that are regularly spaced over approximately 50
amino acid residues and form three intrachain disulfide bonds. Members of this family include aprotinin, tissue factor pathway inhibitor (TFPI-1 and TFPI-2), inter-α-trypsin inhibitor, and bikunin. (Marlor, C.W. et al. (1997) J. Biol. Chem. 272:12202-12208.) Members of this family are potent inhibitors (in the nanomolar range) against serine proteases such as kallikrein and plasmin. Aprotinin has clinical utility in reduction of perioperative blood loss.
A major portion of all proteins synthesized in eukaryotic cells are synthesized on the cytosolic surface of the endoplasmic reticulum (ER). Before these immature proteins are distributed to other organelles in the cell or are secreted, they must be transported into the interior lumen of the ER where post-translational modifications are performed. These modifications include protein folding and the formation of disulfide bonds, and N-linked glycosylations. Protein Isomerases
Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imidic bonds in oligopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation. The cyclophilins represent a major class of PPI that was originally identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226: 544-547). Protein Glycosylation
The glycosylation of most soluble secreted and membrane-bound proteins by oligosaccharides linked to asparagine residues in proteins is also performed in the ER. This reaction is catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. Although the exact purpose of this "N-linked" glycosylation is unknown, the presence of oligosaccharides tends to make a glycoprotein resistant to protease digestion. In addition, oligosaccharides attached to cell-surface proteins called selectins are known to function in cell-cell adhesion processes (Alberts, B. et al. (1994) Molecular Biology of the Cell. Garland Publishing Co., New York NY, p.608). "O-linked" glycosylation of proteins also occurs in the ER by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar residues to the first. This process is catalysed by a series of glycosylttansferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular Cell Biology, W.H. Freeman and Co., New York NY, pp.700-708). In many cases, both N- and O-linked oligosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the cell surface. An additional glycosylation mechanism operates in the ER specifically to target lysosomal
enzymes to lysosomes and prevent their secretion. Lysosomal enzymes in the ER receive an N- linked oligosaccharide, like plasma membrane and secreted proteins, but are then phosphorylated on one or two mannose residues. The phosphorylation of mannose residues occurs in two steps, the first step being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine phosphottansferase, and the second the removal of the N-acetylglucosamine group by phosphodiesterase. The phosphorylated mannose residue then targets the lysosomal enzyme to a mannose 6-phosphate receptor which transports it to a lysosome vesicle (Lodish, supra, pp. 708-711). Chaperones
Molecular chaperones are proteins that aid in the proper folding of immature proteins and refolding of improperly folded ones, the assembly of protein subunits, and in the transport of unfolded proteins across membranes. Chaperones are also called heat-shock proteins (hsp) because of their tendency to be expressed in dramatically increased amounts following brief exposure of cells to elevated temperatures. This latter property most likely reflects their need in the refolding of proteins that have become denatured by the high temperatures. Chaperones may be dividedϊnto several classes according to their location, function, and molecular weight, and include hsp60, TCPl, hsp70, hsp40 (also called DnaJ), and hsp90. For example, hsp90 binds to steroid hormone receptors, represses transcription in the absence of the ligand, and provides proper folding of the ligand-binding domain of the receptor in the presence of the hormone (Burston, S.G. and A.R. Clarke (1995) Essays Biochem. 29:125-136). Hsp60 and hsp70 chaperones aid in the transport and folding of newly synthesized proteins. Hsp70 acts early in protein folding, binding a newly synthesized protein before it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the folded protein. Hsp60, along with hsp 10, binds misfolded proteins and gives them the opportunity to refold correctly. All chaperones share an affinity for hydrophobic patches on incompletely folded proteins and the ability to hydrolyze ATP. The energy of ATP hydrolysis is used to release the hsp- bound protein in its properly folded state (Alberts, supra, pp 214, 571-572).
Nucleic Acid Synthesis and Modification Molecules
Polymerases
DNA and RNA replication are critical processes for cell replication and function. DNA and RNA replication are mediated by the enzymes DNA and RNA polymerase, respectively, by a "templating" process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA. However, there are fundamental differences between the two processes.
DNA polymerase catalyzes the stepwise addition of a deoxyribonucleotide to the 3' -OH end of a polynucleotide sttand (the primer sttand) that is paired to a second (template) sttand. The new
DNA sttand therefore grows in the 5' to 3' direction (Alberts, B. et al. (1994 The Molecular Biology of the Cell. Garland Publishing Inc., New York NY, pp. 251-254). The substrates for the polymerization reaction are the corresponding deoxynucleotide triphosphates which must base-pair with the correct nucleotide on the template strand in order to be recognized by the polymerase. Because DNA exists as a double-stranded helix, each of the two sttands may serve as a template for the formation of a new complementary sttand. Each of the two daughter cells of the dividing cell therefore inherits a new DNA double helix containing one old and one new sttand. Thus, DNA is said to be replicated "semiconservatively" by DNA polymerase. In addition to the synthesis of new DNA, DNA polymerase is also involved in the repair of damaged DNA as discussed below under "Ligases."
In contrast to DNA polymerase, RNA polymerase uses a DNA template sttand to "transcribe" DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside monophosphate to the 3' -OH end of a growing RNA chain. DNA transcription generates messenger RNAs (mRNA) that carry information for protein synthesis, as well as the transfer, ribosomal, and other RNAs that have structural or catalytic functions. In eukaryotes, three discrete RNA polymerases synthesize the three different types of RNA (Alberts, supra, pp. 367-368). RNA polymerase I makes the large ribosomal RNAs, RNA polymerase II makes the mRNAs that will be translated into proteins, and RNA polymerase DI makes a variety of small, stable RNAs, including 5S ribosomal RNA and the ttansfer RNAs (tRNA). In all cases, RNA synthesis is initiated by binding of the RNA polymerase to a promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is completed at a broad, general stop or termination region in the DNA where both the polymerase and the completed RNA chain are released. Ligases DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA are corrected before replication or transcription of the DNA can occur. Because of the efficiency of the DNA repair process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra, pp. 245-249). The three steps common to most types of DNA repair are (1) excision of the damaged or altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in this gap by DNA polymerase using the complementary sttand as the template; and (3) sealing the break left between the inserted nucleotide(s) and the existing DNA sttand by DNA ligase. In the last reaction, DNA ligase uses the energy from ATP hydrolysis to activate the 5' end of the broken phosphodiester bond before forming the new bond with the 3'-OH of the DNA strand. In Bloom's syndrome, an inherited human disease, individuals are partially deficient in DNA ligation and
consequently have an increased incidence of cancer (Alberts, supra, p. 247). Nucleases
Nucleases comprise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They serve different purposes in nucleic acid metabolism. Nucleases hydrolyze the phosphodiester bonds between adjacent nucleotides either at internal positions (endonucleases) or at the terminal 3' or 5' nucleotide positions (exonucleases). A DNA exonuclease activity in DNA polymerase, for example, serves to remove improperly paired nucleotides attached to the 3' -OH end of the growing DNA sttand by the polymerase and thereby serves a "proofreading" function. As mentioned above, DNA endonuclease activity is involved in the excision step of the DNA repair process. RNases also serve a variety of functions. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle. Pancreatic RNase secreted by the pancreas into the intestine hydrolyzes RNA present in ingested foods. RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to conttol tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Methylases
Methylation of specific nucleotides occurs in both DNA and RNA, and serves different functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in DNA occurs specifically at CG sequences which are base-paired with one another in the DNA double-helix. This pattern of methylation is passed from generation to generation during DNA replication by an enzyme called "maintenance methylase" that acts preferentially on those CG sequences that are base-paired with a CG sequence that is already methylated. Such methylation appears to distinguish active from inactive genes by preventing the binding of regulatory proteins that "turn on" the gene, but permit the binding of proteins that inactivate the gene (Alberts, supra, pp. 448- 451). In RNA metabolism, "tRNA methylase" produces one of several nucleotide modifications in tRNA that affect the conformation and base-pairing of the molecule and facUitate the recognition of the appropriate mRNA codons by specific tRNAs. The primary methylation pattern is the dimethylation of guanine residues to form N,N-dimethyl guanine. Helicases and Single-Stranded Binding Proteins
Helicases are enzymes that destabilize and unwind double helix structures in both DNA and RNA. Since DNA replication occurs more or less simultaneously on both strands, the two sttands must first separate to generate a replication "fork" for DNA polymerase to act on. Two types of replication proteins conttibute to this process, DNA helicases and single-sttanded binding proteins.
DNA helicases hydrolyze ATP and use the energy of hydrolysis to separate the DNA sttands. Single- sttanded binding proteins (SSBs) then bind to the exposed DNA strands without covering the bases, thereby temporarily stabilizing them for templating by the DNA polymerase (Alberts, supra, pp. 255- 256). RNA helicases also alter and regulate RNA conformation and secondary structure. Like the
DNA helicases, RNA helicases utilize energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most well-characterized and ubiquitous family of RNA helicases is the DEAD- box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family. Over 40 DEAD-box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. Overexpression of the DEAD-box 1 protein (DDXl) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168). These observations suggest that DDXl may promote or enhance tumor progression by altering the normal secondary structure and expression levels of RNA in cancer cells. Other DEAD-box helicases have been implicated either directly or indirectly in tumorigenesis (Discussed in Godbout, supra). For example, murine p68 is mutated in ultraviolet Ught-induced tumors, and human DDX6 is located at a chromosomal breakpoint associated with B-cell lymphoma. Similarly, a chimeric protein comprised of DDX10 and NUP98, a nucleoporin protein, may be involved in the pathogenesis of certain myeloid malignancies. Topoisomerases
Besides the need to separate DNA sttands prior to repUcation, the two strands must be "unwound" from one another prior to their separation by DNA helicases. This function is performed by proteins known as DNA topoisomerases. DNA topoisomerase effectively acts as a reversible nuclease that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two sttands to rotate freely about one another to remove the strain of the helix, and then rejoins the original phosphodiester bond between the two sttands. Two types of DNA topoisomerase exist, types I and II. DNA Topoisomerase I causes a single-strand break in a DNA helix to allow the rotation of the two sttands of the helix about the remaining phosphodiester bond in the opposite sttand. DNA topoisomerase II causes a transient break in both sttands of a DNA helix where two double helices cross over one another. This type of topoisomerase can efficiently separate two interlocked DNA circles (Alberts, supra, pp.260-262). Type II topoisomerases are largely confined to proliferating cells in eukaryotes, such as cancer cells. For this reason they are targets for anticancer drugs. Topoisomerase JJ has been implicated in multi-drug resistance (MDR) as it appears to aid in the
repair of DNA damage inflicted by DNA binding agents such as doxorubicin and vincristine. Recombinases
Genetic recombination is the process of rearranging DNA sequences within an organism's genome to provide genetic variation for the organism in response to changes in the environment. DNA recombination allows variation in the particular combination of genes present in an individual' s genome, as well as the timing and level of expression of these genes (see Alberts, supra, pp. 263- 273). Two broad classes of genetic recombination are commonly recognized, general recombination and site-specific recombination. General recombination involves genetic exchange between any homologous pair of DNA sequences usually located on two copies of the same chromosome. The process is aided by enzymes called recombinases that "nick" one sttand of a DNA duplex more or less randomly and permit exchange with the complementary sttand of another duplex. The process does not normally change the arrangement of genes on a chromosome. In site-specific recombination, the recombinase recognizes specific nucleotide sequences present in one or both of the recombining molecules. Base-pairing is not involved in this form of recombination and therefore does not require DNA homology between the recombining molecules. Unlike general recombination, this form of recombination can alter the relative positions of nucleotide sequences in chromosomes. Splicing Factors
Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove inttons. The primary RNA transcript from DNA is a faithful copy of the gene containing both exon and intron sequences, and the latter sequences must be cut out of the RNA ttanscript to produce an mRNA that codes for a protein. This "spUcing" of the mRNA sequence takes place in the nucleus with the aid of a large, multicomponent ribonucleoprotein complex known as a spliceosome. The spliceosomal complex is composed of five small nuclear ribonucleoprotein particles (snRNPs) designated Ul, U2, U4, U5, and U6, and a number of additional proteins. Each snRNP contains a single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base pair with intron consensus sequences. The protein components mediate spUceosome assembly and the splicing reaction. Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry. W.H. Freeman and Company, New York NY, p. 863).
Adhesion Molecules
The surface of a cell is rich in transmembrane proteoglycans, glycoproteins, glycolipids, and receptors. These macromolecules mediate adhesion with other cells and with components of the exttacellular matrix (ECM). The interaction of the cell with its surroundings profoundly influences
cell shape, strength, flexibility, motility, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proliferation and differentiation, tissue construction, and embryonic development. Cadherins Cadherins comprise a family of calcium-dependent glycoproteins that function in mediating cell-cell adhesion in virtually all solid tissues of multicellular organisms. These proteins share multiple repeats of a cadherin-specific motif, and the repeats form the folding units of the cadherin extracellular domain. Cadherin molecules cooperate to form focal contacts, or adhesion plaques, between adjacent epithelial cells. The cadherin family includes the classical cadherins and protocadherins. Classical cadherins include the E-cadherin, N-cadherin, and P-cadherin subfamilies. E-cadherin is present on many types of epithelial cells and is especially important for embryonic development. N-cadherin is present on nerve, muscle, and lens cells and is also critical for embryonic development. P-cadherin is present on cells of the placenta and epidermis. Recent studies report that protocadherins are involved in a variety of cell-cell interactions (Suzuki, S.T. (1996) J. Cell Sci. 109:2609-2611). The intracellular anchorage of cadherins is regulated by their dynamic association with catenins, a family of cytoplasmic signal ttansduction proteins associated with the actin cytoskeleton. The anchorage of cadherins to the actin cytoskeleton appears to be regulated by protein tyrosine phosphorylation, and the cadherins are the target of phosphorylation-induced junctional disassembly (Aberle, H. et al. (1996) J. Cell. Biochem. 61:514-523). Integrins
Integrins are ubiquitous transmembrane adhesion molecules that link the ECM to the internal cytoskeleton. Integrins are composed of two noncovalently associated transmembrane glycoprotein subunits called a and β. Integrins function as receptors that play a role in signal ttansduction. For example, binding of integrin to its exttacellular ligand may stimulate changes in intracellular calcium levels or protein kinase activity (Sjaastad, M.D. and W.J. Nelson (1997) BioEssays 19:47-55). At least ten cell surface receptors of the integrin family recognize the ECM component fibronectin, which is involved in many different biological processes including cell migration and embryogenesis (Johansson, S. et al. (1997) Front. Biosci. 2:D126-D146). Lectins Lectins comprise a ubiquitous family of exttacellular glycoproteins which bind cell surface carbohydrates specificaUy and reversibly, resulting in the agglutination of cells (reviewed in Drickamer, K. and M.E. Taylor (1993) Annu. Rev. Cell Biol. 9:237-264). This function is particularly important for activation of the immune response. Lectins mediate the agglutination and mitogenic stimulation of lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. Cell. Biochem. 45: 139-146; Paietta, E. et al. (1989) J. Immunol. 143:2850-2857).
Lectins are further classified into subfamiUes based on carbohydrate-binding specificity and other criteria. The galectin subfamily, in particular, includes lectins that bind β-galactoside carbohydrate moieties in a thiol-dependent manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. Chem. 270:3447-3453). Galectins are widely expressed and developmentally regulated. Because all galectins lack an N-terminal signal peptide, it is suggested that galectins are externalized through an atypical secretory mechanism. Two classes of galectins have been defined based on molecular weight and oligomerization properties. Small galectins form homodimers and are about 14 to 16 kilodaltons in mass, while large galectins are monomeric and about 29-37 kilodaltons.
Galectins contain a characteristic carbohydrate recognition domain (CRD). The CRD is about 140 amino acids and contains several stretches of about 1 - 10 amino acids which are highly conserved among all galectins. A particular 6-amino acid motif within the CRD contains conserved tryptophan and arginine residues which are critical for carbohydrate binding. The CRD of some galectins also contains cysteine residues which may be important for disulfide bond formation. Secondary structure predictions indicate that the CRD forms several β-sheets. Galectins play a number of roles in diseases and conditions associated with cell-cell and cell- matrix interactions. For example, certain galectins associate with sites of inflammation and bind to cell surface immunoglobulin E molecules, hi addition, galectins may play an important role in cancer metastasis. Galectin overexpression is correlated with the metastatic potential of cancers in humans and mice. Moreover, anti-galectin antibodies inhibit processes associated with cell transformation, such as cell aggregation and anchorage-independent growth (See, for example, Su, Z.-Z. et al. (1996) Proc. Natl. Acad. Sci. USA 93:7252-7257). Selectins
Selectins, or LEC-CAMs, comprise a specialized lectin subfamily involved primarily in inflammation and leukocyte adhesion (Reviewed in Lasky, supra). Selectins mediate the recruitment of leukocytes from the circulation to sites of acute inflammation and are expressed on the surface of vascular endothelial cells in response to cytokine signaling. Selectins bind to specific ligands on the leukocyte cell membrane and enable the leukocyte to adhere to and migrate along the endothelial surface. Binding of selectin to its ligand leads to polarized rearrangement of the actin cytoskeleton and stimulates signal ttansduction within the leukocyte (Brenner, B. et al. (199?) Biochem. Biophys. Res. Commun. 231:802-807; Hidari, K.I. et al. (1997) J. Biol. Chem. 272:28750-28756). Members of the selectin family possess three characteristic motifs: a lectin or carbohydrate recognition domain; an epidermal growth factor-like domain; and a variable number of short consensus repeats (ser or "sushi" repeats) which are also present in complement regulatory proteins. The selectins include lymphocyte adhesion molecule-1 (Lam-1 or L-selectin), endothelial leukocyte adhesion molecule-1 (ELAM-1 or E-selectin), and granule membrane protein-140 (GMP-140 or P-selectin)
(Johnston, G.I. et al. (1989) Cell 56:1033-1044).
Antigen Recognition Molecules
AU vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal, and parasitic infections. A key feature of the immune system is its ability to distinguish foreign molecules, or antigens, from "self molecules. This ability is mediated primarily by secreted and transmembrane proteins expressed by leukocytes (white blood cells) such as lymphocytes, granulocytes, and monocytes. Most of these proteins belong to the immunoglobulin (Ig) superfamily, members of which contain one or more repeats of a conserved structural domain. This Ig domain is comprised of antiparallel β sheets joined by a disulfide bond in an arrangement called the Ig fold. Members of the Ig superfamily include T-cell receptors, major histocompatibility (MHC) proteins, antibodies, and immune cell-specific surface markers such as CD4, CD8, and CD28.
MHC proteins are cell surface markers that bind to and present foreign antigens to T cells. MHC molecules are classified as either class I or class II. Class I MHC molecules (MHC I) are expressed on the surface of almost all cells and are involved in the presentation of antigen to cytotoxic T cells. For example, a cell infected with virus will degrade intracellular viral proteins and express the protein fragments bound to MHC I molecules on the cell surface. The MHC I/antigen complex is recognized by cytotoxic T-cells which destroy the infected cell and the virus within. Class II MHC molecules are expressed primarily on specialized antigen-presenting cells of the immune system, such as B-ceUs and macrophages. These cells ingest foreign proteins from the exttacellular fluid and express MHC II/antigen complex on the cell surface. This complex activates helper T-cells, which then secrete cytokines and other factors that stimulate the immune response. MHC molecules also play an important role in organ rejection following transplantation. Rejection occurs when the recipient' s T-cells respond to foreign MHC molecules on the ttansplanted organ in the same way as to self MHC molecules bound to foreign antigen. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of the Cell. Garland Publishing, New York NY, pp. 1229-1246.)
Antibodies, or immunoglobulins, are either expressed on the surface of B-ceUs or secreted by B-ceUs into the circulation. Antibodies bind and neutralize foreign antigens in the blood and other exttacellular fluids. The prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the , δ, e, γ, and μ H-chain types. There are two types of L- chains, K and λ, either of which may associate as a pair with any H-chain pair. IgG, the most
common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generally variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. The constant region consists of about 110 amino acids in L-chains and about 330 or 440 amino acids in H-chains. The amino acid sequence of the constant region is nearly identical among H- or L-chains of a particular class. The variable region consists of about 110. amino acids in both H- and L-chains. However, the amino acid sequence of the variable region differs among H- or L-chains of a particular class. Within each H- or L-chain variable region are three hypervariable regions of extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, the H- and L-chain hypervariable regions come together to form the antigen recognition site. (Reviewed in Alberts, supra, pp. 1206-1213 and 1216-1217.)
Both H-chains and L-chains contain repeated Ig. domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region.
The immune system is capable of recognizing and responding to any foreign molecule that enters the body. Therefore, the immune system must be armed with a full repertoire of antibodies against all potential antigens. Such antibody diversity is generated by somatic rearrangement of gene segments encoding variable and constant regions. These gene segments are joined together by site- specific recombination which occurs between highly conserved DNA sequences that flank each gene segment. Because there are hundreds of different gene segments, millions of unique genes can be generated combinatorially. In addition, imprecise joining of these segments and an unusually high rate of somatic mutation within these segments further contribute to the generation of a diverse antibody population.
T-cell receptors are both structurally and functionally related to antibodies. (Reviewed in Alberts, supra, pp. 1228-1229.) T-cell receptors are cell surface proteins that bind foreign antigens and mediate diverse aspects of the immune response. A typical T-cell receptor is a heterodimer comprised of two disulfide-linked polypeptide chains called α and β. Each chain is about 280 amino acids in length and contains one variable region and one constant region. Each variable or constant region folds into an Ig domain. The variable regions from the α and β chains come together in the heterodimer to form the antigen recognition site. T-cell receptor diversity is generated by somatic rearrangement of gene segments encoding the α and β chains. T-cell receptors recognize small peptide antigens that are expressed on the surface of antigen-presenting cells and pathogen-infected cells. These peptide antigens are presented on the cell surface in association with major
histocompatibility proteins which provide the proper context for antigen recognition.
Secreted and Extracellular Matrix Molecules
Protein secretion is essential for cellular function. Protein secretion is mediated by a signal peptide located at the amino terminus of the protein to be secreted. The signal peptide is comprised of about ten to twenty hydrophobic amino acids which target the nascent protein from the ribosome to the endoplasmic reticulum (ER). Proteins targeted to the ER may either proceed through the secretory pathway or remain in any of the secretory organelles such as the ER, Golgi apparatus, or lysosomes. Proteins that transit through the secretory pathway are either secreted into the exttacellular space or retained in the plasma membrane. Secreted proteins are often synthesized as inactive precursors that are activated by post-translational processing events during transit through the secretory pathway. Such events include glycosylation, proteolysis, and removal of the signal peptide by a signal peptidase. Other events that may occur during protein transport include chaperone-dependent unfolding and folding of the nascent protein and interaction of the protein with a receptor or pore complex. Examples of secreted proteins with amino terminal signal peptides include receptors, exttaceUular matrix molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of The Cell. Garland Publishing, New York NY, pp. 557- 560, 582-592.) The exttacellular matrix ECM) is a complex network of glycoproteins, polysaccharides, proteoglycans, and other macromolecules that are secreted from the cell into the exttacellular space. The ECM remains in close association with the cell surface and provides a supportive meshwork that profoundly influences cell shape, motility, strength, flexibiUty, and adhesion. In fact, adhesion of a cell to its surrounding matrix is required for cell survival except in the case of metastatic tumor cells, which have overcome the need for cell-ECM anchorage. This phenomenon suggests that the ECM plays a critical role in the molecular mechanisms of growth control and metastasis. (Reviewed in Ruoslahti, E. (1996) Sci. Am. 275:72-77.) Furthermore, the ECM determines the structure and physical properties of connective tissue and is particularly important for morphogenesis and other processes associated with embryonic development and pattern formation. The collagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, ligaments, tendons, cartilage, blood vessels, and basement membranes. Multiple collagen proteins have been identified. Three collagen molecules fold together in a triple heUx stabilized by interchain disulfide bonds. Bundles of these triple helices then associate to form fibrils. Collagen primary structure consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. Glycines are crucial to helix formation as the bulkier amino acid sidechains cannot fold into the
triple helical conformation. Because of these strict sequence requirements, mutations in collagen genes have severe consequences. Osteogenesis imperfecta patients have brittle bones that fracture easily; in severe cases patients die in utero or at birth. Ehlers-Danlos syndrome patients have hyperelastic skin, hypermobile joints, and susceptibility to aortic and intestinal rupture. Chondrodysplasia patients have short stature and ocular disorders. Alport syndrome patients have hematuria, sensorineural deafness, and eye lens deformation. (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, Inc., New York NY, pp. 2105-2117; and Creighton, T.E. (1984) Proteins, Structures and Molecular Principles, W.H. Freeman and Company, New York NY, pp. 191-197.) Elastin and related proteins confer elasticity to tissues such as skin, blood vessels, and lungs.
Elastin is a highly hydrophobic protein of about 750 amino acids that is rich in proline and glycine residues. Elastin molecules are highly cross-linked, forming an extensive exttacellular network of fibers and sheets. Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number of glycoproteins, including fibrillin. Mutations in the gene encoding fibrillin are responsible for Marfan's syndrome, a genetic disorder characterized by defects in connective tissue. In severe cases, the aortas of afflicted individuals are prone to rupture. (Reviewed in Alberts, supra, pp. 984- 986.)
Fibronectin is a large ECM glycoprotein found in all vertebrates. Fibronectin exists as a dimer of two subunits, each containing about 2,500 amino acids. Each subunit folds into a rod-like structure containing multiple domains. The domains each contain multiple repeated modules, the most common of which is the type ID fibronectin repeat. The type ID fibronectin repeat is about 90 amino acids in length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic proteins. Furthermore, some type ID fibronectin repeats contain a characteristic tripeptide consisting of Arginine-Glycine-Aspartic acid (RGD). The RGD sequence is recognized by the integrin family of cell surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene encoding fibronectin causes early embryonic lethality in mice. The mutant embryos display extensive morphological defects, including defects in the formation of the notochord, somites, heart, blood vessels, neural tube, and extraembryonic structures. (Reviewed in Alberts, supra, pp. 986-987.) Laminin is a major glycoprotein component of the basal lamina which underlies and supports epithelial cell sheets. Laminin is one of the first ECM proteins synthesized in the developing embryo. Laminin is an 850 kilodalton protein composed of three polypeptide chains joined in the shape of a cross by disulfide bonds. Laminin is especially important for angiogenesis and in particular, for guiding the formation of capillaries. (Reviewed in Alberts, supra, pp. 990-991.) There are many other types of proteinaceous ECM components, most of which can be
classified as proteoglycans. Proteoglycans are composed of unbranched polysaccharide chains (glycosaminoglycans) attached to protein cores. Common proteoglycans include aggrecan, betaglycan, decorin, perlecan, serglycin, and syndecan-1. Some of these molecules not only provide mechanical support, but also bind to extracellular signaling molecules, such as fibroblast growth factor and transforming growth factor β, suggesting a role for proteoglycans in cell-cell communication and cell growth. (Reviewed in Alberts, supra, pp. 973-978.) Likewise, the glycoproteins tenascin-C and tenascin-R are expressed in developing and lesioned neural tissue and provide stimulatory and anti-adhesive (inhibitory) properties, respectively, for axonal growth. (Faissner, A. (1997) Cell Tissue Res. 290:331-341.)
Cytoskeletal Molecules
The cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, and movement. The cytoskeleton supports the cell membrane and forms tracks along which organelles and other elements move in the cytosol. The cytoskeleton is a dynamic structure that allows cells to adopt various shapes and to carry out directed movements. Major cytoskeletal fibers include the microtubules, the microfilaments, and the intermediate filaments. Motor proteins, including myosin, dynein, and kinesin, drive movement of or along the fibers. The motor protein dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the cell membrane. Tubulins
Microtubules, cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell membrane that are necessary for sweeping materials across an epithelium and for swimming of sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along tracks of microtubules. For example, microtubules run through nerve cell axons, allowing bidirectional transport of materials and membrane vesicles between the cell body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during cell division. Both stable and short-lived populations of microtubules exist in the cell.
Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a heterodimer of - and β- tubulin, multiple isoforms of which exist. The hydrolysis of GTP is linked to the addition of tubulin subunits at the end of a microtubule. The subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule. A microtubule is
polarized, one end ringed with α-tubulin and the other with β-tubulin, and the two ends differ in their rates of assembly. Generally, each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found. Cilia and flagella contain doublet microtubules. Microtubules grow from specialized structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules. The basal body, the organizing center located at the base of a cilium or flagellum, contains one centtiole. Gamma tubulin present in the MTOC is important for nucleating the polymerization of a- and β- tubulin heterodimers but does not polymerize into microtubules. Microtubule-Associated Proteins Microtubule-associated proteins (MAPs) have roles in the assembly and stabilization of microtubules. One major family of MAPs, assembly MAPs, can be identified in neurons as well as non-neuronal cells. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type D. Type I MAPs, which include MAPIA and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes. Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neutralizes negatively charged tubulin, leading to stabilization of microtubules. MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolytically processed to generate one heavy chain and one light chain.
Another light chain, LC3, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during cell proliferation (Mann, S.S. et al. (1994) J. Biol. Chem. 269:11492-11497).
Type D MAPs, which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain. MAP2a, MAP2b, and MAP2c are found only in dendrites, MAP4 is found in non-neuronal cells, and Tau is found in axons and dendrites of nerve cells. Alternative splicing of the Tau mRNA leads to the existence of multiple forms of Tau protein. Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and familial frontotemporal dementia and Parkinsonism Unked to chromosome 17. The altered Tau phosphorylation leads to a collapse of the microtubule network and the formation of inttaneuronal Tau aggregates (Spillantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428-
433).
The protein pericentrin is found in the MTOC and has a role in microtubule assembly. Actins
Microfilaments, cytoskeletal filaments with a diameter of about 7-9 nm, are vital to cell locomotion, cell shape, cell adhesion, cell division, and muscle contraction. Assembly and disassembly of the microfilaments allow cells to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intracellular protein in the eukaryotic cell. Human cells contain six isoforms of actin. The three α-actins are found in different kinds of muscle, nonmuscle β-actin and nonmuscle γ-actin are found in nonmuscle cells, and another γ-actin is found in intestinal smooth muscle cells. G-actin, the monomeric form of actin, polymerizes into polarized, helical F-actin filaments, accompanied by the hydrolysis of ATP to ADP. Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine cell shape. These bundles and networks are connected to the ceU membrane. In muscle cells, thin filaments containing actin slide past thick filaments containing the motor protein myosin during conttaction. A family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin- Associated Proteins
Actin-associated proteins have roles in cross-linking, severing, and stabilization of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-linking proteins. These proteins have two actin-binding sites, one for each filament. Short cross-linking proteins promote bundle formation while longer, more flexible cross-linking proteins promote network formation. Calmodulin-like calcium-binding domains in actin cross-linking proteins allow calcium regulation of cross-linking. Group I cross-linking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin. Group II cross-linking proteins have a 7,000- MW actin-binding domain and include villin and dematin. Group III cross-linking proteins have pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dysttophin, ABP 120, and filamin.
Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin. Capping proteins can cap the ends of actin filaments, but cannot break filaments. Capping proteins include CapZ and ttopomoduUn. The proteins thymosin and profilin sequester actin monomers in the cytosol, allowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium. Intermediate Filaments and Associated Proteins
Intermediate filaments (IFs) are cytoskeletal fibers with a diameter of about 10 nm, intermediate between that of microfilaments and microtubules. IFs serve structural roles in the cell, reinforcing cells and organizing cells into tissues. IFs are particularly abundant in epidermal cells and in neurons. IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in cell motility.
Five types of JF proteins are known in mammals. Type I and Type D proteins are the acidic and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin IFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithelia such as nails and hair, and in epithelia that line internal body cavities. Mutations in keratin genes lead to epithelial diseases including epidermolysis bullosa simplex, bullous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of these diseases result in severe skin blistering. (See, e.g., Wawersik, M. et al. (1997) J. Biol. Chem. 272:32557-32565; and Corden L.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.) Type ID JF proteins include desmin, glial fibrillary acidic protein, vimentin, and peripherin.
Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting muscle. Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and astrocytes. Vimentin filaments are found in blood vessel endothelial cells, some epithelial cells, and mesenchymal cells such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organelles in place in the cell. Type IV IFs include the neurofilaments and nestin. Neurofilaments, composed of three polypeptides NF-L, NF-M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metabolism of neurofilaments are observed in neurodegenerative diseases including amyottophic lateral sclerosis, Parkinson's disease, and
Alzheimer's disease (Julien, J.P. and W.E. Mushynski (1998) Prog. Nucleic Acid Res. Mol. Biol. 61: 1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane.
IFs have a central α-helical rod region interrupted by short nonhelical linker segments. The rod region is bracketed, in most cases, by non-helical head and tail domains. The rod regions of intermediate filament proteins associate to form a coiled-coil dimer. A highly ordered assembly process leads from the dimers to the IFs. Neither ATP nor GTP is needed for IF assembly, unlike that of microfilaments and microtubules.
IF-associated proteins (IFAPs) mediate the interactions of IFs with one another and with other cell structures. IFAPs cross-link IFs into a bundle, into a network, or to the plasma membrane, and may cross-link IFs to the microfilament and microtubule cytoskeleton. Microtubules and DFs are
in particular closely associated. IFAPs include BPAG1, plakoglobin, desmoplakin I, desmoplakin D, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining cell shape and for muscle contraction. In erythrocytes, the specttin-actin cytoskeleton is attached to cell membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormally shaped cells which are more rapidly degraded by the spleen, leading to anemia. In platelets, the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin. In muscle cells the protein dysttophin links actin filaments to the plasma membrane; mutations in the dysttophin gene lead to Duchenne muscular dystrophy. In adherens junctions and adhesion plaques the peripheral membrane proteins α-actinin and vinculin attach actin filaments to the cell membrane. IFs are also attached to membranes by cytoskeletal-membrane anchors. The nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor. Vimentin IFs are attached to the plasma membrane by ankyrin and plectin. Desmosome and hemidesmosome membrane junctions hold together epithelial cells of organs and skin. These membrane junctions allow shear forces to be distributed across the entire epithelial cell layer, thus providing strength and rigidity to the epithelium. IFs in epithelial cells are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link DFs to hemidesmosomes are not known. Desmin IFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin. Mvosin-related Motor Proteins
Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle conttaction and intracellular movements such as phagocytosis and rearrangement of cell contents during mitotic cell division (cytokinesis). The contractile unit of skeletal muscle, termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber. Myosins are composed of one or two heavy chains and associated light chains. Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain binding, and a carboxy-terminal tail domain. The tail domains may associate to form an α-helical coUed coil. Conventional myosins, such as those found in muscle tissue, are composed of two myosin heavy-chain subunits, each associated with two light-chain subunits that bind at the neck region and play a regulatory role. Unconventional myosins, believed to function in intraceUular
motion, may contain either one or two heavy chains and associated light chains. There is evidence for about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional. Dvnein-related Motor Proteins
Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosolic and axonemal, have been identified. Cytosolic dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve terminal to the cell body and transport of endocytic vesicles to lysosomes. Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flagella and cilia. Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sliding force produces bending forces that cause the flagellum or cilium to beat. Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven by the hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and light chains. Kinesin-related Motor Proteins Kinesins are (+) end-directed motor proteins which act on microtubules. The prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function is particularly important for axonal transport in neurons. Kinesin is also important in all cell types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for maintaining the identity and functionality of these secretory organelles. Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement, and cellular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. Cell Biol. 6:63-68.) The prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two light polypeptide chains (KLCs). The KHC subunits are typically referred to as "kinesin." KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length. Two KHCs dimerize to form a rod-shaped molecule with three distinct regions of secondary structure. At one end of the molecule is a globular motor domain that functions in ATP hydrolysis and microtubule binding. Kinesin motor domains are highly conserved and share over 70% identity. Beyond the motor domain is an α-helical coiled-coil region which mediates dimerization. At the other end of the molecule is a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
Members of the more divergent subfamilies of kinesins are called kinesin-related proteins (KRPs), many of which function during mitosis in eukaryotes (Hoyt, supra). Some KRPs are required for assembly of the mitotic spindle. In vivo and in vitro analyses suggest that these KRPs
exert force on microtubules that comprise the mitotic spindle, resulting in the separation of spindle poles. Phosphorylation of KRP is required for this activity. Failure to assemble the mitotic spindle results in abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer cells. In addition, a unique KRP, centromere protein E, localizes to the kinetochore of human mitotic chromosomes and may play a role in their segregation to opposite spindle poles. Dynamin-related Motor Proteins
Dynamin is a large GTPase motor protein that functions as a "molecular pinchase," generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin-coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in neurons. Binding of dynamin to a membrane leads to dynamin's self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynamin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle. Release of GDP and inorganic phosphate leads to dynamin disassembly. Following disassembly the dynamin may either dissociate from the membrane or remain associated to the vesicle and be ttansported to another region of the cell. Three homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. Conserved dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin's GTPase activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on other proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the proline-rich domain. (See McNiven, M.A. (1998) Cell 94:151-154; Scaife, R.M. and R.L. Margolis (1997) Cell. Signal. 9:395-401.)
The cytoskeleton is reviewed in Lodish, H. et al. (1995) Molecular Cell Biology, Scientific American Books, New York NY.
Ribosomal Molecules
Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate messenger RNA into polypeptides. The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more than fifty proteins. The ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small). Ribosomal protein activities include binding rRNA and organizing the conformation of the junctions between rRNA helices (Woodson, S.A. and N.B. Leontis (1998) Curr. Opin. Struct. Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends Biochem. Sci. 23:208-212.) Three important sites are identified on the ribosome. The aminoacyl-
tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (The ribosome is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, pp. 888-908; and Lodish, H. et al. (1995) Molecular Cell Biology Scientific American Books, New York NY. pp. 119-138.)
Chromatin Molecules
The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are observed: euchromatin, some of which may be ttanscribed, and heterochromatin so densely packed that much of it is inaccessible to transcription. Chromatin packing thus serves to regulate protein expression in eukaryotes. Bacteria lack chromatin and the chromatin-packing level of gene regulation.
The fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by another class of histones, HI. Low molecular weight non-histone proteins called the high mobility group (HMG), associated with chromatin, may function in the unwinding of DNA and stabilization of single-sttanded DNA. Chromodomain proteins function in compaction of chromatin into its ttanscriptionally silent heterochromatin form. During mitosis, all DNA is compacted into heterochromatin and ttanscription ceases.
Transcription in interphase begins with the activation of a region of chromatin. Active chromatin is decondensed. Decondensation appears to be accompanied by changes in binding coefficient, phosphorylation and acetylation states of chromatin histones. HMG proteins HMG13 and HMG17 selectively bind activated chromatin. Topoisomerases remove superhelical tension on DNA. The activated region decondenses, allowing gene regulatory proteins and transcription factors to assemble on the DNA.
Patterns of chromatin structure can be stably inherited, producing heritable patterns of gene expression. In mammals, one of the two X chromosomes in each female cell is inactivated by condensation to heterochromatin during zygote development. The inactive state of this chromosome is inherited, so that adult females are mosaics of clusters of paternal-X and maternal-X clonal cell groups. The condensed X chromosome is reactivated in meiosis.
Chromatin is associated with disorders of protein expression such as thalassemia, a genetic anemia resulting from the removal of the locus conttol region (LCR) required for decondensation of the globin gene locus. For a review of chromatin structure and function see Alberts, B. et al. (1994) Molecular Cell
Biology, third edition, Garland Publishing, Inc., New York NY, pp. 351-354, 433-439.
Electron Transfer Associated Molecules
Electron carriers such as cytochromes accept electrons from NADH or FADH2 and donate them to other electton carriers. Most electton-ttansferring proteins, except ubiquinone, are prosthetic groups such as flavins, heme, FeS clusters, and copper, bound to inner membrane proteins. Adrenodoxin, for example, is an FeS protein that forms a complex with NADPH: adrenodoxin reductase and cytochrome p450. Cytochromes contain a heme prosthetic group, a porphyrin ring containing a tightly bound iron atom. Electron ttansfer reactions play a crucial role in cellular energy production.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initially converted to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C02 coupled by enzymes to the ttansport of electrons from NADH and FADH2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P-. Pyruvate is ttansported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including ttanssuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH,
FADH2, and GTP. In oxidative phosphorylation, the ttansfer of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P- by the FQFJ ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the F(Fl ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome cl5 FeS protein, and cytochrome c oxidase.
ATP synthesis requires membrane transport enzymes including the phosphate ttansporter and the ATP-ADP antiport protein. The ATP-binding casette (ABC) superfamily has also been suggested as belonging to the mitochondrial ttansport group (Hogue, D.L. et al. (1999) J. Mol. Biol. 285:379- 389). Brown fat uncoupling protein dissipates oxidative energy as heat, and may be involved the fever response to infection and trauma (Cannon, B. et al. (1998) Ann. NY Acad. Sci. 856:171-187). Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane. The outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of ttansport proteins that ttansfer only selected molecules. Mitochondria are the primary sites of energy
production in cells.
Mitochondria contain a small amount of DNA. Human mitochondrial DNA encodes 13 proteins, 22 tRNAs, and 2 rRNAs. Mitochondrial-DNA encoded proteins include NADH-Q reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits. Electron-transfer reactions also occur outside the mitochondria in locations such as the endoplasmic reticulum, which plays a crucial role in lipid and protein biosynthesis. Cytochrome b5 is a central electron donor for various reductive reactions occurring on the cytoplasmic surface of liver endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum (ER), and microbody membranes. For a review of mitochondrial metabolism and regulation, see Lodish, H. et al. (1995)
Molecular Cell Biology. Scientific American Books, New York NY, pp. 745-797 and Stryer (1995) Biochemistry. W.H. Freeman and Co., San Francisco CA, pp 529-558, 988-989.
The majority of mitochondrial proteins are encoded by nuclear genes, are synthesized on cytosolic ribosomes, and are imported into the mitochondria. Nuclear-encoded proteins which are destined for the mitochondrial matrix typically contain positively-charged amino terminal signal sequences. Import of these preproteins from the cytoplasm requires a multisubunit protein complex in the outer membrane known as the ttanslocase of outer rnitochondrial membrane (TOM; previously designated MOM; Pfanner, N. et al. (1996) Trends Biochem. Sci. 21:51-52) and at least three inner membrane proteins which comprise the ttanslocase of inner mitochondrial membrane (TIM; previously designated MTM; Pfanner, supra). An inside-negative membrane potential across the inner mitochondrial membrane is also required for preprotein import. Preproteins are recognized by surface receptor components of the TOM complex and are translocated through a proteinaceous pore formed by other TOM components. Proteins targeted to the matrix are then recognized by the import machinery of the TIM complex. The import systems of the outer and inner membranes can function independently (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218).
Once precursor proteins are in the mitochondria, the leader peptide is cleaved by a signal peptidase to generate the mature protein. Most leader peptides are removed in a one step process by a protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl. Acad. Sci. USA 90:5355-5358). In some cases a two-step process occurs in which MPP generates an intermediate precursor form which is cleaved by a second enzyme, mitochondrial intermediate peptidase, to generate the mature protein.
Mitochondrial dysfunction leads to impaired calcium buffering, generation of free radicals that may participate in deleterious intracellular and exttacellular processes, changes in mitochondrial permeability and oxidative damage which is observed in several neurodegenerative diseases. Neurodegenerative diseases linked to mitochondrial dysfunction include some forms of Alzheimer's
disease, Friedreich's ataxia, familial amyotrophic lateral sclerosis, and Huntington's disease (Beal, M.F. (1998) Biochim. Biophys. Acta 1366:211-213). The myocardium is heavily dependent on oxidative metabolism, so mitochondrial dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol 13:190-197). Mitochondria are implicated in disorders of cell proliferation, since they play an important role in a cell's decision to proliferate or self-destruct through apoptosis. The oncoprotein Bcl-2, for example, promotes cell proliferation by stabilizing mitochondrial membranes so that apoptosis signals are not released (Susin, S.A. (1998) Biochim. Biophys. Acta 1366:151-165).
Transcription Factor Molecules
Multicellular organisms are comprised of diverse cell types that differ dramatically both in structure and function. The identity of a cell is determined by its characteristic pattern of gene expression, and different cell types express overlapping but distinctive sets of genes throughout development. Spatial and temporal regulation of gene expression is critical for the control of cell proliferation, cell differentiation, apoptosis, and other processes that contribute to organismal development. Furthermore, gene expression is regulated in response to exttacellular signals that mediate cell-cell communication and coordinate the activities of different cell types. Appropriate gene regulation also ensures that cells function efficiently by expressing only those genes whose functions are required at a given time. Transcriptional regulatory proteins are essential for the conttol of gene expression. Some of these proteins function as ttanscription factors that initiate, activate, repress, or terminate gene ttanscription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IV. Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554- 570.)
The double helix structure and repeated sequences of DNA create topological and chemical features which can be recognized by ttanscription factors. These features are hydrogen bond donor and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated sttetches of sequence which induce distinct bends in the helix. Typically, ttanscription factors recognize specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent ttanscription factor-binding motifs may be required for gene regulation.
Many ttanscription factors incorporate DNA-binding structural motifs which comprise either α helices or β sheets that bind to the major groove of DNA. Four well-characterized structural motifs
are helix-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. The helix-turn-helix motif consists of two α helices connected at a fixed angle by a short chain of amino acids. One of the helices binds to the major groove. Helix-turn-helix motifs are exemplified by the homeobox motif which is present in homeodomain proteins. These proteins are critical for specifying the anterior-posterior body axis during development and are conserved throughout the animal kingdom. The Antennapedia and Ulttabithorax proteins of Drosophila melanogaster are prototypical homeodomain proteins (Pabo, CO. and RT. Sauer (1992) Annu. Rev.
Biochem. 61:1053-1095). The zinc finger motif, which binds zinc ions, generally contains tandem repeats of about 30 amino acids consisting of periodically spaced cysteine and histidine residues. Examples of this sequence pattern, designated C2H2 and C3HC4 ("RING" finger), have been described (Lewin, supra). Zinc finger proteins each contain an α helix and an antiparallel β sheet whose proximity and conformation are maintained by the zinc ion. Contact with DNA is made by the arginine prece ding the α helix and by the second, third, and sixth residues of the α helix. Variants of the zinc finger motif include poorly defined cysteine-rich motifs which bind zinc or other metal ions. These motifs may not contain histidine residues and are generally nonrepetitive.
The leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an amphipathic a helix. This structure provides the basis for dimerization of two leucine zipper proteins. The region adjacent to the leucine zipper is usually basic, and upon protein dimerization, is optimally positioned for binding to the major groove. Proteins containing such motifs are generally referred to as bZIP ttanscription factors.
The helix-loop-helix motif (HLH) consists of a short α helix connected by a loop to a longer cc heUx. The loop is flexible and allows the two helices to fold back against each other and to bind to DNA. The ttanscription factor Myc contains a prototypical HLH motif.
Most ttanscription factors contain characteristic DNA binding motifs, and variations on the above motifs and new motifs have been and are currently being characterized (Faisst, S. and S. Meyer
(1992) Nucleic Acids Res. 20:3-26).
Many neoplastic disorders in humans can be attributed to inappropriate gene expression. Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104).
Chromosomal ttanslocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene ttanscription, potentially contributing to malignancy. In addition, the immune system responds to infection or trauma by activating a cascade of
events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is well documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine. 13/e, McGraw Hill, Inc. and Teton Data Systems Software).
Furthermore, the generation of multicellular organisms is based upon the induction and coordination of cell differentiation at the appropriate stages of development. Central to this process is differential gene expression, which confers the distinct identities of cells and tissues throughout the body. Failure to regulate gene expression during development can result in developmental disorders. Human developmental disorders caused by mutations in zinc finger-type transcriptional regulators include: urogenenital developmental abnormalities associated with WT1; Greig cephalopolysyndactyly, Pallister-Hall syndrome, and postaxial polydactyly type A (GLI3); and Townes-Brocks syndrome, characterized by anal, renal, limb, and ear abnormalities (SALL1)
(Engelkamp, D. and V. van Heyningen (1996) Curr. Opin. Genet. Dev. 6:334-342; Kohlhase, J. et al. (1999) Am. J. Hum. Genet. 64:435-445).
Cell Membrane Molecules Eukaryotic cells are surrounded by plasma membranes which enclose the cell and maintain an environment inside the cell that is distinct from its surroundings. In addition, eukaryotic organisms are distinct from prokaryotes in possessing many inttacellular organelle and vesicle structures. Many of the metabolic reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry take place within these structures. The plasma membrane and the membranes surrounding organelles and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phospholipids, glycolipids, proteoglycans, and proteins. These components confer identity and functionality to the membranes with which they associate. Integral Membrane Proteins
The majority of known integral membrane proteins are ttansmembrane proteins (TM) which are characterized by an extracellular, a ttansmembrane, and an inttacellular domain. TM domains are typically comprised of 15 to 25 hydrophobic amino acids which are predicted to adopt an α-helical conformation. TM proteins are classified as bitopic (Types I and D) and polytopic (Types DI and IV) (Singer, SJ. (1990) Annu. Rev. CeU Biol. 6:247-296). Bitopic proteins span the membrane once while polytopic proteins contain multiple membrane-spanning segments. TM proteins function as cell-surface receptors, receptor-interacting proteins, ttansporters of ions or metabolites, ion channels,
cell anchoring proteins, and cell type-specific surface antigens.
Many membrane proteins (MPs) contain amino acid sequence motifs that target these proteins to specific subcellular sites. Examples of these motifs include PDZ domains, KDEL, RGD, NGR, and GSL sequence motifs, von Willebrand factor A (vWFA) domains, and EGF-like domains. RGD, NGR, and GSL motif-containing peptides have been used as drug delivery agents in targeted cancer treatment of tumor vasculature (Arap, W. et al. (1998) Science 279:377-380). Furthermore, MPs may also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that mediate interactions with exttacellular or intracellular molecules. G-Protein Coupled Receptors G-protein coupled receptors (GPCR) are a superfamily of integral membrane proteins which transduce exttacellular signals. GPCRs include receptors for biogenic amines, lipid mediators of inflammation, peptide hormones, and sensory signal mediators. The structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an exttacellular N-terminus, and a cytoplasmic C-terminus. Three exttacellular loops alternate with three inttacellular loops to link the seven transmembrane regions. Cysteine disulfide bridges connect the second and third exttacellular loops. The most conserved regions of GPCRs are the ttansmembrane regions and the first two cytoplasmic loops. A conserved, acidic-Arg-aromatic residue ttiplet present in the second cytoplasmic loop may interact with G proteins. A GPCR consensus pattern is characteristic of most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. and S. Arkinstall (1994) The G-protein Linked Receptor Facts Book. Academic Press, San Diego CA, pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been associated with neurological disorders such as schizophrenia, Parkinson's disease, Alzheimer's disease, drug addiction, and feeding disorders. Scavenger Receptors Macrophage scavenger receptors with broad ligand specificity may participate in the binding of low density lipoproteins (LDL) and foreign antigens. Scavenger receptors types I and D are trimeric membrane proteins with each subunit containing a small N-terminal inttacellular domain, a ttansmembrane domain, a large exttacellular domain, and a C-terminal cysteine-rich domain. The extracellular domain contains a short spacer region, an α-helical coiled-coil region, and a triple helical collagen-like region. These receptors have been shown to bind a spectrum of ligands, including chemically modified lipoproteins and albumin, polyribonucleotides, polysaccharides, phosphoUpids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; and Elomaa, O. et al. (1995) Cell 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial endotoxins, bacteria, and protozoa.
Tetraspan Family Proteins
The ttansmembrane 4 superfamily (TM4SF) or tetraspan family is a multigene family encoding type III integral membrane proteins (Wright, M.D. and M.G. Tomlinson (1994) Immunol. Today 15:588-594). The TM4SF is comprised of membrane proteins which traverse the cell membrane four times. Members of the TM4SF include platelet and endothelial cell membrane proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) Oncogene 9:1205-1211). Members of the TM4SF share about 25-30% amino acid sequence identity with one another. A number of TM4SF members have been implicated in signal transduction, conttol of cell adhesion, regulation of cell growth and proliferation, including development and oncogenesis, and cell motility, including tumor cell metastasis. Expression of TM4SF proteins is associated with a variety of tumors and the level of expression may be altered when cells are growing or activated. Tumor Antigens Tumor antigens are cell surface molecules that are differentially expressed in tumor cells relative to normal cells. Tumor antigens distinguish tumor cells immunologically from normal cells and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032). Leukocyte Antigens Other types of cell surface antigens include those identified on leukocytic cells of the immune system. These antigens have been identified using systematic, monoclonal antibody (mAb)- based "shot gun" techniques. These techniques have resulted in the production of hundreds of mAbs directed against unknown cell surface leukocytic antigens. These antigens have been grouped into "clusters of differentiation" based on common immunocytochemical localization patterns in various differentiated and undifferentiated leukocytic cell types. Antigens in a given cluster are presumed to identify a single cell surface protein and are assigned a "cluster of differentiation" or "CD" designation. Some of the genes encoding proteins identified by CD antigens have been cloned and verified by standard molecular biology techniques. CD antigens have been characterized as both ttansmembrane proteins and cell surface proteins anchored to the plasma membrane via covalent attachment to fatty acid-containing glycolipids such as glycosylphosphatidylinositol (GPI).
(Reviewed in Barclay, A.N. et al. (1995) The Leucocyte Antigen Facts Book, Academic Press, San Diego CA, pp. 17-20.) Ion Channels
Ion channels are found in the plasma membranes of virtually every cell in the body. For example, chloride channels mediate a variety of cellular functions including regulation of membrane
potentials and absorption and secretion of ions across epithelial membranes. Chloride channels also regulate the pH of organelles such as the Golgi apparatus and endosomes (see, e.g., Greger, R. (1988) Annu. Rev. Physiol. 50:111-122). Electrophysiological and pharmacological properties of chloride channels, including ion conductance, current-voltage relationships, and sensitivity to modulators, suggest that different chloride channels exist in muscles, neurons, fibroblasts, epithelial cells, and lymphocytes.
Many ion channels have sites for phosphorylation by one or more protein kinases including protein kinase A, protein kinase C, tyrosine kinase, and casein kinase II, all of which regulate ion channel activity in cells. Inappropriate phosphorylation of proteins in cells has been linked to changes in cell cycle progression and cell differentiation. Changes in the cell cycle have been linked to induction of apoptosis or cancer. Changes in cell differentiation have been linked to diseases and disorders of the reproductive system, immune system, skeletal muscle, and other organ systems. Proton Pumps
Proton ATPases comprise a large class of membrane proteins that use the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane. The resultant gradient may be used to transport other ions across the membrane (Na+, K+, or Cl") or to maintain organelle pH. Proton ATPases are further subdivided into the mitochondrial F-ATPases, the plasma membrane ATPases, and the vacuolar ATPases. The vacuolar ATPases establish and maintain an acidic pH within various organelles involved in the processes of endocytosis and exocytosis (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700).
Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorption of peptides using an electrochemical H+ gradient as the driving force. Another type of peptide ttansporter, the TAP transporter, is a heterodimer consisting of TAP 1 and TAP 2 and is associated with antigen processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by TAP so they can be expressed on the cell surface in association with MHC molecules. Each TAP protein consists of multiple hydrophobic membrane spanning segments and a highly conserved ATP-binding cassette (Boll, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:284-289). Pathogenic microorganisms, such as herpes simplex virus, may encode inhibitors of TAP-mediated peptide ttansport in order to evade immune surveillance (Marusina, K. and J J Manaco (1996) Curr. Opin. Hematol. 3:19-26). ABC Transporters
The ATP-binding cassette (ABC) ttansporters, also called the "traffic ATPases", comprise a superfamily of membrane proteins that mediate ttansport and channel functions in prokaryotes and eukaryotes (Higgins, CF. (1992) Annu. Rev. Cell Biol. 8:67-113). ABC proteins share a similar
overall structure and significant sequence homology. AU ABC proteins contain a conserved domain of approximately two hundred amino acid residues which includes one or more nucleotide binding domains. Mutations in ABC ttansporter genes are associated with various disorders, such as hyperbilirubinemia H/Dubin- Johnson syndrome, recessive Stargardt' s disease, X-linked adrenoleukodystrophy, multidrug resistance, celiac disease, and cystic fibrosis. Peripheral and Anchored Membrane Proteins
Some membrane proteins are not membrane-spanning but are attached to the plasma membrane via membrane anchors or interactions with integral membrane proteins. Membrane anchors are covalently joined to a protein post-translationally and include such moieties as prenyl, myristyl, and glycosylphosphatidyl inositol groups. Membrane localization of peripheral and anchored proteins is important for their function in processes such as receptor-mediated signal ttansduction. For example, prenylation of Ras is required for its localization to the plasma membrane and for its normal and oncogenic functions in signal ttansduction. Vesicle Coat Proteins Intercellular communication is essential for the development and survival of multicellular organisms. Cells communicate with one another through the secretion and uptake of protein signaling molecules. The uptake of proteins into the cell is achieved by the endocytic pathway, in which the interaction of exttaceUular signaling molecules with plasma membrane receptors results in the formation of plasma membrane-derived vesicles that enclose and transport the molecules into the cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) compartments. The secretion of proteins from the cell is achieved by exocytosis, in which molecules inside of the cell proceed through the secretory pathway. In this pathway, molecules transit from the ER to the Golgi apparatus and finally to the plasma membrane, where they are secreted from the cell. Several steps in the transit of material along the secretory and endocytic pathways require the formation of ttansport vesicles. Specifically, vesicles form at the ttansitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. Vesicle formation occurs when a region of membrane buds off from the donor organelle. The membrane-bound vesicle contains proteins to be ttansported and is surrounded by a proteinaceous coat, the components of which are recruited from the cytosol. Two different classes of coat protein have been identified. Clathrin coats form on vesicles derived from the TGN and PM, whereas coatomer (COP) coats form on vesicles derived from the ER and Golgi. COP coats can be further classified as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPD, involved in anterograde traffic from the ER to the Golgi (Mellman, supra). In clathrin-based vesicle formation, adapter proteins bring vesicle cargo and coat proteins
together at the surface of the budding membrane. Adapter protein- 1 and -2 select cargo from the TGN and plasma membrane, respectively, based on molecular information encoded on the cytoplasmic tail of integral membrane cargo proteins. Adapter proteins also recruit clathrin to the bud site. Clathrin is a protein complex consisting of three large and three small polypeptide chains arranged in a three-legged structure called a triskelion. Multiple ttiskelions and other coat proteins appear to self-assemble on the membrane to form a coated pit. This assembly process may serve to deform the membrane into a budding vesicle. GTP-bound ADP-ribosylation factor (Arf) is also incorporated into the coated assembly. Another small G-protein, dynamin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, thereby releasing the vesicle. The coated vesicle complex is then ttansported through the cytosol.
During the ttansport process, Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the ttansport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254).
Vesicles which bud from the ER and the Golgi are covered with a protein coat similar to the clathrin coat of endocytic and TGN vesicles. The coat protein (COP) is assembled from cytosolic precursor molecules at specific budding regions on the organelle. The COP coat consists of two major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. The coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane proteins. These include the KKXX retrieval motif of membrane proteins of the ER and dibasic/diphenylamine motifs of members of the p24 family. The p24 family of type I membrane proteins represent the major membrane proteins of COPI vesicles (Harter, C. and FT. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654).
Organelle Associated Molecules Eukaryotic cells are organized into various cellular organelles which has the effect of separating specific molecules and their functions from one another and from the cytosol. Within the cell, various membrane structures surround and define these organelles whUe allowing them to interact with one another and the cell environment through both active and passive ttansport processes. Important cell organelles include the nucleus, the Golgi apparatus, the endoplasmic reticulum, mitochondria, peroxisomes, lysosomes, endosomes, and secretory vesicles. Nucleus
The cell nucleus contains all of the genetic information of the cell in the form of DNA, and the components and machinery necessary for replication of DNA and for transcription of DNA into RNA. (See Alberts, B. et al. (1994) Molecular Biology of the CeU, Garland Publishing Inc., New York NY, pp. 335-399.) DNA is organized into compact structures in the nucleus by interactions
with various DNA-binding proteins such as histones and non-histone chromosomal proteins. DNA-specific nucleases, DNAses, partially degrade these compacted structures prior to DNA replication or transcription. DNA replication takes place with the aid of DNA helicases which unwind the double-stranded DNA helix, and DNA polymerases that duplicate the separated DNA sttands.
Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as ttanscription factors that initiate, activate, repress, or terminate gene ttanscription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IV. Oxford University Press, New York NY, and Cell Press, Cambridge MA, pp. 554-570.) Many transcription factors incorporate DNA-binding structural motifs which comprise either α helices or β sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-turn-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA.
Many neoplastic disorders in humans can be attributed to inappropriate gene expression. Malignant cell growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene ttanscription, potentially contributing to malignancy.
In addition, the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, amplification, and mobilization of cellular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is well documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine. 13/e, McGraw Hill, Inc. and Teton Data Systems Software).
Transcription of DNA into RNA also takes place in the nucleus catalyzed by RNA polymerases. Three types of RNA polymerase exist. RNA polymerase I makes large ribosomal RNAs, while RNA polymerase III makes a variety of small, stable RNAs including 5S ribosomal RNA and the ttansfer RNAs (tRNA). RNA polymerase D ttanscribes genes that will be translated
into proteins. The primary ttanscript of RNA polymerase II is called heterogenous nuclear RNA (hnRNA), and must be further processed by splicing to remove non-coding sequences called inttons. RNA spUcing is mediated by small nuclear ribonucleoprotein complexes, or snRNPs, producing mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into proteins. Nucleolus
The nucleolus is a highly organized subcompartment in the nucleus that contains high concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly (Alberts, et al. supra, pp. 379-382). Ribosomal RNA (rRNA) is a structural RNA that is complexed with proteins to form ribonucleoprotein structures called ribosomes. Ribosomes provide the platform on which protein synthesis takes place.
Ribosomes are assembled in the nucleolus initially from a large, 45S rRNA combined with a . variety of proteins imported from the cytoplasm, as well as smaller, 5S rRNAs. Later processing of , the immature ribosome results in formation of smaller ribosomal subunits which are ttansported from the nucleolus to the cytoplasm where they are assembled into functional ribosomes. Endoplasmic Reticulum hi eukaryotes, proteins are synthesized within the endoplasmic reticulum (ER), delivered from the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the Golgi to specific inttacellular and exttacellular destinations. Synthesis of integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organelle occurs on the rough endoplasmic reticulum (ER). The rough ER is so named because of the rough appearance in electton micrographs imparted by the attached ribosomes on which protein synthesis proceeds. Synthesis of proteins destined for the ER actually begins in the cytosol with the synthesis of a specific signal peptide which directs the growing polypeptide and its attached ribosome to the ER membrane where the signal peptide is removed and protein synthesis is completed. Soluble proteins destined for the ER lumen, for secretion, or for ttansport to the lumen of other organelles pass completely into the ER lumen. Transmembrane proteins destined for the ER or for other cell membranes are translocated across the ER membrane but remain anchored in the lipid bilayer of the membrane by one or more membrane-spanning α-helical regions. Translocated polypeptide chains destined for other organelles or for secretion also fold and assemble in the ER lumen with the aid of certain "resident" ER proteins. Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl- prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imide bonds in oligopeptides and proteins, is considered to govern one of the rate
limiting steps in the folding of many proteins to their final functional conformation. The cyclophilins represent a major class of PPI that was originally identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226:544-547). Molecular "chaperones" such as BiP (binding protein) in the ER recognize incorrectly folded proteins as well as proteins not yet folded into their final form and bind to them, both to prevent improper aggregation between them, and to promote proper folding.
The "N-linked" glycosylation of most soluble secreted and membrane-bound proteins by oligosacchrides linked to asparagine residues in proteins is also performed in the ER. This reaction is catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. Golgi Apparatus
The Golgi apparatus is a complex structure that lies adjacent to the ER in eukaryotic cells and serves primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra, pp. 600-610). Additional posttranslational processing, principally additional glycosylation, also occurs in the Golgi. Indeed, the Golgi is a major site of carbohydrate synthesis, including most of the glycosaminoglycans of the extracellular matrix. N-linked oligosaccharides, added to proteins in the ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- linked oligosaccharides. "O-linked" glycosylation of proteins also occurs in the Golgi by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar residues to the first. This process is catalyzed by a series of glycosylttansferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular Cell Biology. W.H. Freeman and Co., New York NY, pp.700- 708). In many cases, both N- and O-linked oligosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the cell surface.
The terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport (or secretory) vesicles destined for inttacellular compartments, such as lysosomes, bud off of the TGN. Other ttansport vesicles bud off containing proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Vacuoles
The vacuole system is a collection of membrane bound compartments in eukaryotic cells that functions in the processes of endocytosis and exocytosis. They include phagosomes, lysosomes, endosomes, and secretory vesicles. Endocytosis is the process in cells of internalizing nutrients, solutes or small particles (pinocytosis) or large particles such as internalized receptors, viruses, bacteria, or bacterial toxins (phagocytosis). Exocytosis is the process of transporting molecules to the
cell surface. It facilitates placement or localization of membrane-bound receptors or other membrane proteins and secretion of hormones, neurotransmitters, digestive enzymes, wastes, etc.
A common property of all of these vacuoles is an acidic pH environment ranging from approximately pH 4.5-5.0. This acidity is maintained by the presence of a proton ATPase that uses the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (Mellman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700). Eukaryotic vacuolar proton ATPase (vp-ATPase) is a multimeric enzyme composed of 3-10 different subunits. One of these subunits is a highly hydrophobic polypeptide of approximately 16 kDa that is similar to the proteolipid component of vp-ATPases from eubacteria, fungi, and plant vacuoles (Mandel, M. et al. (1988) Proc. Natl. Acad. Sci. USA 85:5521-5524). The 16 kDa proteolipid component is the major subunit of the membrane portion of vp-ATPase and functions in the ttansport of protons across the membrane. Lysosomes
Lysosomes are membranous vesicles containing various hydrolytic enzymes used for the controlled intracellular digestion of macromolecules. Lysosomes contain some 40 types of enzymes including proteases, nucleases, glycosidases, Upases, phospholipases, phosphatases, and sulfatases, all , of which are acid hydrolases that function at a pH of about 5. Lysosomes are surrounded by a unique membrane containing ttansport proteins that allow the final products of macromolecule degradation, such as sugars, amino acids, and nucleotides, to be ttansported to the cytosol where they may be either excreted or reutilized by the cell. A vp-ATPase, such as that described above, maintains the acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-611). Endosomes
Endosomes are another type of acidic vacuole that is used to transport substances from the cell surface to the interior of the cell in the process of endocytosis. Like lysosomes, endosomes have an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of endosomes are apparent based on ttacer uptake studies that distinguish their time of formation in the cell and their cellular location. Early endosomes are found near the plasma membrane and appear to function primarily in the recycling of internalized receptors back to the cell surface. Late endosomes appear later in the endocytic process close to the Golgi apparatus and the nucleus, and appear to be associated with delivery of endocytosed material to lysosomes or to the TGN where they may be recycled. Specific proteins are associated with particular ttansport vesicles and their target compartments that may provide selectivity in targeting vesicles to their proper compartments. A cytosolic prenylated GTP-binding protein, Rab, is one such protein. Rabs 4, 5, and 11 are associated with the early endosome, whereas Rabs 7 and 9 associate with the late endosome. Mitochondria Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded
inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane. The outer membrane contains many porin molecules that allow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that ttansfer only selected molecules. Mitochondria are the primary sites of energy production in cells.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initially converted to pyruvate in the cytoplasm. Fatty acids and pyruvate are ttansported to the mitochondria for complete oxidation to C02 coupled by enzymes to the ttansport of electrons from NADH and FADH2 to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P-. Pyruvate is ttansported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including ttanssuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH,
FADH2, and GTP. In oxidative phosphorylation, the ttansfer of electtons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P- by the Jc! 1 ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electton transport and ATP synthesis include the FQFI ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c15 FeS protein, and cytochrome c oxidase. Peroxisomes
Peroxisomes, like mitochondria, are a major site of oxygen utiUzation. They contain one or more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen atoms from specific organic substtates in an oxidative reaction that produces hydrogen peroxide (Alberts, supra, pp. 574-577). Catalase oxidizes a variety of substtates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of liver and kidney cells for detoxifying various toxic molecules that enter the bloodstream. Another major function of oxidative reactions in peroxisomes is the breakdown of fatty acids in a process called β oxidation, β oxidation results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted to acetyl CoA and exported to the cytosol for reuse in biosynthetic reactions.
Also like mitochondria, peroxisomes import their proteins from the cytosol using a specific signal sequence located near the C-terminus of the protein. The importance of this import process is evident in the inherited human disease Zellweger syndrome, in which a defect in importing proteins into perixosomes leads to a perixosomal deficiency resulting in severe abnormalities in the brain, liver, and kidneys, and death soon after birth. One form of this disease has been shown to be due to a
mutation in the gene encoding a perixosomal integral membrane protein called peroxisome assembly factor- 1.
The discovery of new human molecules satisfies a need in the art by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
SUMMARY OF THE INVENTION
The present invention relates to nucleic acid sequences comprising human diagnostic and therapeutic polynucleotides (dithp) as presented in the Sequence Listing. The dithp uniquely identify genes encoding human structural, functional, and regulatory molecules.
The invention provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the . polynucleotide of b); and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188. In another alternative, the polynucleotide comprises at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In another alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The invention further provides a composition for the detection of expression of human diagnostic and therapeutic polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide
sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d); and a detectable label. The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polyneucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of SEQ DD NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The method comprises a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof. The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 30 contiguous nucleotides. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 60 contiguous nucleotides.
The invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide selected from the group consisting of a} a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturaUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a
polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In one alternative, the invention provides a cell transformed with the recombinant polynucleotide. In another alternative, the invention provides a transgenic organism comprising the recombinant polynucleotide. 5 The invention also provides a method for producing a human diagnostic and therapeutic polypeptide, the method comprising a) culturing a cell under conditions suitable for expression of the human diagnostic and therapeutic polypeptide, wherein said cell is transformed with a recombinant polynucleotide, said recombinant polynucleotide comprising an isolated polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the
10 group consisting of SEQ ID NO: 1-188; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii) ; and v) an RNA equivalent of i) through iv), and b) recovering the human diagnostic and therapeutic polypeptide so expressed. The invention
15. additionally provides a method wherein the polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NO: 189-377.
The invention also provides an isolated human diagnostic and therapeutic polypeptide (DITHP) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188. The invention further provides a method of screening for
20 a test compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 189-377. The method comprises a) combining the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 189- 377 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 189-
25 377 to the test compound, thereby identifying a compound that specifically binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 189-377.
The invention further provides a microarray wherein at least one element of the microarray is an isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from
30 the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ DD NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The invention also provides a method for generating a ttanscript image of a sample which
35 contains polynucleotides. The method comprises a) labeling the polynucleotides of the sample, b)
contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
Additionally, the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b) ; and e) an RNA equivalent of a) through d). The method comprises a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound. The invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of i) through iv). Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-188; ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ DD NO: 1-188; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of ii); and v) an RNA equivalent of i) through iv), and alternatively, the target polynucleotide comprises a polynucleotide sequence of a fragment of a polynucleotide selected from the group consisting of i-v above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of
hybridization complex in the treated biological sample is indicative of toxicity of the test compound. The invention further provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. In one alternative, the invention provides an isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377.
The invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ D NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ D NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. In one alternative, the polynucleotide encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. In another alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ DD NO:l-188.
Additionally, the invention provides an isolated antibody which specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. The invention further provides a composition comprising a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a
polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189- 377, and a pharmaceutically acceptable excipient. In one embodiment, the composition comprises a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189- 377. The invention additionally provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
The invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample. In one alternative, the invention provides a composition comprising an agonist compound identified by the method and a pharmaceutically acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition. Additionally, the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample, hi one alternative, the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceutically acceptable excipient. hi another alternative, the invention provides a method of treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
The invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, b) a polypeptide
comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ DD NO: 189-377. The method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
DESCRIPTION OF THE TABLES
Table 1 shows the sequence identification numbers (SEQ DD NO:s) and template identification numbers (template DDs) corresponding to the polynucleotides of the present invention, along with the sequence identification numbers (SEQ DD NO:s) and open reading frame identification numbers (ORF DDs) corresponding to polypeptides encoded by the template DD, and the PFH (Protein Functional Hierarchy) classification of the polypeptides (PFH designation).
Table 2 shows the sequence identification numbers (SEQ DD NO:s) and template identification numbers (template DDs) corresponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template DDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the polynucleotide segments are indicated.
Table 4 shows the sequence identification numbers (SEQ DD NO:s) and template identification numbers (template DDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated. For TM domains, the membrane topology of the encoded polypeptide sequence is indicated as being transmembrane or on the cytosolic or non-
cytosolic side of the cell membrane or organelle.
Table 5 shows the sequence identification numbers (SEQ DD NO:s) and template identification numbers (template DDs) corresponding to the polynucleotides of the present invention, along with the component sequence identification spans (component spans) corresponding to each template. The component sequences, which were used to assemble the template sequences, are defined by the spans indicating the nucleotide positions along each template.
Table 6 shows the tissue distribution profiles for the templates of the invention. Table 7 shows the sequence identification numbers (SEQ DD NO:s) corresponding to the polypeptides of the present invention, along with the reading frames used to obtain the polypeptide segments, the lengths of the polypeptide segments, the "start" and "stop" nucleotide positions of the polynucleotide sequences used to define the encoded polypeptide segments, the GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
Table 8 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention. The first column of Table 8 lists analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
DETAILED DESCRIPTION OF THE INVENTION
Before the nucleic acid sequences and methods are presented, it is to be understood that this invention is not limited to the particular machines, methods, and materials described. Although particular embodiments are described, machines, methods, and materials similar or equivalent to these embodiments may be used to practice the invention. The preferred machines, methods, and materials set forth are not intended to limit the scope of the invention which is limited only by the appended claims.
The singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. AU technical and scientific terms have the meanings commonly understood by one of ordinary skill in the art. All publications are incorporated by reference for the purpose of describing and disclosing the cell lines, vectors, and methodologies which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.
Definitions
As used herein, the lower case "dithp" refers to a nucleic acid sequence, while the upper case "DITHP" refers to an amino acid sequence encoded by dithp. A "full-length" dithp refers to a nucleic acid sequence containing the entire coding region of a gene endogenously expressed in human tissue. "Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence. The present invention encompasses allelic dithp.
An "allelic variant" is an alternative form of the gene encoding DITHP. Allelic variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered. A gene may have none, one, or many allelic variants of its naturally occurring form. Common mutational changes which give rise to allelic variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
"Altered" nucleic acid sequences encoding DITHP include those sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as DITHP or a polypeptide with at least one functional characteristic of DITHP. Included within this definition are polymorphisms which may or may not be readily detectable using a particular oligonucleotide probe of the polynucleotide encoding DITHP, and improper or unexpected hybridization to allelic variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding DITHP. The encoded protein may also be "altered," and may contain deletions, insertions, or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent DITHP. Deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the biological or immunological activity of DITHP is retained. For example, negatively charged amino acids may include aspartic acid and glutamic acid, and positively charged amino acids may include lysine and arginine. Amino acids with uncharged polar side chains having similar hydrophilicity values may include: asparagine and glutamine; and serine and threonine. Amino acids with uncharged side chains having similar hydrophilicity values may include: leucine,
isoleucine, and valine; glycine and alanine; and phenylalanine and tyrosine.
"Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence.
"Amplification" refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies well known in the art.
"Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab')2, and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind DITHP polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal.
The term "aptamer" refers to a nucleic acid or oligonucleotide molecule that binds to a specific molecular target. Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by Exponential Enrichment), described in U.S. Patent No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial libraries. Aptamer compositions may be double-stranded or single-sttanded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-like molecules. The nucleotide components of an aptamer may have modified sugar groups (e.g., the 2'-OH group of a ribonucleotide may be replaced by 2'-F or 2'-NH2), which may improve a desired property, e.g., resistance to nucleases or longer lifetime in blood. Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system. Aptamers may be specifically cross-linked to their cognate ligands, e.g., by photo-activation of a cross-linker. (See, e.g., Brody, E.N. and L. Gold (2000) J. Biotechnol. 74:5-13.)
The term "intramer" refers to an aptamer which is expressed in vivo. For example, a vaccinia .virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl Acad. Sci. USA 96:3606-3610).
The term "spiegelmer" refers to an aptamer which includes L-DNA, L-RNA, or other left- handed nucleotide derivatives or nucleotide-like molecules. Aptamers containing left-handed nucleotides are resistant to degradation by naturally occurring enzymes, which normally act on substtates containing right-handed nucleotides. "Antisense sequence" refers to a sequence capable of specifically hybridizing to a target
sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having. modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
"Antisense technology" refers to any technology which relies on the specific hybridization of an antisense sequence to a target sequence.
A "bin" is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program. "Biologically active" refers to an amino acid sequence having a structural, regulatory, or biochemical function of a naturally occurring amino acid sequence.
"Clone joining" is a process for combining gene bins based upon the bins' containing sequence information from the same clone. The sequences may assemble into a primary gene transcript as well as one or more splice variants. "Complementary" describes the relationship between two single-sttanded nucleic acid sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3'-T-C-A-5').
A "component sequence" is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences. A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
"Conservative amino acid substitutions" are those substitutions that, when made, least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions. The table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions.
Original Residue Conservative Substitution
Ala Gly, Ser
Arg His, Lys
Asn Asp, Gin, His
Asp Asn, Glu Cys Ala, Ser
Gin Asn, Glu, His
Glu Asp, Gin, His
Gly Ala
His Asn, Arg, Gin, Glu
De Leu, Val
Leu Ue, Val
Lys Arg, Gin, Glu Met Leu, Ue
Phe His, Met, Leu, Trp, Tyr-
Ser Cys, Thr
Thr Ser, Val
Trp Phe, Tyr Tyr His, Phe, Trp Val Ue, Leu, Thr
Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent.
"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group.
"Differential expression" refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons may be carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample. The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
The term "modulate" refers to a change in the activity of DITHP. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional, or immunological properties of DITHP. "E-value" refers to the statistical probability that a match between two sequences occurred by chance.
"Exon shuffling" refers to the recombination of different coding regions (exons). Since an exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus allowing acceleration of the evolution of new protein functions.
A "fragment" is a unique portion of dithp or DITHP which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of the defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer,
antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments may be preferentially selected from certain regions of a molecule. For example, a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the Sequence Listing and the figures, may be encompassed by the present embodiments.
A fragment of dithp comprises a region of unique polynucleotide sequence that specifically identifies dithp, for example, as distinct from any other sequence in the same genome. A fragment of dithp is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish dithp from related polynucleotide sequences. The precise length of a fragment of dithp and the region of dithp to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
A fragment of DITHP is encoded by a fragment of dithp. A fragment of DITHP comprises a region of unique amino acid sequence that specifically identifies DITHP. For example, a fragment of DTTHP is useful as an immunogenic peptide for the development of antibodies that specifically recognize DTTHP. The precise length of a fragment of DTTHP and the region of DTTHP to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment. A "full length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" polypeptide.
"Hit" refers to a sequence whose annotation will be used to describe a given template. Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E-value.
"Homology" refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of a dithp or between a reference amino acid sequence and a fragment of a
DTTHP.
"Hybridization" refers to the process by which a sttand of nucleotides anneals with a complementary sttand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step. The defined hybridization
conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency.
Generally, stringency of hybridization is expressed with reference to the temperature under which the wash step is carried out. Generally, such wash temperatures are selected to be about 5°C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. An equation for calculating Tm and conditions for nucleic acid hybridization is well known and can be found in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual. 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; specifically see volume 2, chapter 9. High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%. Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 μg/ml. Useful variations on these conditions will be readily apparent to those skilled in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. Other parameters, such as temperature, salt concentration, and detergent concentration may be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skill in the art.
"Immunologically active" or "immunogenic" describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell lines.
"Immune response" can refer to conditions associated with inflammation, trauma, immune disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect cellular and systemic defense systems.
An "immunogenic fragment" is a polypeptide or oligopeptide fragment of DTTHP which is capable of eliciting an immune response when introduced into a living organism, for example, a mammal. The term "immunogenic fragment" also includes any polypeptide or oligopeptide fragment of DTTHP which can be useful in any of the antibody production methods disclosed herein or known in the art.
"Insertion" or "addition" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or residue, respectively, is added to the sequence.
"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal. "Microarray" is any arrangement of nucleic acids, amino acids, antibodies, etc., on a substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or an appropriate membrane.
"Linkers" are short sttetches of nucleotide sequence which may be added to a vector or a dithp to create restriction endonuclease sites to facilitate cloning. "Polylinkers" are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and HindDI) and those which provide blunt ends (e.g., EcoRV, SnaBI, and Stul).
"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be isolated from viruses or prokaryotic or eukaryotic cells. "Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be either double-stranded or single-sttanded, and can represent either the sense or antisense (complementary) sttand.
"Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used as, e.g., primers for PCR, and are usually chemically synthesized. "Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the ttanscription or expression of the coding sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
"Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA.
The phrases "percent identity" and "% identity", as applied to polynucleotide sequences, 5 refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
Percent identity between polynucleotide sequences may be determined using the default o parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in Higgins, D.G. and Sharp, P.M. (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) CABIOS 8:189-191. For pairwise alignments of polynucleotide sequences, the default parameters are 5 set as follows: Ktuple=2, gap penalty=5, window=4, and "diagonals saved"=4. The "weighted" residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polynucleotide sequence pairs.
Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment 0 Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence analysis programs including "BLASTN," that is used to determine alignment between a known polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called 5 "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences.
"BLAST 2 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/. The "BLAST 2 Sequences" tool can be used for both BLASTN and BLASTP (discussed below). BLAST programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use o BLASTN with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example: Matrix: BLOSUM62 Reward for match: 1 Penalty for mismatch: -2 5 Open Gap: 5 and Extension Gap: 2 penalties
Gap x drop-off: 50
Expect: 10
Word Size: 11
Filter: on Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ DD number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
The phrases "percent identity" and "% identity", as applied to polypeptide sequences, refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide.
Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program (described and referenced above). For pairwise alignments of polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple=l, gap penalty=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default residue weight table. As with polynucleotide alignments, the percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs.
Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) with BLASTP set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62
Open Gap: 11 and Extension Gap: 1 penalty Gap x drop-off: 50
Expect: 10
Word Size: 3
Filter: on
Percent identity may be measured over the length of an entire defined polypeptide sequence, 5 for example, as defined by a particular SEQ DD number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to 0 describe a length over which percentage identity may be measured.
"Post-translational modification" of a DTTHP may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu and the DITHP. 5 "Probe" refers to dithp or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. o The primer may then be extended along the target DNA sttand by a DNA polymerase enzyme.
Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also 5 be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the figures and Sequence Listing, may be used.
Methods for preparing and using probes and primers are described in the references, for 0 example Sambrook, J. et al., (1989, Molecular Cloning: A Laboratory Manual. 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY); Ausubel, F.M. et al., (1999, Short Protocols in Molecular Biology, 4th ed. Greene Publ. John Wiley & Sons Assoc. & Wiley-Intersciences, New York NY); and Innis, M. et al., (1990; PCR Protocols, A Guide to Methods and Applications. Academic Press, San Diego CA). PCR primer pairs can be derived from a known sequence, for example, by using
computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above. "Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are naturally associated.
A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra. The term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter
sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
"Regulatory element" refers to a nucleic acid sequence from nonttanslated regions of a gene, and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host proteins to carry out or regulate transcription or translation.
"Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substtates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substtate; and cleared cells or tissues or blots or imprints from such cells or tissues).
"Specific binding" or "specifically binding" refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
"Substitution" refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
"Substtate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles or capillaries. The substtate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
A "ttanscript image" or "expression profile" refers to the collective pattern of gene expression by a particular cell type or tissue under given conditions at a given time.
"Transformation" refers to a process by which exogenous DNA enters a recipient cell. Transformation may occur under natural or artificial conditions using various methods well known in 5 the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell being transformed.
"Transformants" include stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well 0 as cells which transiently express inserted DNA or RNA.
A "transgenic organism," as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art. The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of 5 the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, and plants and animals. The isolated DNA of the present invention can be o introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 5 at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using BLASTN with the "BLAST 2 Sequences" tool Version 2.0.9 (May- 07-1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater 0 sequence identity over a certain defined length. The variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties. A variant may be described as, for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" variant. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The 5 corresponding polypeptide may possess additional functional domains or lack domains that are
present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass 5 "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
In an alternative, variants of the polynucleotides of the present invention may be generated through recombinant methods. One possible method is a DNA shuffling technique such as i o MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number
5,837,458; Chang, C-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of DTTHP, such as its biological or enzymatic activity or its ability to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene
15 variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random
20 point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner.
25 A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using BLASTP with the "BLAST 2 Sequences" tool Version 2.0.9 (May- 07-1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least
30 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater identity over a certain defined length of one of the polypeptides.
THE INVENTION
In a particular embodiment, cDNA sequences derived from human tissues and cell lines were 35 aligned based on nucleotide sequence identity and assembled into "consensus" or "template"
sequences which are designated by the template identification numbers (template DDs) in column 2 of Table 2. The sequence identification numbers (SEQ DD NO:s) corresponding to the template DDs are shown in column 1. The template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is indicated by a probability score in column 4, and the functional annotation corresponding to each GenBank hit is listed in column 5.
The invention incorporates the nucleic acid sequences of these templates as disclosed in the Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in human molecules. The invention further utilizes these sequences in hybridization and amplification technologies, and in particular, in technologies which assess gene expression patterns correlated with specific cells or tissues and their responses in vivo or in vitto to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a ttanscript image for a particular cell or tissue.
Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines. The human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LTFESEQ database (Incyte Genomics, Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
Sequencing of the cDNAs
Methods for DNA sequencing are well known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. Biochemical Corporation, Cleveland OH), Taq polymerase (Applied Biosystems, Foster City CA),
thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA template of interest. Methods have been developed for the use of both single-sttanded and double- stranded templates. Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed. 0 Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid ttansfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, hie. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Applied Biosystems). Sequencing can be carried out using, for example, the ABI 373 or 377 (Applied Biosystems) or MEGABACE 1000 (Molecular Dynamics, hie. (Molecular Dynamics), Sunnyvale > 5 CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified o bases do not represent a hindrance to practicing the invention for those skilled in the art. Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology. John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.) 5
Assembly of cDNA Sequences
Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such o programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
Alternatively, cDNA sequences are used as "component" sequences that are assembled into "template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway 5 known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA).
A series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available. When additional sequences are added into the RDMS, a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves. After the new sequences have been assigned to templates, the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated. Once gene bins have been generated based upon sequence alignments, bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged. A resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the ' full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse ttanscription of the mRNA, or incomplete "second sttand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
Analysis of the cDNA Sequences The cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology. Wiley VCH, New York NY, pp. 856-853; and Table 8.) These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
Computer programs known to those of skill in the art for performing computer-assisted searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local Alignment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally
maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropriate search tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query dithp or DITHP of the present invention.
Other approaches to the identification, assembly, storage, and display of nucleotide and polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S. Patent Number 5,953,727; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incorporated by reference herein in their entirety.
Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S. Patent Number 6,023,659, incorporated herein by reference.
Identification of Human Diagnostic and Therapeutic Molecules Encoded by dithp
The identities of the DITHP encoded by the dithp of the present invention were obtained by analysis of the assembled cDNA and template sequences. Human molecules encoding DTTHP are classified by their GenBank annotation into a hierarchical classification system. Table 1, column 5 indicates the identities of DTTHP which correspond to the following Protein Functional Hierarchy (PFH) classification:
Sequences of Human Diagnostic and Therapeutic Molecules
The dithp of the present invention may be used for a variety of diagnostic and therapeutic purposes. For example, a dithp may be used to diagnose a particular condition, disease, or disorder 5 associated with human molecules. Such conditions, diseases, and disorders include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, o and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; an autoimmune/inflammatory disorder, such as inflammation, actinic keratosis, acquired immunodeficiency syndrome (ADDS), Addison's disease, adult respiratory distress syndrome, 5 allergies, anky losing spondylitis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal o hemoglobinuria, hepatitis, hypereosinophiUa, irritable bowel syndrome, episodic lymphopenia with lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sjogren' s syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, 5 primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and exttacorporeal circulation, trauma, and hematopoietic
cancer including lymphoma, leukemia, and myeloma; an infection caused by a viral agent classified as adenovirus, arenavirus, bunyavirus, calicivirus, coronavirus, filovirus, hepadnavirus, herpesvirus, flavivirus, orthomyxovirus, parvovirus, papovavirus, paramyxovirus, picornavirus, poxvirus, reovirus, retrovirus, rhabdovirus, or togavirus; an infection caused by a bacterial agent classified as pneumococcus, staphylococcus, streptococcus, bacillus, corynebacterium, clostridium, meningococcus, gonococcus, listeria, moraxella, kingella, haemophilus, legionella, bordetella, gram- negative enterobacterium including shigella, salmonella, or campylobacter, pseudomonas, vibrio, brucella, francisella, yersinia, bartonella, norcardium, actinomyces, mycobacterium, spirochaetale, rickettsia, chlamydia, or mycoplasma; an infection caused by a fungal agent classified as aspergillus, blastomyces, dermatophytes, cryptococcus, coccidioides, malasezzia, histoplasma, or other mycosis- causing fungal agent; and an infection caused by a parasite classified as plasmodium or malaria- causing, parasitic entamoeba, leishmania, trypanosoma, toxoplasma, pneumocystis carinii, intestinal protozoa such as giardia, ttichomonas, tissue nematode such as ttichinella, intestinal nematode such as ascaris, lymphatic filarial nematode, trematode such as schistosoma, and cesttode such as tapeworm; a developmental disorder such as renal tubular acidosis, anemia, Cushing's syndrome, achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tumor, aniridia, genitourinary abnormalities, and mental retardation), Smith-Magenis syndrome, myelodysplastic syndrome, hereditary mucoepithelial dysplasia, hereditary keratodermas, hereditary neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, hypothyroidism, hydrocephalus, seizure disorders such as Syndenham's chorea and cerebral palsy, spina bifida, anencephaly, craniorachischisis, congenital glaucoma, cataract, and sensorineural hearing loss; an endocrine disorder such as a disorder of the hypothalamus and/or pituitary resulting from lesions such as a primary brain tumor, adenoma, infarction associated with pregnancy, hypophysectomy, aneurysm, vascular malformation, thrombosis, infection, immunological disorder, and complication due to head ttauma; a disorder associated with hypopituitarism including hypogonadism, Sheehan syndrome, diabetes insipidus, Kallman's disease, Hand-SchuUer-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sella syndrome, and dwarfism; a disorder associated with hyperpituitarism including acromegaly, giantism, and syndrome of inappropriate antidiuretic hormone (ADH) secretion (SIADH) often caused by benign adenoma; a disorder associated with hypothyroidism including goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism; a disorder associated with hyperthyroidism including thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease; a disorder associated with hyperparathyroidism including Conn disease (chronic hypercalemia); a pancreatic disorder such as Type I or Type D diabetes
mellitus and associated complications; a disorder associated with the adrenals such as hyperplasia, carcinoma, or adenoma of the adrenal cortex, hypertension associated with alkalosis, amyloidosis, hypokalemia, Cushing's disease, Liddle's syndrome, and Arnold-Healy-Gordon syndrome, pheochromocytoma tumors, and Addison's disease; a disorder associated with gonadal steroid hormones such as: in women, abnormal prolactin production, infertility, endometriosis, perturbation of the menstrual cycle, polycystic ovarian disease, hyperprolactinemia, isolated gonadotropin deficiency, amenorrhea, galactorrhea, hermaphroditism, hirsutism and virilization, breast cancer, and, in post-menopausal women, osteoporosis; and, in men, Leydig cell deficiency, male climacteric phase, and germinal cell aplasia, a hypergonadal disorder associated with Leydig cell tumors, androgen resistance associated with absence of androgen receptors, syndrome of 5 α-reductase, and gynecomastia; a metabolic disorder such as Addison's disease, cerebrotendinous xanthomatosis, congenital adrenal hyperplasia, coumarin resistance, cystic fibrosis, diabetes, fatty hepatocirrhosis, fructose- 1,6-diphosphatase deficiency, galactosemia, goiter, glucagonoma, glycogen storage diseases, hereditary fructose intolerance, hyperadrenalism, hypoadrenalism, hyperparathyroidism, hypoparathyroidism, hypercholesterolemia, hyperthyroidism, hypoglycemia, hypothyroidism, hyperlipidemia, hyperlipemia, lipid myopathies, lipodysttophies, lysosomal storage diseases, mannosidosis, neuraminidase deficiency, obesity, pentosuria phenylketonuria, pseudovitamin D- deficiency rickets; disorders of carbohydrate metabolism such as congenital type II dyserythropoietic anemia, diabetes, insulin-dependent diabetes mellitus, non-insulin-dependent diabetes mellitus, fructose- 1,6-diphosphatase deficiency, galactosemia, glucagonoma, hereditary fructose intolerance, hypoglycemia, mannosidosis, neuraminidase deficiency, obesity, galactose epimerase deficiency, glycogen storage diseases, lysosomal storage diseases, fructosuria, pentosuria, and inherited abnormalities of pyruvate metabolism; disorders of lipid metabolism such as fatty liver, cholestasis, primary biliary cirrhosis, carnitine deficiency, carnitine palmitoylttansferase deficiency, myoadenylate deaminase deficiency, hypertriglyceridemia, lipid storage disorders such Fabry's disease, Gaucher's disease, Niemann-Pick' s disease, metachromatic leukodysttophy, adrenoleukodystrophy, GM2 gangliosidosis, and ceroid lipofuscinosis, abetalipoproteinemia, Tangier disease, hyperlipoproteinemia, diabetes melUtus, lipodysttophy, lipomatoses, acute panniculitis, disseminated fat necrosis, adiposis dolorosa, lipoid adrenal hyperplasia, minimal change disease, lipomas, atherosclerosis, hypercholesterolemia, hypercholesterolemia with hypertriglyceridemia, primary hypoalphalipoproteinemia, hypothyroidism, renal disease, liver disease, lecithin: cholesterol acyltransferase deficiency, cerebrotendinous xanthomatosis, sitosterolemia, hypocholesterolemia, Tay-Sachs disease, Sandhoff s disease, hyperlipidemia, hyperUpe ia, lipid myopathies, and obesity; and disorders of copper metabolism such as Menke's disease, WUson' s disease, and Ehlers-Danlos syndrome type IX; a neurological disorder such as epilepsy, ischemic cerebrovascular disease, sttoke,
cerebral neoplasms, Alzheimer's disease, Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal disorders, amyottophic lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural empyema, epidural abscess, suppurative intracranial thrombophlebitis, myelitis and radiculitis, viral central nervous system disease, prion diseases including kuru, Creutzfeldt-Jakob disease, and Gerstmann-Straussler-Scheinker syndrome, fatal familial insomnia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental disorder of the central nervous system, cerebral palsy, a neuroskeletal disorder, an autonomic nervous system disorder, a cranial nerve disorder, a spinal cord disease, muscular dysttophy and other neuromuscular disorder, a peripheral nervous system disorder, dermatomyositis and polymyositis, inherited, metabolic, endocrine, and toxic myopathy, myasthenia gravis, periodic paralysis, a mental disorder including mood, anxiety, and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, postherpetic neuralgia, and Tourette's disorder; a gastrointestinal disorder including ulcerative colitis, gastric and duodenal ulcers, cystinuria, dibasicaminoaciduria, hypercystinuria, lysinuria, hartnup disease, tryptophan malabsorption, methionine malabsorption, histidinuria, iminoglycinuria, dicarboxyUcaminoaciduria, cystinosis, renal glycosuria, hypouricemia, famUial hypophophatemic rickets, congenital chloridorrhea, distal renal tubular acidosis, Menkes' disease, Wilson's disease, lethal diarrhea, juvenile pernicious anemia, folate malabsorption, adrenoleukodysttophy, hereditary myoglobinuria, and Zellweger syndrome; a ttansport disorder such as akinesia, amyottophic lateral sclerosis, ataxia telangiectasia, cystic fibrosis, Becker's muscular dysttophy, Bell's palsy, Charcot- Marie Tooth disease, diabetes mellitus, diabetes insipidus, diabetic neuropathy, Duchenne muscular dysttophy, hyperkalemic periodic paralysis, normokalemic periodic paralysis, Parkinson's disease, malignant hyperthermia, multidrug resistance, myasthenia gravis, myotonic dysttophy, catatonia, tardive dyskinesia, dystonias, peripheral neuropathy, cerebral neoplasms, prostate cancer, cardiac disorders associated with ttansport, e.g., angina, bradyarrythmia, tachyarrythmia, hypertension, Long QT syndrome, myocarditis, cardiomyopathy, nemaline myopathy, centtonuclear myopathy, lipid myopathy, mitochondrial myopathy, thyrotoxic myopathy, ethanol myopathy, dermatomyositis, inclusion body myositis, infectious myositis, and polymyositis, neurological disorders associated with ttansport, e.g., Alzheimer's disease, amnesia, bipolar disorder, dementia, depression, epilepsy, Tourette's disorder, paranoid psychoses, and schizophrenia, and other disorders associated with ttansport, e.g., neurofibromatosis, postherpetic neuralgia, trigeminal neuropathy, sarcoidosis, sickle cell anemia, cataracts, infertiUty, pulmonary artery stenosis, sensorineural autosomal deafness,
hyperglycemia, hypoglycemia, Grave's disease, goiter, glucose-galactose malabsorption syndrome, hypercholesterolemia, Cushing's disease, and Addison's disease; and a connective tissue disorder such as osteogenesis imperfecta, Ehlers-Danlos syndrome, chondrodysplasias, Marfan syndrome, Alport syndrome, familial aortic aneurysm, achondroplasia, mucopolysaccharidoses, osteoporosis, osteopettosis, Paget's disease, rickets, osteomalacia, hyperparathyroidism, renal osteodystrophy, osteonecrosis, osteomyelitis, osteoma, osteoid osteoma, osteoblastoma, osteosarcoma, osteochondroma, chondroma, chondroblastoma, chondromyxoid fibroma, chondrosarcoma, fibrous cortical defect, nonossifying fibroma, fibrous dysplasia, fibrosarcoma, malignant fibrous histiocytoma, Ewing's sarcoma, primitive neuroectodermal tumor, giant cell tumor, osteoarthritis, rheumatoid arthritis, ankylosing spondyloarthritis, Reiter's syndrome, psoriatic arthritis, enteropathic arthritis, infectious arthritis, gout, gouty arthritis, calcium pyrophosphate crystal deposition disease, ganglion, synovial cyst, villonodular synovitis, systemic sclerosis, Dupuytten's contracture, hepatic fibrosis, lupus erythematosus, mixed connective tissue disease, epidermolysis bullosa simplex, bullous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic ' and epidermolytic palmoplantar keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. The dithp can be used to detect the presence of, or to quantify the amount of, a dithp-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is established. Alternatively, a polynucleotide complementary to a given dithp can inhibit or inactivate a therapeutically relevant gene related to the dithp.
Analysis of dithp Expression Patterns
The expression of dithp may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of dithp expression. For example, the level of expression of dithp may be compared among different cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at different developmental stages, or among cell types or tissues undergoing various treatments. This type of analysis is useful, for example, to assess the relative levels of dithp expression in fully or partially differentiated cells or tissues, to determine if changes in dithp expression levels are correlated with the development or progression of specific disease states, and to assess the response of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. Methods for the analysis of dithp expression are based on hybridization and amplification technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utilize, for example, microarrays, and PCR-based procedures.
Hybridization and Genetic Analysis
The dithp, their fragments, or complementary sequences, may be used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The dithp may be hybridized to naturally occurring or recombinant nucleic acid sequences under 5 appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the dithp allows for the detection of nucleic acid sequences, including genomic sequences, which are identical or related to the dithp of the Sequence Listing. Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ ID NO: 1-188 and tested for their ability to identify or amplify the target nucleic acid o sequence using standard protocols.
Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ DD NO: 1-188 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in "Definitions." 5 A probe for use in Southern or northern hybridization may be derived from a fragment of a dithp sequence, or its complement, that is up to several hundred nucleotides in length and is either , single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substtates containing dithp. Microarrays are particularly suitable for identifying the o presence of and detecting the level of expression for multiple genes of interest by examining gene expression correlated with, e.g., various stages of development, treatment with a drug or compound, or disease progression. An array analogous to a dot or slot blot may be used to arrange and link polynucleotides to the surface of a substtate using one or more of the following: mechanical (vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of 5 dithp and may be produced by hand or by using available devices, materials, and machines.
Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g., Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93: 10614-10619; Baldeschweiler et al. (1995) PCT application W095/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- o 2155; and Heller, M J. et al. (1997) U.S. Patent No. 5,605,662.)
Probes may be labeled by either PCR or enzymatic techniques using a variety of commercially available reporter molecules. For example, commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies). Alternatively, dithp may be cloned into commercially 5 available vectors for the production of RNA probes. Such probes may be ttanscribed in the presence
of at least one labeled nucleotide (e.g., 32P-ATP, Amersham Pharmacia Biotech).
Additionally the polynucleotides of SEQ DD NO: 1-188 or suitable fragments thereof can be used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures well known in the art, e.g., cDNA library screening, PCR amplification, etc. The molecular cloning of such full length cDNA sequences may employ the method of cDNA library screening with probes using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6. These procedures may also be employed with genomic libraries to isolate genomic sequences of dithp in order to analyze, e.g., regulatory elements. Genetic Mapping
Gene identification and mapping are important in the investigation and treatment of almost all conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder. For example, cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream, and diabetes may result when a particular individual's immune system is activated by an infection and attacks the insulin-producing cells of the pancreas. In some studies, Alzheimer's disease has been linked to a gene on chromosome 21; other studies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generally proceeds from genetic linkage analysis to physical mapping. As a condition is noted among members of a family, a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. (See, for example, Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) Occasionally, genetic markers and their locations are known from previous studies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMIM) World Wide Web" site.
In another embodiment of the invention, dithp sequences may be used to generate hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences. Either coding or noncoding sequences of dithp may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of a dithp coding sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes
(BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B J. (1991) Trends Genet. 7:149-154.)
Fluorescent in situ hybridization (FISH) may be correlated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of dithp on a physical chromosomal map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder. The dithp sequences may also be used to detect polymorphisms that are genetically linked to the inheritance of a particular condition, disease, or disorder. o In situ hybridization of chromosomal preparations and genetic mapping techniques, such as linkage analysis using established chromosomal markers, may be used for extending existing genetic maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the number or arm of the corresponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and 5 may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to llq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject o invention may also be used to detect differences in chromosomal architecture due to ttanslocation, inversion, etc., among normal, carrier, or affected individuals.
Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., ttanslocations or inversions) that may be correlated with disease. This process requires a physical map of the chromosomal region containing 5 the disease-gene of interest along with associated markers. A physical map is necessary for determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are well known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is o determined, the DNA from that region is obtained by consulting the catalog and selecting clones from that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
Diagnostic Uses 5 The dithp of the present invention may be used to design probes useful in diagnostic assays.
Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of dithp expression. Labeled probes developed from dithp sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, dithp, or fragments or oligonucleotides derived from dithp, may be used as primers in amplification steps prior to hybridization. The amount of hybridization complex formed is quantified and compared with standards for that cell or tissue. If dithp expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
The probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of dithp expression, or to evaluate the efficacy of a particular therapeutic treatment. The candidate probe may be identified from the dithp that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of an individual patient. In a typical process, standard expression is established by methods well known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods well known to those skilled in the art may be use to determine the significance of such therapeutic agents.
The polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA. The polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique DD database is established for an individual, positive identification of that individual can be made from extremely small tissue samples.
In a particular aspect, oligonucleotide primers derived from the dithp of the invention may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP)
and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from dithp are used to amplify DNA using the polymerase chain reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-sttanded form, and these differences are detectable using gel electrophoresis in non-denaturing gels, hi fSCCP, the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high-throughput equipment such as DNA sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP (isSNP), are capable of identifying polymorphisms by comparing the sequences of individual overlapping DNA fragments which assemble into a common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing errors using statistical models and automated analyses of DNA sequence chromatograms. In the alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). DNA-based identification techniques are critical in forensic technology. DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. (1992) PCR Technology. Freeman and Co., New York, NY). Similarly, polynucleotides of the present invention can be used as polymorphic markers. There is also a need for reagents capable of identifying the source of a particular tissue.
Appropriate reagents can comprise, for example, DNA probes or primers prepared from the sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue cultures for contamination. The polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, and as an antigen to elicit an immune response.
Disease Model Systems Using dithp
The dithp of the invention or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells. Such techniques are well known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent Number 5,175,383 and U.S. Patent Number 5,767,337.) For example, mouse ES cells,
such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphottansferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292). The vector integrates into the corresponding region of the host genome by homologous recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. The dithp of the invention may also be manipulated in vitro in ES cells derived from human blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
The dithp of the invention can also be used to create "knockin" humanized animals (pigs) or. transgenic animals (mice or rats) to model human disease. With knockin technology, a region of dithp is injected into animal ES cells, and the injected sequence integrates into the animal cell genome. Transformed cells are injected into blastulae, and the blastulae are implanted as described above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to overexpress dithp, resulting, e.g., in the secretion of DTTHP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
Screening Assays
DTTHP encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule. Examples of such molecules include antibodies, oligonucleotides, proteins (e.g., receptors), or small molecules.
Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et al., (1991) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor,
e.g., the active site, hi either case, the molecule can be rationally designed using known techniques.
Preferably, the screening for these molecules involves producing appropriate cells which express the polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from mammals, yeast, Drosophila, or E. coli. Cells expressing the polypeptide or cell membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label.
Alternatively, the assay may assess binding in the presence of a labeled competitor. Additionally, the assay can be carried out using cell-free preparations, polypeptide/molecule affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard. Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
AU of the above assays can be used in a diagnostic or prognostic context. The molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the assays can discover agents which may inhibit or enhance the production of the polypeptide from suitably manipulated cells or tissues.
Transcript Imaging and Toxicological Testing Another embodiment relates to the use of dithp to develop a transcript image of a tissue or cell type. A ttanscript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incorporated by reference herein.) Thus a ttanscript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type. In one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray. The resultant ttanscript image would provide a profile of gene activity pertaining to human molecules for diagnostics and therapeutics.
Transcript images which profile dithp expression may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples. The ttanscript image may thus reflect dithp expression in vivo, as in the case of a tissue or biopsy sample, or in vitto, as in the case of a cell line. Transcript images which profile dithp expression may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N.L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties. These finge rints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in inteφretation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatures which leads to prediction of toxicity. (See, for example, Press Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 2000, available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore, it is important and desirable in toxicological screening using toxicant signatures to include all expressed gene sequences.
In one embodiment, the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that ttanscript levels corresponding to the polynucleotides of the present invention may be quantified. The ttanscript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
Another particular embodiment relates to the use of DITHP encoded by polynucleotides of the present invention to analyze the proteome of a tissue or cell type. The term proteome refers to the global pattern of protein expression in a particular tissue or cell type. Each protein component of a proteome can be subjected individually to further analysis. Proteome expression patterns, or profiles, are analyzed by quantifying the number of expressed proteins and their relative abundance under
given conditions and at a given time. A profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type. In one embodiment, the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelecttic focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra). The proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot is generally proportional to the level of the protein in the sample. The optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment. The proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry. The identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
A proteomic profile may also be generated using antibodies specific for DITHP to quantify . the levels of DTTHP expression. In one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and 0 detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-11; Mendoze, L.G. et al. (1999) Biotechniques 27:778-88). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element. 5 Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level. There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the ttanscript image, but which o alter the proteomic profile. In addition, the analysis of ttanscripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated 5 biological sample are separated so that the amount of each protein can be quantified. The amount of
each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the tteated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the DTTHP encoded by polynucleotides of the present invention.
In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the DTTHP encoded by polynucleotides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the tteated sample.
Transcript images may be used to profile dithp expression in distinct tissue types. This process can be used to determine human molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of dithp expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor the efficacy of drug treatments for diseases which affect the activity of human molecules.
Transcript images of cell lines can be used to assess human molecule activity and/or to identify cell lines that lack or misregulate this activity. Such cell lines may then be tteated with pharmaceutical agents, and a ttanscript image following treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in human molecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated ttanscript images with those of pharmaceutical agents of known effectiveness.
Antisense Molecules
The polynucleotides of the present invention are useful in antisense technology. Antisense technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics. Humana Press Inc., Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3): 171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40:1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12): 1055-1063; and Lavrosky, Y. et al. (1997) Biochem. Mol. Med. 62(1): 11-22.) An antisense sequence is a polynucleotide sequence capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences
bind to cellular mRNA and or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. (1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression occurs through hybridization or binding of complementary base pairs. Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double helix.
The polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by dithp. The antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (Applied Biosystems) or other automated systems known in the art. Antisense sequences can also be produced biologically, such as by transforming an appropriate host cell with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.)
In therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used. Antisense sequences can be delivered inttacellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein. (See, e.g., Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K.J., et al. (1995) 9(13): 1288-1296.) Antisense sequences can also be introduced inttacellularly through the use of viral vectors, such as rettovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 76:271; Ausubel, F.M. et al. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, New York NY; Uckert, W. and W. Walther (1994) Pharmacol. Ther. 63(3):323-347.) Other gene delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems known in the art. (See, e.g., Rossi, J.J. (1995) Br. Med. Bull. 51(l):217-225; Boado, RJ. et al. (1998) J. Pharm. Sci. 87(11): 1308-1315; and Morris, M.C. et al. (1997) Nucleic Acids Res. 25(14):2730- 2736.)
Expression
Jn order to express a biologically active DITHP, the nucleotide sequences encoding DTTHP or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational conttol of the inserted coding sequence in a suitable host. Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding DTTHP and appropriate transcriptional and translational control elements. These methods include in vitto recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra, Chapters 4, 8,
16, and 17; and Ausubel, supra, Chapters 9, 10, 13, and 16.)
A variety of expression vector/host systems may be utilized to contain and express sequences encoding DITHP. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammalian) cell systems. (See, e.g., Sambrook, supra; Ausubel, 1995, supra. Van Heeke, G. and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, GA. et al. (1987) Methods Enzymol. 153:516-544; Scorer, CA. et al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; The McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New York NY, pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. USA 90(13):6340-6344; BuUer, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al. (1994) Mol. Immunol. 31(3):219-226; and Verma, I.M. and N. Somia (1997) Nature 389:239-242.) The invention is not limited by the host cell employed.
For long term production of recombinant proteins in mammalian systems, stable expression of DITHP in cell lines is preferred. For example, sequences encoding DITHP can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems may be used to recover transformed cell lines. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14; Hartman, S.C. and R.CMulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051; Rhodes, CA. (1995) Methods Mol. Biol. 55:121-131.)
Therapeutic Uses of dithp
The dithp of the invention may be used for somatic or germline gene therapy. Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined
immunodeficiency (SCDD)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207- 216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from Factor VTA or Factor TX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, I.M. and Somia, N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the case of cancers which result from unregulated cell proliferation), or (iii) express a protein which affords protection against intracellular parasites (e.g., against human retroviruses, such as human immunodeficiency virus (HTV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA. 93: 11395-11399), hepatitis B or C virus (HBV, HCV); fungal parasites, such as Candida albicans and Paracoccidioides brasiliensis; and protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi>. In the case where a genetic deficiency in dithp expression or regulation causes disease, the expression of dithp from an appropriate population of transduced cells may alleviate the clinical manifestations caused by the genetic deficiency.
In a further embodiment of the invention, diseases or disorders caused by deficiencies in dithp are tteated by constructing mammalian expression vectors comprising dithp and introducing these vectors by mechanical means into dithp-deficient cells. Mechanical transfer technologies for use with cells in vivo or ex vitto include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene ttansfer, and (v) the use of DNA ttansposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450). Expression vectors that may be effective for the expression of dithp include, but are not limited to, the PCDNA 3.1, EPTTAG, PRCCMV2, PREP, PVAX vectors (Invittogen, Carlsbad CA), PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Stratagene, La Jolla CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The dithp of the invention may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or β-actin genes), (ii) an inducible promoter (e.g., the tettacycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.V. and Blau, H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX plasmid (Invittogen); the ecdysone-inducible promoter (available in the plasmids PVGRXR and PIND; Invittogen); the FK506/raρamycin inducible promoter; or the RU486/mifepristone inducible
promoter (Rossi, F.M.V. and Blau, H.M. supra), or (iii) a tissue-specific promoter or the native promoter of the endogenous gene encoding DITHP from a normal individual.
Commercially available liposome transformation kits (e.g., the PERFECT LEPDD TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver polynucleotides to target cells in culture and require minimal effort to optimize experimental parameters. In the alternative, transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, AJ. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845). The introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols. In another embodiment of the invention, diseases or disorders caused by genetic defects with respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) dithp under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional rettovirus cz's-acting RNA sequences and coding sequences required for efficient vector propagation. Rettovirus vectors (e.g., PFB and PFBNEO) are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 92:6733-6737), incorporated by reference herein. The vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a ttopism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, A.D. (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol. 72:9873-9880). U.S. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovirus packaging cell lines producing high transducing efficiency retroviral supernatant") discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of ceUs (e.g., CD4+ T-cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. U.S.A. 95:1201-1206; Su, L. (1997) Blood 89:2283-2290). hi the alternative, an adenovirus-based gene therapy delivery system is used to deliver dithp to cells which have one or more genetic abnormalities with respect to the expression of dithp. The construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent
Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incoφorated by reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutt. 19:511-544 and Verma, I.M. and Somia, N. (1997) Nature 18:389:239-242, both incoφorated by reference herein. In another alternative, a heφes-based, gene therapy delivery system is used to deliver dithp to 5 target cells which have one or more genetic abnormalities with respect to the expression of dithp. The use of heφes simplex virus (HSV)-based vectors may be especially valuable for introducing dithp to cells of the central nervous system, for which HSV has a ttopism. The construction and packaging of heφes-based vectors are well known to those with ordinary skill in the art. A replication-competent heφes simplex virus (HSV) type 1 -based vector has been used to deliver a o reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res.169:385-395). The construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 5,804,413 to DeLuca ("Heφes simplex virus strains for gene ttansfer"), which is hereby incoφorated by reference. U.S. Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be ttansferred to a cell under the control of the 5 appropriate promoter for puφoses including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163: 152-161, hereby incoφorated by reference. The manipulation of cloned heφesvirus sequences, the generation of recombinant virus following the transfection of multiple plasmids containing o different segments of the large heφesvirus genomes, the growth and propagation of heφesvirus, and the infection of cells with heφesvirus are techniques well known to those of ordinary skill in the art. In another alternative, an alphavirus (positive, single-sttanded RNA virus) vector is used to deliver dithp to target cells. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), has been studied extensively and gene ttansfer vectors have been based on the SFV genome (Garoff, 5 H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic RNA replicates to higher levels than the full-length genomic RNA, resulting in the oveφroduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). Similarly, inserting dithp into the alphavirus genome in place of the capsid-coding region results in o the production of a large number of dithp RNAs and the synthesis of high levels of DTTHP in vector . transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the needs of the gene therapy application (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host 5 range of alphaviruses will allow the introduction of dithp into a variety of cell types. The specific
ttansduction of a subset of cells in a population may require the sorting of cells prior to transduction. The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
5
Antibodies
Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. (1998) o hnmunochemical Protocols, Humana Press, Totowa, NJ.
The amino acid sequence encoded by the dithp of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus; the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be 5 exposed to the external environment when the polypeptide is in its natural conformation. Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra. Chapter 11.7). Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids, preferably at least 10 amino acids, and most preferably at least 15 amino o acids. A peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet hemocyanin (KLH; Sigma, St. Louis MO) for antibody production. A peptide encompassing an antigenic region may be expressed from a dithp, synthesized as described above, or purified from human cells.
Procedures well known in the art may be used for the production of antibodies. Various hosts 5 including mice, goats, and rabbits, may be immunized by injection with a peptide. Depending on the host species, various adjuvants may be used to increase immunological response.
In one procedure, peptides about 15 residues in length may be synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra). Rabbits 0 are immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- rabbit IgG. Antisera with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting. 5 In another procedure, isolated and purified peptide may be used to immunize mice (about 100
μg of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are 5 detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST, Becton-Dickinson, Palo Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1% BSA and washed and exposed to supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 0 mg/ml.
Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several 5 procedures for the production of monoclonal antibodies, including in vitto production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
Antibody fragments containing specific binding sites for an epitope may also be generated. For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin o digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra. Chaps. 45-47). Antibodies generated against polypeptide encoded by dithp can be used to purify and characterize full-length DITHP protein and its activity, binding partners, etc. 5
Assays Using Antibodies
Anti-DITHP antibodies may be used in assays to quantify the amount of DTTHP found in a particular human cell. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions. The peptides and antibodies of the invention o may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes 5 between the DTTHP and its specific antibody and the measurement of such complexes. These and
other assays are described in Pound (supra).
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following prefened specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder 5 of the disclosure in any way whatsoever.
The disclosures of all patents, applications, and publications mentioned above and below, including U.S. Ser. No. 60/349,384, U.S. Ser. No. 60/349,946, and U.S. Ser. No. 60/349,413, are hereby expressly incoφorated by reference.
o EXAMPLES
I. Construction of cDNA Libraries
RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from : various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as 5 TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCI cushions or extracted with chloroform. RNA was precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods. Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA purity, hi most cases, RNA was tteated with DNase. For most libraries, poly(A+) RNA was isolated o using oligo d(T)-coupled paramagnetic particles (Promega Coφoration (Promega), Madison WI), OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX).
In some cases, Stratagene was provided with RNA and constructed the conesponding cDNA 5 libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse ttanscription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to 0 double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), 5 PSPORTl plasmid (Life Technologies), PCDNA2.1 plasmid (Invitrogen, Carlsbad CA), PBK-CMV
plasmid (Stratagene), PCR2-TOPOTA plasmid (Invitrogen), PCMV-ICIS plasmid (Stratagene), pIGEN (Incyte Genomics, Palo Alto CA), pRARE (Incyte Genomics), or pTNCY (Incyte Genomics), or derivatives thereof. Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5α, DH10B, or ElecttoMAX 5 DH10B from Life Technologies.
II. Isolation of cDNA Clones
Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or 0 WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or without lyophilization, at 4°C 5 Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host cell lysis and thermal cycling steps were canied out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a o FLUOROSKAN D fluorescence scanner (Labsystems Oy, Helsinki, Finland).
III. Sequencing and Analysis cDNA sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 thennal cycler (Applied Biosystems) or the PTC- 5 200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins
Scientific Coφ., Sunnyvale CA) or the MICROLAB 2200 liquid ttansfer system (Hamilton). cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGD E Terminator cycle sequencing ready reaction kit (Applied Biosystems). Electrophoretic separation of cDNA sequencing reactions o and detection of labeled polynucleotides were canied out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (Applied Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel, 1997, supra. Chapter 7.7). Some of the cDNA sequences 5 were selected for extension using the techniques disclosed in Example VID.
IV. Assembly and Analysis of Sequences
Component sequences from chromatograms were subject to PHRED analysis and assigned a quality score. The sequences having at least a required quality score were subject to various preprocessing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, polyA tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaller than 50 base pairs. In particular, low-information sequences and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) were replaced by "n's", or masked, to prevent spurious matches.
Processed sequences were then subject to assembly procedures in which the sequences were assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bins using BLASTN (v.1.4 WashU) and CROSSMATCH. Candidate pairs were identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local identity were accepted into the bin. The component sequences from each bin were assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence listing conespond to sense sttand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) sttands are inherently disclosed herein. The component sequences which were used to assemble each template consensus sequence are listed in Table 5 by their positions along the template nucleotide sequences.
Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-split. Assembled templates were also subject to analysis by STITCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
Once gene bins were generated based upon sequence alignments, bins were clone joined based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actually belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
The final assembled templates were subsequently annotated using the following procedure. Template sequences were analyzed using BLASTN (v2.0, NCBI) versus gbpri (GenBank version
135). "Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 100% local identity over 100 base pairs, or a homolog match having an E-value, i.e. a probability score, of ≤ 1 x IO"8. The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 135). (See Table 8). In this analysis, a homolog match was defined as having an E-value of ≤ 1 x IO"8. The assembly method used above was described in "System and Methods for Analyzing Biomolecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user manual (Incyte) both incoφorated by reference herein.
Following assembly, template sequences were subjected to motif, BLAST, and functional analyses, and categorized in protein hierarchies using methods described in, e.g., "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S. Patent Number 6,023,659; "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S. Patent Number 5,953,727; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incoφorated by reference herein.
The template sequences were further analyzed by translating each template in all three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein families and domains using the HMMER software package (available to the public from Washington University School of Medicine, St. Louis MO). Regions of templates which, when ttanslated, contain similarity to Pfam consensus sequences are reported in Table 3, along with descriptions of Pfam protein domains and families. Only those Pfam hits with an E-value of ≤ 1 x IO"3 are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam protein domains and families.)
Additionally, the template sequences were ttanslated in all three forward reading frames, and each translation was searched against hidden Markov models for signal peptides using the HMMER software package. Construction of hidden Markov models and their usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Cun. Opin. Stt. Biol. 6:361-365.) Only those signal peptide hits with a cutoff score of 11 bits or greater are reported. A cutoff score of 11 bits or greater conesponds to at least about 91-94% true-positives in signal peptide prediction. Template sequences were also translated in all three forward reading frames, and each ttanslation was searched against TMHMMER, a program that uses a hidden Markov model (HMM) to delineate transmembrane segments on protein sequences and determine orientation (Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. Conf. On Intelligent Systems for Mol. Biol., Glasgow et al., eds., The Am. Assoc. for Artificial Intelligence (AAAI) Press, Menlo Park, CA, and MIT Press, Cambridge, MA, pp. 175-182.) Regions of templates which, when translated, contain similarity to signal peptide or
transmembrane consensus sequences are reported in Table 4.
The results of HMMER analysis as reported in Tables 3 and 4 may support the results of BLAST analysis as reported in Table 2 or may suggest alternative or additional properties of template-encoded polypeptides not previously uncovered by BLAST or other analyses. Template sequences are further analyzed using the bioinformatics tools listed in Table 8, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template sequences may be further queried against public databases such as the GenBank rodent, mammalian, vertebrate, prokaryote, and eukaryote databases. The template sequences were ttanslated to derive the conesponding longest open reading frame as presented by the polypeptide sequences as reported in Table 7. Alternatively, a polypeptide of the invention may begin at any of the methionine residues within the full length translated polypeptide. Polypeptide sequences were subsequently analyzed by querying against the GenBank protein database (GENPEPT, (GenBank version 135)). Full length polynucleotide sequences are also analyzed using MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Polynucleotide and polypeptide sequence alignments are generated using default parameters specified by the CLUSTAL algorithm as incoφorated into the MEGALIGN mύltisequence alignment program (DNASTAR), which also calculates the percent , identity between aligned sequences. Table 7 shows sequences with homology to the polypeptides of the invention as identified by
BLAST analysis against the GenBank protein (GENPEPT) database. Column 1 shows the polypeptide sequence identification number (SEQ DD NO:) for the polypeptide segments of the invention. Column 2 shows the reading frame used in the ttanslation of the polynucleotide sequences encoding the polypeptide segments. Column 3 shows the length of the ttanslated polypeptide segments. Columns 4 and 5 show the start and stop nucleotide positions of the polynucleotide sequences encoding the polypeptide segments. Column 6 shows the GenBank identification number (GI Number) of the nearest GenBank homolog. Column 7 shows the probability score for the match between each polypeptide and its GenBank homolog. Column 8 shows the annotation of the GenBank homolog. V. Analysis of Polynucleotide Expression
Northern analysis is a laboratory technique used to detect the presence of a ttanscript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel, 1995, supra, ch.4 and 16.) Analogous computer techniques applying BLAST were used to search for identical or related
molecules in cDNA databases such as GenBank or LEFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the computer search can be modified to determine whether any particular match is categorized as exact or similar. The basis of the search is the product score, which is defined as:
BLAST Score x Percent Identity
5 x minimum {length(Seq. 1), length(Seq. 2)}
The product score takes into account both the degree of similarity between two sequences and the length of the sequence match. The product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The product score represents a balance between fractional overlap and quality in a BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
VI. Tissue Distribution Profiling
A tissue distribution profile is determined for each template by compiling the cDNA library tissue classifications of its component cDNA sequences. Each component sequence, is derived from a cDNA library constructed from a human tissue. Each human tissue is classified into one of the following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified mixed; or urinary tract. Template sequences, component sequences, and cDNA library/tissue information are found in the LTFESEQ GOLD database (Incyte Genomics, Palo Alto CA).
Table 6 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with
percentage values of ≥ 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of <10% in all tissue categories.
VII. Transcript Image Analysis
5 Transcript images are generated as described in Seilhamer et al., "Comparative Gene
Transcript Analysis," U.S, Patent Number 5,840,484, incoφorated herein by reference.
VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA
Oligonucleotide primers designed using a dithp of the Sequence Listing are used to extend 0 the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template. The initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any 5 stretch of nucleotides which would result in haiφin structures and primer-primer dimerizations are avoided. Selected human cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed.
High fidelity amplification is obtained by PCR using methods well known in the art. PCR is performed in 96-well plates using the PTC-200 thermal cycler (MJ Research). The reaction mix o contains DNA template, 200 nmol of each primer, reaction buffer containing Mg2+, (NH4)2S04, and β- mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C In the 5 alternative, the parameters for primer pair T7 and SK+ are as follows: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C.
The concentration of DNA in each well is determined by dispensing 100 μl PICOGREEN quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 μl of o undiluted PCR product into each well of an opaque fluorimeter plate (Corning Incoφorated
(Corning), Corning NY), allowing the DNA to bind to the reagent. The plate is scanned in a FLUOROSKAN D (Labsystems Oy) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence. 5 The extended nucleotides are desalted and concentrated, transfened to 384-well plates,
digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones 5 are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector
(Amersham Pharmacia Biotech), tteated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on antibiotic-containing media, individual colonies are picked and cultured overnight at 37 °C in 384- well plates in LB/2x carbenicillin liquid media. 0 The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham
Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified 5 using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy ttansfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems).
In like manner, the dithp is used to obtain regulatory sequences (promoters, introns, and o enhancers) using the procedure above, oligonucleotides designed for such extension, and an appropriate genomic library.
IX. Labeling of Probes and Southern Hybridization Analyses
Hybridization probes derived from the dithp of the Sequence Listing are employed for 5 screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 1000 nucleotides in length is specifically described, but essentially the same procedure may be used with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, γ32P-ATP, and 0.5X One-Phor-AU Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The o probe mixture is diluted to IO7 dpm/μg/ml hybridization buffer and used in a typical membrane-based hybridization analysis.
The DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel. The DNA fragments are transfened from the agarose to nylon membrane (NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the 5 manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and
hybridization is canied out overnight at 68 °C To remove non-specific signals, blots are sequentially washed at room temperature under increasingly stringent conditions, up to O.lx saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORTMAGER cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and experimental lanes are compared. Essentially the same procedure is employed when screening RNA.
X. Chromosome Mapping of dithp
The cDNA sequences which were used to assemble SEQ DD NO: 1-188 are compared with sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ DD NO: 1-188 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 8). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped. Inclusion of a mapped sequence in a cluster will result in the assignment of all sequences of that cluster, including its particular SEQ DD NO:, to that map location. The genetic map locations of SEQ DD NO: 1-188 are described as ranges, or intervals, of human chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome' s p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) The cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
XI. Microarray Analysis
Probe Preparation from Tissue or Cell Samples
Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA+ RNA is purified using the oligo (dT) cellulose method. Each polyA+ RNA sample is reverse ttanscribed using MMLV reverse-transcriptase, 0.05 pg/μl oligo-dT primer (21mer), IX first sttand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μM dGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription reaction is performed in a 25 ml volume containing 200 ng polyA+ RNA with GEMBRIGHT kits (Incyte). Specific conttol polyA+ RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpublished). As quantitative conttols, the conttol mRNAs at 0.002 ng,
0.02 ng, 0.2 ng, and 2 ng are diluted into reverse ttanscription reaction at ratios of 1:100,000, 1 : 10,000, 1 : 1000, 1 : 100 (w/w) to sample mRNA respectively. The control mRNAs are diluted into reverse transcription reaction at ratios of 1:3, 3: 1, 1:10, 10: 1, 1:25, 25:1 (w/w) to sample mRNA differential expression patterns. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is tteated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc. (CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is o then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 μl 5X SSC/0.2% SDS.
Microanay Preparation
Sequences of the present invention are used to generate anay elements. Each anay element 5 is amplified from bacterial cells containing vectors with cloned cDNA inserts. PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert. Anay elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 μg. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech). 0 Purified array elements are immobilized on polymer-coated glass slides. Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Coφoration (VWR), West Chester, PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 5 110°C oven.
Array elements are applied to the coated glass substtate using a procedure described in US Patent No. 5,807,522, incoφorated herein by reference. 1 μl of the anay element DNA, at an average concentration of 100 ng/μl, is loaded into the open capillary printing element by a high-speed robotic apparatus. The apparatus then deposits about 5 nl of array element sample per slide. o Microanays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene).
Microanays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 0.2% SDS and distilled water as before. 5
Hybridization
Hybridization reactions contain 9 μl of probe mixture consisting of 0.2 μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm2 coverslip. The anays are transfened to a wateφroof chamber having a cavity just slightly larger than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 μl of 5x SSC in a corner of the chamber. The chamber containing the anays is incubated for about 6.5 hours at 60° C. The anays are washed for 10 min at 45° C in a first wash buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45° C in a second wash buffer (0.1X SSC), and dried. 0
Detection
Reporter-labeled hybridization complexes are detected with a microscope equipped with an Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is 5 focused on the anay using a 20X microscope objective (Nikon, Inc., Melville NY). The slide containing the anay is placed on a computer-controlled X-Y stage on the microscope and raster- scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. o Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477,
Hamamatsu Photonics Systems, Bridgewater NJ) conesponding to the two fluorophores. Appropriate filters positioned between the array and the photomultiplier tubes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each anay is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, 5 although the apparatus is capable of recording the spectra from both fluorophores simultaneously. The sensitivity of the scans is typically calibrated using the signal intensity generated by a cDNA control species added to the probe mix at a known concentration. A specific location on the anay contains a complementary DNA sequence, allowing the intensity of the signal at that location to be conelated with a weight ratio of hybridizing species of 1:100,000. When two probes from 0 different sources (e.g., representing test and control cells), each labeled with a different fluorophore, are hybridized to a single anay for the puφose of identifying genes that are differentially expressed, the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital 5 (A/D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an D3M-compatible PC
computer. The digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first conected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore's emission spectrum.
A grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid. The fluorescence signal within each element is then integrated to obtain a numerical value conesponding to the average intensity of the signal. The software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte Genomics). Anay elements that exhibit at least about a two-fold change in expression, a signal-to- background ratio of at least about 2.5, and an element spot size of at least about 40%, are considered to be differentially expressed.
XII. Complementary Nucleic Acids Sequences complementary to the dithp are used to detect, decrease, or inhibit expression of the naturally occu ing nucleotide. The use of oligonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaller or larger sequence fragments can also be used. Appropriate oligonucleotides are designed from the dithp using OLIGO 4.06 software (National Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding and processing of the transcript.
XIII. Expression of DITHP
Expression and purification of DTTHP is accomplished using bacterial or virus-based expression systems. For expression of DTTHP in bacteria, cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA ttanscription. Examples of such promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3). Antibiotic resistant bacteria express DITHP upon induction with isopropyl beta-D- thiogalactopyranoside (IPTG). Expression of DTTHP in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica californica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is
replaced with cDNA encoding DITHP by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA ttanscription. Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. 5 Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.)
In most expression systems, DITHP is synthesized as a fusion protein with, e.g., glutathione S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- 0 kilodalton enzyme from Schistosoma iaponicum. enables the purification of fusion proteins on immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from DITHP at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman 5 Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra. Chapters 10 and 16). Purified DITHP obtained by these methods can be used directly in the following activity assay.
o XIV. Demonstration of DITHP Activity
DTTHP activity is demonstrated through a variety of specific assays, some of which are outlined below.
Oxidoreductase activity of DTTHP is measured by the increase in extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of oxidation activity, or the decrease in 5 extinction coefficient of NAD(P)H coenzyme at 340 nm for the measurement of reduction activity
(Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858). One of three substrates may be used: Asn-βGal, biocytidine, or ubiquinone-10. The respective subunits of the enzyme reaction, for example, cytochtome crb oxidoreductase and cytochrome c, are reconstituted. The reaction mixture contains a)l-2 mg/ml DITHP; and b) 15 mM substtate, 2.4 mM NAD(P)+ in 0.1 M phosphate buffer, pH 7.1 o (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 ( reduction reaction); in a total volume of 0.1 ml. Changes in absorbance at 340 nm (A340) are measured at 23.5° C using a recording specttophotometer (Shimadzu Scientific Instruments, Inc., Pleasanton CA). The amount of NAD(P)H is stoichiometrically equivalent to the amount of substrate initially present, and the change in A340 is a direct measure of the amount of NAD(P)H produced; ΔA340 = 6620[NADHJ. 5 Oxidoreductase activity of DITHP activity is proportional to the amount of NAD(P)H present in the
assay.
Transferase activity of DITHP is measured through assays such as a methyl transferase assay in which the ttansfer of radiolabeled methyl groups between a donor substrate and an acceptor substtate is measured (Bokar, J.A. et al. (1994) J. Biol. Chem. 269:17697-17704). Reaction mixtures 5 (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothreitol, 3% . polyvinylalcohol, 1.5 μCi [met/ry/-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DITHP, and acceptor substrate (0.4 μg [35S]RNA or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. The products are separated by chromatography or electrophoresis and the level of methyl transferase activity is l o determined by quantification of methyl-3H recovery.
DITHP hydrolase activity is measured by the hydrolysis of appropriate synthetic peptide substtates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absoφtion of the released chromophore. (Beynon, . R.J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach. Oxford University Press, New
15 York NY, pp. 25-55) Peptide substrates are designed according to the category of protease activity as eήdopeptidase (serine, cysteine, aspartic proteases), animopeptidase (leucine aminopeptidase), or carboxypeptidase (Carboxypeptidase A and B, procollagen C-proteinase).
DITHP isomerase activity such as peptidyl prolyl cis/trans isomerase activity can be assayed by an enzyme assay described by Rahfeld, J.U., et al. (1994) (FEBS Lett. 352: 180-184). The assay
20 is performed at 10°C in 35 mM HEPES buffer, pH 7.8, containing chymotrypsin (0.5 mg/ml) and DITHP at a variety of concentrations. Under these assay conditions, the substrate, Suc-Ala-Xaa-Pro- Phe-4-NA, is in equilibrium with respect to the prolyl bond, with 80-95% in trans and 5-20% in cis conformation. An aliquot (2 ul) of the substtate dissolved in dimethyl sulfoxide (10 mg/ml) is added to the reaction mixture described above. Only the cis isomer of the substtate is a substtate for
25 cleavage by chymotrypsin. Thus, as the substtate is isomerized by DITHP, the product is cleaved by chymotrypsin to produce 4-nittoanUide, which is detected by it's absorbance at 390 nm. 4- Nittoanilide appears in a time-dependent and a DITHP concentration-dependent manner.
An assay for DITHP activity associated with growth and development measures cell proliferation as the amount of newly initiated DNA synthesis in Swiss mouse 3T3 cells. A plasmid
3 o containing polynucleotides encoding DITHP is ttansfected into quiescent 3T3 cultured cells using methods well known in the art. The transiently transfected cells are then incubated in the presence of [3HJthymidine, a radioactive DNA precursor. Where applicable, varying amounts of DITHP ligand are added to the transfected cells. Incoφoration of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incoφorated is directly proportional to
35 the amount of newly synthesized DNA.
Growth factor activity of DITHP is measured by the stimulation of DNA synthesis in Swiss mouse 3T3 cells (McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY). Initiation of DNA synthesis indicates the cells' entry into the mitotic cycle and their commitment to undergo later division. 3T3 cells are competent to respond to 5 most growth factors, not only those that are mitogenic, but also those that are involved in embryonic induction. This competence is possible because the in vivo specificity demonstrated by some growth . factors is not necessarily inherent but is determined by the responding tissue. In this assay, varying amounts of DITHP are added to quiescent 3T3 cultured cells in the presence of [3H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from o biochemical preparations. Incoφoration of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incoφorated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concenttation range is indicative of growth factor activity. One unit of activity per milliliter is defined as the concenttation of DITHP producing a 50% response level, where 100% represents 5 maximal incoφoration of [3H]thymidine into acid-precipitable DNA.
Alternatively, an assay for cytokine activity of DITHP measures the proliferation of leukocytes. In this assay, the amount of tritiated thymidine incoφorated into newly synthesized DNA is used to estimate proliferative activity. Varying amounts of DITHP are added to cultured leukocytes, such as granulocytes, monocytes, or lymphocytes, in the presence of [3Hjthymidine, a o radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incoφoration of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incoφorated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concenttation range is indicative of DITHP activity. One unit of activity per milliliter is 5 conventionally defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incoφoration of [3H}thymidine into acid-precipitable DNA.
An alternative assay for DITHP cytokine activity utilizes a Boyden micro chamber (Neuroprobe, Cabin John MD) to measure leukocyte chemotaxis (Vicari, supra). In this assay, about IO5 migratory cells such as macrophages or monocytes are placed in cell culture media in the upper o compartment of the chamber. Varying dilutions of DITHP are placed in the lower compartment. The two compartments are separated by a 5 or 8 micron pore polycarbonate filter (Nucleopore, Pleasanton CA). After incubation at 37 °C for 80 to 120 minutes, the filters are fixed in methanol and stained with appropriate labeling agents. Cells which migrate to the other side of the filter are counted using standard microscopy. The chemotactic index is calculated by dividing the number of migratory cells 5 counted when DITHP is present in the lower compartment by the number of migratory cells counted
when only media is present in the lower compartment. The chemotactic index is proportional to the activity of DITHP.
Alternatively, cell lines or tissues transformed with a vector containing dithp can be assayed for DITHP activity by immunoblotting. Cells are denatured in SDS in the presence of β- 5 mercaptoethanol, nucleic acids removed by ethanol precipitation, and proteins purified by acetone precipitation. Pellets are resuspended in 20 mM tris buffer at pH 7.5 and incubated with Protein G- Sepharose pre-coated with an antibody specific for DITHP. After washing, the Sepharose beads are boiled in electrophoresis sample buffer, and the eluted proteins subjected to SDS-PAGE. The SDS- PAGE is transfened to a nitrocellulose membrane for immunoblotting, and the DITHP activity is 0 assessed by visualizing and quantifying bands on the blot using the antibody specific for DITHP as the primary antibody and 125I-labeled IgG specific for the primary antibody as the secondary antibody. DITHP kinase activity is measured by phosphorylation of a protein substtate using γ-labeled [32P]-ATP and quantitation of the incoφorated radioactivity using a radioisotope counter. DITHP is incubated with the protein substtate, [32P]-ATP, and an appropriate kinase buffer. The [32P] 5 incoφorated into the product is separated from free [32P]-ATP by electrophoresis and the incoφorated [32P] is counted. The amount of [32P] recovered is proportional to the kinase activity of DITHP in the assay. A determination of the specific amino acid residue phosphorylated is made by phosphoamino acid analysis of the hydrolyzed protein.
In the alternative, DITHP activity is measured by the increase in cell proliferation resulting o from transformation of a mammalian cell line such as COS7, HeLa or CHO with an eukaryotic expression vector encoding DTTHP. Eukaryotic expression vectors are commercially available, and the techniques to introduce them into cells are well known to those skilled in the art. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the cell line to allow expression of DITHP. Phase microscopy is then used to compare the mitotic index of transformed 5 versus conttol cells. An increase in the mitotic index indicates DITHP activity. hi a further alternative, an assay for DITHP signaling activity is based upon the ability of GPCR family proteins to modulate G protein-activated second messenger signal ttansduction pathways (e.g., cAMP; Gaudin, P. et al. (1998) J. Biol. Chem. 273:4990-4996). A plasmid encoding full length DITHP is ttansfected into a mammaUan cell line (e.g., Chinese hamster ovary (CHO) or 0 human embryonic kidney (HEK-293) cell lines) using methods well-known in the art. Transfected cells are grown in 12- well trays in culture medium for 48 hours, then the culture medium is discarded, and the attached cells are gently washed with PBS. The cells are then incubated in culture medium with or without ligand for 30 minutes, then the medium is removed and cells lysed by treatment with 1 M perchloric acid. The cAMP levels in the lysate are measured by 5 radioimmunoassay using methods well-known in the art. Changes in the levels of cAMP in the lysate
from cells exposed to ligand compared to those without Ugand are proportional to the amount of DITHP present in the ttansfected cells.
Alternatively, an assay for DITHP protein phosphatase activity measures the hydrolysis of P- nitrophenyl phosphate (PNPP). DITHP is incubated together with PNPP in HEPES buffer pH 7.5, in the presence of 0.1% β-mercaptoethanol at 37 °C for 60 min. The reaction is stopped by the addition of 6 ml of 10 N NaOH, and the increase in light absorbance of the reaction mixture at 410 nm resulting from the hydrolysis of PNPP is measured using a specttophotometer. The increase in light absorbance is proportional to the phosphatase activity of DITHP in the assay (Diamond, R.H. et al (1994) Mol Cell Biol 14:3752-3762). An alternative assay measures DITHP-mediated G-protein signaling activity by monitoring the mobilization of Ca++ as an indicator of the signal transduction pathway stimulation. (See, e.g., Grynkievicz, G. et al. (1985) J. Biol. Chem. 260:3440; McColl, S. et al. (1993) J. Immunol. 150:4550-4555; and Aussel, C. et al. (1988) J. Immunol. 140:215-220). The assay requires preloading neutrophils or T cells with a fluorescent dye such as FURA-2 or BCECF (Universal Imaging Coφ, Westchester PA) whose emission characteristics are altered by Ca++ binding. When the cells are exposed to one or more activating stimuli artificially (e.g., anti-CD3 antibody Ugation of the T cell receptor) or physiologically (e.g., by allogeneic stimulation), Ca""" flux takes place. This flux can be observed and quantified by assaying the cells in a fluorometer or fluorescent activated cell sorter. Measurements of Ca4* flux are compared between cells in their normal state and those ttansfected with DITHP. Increased Ca++ mobilization attributable to increased DITHP concenttation is proportional to DITHP activity.
DTTHP ttansport activity is assayed by measuring uptake of labeled substtates into Xenopus laevis oocytes. Oocytes at stages V and VI are injected with DTTHP mRNA (10 ng per oocyte) and incubated for 3 days at 18°C in OR2 medium (82.5mM NaCl, 2.5 mM KC1, lmM CaCl2, ImM MgCl2, ImM Na2HP04, 5 mM Hepes, 3.8 mM NaOH, 50μg/ml gentamycin, pH 7.8) to allow expression of DTTHP protein. Oocytes are then transfened to standard uptake medium (lOOmM NaCl, 2 mM KC1, ImM CaCl2, ImM MgCl2, 10 mM Hepes/Tris pH 7.5). Uptake of various substtates (e.g., amino acids, sugars, drugs, ions, and neurotransmitters) is initiated by adding labeled substrate (e.g. radiolabeled with 3H, fluorescently labeled with rhodamine, etc.) to the oocytes. After incubating for 30 minutes, uptake is terminated by washing the oocytes three times in Na+-free medium, measuring the incoφorated label, and comparing with controls. DTTHP ttansport activity is proportional to the level of internalized labeled substtate.
DITHP transferase activity is demonstrated by a test for galactosyltransferase activity. This can be determined by measuring the ttansfer of radiolabeled galactose from UDP-galactose to a GlcNAc-terminated oligosaccharide chain (Kolbinger, F. et al. (1998) J. Biol. Chem. 273:58-65).
The sample is incubated with 14 μl of assay stock solution (180 mM sodium cacodylate, pH 6.5, 1 mg/ml bovine serum albumin, 0.26 mM UDP-galactose, 2 μl of UDP-[3HJgalactose), 1 μl of MnCl2 (500 mM), and 2.5 μl of GlcNAcβO-(CH2)8-C02Me (37 mg/ml in dimethyl sulfoxide) for 60 minutes at 37 °C The reaction is quenched by the addition of 1 ml of water and loaded on a C18 Sep-Pak 5 cartridge (Waters), and the column is washed twice with 5 ml of water to remove unreacted UDP- [3H]galactose. The [3H]galactosylated GlcNAcβO-(CH2)s-C02Me remains bound to the column during the water washes and is eluted with 5 ml of methanol. Radioactivity in the eluted material is measured by liquid scintillation counting and is proportional to galactosyltransferase activity in the starting sample. 0 In the alternative, DITHP induction by heat or toxins may be demonstrated using primary cultures of human fibroblasts or human cell lines such as CCL-13, HEK293, or HEP G2 (ATCC). To heat induce DITHP expression, aliquots of cells are incubated at 42 °C for 15, 30, or 60 minutes. Conttol aliquots are incubated at 37 °C for the same time periods. To induce DITHP expression by toxins, aliquots of cells are tteated with 100 μM arsenite or 20 mM azetidine-2-carboxylic acid for 0,. 5 3, 6, or 12 hours. After exposure to heat, arsenite, or the amino acid analogue, samples of the tteated . cells are harvested and cell lysates prepared for analysis by western blot. Cells are lysed in lysis .buffer containing 1% Nonidet P-40, 0.15 M NaCl, 50 mM Tris-HCI, 5 mM EDTA, 2 mM N-ethylmaleimide, 2 mM phenylmethylsulfonyl fluoride, 1 mg/ml leupeptin, and 1 mg/ml pepstatin. Twenty micrograms of the cell lysate is separated on an 8% SDS-PAGE gel and transfened to a o membrane. After blocking with 5% nonfat dry milk/phosphate-buffered saline for 1 h, the membrane is incubated overnight at 4°C or at room temperature for 2-4 hours with a 1: 1000 dilution of anti-DITHP serum in 2% nonfat dry milk/phosphate-buffered saline. The membrane is then washed and incubated with a 1: 1000 dilution of horseradish peroxidase-conjugated goat anti-rabbit IgG in 2% dry milk/phosphate-buffered saline. After washing with 0.1% Tween 20 in phosphate-buffered 5 saline, the DITHP protein is detected and compared to controls using chemiluminescence.
Alternatively, DITHP protease activity is measured by the hydrolysis of appropriate synthetic peptide substtates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absoφtion of the released chromophore (Beynon, RJ. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach. Oxford University 0 Press, New York, NY, pp.25-55). Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases, or metaUoproteases), aminopeptidase (leucine aminopeptidase), or carboxypeptidase (carboxypeptidases A and B, procollagen C-proteinase). Commonly used chromogens are 2-naphthylamine, 4-nittoaniUne, and furylacrylic acid. Assays are performed at ambient temperature and contain an aUquot of the enzyme 5 and the appropriate substtate in a suitable buffer. Reactions are carried out in an optical cuvette, and
the increase/decrease in absorbance of the chromogen released during hydrolysis of the peptide substrate is measured. The change in absorbance is proportional to the DITHP protease activity in the assay.
In the alternative, an assay for DTTHP protease activity takes advantage of fluorescence resonance energy transfer (FRET) that occurs when one donor and one acceptor fluorophore with an appropriate spectral overlap are in close proximity. A flexible peptide linker containing a cleavage site specific for PRTS is fused between a red-shifted variant (RSGFP4) and a blue variant (BFP5) of Green Fluorescent Protein. This fusion protein has spectral properties that suggest energy transfer is occurring from BFP5 to RSGFP4. When the fusion protein is incubated with DITHP, the substtate is cleaved, and the two fluorescent proteins dissociate. This is accompanied by a marked decrease in energy ttansfer which is quantified by comparing the emission spectra before and after the addition of DITHP (Mitta, R.D. et al (1996) Gene 173:13-17). This assay can also be performed in living cells. In this case the fluorescent substtate protein is expressed constitutively in cells and DITHP is introduced on an inducible vector so that FRET can be monitored in the presence and absence of DITHP (Sagot, I. et al (1999) FEBS Lett. 447:53-57).
A method to determine the nucleic acid binding activity of DITHP involves a polyacrylamide gel mobility-shift assay. In preparation for this assay, DITHP is expressed by transforming a mammalian cell line such as COS7, HeLa or CHO with a eukaryotic expression vector containing DITHP cDNA. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the cell line to allow expression and accumulation of DITHP. Extracts containing solubilized proteins can be prepared from cells expressing DITHP by methods well known in the art. Portions of the extract containing DITHP are added to [32P]-labeled RNA or DNA. Radioactive nucleic acid can be synthesized m vitto by techniques well known in the art. The mixtures are incubated at 25 °C in the presence of RNase- and DNase-inhibitors under buffered conditions for 5-10 minutes. After incubation, the samples are analyzed by polyacrylamide gel electrophoresis followed by autoradiography. The presence of a band on the autoradiogram indicates the formation of a complex between DITHP and the radioactive ttanscript. A band of similar mobility will not be present in samples prepared using conttol extracts prepared from untransformed cells.
In the alternative, a method to determine the methylase activity of a DITHP measures ttansfer of radiolabeled methyl groups between a donor substrate and an acceptor substtate. Reaction mixtures (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 μCi [met/ryZ-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DITHP, and acceptor substtate (e.g., 0.4 μg [35S]RNA, or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. Analysis of [met/ryZ-3H]RNA is as follows: 1) 50 μl of 2 x loading buffer (20 mM Tris-HCI, pH 7.6, 1
M LiCl, 1 mM EDTA, 1% sodium dodecyl sulphate (SDS)) and 50 μl oligo d(T)-cellulose (10 mg/ml in 1 x loading buffer) are added to the reaction mixture, and incubated at ambient temperature with shaking for 30 minutes. 2) Reaction mixtures are transfened to a 96-well filtration plate attached to a vacuum apparatus. 3) Each sample is washed sequentially with three 2.4 ml aliquots of 1 x oligo d(T) loading buffer containing 0.5% SDS, 0.1% SDS, or no SDS. and 4) RNA is eluted with 300 μl of water into a 96-well collection plate, transfened to scintillation vials containing liquid scintiUant, and radioactivity determined. Analysis of [τnet/ryZ-3H]6-MP is as follows: 1) 500 μl 0.5 M borate buffer, pH 10.0, and then 2.5 ml of 20% (v/v) isoamyl alcohol in toluene are added to the reaction mixtures. 2) The samples mixed by vigorous vortexing for ten seconds. 3) After centrifugation at 700g for 10 minutes, 1.5 ml of the organic phase is transfened to scintillation vials containing 0.5 ml absolute ethanol and liquid scintiUant, and radioactivity determined, and 4) Results are conected for the extraction of 6-MP into the organic phase (approximately 41%).
An assay for adhesion activity of DITHP measures the disruption of cytoskeletal filament networks upon overexpression of DITHP in cultured cell lines (Rezniczek, G.A. et al. (1998) J. Cell Biol. 141:209-225). cDNA encoding DITHP is subcloned into a mammalian expression vector that drives high levels of cDNA expression. This construct is transfected into cultured cells, such as rat kangaroo PtK2 or rat bladder carcinoma 804G cells. Actin filaments and intermediate filaments such as keratin and vimentin are visualized by immunofluorescence microscopy using antibodies and techniques well known in the art. The configuration and abundance of cytoskeletal filaments can be assessed and quantified using confocal imaging techniques. In particular, the bundling and collapse of cytoskeletal filament networks is indicative of DTTHP adhesion activity.
Alternatively, an assay for DITHP activity measures the expression of DITHP on the cell surface. cDNA encoding DITHP is ttansfected into a non-leukocytic cell line. Cell surface proteins are labeled with biotin (de la Fuente, M.A. et al. (1997) Blood 90:2398-2405). Immunoprecipitations are performed using DITHP-specific antibodies, and immunoprecipitated samples are analyzed using SDS-PAGE and immunoblotting techniques. The ratio of labeled immunoprecipitant to unlabeled immunoprecipitant is proportional to the amount of DITHP expressed on the cell surface.
Alternatively, an assay for DITHP activity measures the amount of cell aggregation induced by overexpression of DTTHP. In this assay, cultured cells such as NTH3T3 are transfected with cDNA encoding DTTHP contained within a suitable mammalian expression vector under conttol of a strong promoter. Cottansfection with cDNA encoding a fluorescent marker protein, such as Green Fluorescent Protein (CLONTECH), is useful for identifying stable transfectants. The amount of ceU agglutination, or clumping, associated with ttansfected cells is compared with that associated with untransfected cells. The amount of cell agglutination is a direct measure of DITHP activity. DITHP may recognize and precipitate antigen from serum. This activity can be measured by
the quantitative precipitin reaction (Golub, E.S. et al. (1987) Immunology: A Synthesis, Sinauer Associates, Sunderland MA, pages 113-115). DITHP is isotopically labeled using methods known in the art. Various serum concentrations are added to constant amounts of labeled DITHP. DITHP- antigen complexes precipitate out of solution and are collected by centrifugation. The amount of precipitable DITHP-antigen complex is proportional to the amount of radioisotope detected in the precipitate. The amount of precipitable DITHP-antigen complex is plotted against the serum concentration. For various serum concentrations, a characteristic precipitation curve is obtained, in which the amount of precipitable DITHP-antigen complex initially increases proportionately with increasing serum concentration, peaks at the equivalence point, and then decreases proportionately with further increases in serum concentration. Thus, the amount of precipitable DITHP-antigen complex is a measure of DITHP activity which is characterized by sensitivity to both limiting and excess quantities of antigen.
A microtubule motility assay for DITHP measures motor protein activity. In this assay, recombinant DITHP is immobilized onto a glass slide or similar substrate. Taxol-stabilized bovine brain microtubules (commercially available) in a solution containing ATP and cytosolic extract are perfused onto the slide. Movement of microtubules as driven by DITHP motor activity can be visualized and quantified using video-enhanced light microscopy and image analysis techniques. DITHP motor protein activity is directly proportional to the frequency and velocity of microtubule movement. Alternatively, an assay for DTTHP measures the formation of protein filaments in vitto. A solution of DITHP at a concenttation greater than the "critical concenttation" for polymer assembly is appUed to carbon-coated grids. Appropriate nucleation sites may be supplied in the solution. The grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electton microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 nm (intermediate filaments) is a demonstration of protein activity.
DTTHP electton ttansfer activity is demonstrated by oxidation or reduction of NADP. Substtates such as Asn-βGal, biocytidine, or ubiquinone-10 may be used. The reaction mixture contains 1-2 mg/ml HORP, 15 mM substrate, and 2.4 mM NAD(P)
+ in 0.1 M phosphate buffer, pH 7.1 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M
buffer, pH 7.4 (reduction reaction); in a total volume of 0.1 ml. FAD may be included with NAD, according to methods well known in the art. Changes in absorbance are measured using a recording spectrophotometer. The amount of NAD(P)H is stoichiometrically equivalent to the amount of substrate initially present, and the change in A
340 is a direct measure of the amount of NAD(P)H produced; ΔA
340 = 6620[NADH]. DTTHP activity is proportional to the amount of NAD(P)H present in the assay. The increase in extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of oxidation activity, or the decrease in
extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858).
DTTHP transcription factor activity is measured by its ability to stimulate transcription of a reporter gene (Liu, H.Y. et al. (1997) EMBO J. 16:5289-5298). The assay entails the use of a well characterized reporter gene construct, LexAop-LacZ, that consists of LexA DNA transcriptional conttol elements (LexAop) fused to sequences encoding the E. coli LacZ enzyme. The methods for constructing and expressing fusion genes, introducing them into cells, and measuring LacZ enzyme activity, are well known to those skilled in the art. Sequences encoding DTTHP are cloned into a plasmid that directs the synthesis of a fusion protein, LexA-DTTHP, consisting of DTTHP and a DNA binding domain derived from the LexA ttanscription factor. The resulting plasmid, encoding a LexA- DITHP fusion protein, is introduced into yeast cells along with a plasmid containing the LexAop-LacZ reporter gene. The amount of LacZ enzyme activity associated with LexA-DTTHP transfected cells, relative to control cells, is proportional to the amount of ttanscription stimulated by the DTTHP.
Chromatin activity of DITHP is demonstrated by measuring sensitivity to DNase I (Dawson, B.A. et al. (1989) J. Biol. Chem. 264: 12830-12837). Samples are tteated with DNase I, followed by insertion of a cleavable biotinylated nucleotide analog, 5-[(N-biotinamido)hexanoamido-ethy 1-1,3- thiopropionyl-3-aminoallyl]-2'-deoxyuridine 5 '-triphosphate using nick-repair techniques well known to those skiUed in the art. Following purification and digestion with EcoRI restriction endonuclease, biotinylated sequences are affinity isolated by sequential binding to streptavidin and biotincellulose. Another specific assay demonstrates the ion conductance capacity of DITHP using an electtophysiological assay. DITHP is expressed by transforming a mammalian cell line such as COS7, HeLa or CHO with a eukaryotic expression vector encoding DITHP. Eukaryotic expression vectors are commercially available, and the techniques to introduce them into ceUs are well known to those skilled in the art. A small amount of a second plasmid, which expresses any one of a number of marker genes such as β-galactosidase, is co-transformed into the cells in order to allow rapid identification of those cells which have taken up and expressed the foreign DNA. The cells are incubated for 48-72 hours after transformation under conditions appropriate for the ceU line to allow expression and accumulation of DTTHP and β-galactosidase. Transformed cells expressing β- galactosidase are stained blue when a suitable colorimettic substrate is added to the culture media under conditions that are well known in the art. Stained cells are tested for differences in membrane conductance due to various ions by electtophysiological techniques that are well known in the art. Unttansformed cells, and/or cells transformed with either vector sequences alone or β-galactosidase sequences alone, are used as conttols and tested in parallel. The contribution of DTTHP to cation or anion conductance can be shown by incubating the cells using antibodies specific for either DITHP. The respective antibodies will bind to the extracellular side of DITHP, thereby blocking the pore in
the ion channel, and the associated conductance.
XV. Functional Assays
DTTHP function is assessed by expressing dithp at physiologically elevated levels in 5 mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV SPORT (Life Technologies) and pCR3.1 (Invitrogen Coφoration, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 μg of recombinant vector are transiently ttansfected into a human cell line, preferably of endothelial or hematopoietic origin, using either liposome 0 formulations or electroporation. 1-2 μg of an additional plasmid containing sequences encoding a marker protein are co-transfected.
Expression of a marker protein provides a means to distinguish transfected cells from nonttansfected cells and is a reliable predictor of cDNA expression from the recombinant vector. Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a 5 CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is used to identify ttansfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of the cells and other cellular properties.
FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as o measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to 5 the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cytometry, Oxford, New York NY.
The influence of DITHP on gene expression can be assessed using highly purified populations of cells transfected with sequences encoding DTTHP and either CD64 or CD64-GFP. CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions 0 of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nonttansfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding DITHP and other genes of interest can be analyzed by northern analysis or microanay techniques. 5
XVI. Production of Antibodies
DTTHP substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., Hanington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize rabbits and to produce antibodies using standard protocols. Alternatively, the DTTHP amino acid sequence is analyzed using LASERGENE software
(DNASTAR) to determine regions of high immunogenicity, and a conesponding peptide is synthesized and used to raise antibodies by means known to those of skill in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra. Chapter 11.) Typically, peptides 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel, supra.) Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-DITHP activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
XVII. Purification of Naturally Occurring DITHP Using Specific Antibodies Naturally occurring or recombinant DTTHP is substantially purified by immunoaffinity chromatography using antibodies specific for DTTHP. An immunoaffinity column is constructed by covalently coupling anti-DITHP antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions. Media containing DTTHP are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of DTTHP (e.g., high ionic strength buffers in the presence of detergent). The column is eluted under conditions that disrupt antibody/DTTHP binding (e.g., a buffer of pH 2 to pH 3, or a high concenttation of a chaottope, such as urea or thiocyanate ion), and DTTHP is collected.
XVIII. Identification of Molecules Which Interact with DITHP
DTTHP, or biologically active fragments thereof, are labeled with 125I Bolton-Hunter reagent. (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled DITHP, washed, and any wells with labeled DTTHP complex are assayed. Data obtained using different
concentrations of DTTHP are used to calculate values for the number, affinity, and association of DTTHP with the candidate molecules. ι
Alternatively, molecules interacting with DITHP are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH).
DTTHP may also be used in the PATHCALLTNG process (CuraGen Coφ., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
AU publications and patents mentioned in the above specification are herein incoφorated by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific prefened embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described modes for carrying out the invention which are obvious to those skilled in the field of molecular biology or related fields are intended to be within the scope of the following claims.
TABLE 1
SEQ ID Template ID SEQ ID ORF ID PFH1
NO: NO: Designation
1 LG: 1 100267.1 :2002JAN 18 189 LG: 1100267.1.orfl :2002JAN 18 HEM
2 LG:1376818.27:2002JAN18 190 LG:1376818.27.orfl:2002JAN18 HEM
3 LG:990561.44:2002JAN18 191 LG:990561.44.orf2:2002JAN18 HEM
4 LG:990855.9:2002JAN18 192 LG:990855.9.orfl:2002JAN18 HEM
5 LG:898483.1 :2002JAN18 193 LG:898483.1.orf2:2002JAN18 HEM
6 LG:150971 .1 :2002JA 18 194 LG:150971.1.orf2:2002JAN18 HEM
7 LG:7771532.20:2002JAN18 195 LG:7771532.20.orf2:2002JAN18 HEM
8 LG:15012όl .5:2002JAN18 196 LG:1501261.5.orf2:2002JAN18 HEM
9 LG: 1454772.2:2002JAN 18 197 LG:1454772.2.orf2:2002JAN18 HEM
10 LG;203951 ,12:2002JAN18 198 LG:203951.12.orf2:2002JAN18 HEM
1 1 LG:142131.1ό:2002JAN18 199 LG:142131.16.orf3:2002JAN18 EITM
12 LG:333034.3:2002JAN18 200 LG:333034.3.orf3:2002JAN18 EITM
13 LG:1291525.9:2002JAN18 201 LG:1291525.9.orf3:2002JAN18 EITM
14 LG:777124ό.13:2002JAN18 202 LG:7771246.13.orfl :2002JAN18 EITM
15 LG: 1 125820.1 :2002JAN 18 203 LG:1125820.1.orfl :2002JAN18 RM
16 LG:299789.1 :2002JAN18 204 LG:299789.1.orfl :2002JAN18 RM
17 LG:044888.1 :2002JAN18 205- LG:044888.1.orfl:2002JAN18 RM
18 LG:410020.3:2002JAN18 206 LG:410020.3.orf2:2002JAN18 RM
19 LG:7684165.8:2002JAN18 207 LG:7684165.8.orf2:2002JAN18 RM
20 LG:358050.2:2002JAN18 208 LG:358050.2.orf 1 :2002JAN 18 RM
21 LG:230905.7:2002JAN18 209 LG:230905.7.orf2:2002JAN18 RM
22 LG:7688735.3:2002JAN18 210 LG:7688735.3.orfl :2002JAN18 . RM
23 LG:445084.2:2002JAN18 21 1 LG:445084.2.orf2:2002JAN 18. RM
24 LG:7696681.1 :2002JAN18 212 LG:7696681.1.orfl :2002JAN 18 RM
25 LG:1446403.4:2002JAN18 213 LG: 1446403.4.orf 1 :2002J AN 18 RM
26 LG: 1042935.2:2002JAN 18 214 LG: 1042935.2.orf 1 :2002JAN 18 RM
27 LG:7691854.1 :2002JAN 18 215 LG:7691854.1.orf2:2002JAN18 RM
28 LG:97958O.l :2002JAN18 216 LG:979580.1.orf2:2002JAN18 ISM
29 LG:18513ό.4:2002JAN18 217 LG:185136.4.orf2:2002JAN18 ISM
30 LG:1398319.1 :2002JAN18 218 LG:139831 .1.orf2:2002JAN18 ISM
31 LG:375724.10:2002JAN18 219 LG:375724.10.orfl:2002JAN18 ISM
32 LG:220407.7:2002JAN18 220 LG:220407.7.orf2:2002JAN18 ISM
33 LG:259850.1 :2002JA 18 221 LG:259850.1 ,orf3:2002JAN 18 ISM
34 LG:435726.8:2002JAN18 222 LG:43572ό.8.orf2:2002JAN18 ISM
35' LG:271.394.44:2002JAN18 223 LG:271394.44.orf3:2002JAN18 ISM
36 LG:7761755.9:2002JAN18 224 LG:7761755.9.orfl :2002JAN18 ISM
37 LG:7762920.1 :2002JAN18 225 LG:7762920.1.orfl:2002JAN18 ISM
38 LG:776332ό.6:2002JAN 18 226 LG:7763326.6.orf3:2002JAN18 ISM
39 LG:242234.14:2002JAN18 227 LG:242234.14.orf 1 :2002JAN 18 ISM
40 LG:29152ό.l :2002JAN18 228 LG:291526.1.orf2:2002JAN18 ISM
41 LG:243209.10:2002JAN18 229 LG:243209.10.orf 1 :2002JAN 18 ISM
42 LG:378592.15:2002JAN18 230 LG:378592.15.orf3:2002JAN18 ISM
43 LG:357276.1 1 :2002JAN18 231 LG:357276.11.orf2:2002JAN 18 ISM
44 LG:1507027.3:2002JAN18 232 LG:1507027.3.orf3:2002JAN18 ISM
45 LG:201342.4:2002JAN18 233 LG:201342.4.orf3:2002JAN18 ISM
46 LG:327504.9:2002JAN18 234 LG:327504.9.orf 1 :2002JAN 18 ISM
47 LG:346506.19:2002JAN18 235 ' LG:346506.19.orf 1 :2002JAN 18 ISM
48 LG:7771048.3:2002JAN18 236 LG:7771048.3.orfl :2002JAN18 ISM
49 LG:395081.7:2002JAN18 237 LG:395081.7.orf3:2002JAN 18 ISM
TABLE 1
SEQ ID Template ID SEQ ID ORF ID PFH
NO: NO: Designation
50 LG: 1452709.28:2002JAN 18 238 LG:1452709.28.orfl :2002JAN18 ISM
51 LG:991 162.52:2002JAN18 239 LG:991 162.52.orf3:2002JAN18 ISM
52 LG:346677.11:2002JAN18 240 LG:346ό77.11 ,orf2:2002JAN 18 ISM
53 LG: 1400284.13:2002JAN 18 241 LG:1400284.13.orf2:2002JAN 18 ISM
54 LG:7698465.26:2002JAN18 242 LG:7698465.26.orf3:2002JAN 18 TFM
55 LG:7698ό96.18:20O2JAN 18 243 LG:7698696.18.orf3:2002JAN18 TFM
56 LG:350410.3:2002JAN18 244 LG:350410.3.orf3:2002JAN18 TFM
57 LG:7770751.8:2002JAN18 245 LG:7770751.8.orfl :2002JAN18 TFM
58 LG:052513.3:2002JAN18 246 LG:052513.3.orf3:2002J AN 18 TFM
59 LG:7692334.1 :2002JAN18 247 LG:7692334.1.orfl :2002JAN 18 TFM
60 LG:199284.11 :2002JAN 18 248 LG: 199284.11.orf2:2002JAN 18 TFM
61 LG:7ό83993.13:2002JAN18 249 LG:7683993.13.orf2:2002JAN18 TFM
62 LG: 1079823.1 :2002JAN 18 250 LG: 1079823.1.orf 1 :2002JAN 18 ZFR
63 LG: 1082263.10:2002JAN18 251 LG:1082263.10,orf3:2002JAN 18 ZFTR
64 LG:1076162.1 :2002JAN18 252 LG:1076162.1.orf2:2002JAN18 ZFTR
65 LG:404157.1:2002JAN18 253 LG:404157.1.orf3:2002JAN 18 ZFR
66 LG:474725.1 :2002JAN18 254 LG:474725.1.orfl ;2002JAN18 ZFR
67 LG:1080918.1 :2002JAN18 255 LG: 1080918.1. orf 1 :2002JAN18 ZFTR
68 LG:1092343.1 :2002JAN 18 256 LG: 1092343.1.orf2:2002JAN 18 ZFTR
69 LG:7684505.1 :2002JAN18 257 LG:7684505.1 ,orfl :2002JAN18 ZFTR
70 LG:7689627.1 :2002JAN18 258 LG:7689627.1 ,orf3:2002JAN 18 ZFTR
71 LG: 122863. l :2002JAN 18 259 LG:122863. l .orf3:2002JAN 18 ZFR
72 LG:7690093.1 :2002JAN18 260 LG:7690093.1.orf2:2002JAN18 ZFTR
73 LG:1449021.1 :2002JAN18 261 LG:1449021. l .orf2:2002JAN 18 ZFTR
74 LG:958155.1 :2002JAN18 262 LG:958155.1.orf3:2002JAN18 ZFTR
75 LG:7684559.1 :2002JAN18 263 LG:7684559.1.orf2:2002JAN18 ZFTR
76 LG:080328.2:2002JAN18 264 LG:080328.2.orf3:2002JAN18 ZFR
77 LG:7687730.5:2002JAN18 265 LG:7687730.5.orf2:2002JAN18 ZFR
78 LG:7691462.5:2002JAN18 266 LG:7691462.5.orf2:2002JAN18 ZFR
79 LG:769O229.9:2002JAN 18 267 LG:7690229.9.orf3:2002JAN18 ZFR
80 LG:76911 17.5:2002JAN18 268 LG:7691 117.5.orf3:2002JANl 8 ZFTR
81 LG:413642.1 :2002JAN18 269 LG:413642.1 ,orf3:2002JANl 8 ZFTR
82 LG:7771639.1 :2002JAN18 270 LG:7771639.1.orf2;2002JAN18 ZFTR
83 LG:7684553.3:2002JAN18 271 LG:7684553.3.orfl :2002JAN18 ZFR
84 LG:7690374.7:2002JAN18 272 LG:7690374.7.orf2:2002JAN18 ZFR
85 LG:7690065.3:20 2JAN18 273 LG:7690065.3.orfl :2002JAN18 ZFR
86 LG;7690583.5:2002JAN18 274 LG:7690583.5.orf3:2002JAN18 ZFTR
87 LG:7771893.1 :2002JAN18 275 LG:7771893.1. orf 1 :2002JAN 18 ZFTR
88 LG:7691582.2:2002JAN18 276 LG:7691582.2.orf2:2002JAN18 ZFTR
89 LG:7687809.2:2002JAN18 277 LG:7687809.2.orf3:2002JAN18 ZFR
90 LG:7691200.3:2002JAN18 278 LG:7691200.3.orf2:2002JAN18 ZFTR
91 LG:405709.4:2002JAN18 279 LG:405709.4.orf 1 :2002JAN 18 ZFR
92 LG:982979.1 :2002JAN18 280 LG:982979.1 ,orf3:2002JAN 18 MTM
93 LG:7669310.1 :2002JAN18 281 LG:7669310.1.orfl :2002JAN18 MTM
94 LG:231546.6:2002JAN18 282 LG:231546.6.orf3:2002JAN18 MTM
95 LG:7693668.4:2002JAN 18 283 LG:7ό93668.4.orf3:2002JAN18 MTM
96 LG:7771057.9:2002JAN18 284 LG:7771057.9.orfl :2002JAN18 MTM
97 LG:114448.25:2002JAN18 285 LG:1 14448.25.orfl :2002JAN18 MTM
98 LG:180803.3:2002JAN18 286 LG:180803.3.orf3:2002JAN18 PMMM
TABLE 1
SEQ ID Template ID SEQ ID ORF ID PFH
NO: NO: Designation
99 LG: 1094595.3:2002JAN18 287 LG:1094595.3.orf3:2002JAN18 PMMM
100 LG:150288.12:2002JAN 18 288 LG:150288.12.orf3:2002JAN 18 PMMM
101 LG:7761700.28:2002JAN18 289 LG:7761700.28.orfl:2002JAN18 PMMM
102 LG: 1093982.42:2002JAN 18 290 LG:1093982.42.orf3:2002JAN18 PMMM
103 LG:7762752.1 :2002JAN18 291 LG:7762752.1.orf3:2002JAN18 PMMM
104 LG:013006.1 1 :2002JAN18 292 LG:013006.1 1.orf 1 :2002JAN 18 PMMM
105 LG:054509.10:2002J AN 18 293 LG:054509.10,orf3:2002JAN 18 PMMM
106 LG:345276.3:2002JAN18 294 LG:345276.3.orf2:2002JAN18 PMMM
107 LG:247354.20:2002JAN18 295 LG:247354.20.orf2:2002JAN18 PMMM
108 LG: 1454791 ,33:2002JAN18 296 LG: 1454791.33.orf3:2002JAN 18 PMMM
109 LG:7690539.5:2002JAN18 297 LG:7690539.5.orf3:2002JAN18 PMMM no LG:984007.4:2002JAN18 298 LG:984007.4.orf3:2002JAN18 NSMM
1 1 1 LG: 1093386.25:2002JAN 18 299 LG:1093386.25.orf2:2002JAN18 NSMM
1 12 LG:7693871.6:2002JAN18 300 LG:7693871.6.orf3:2002JAN18 NSMM
1 T3 LG:7ό93934.1 :2002JAN18 301 LG:7693934.1.orf2:2002JAN18 NSMM
114 LG:7697553.34:2002JAN 18 302 LG:7697553.34.orf 1 :2002JAN 18 NSMM
1 15 LG:337345.5:2002JAN18 303 LG:337345.5.orf2:2002JAN18 AM
116 ' LG:410680.7:2002JAN18 304 LG:410680.7.orf 1 :2002JAN 18 AM
1 17 LG:7771583.2:2002JAN18 305 LG:7771583.2.orfl:2002JAN18 AM
118 LG;074994.14:2002JAN18 306 LG:074994.14,orf3:2002JAN 18 AM
1 19 LG:7691 131.1 :2002JAN18 307 LG:7691131 ,l .orf3:2002JAN18 AM
120 LG:983975.1 :2002JAN18 308 LG:983975.1.orf3:2002JAN18 ARM
121 LG:1383194.7:2002JAN18 309 LG:1383194.7.orfl :2002JAN18 ARM
122 LG: 1328573.4:2002JAN18 310 LG:1328573.4.orfl :2002JAN18 ARM
123 LG:7692963.1 :2002JAN18 31 1 LG:7692963.1.orfl :2002JAN18 ARM
124 LG:7696423.1 :2002JAN18 312 LG:7696423.1.orfl :2002JAN 18 ARM
125 LG:7696234.1 :2002JAN18 313 LG:7696234.1 ,orf3:2002JANl 8 ARM
126 LG: 1388299.1 :2002JAN 18 314 LG: 1388299.1. orfl :2002JAN 18 ARM
127 LG:978521.5:2002JAN18 315 LG:978521.5.orfl :2002JAN18 ARM
128 LG:7692599,9:2002JAN18 316 LG:7692599.9.orf2:2002JAN 18 ARM
129 LG: 1452678.13:2002JAN18 317 LG:1452678.13.orf2:2002JAN 18 ETAM
130 LG:332947.1 :2002JAN18 318 LG:332947.1.orfl :2002JAN18 SEMM
131 LG:1292520.13:2002JAN 18 319 LG:1292520.13.orf2:2002JAN 18 SEMM
132 LG:7750009.1 :2002JAN18 320 LG:7750009.1.orf2:2002JAN18 SEMM
133 . LG:238322.4:2002JAN18 321 LG:238322.4.orf 1 :2002JAN 18 SEMM
134 LG:7694382.4:2002JAN 18 322 LG:7694382.4.orf2:2002JAN18 SEMM
135 LG:1329198.3:2002JAN18 323 LG:1329198.3.orfl :2002JAN18 SEMM
136 LG:345314.33:2002JAN18 324 LG:345314.33.orf3:2002JAN18 SEMM
137 LG:215030.7:2002JAN18 325 LG:215030,7.orf2:2002JAN18 SEMM
138 LG:383884.26:2002JAN18 326 LG:383884.26.orf 1 :2002JAN 18 CM
139 LG:413518.62:2002JAN18 327 LG:413518.62.orf3:2002JAN18 CM
140 LG:903138.45:2002JAN18 328 LG:903138.45.orf3:2002JAN18 CM
141 LG: 1377804.32:2002JAN18 329 LG:1377804.32.orf2:2002JAN18 CM
142 LG:1390822.13:2002JAN 18 330 LG:1390822.13,orfl :2002JAN 18 CM
143 LG:7698830.22:2002JAN18 331 LG:7698830.22.orf2:2002JAN18 CM
144 LG:7762105.20:2002JAN18 332 LG:7762105.20.orf3:2002JAN18 CM
145 LG: 1382907.104:2002JAN 18 333 LG:1382907.104.orf3:2002JAN 18 CM
146 LG:294464.12:2002J AN 18 334 LG:294464.12.orf2:2002JAN18 CM
147 LG:003736.32:2002JAN18 335 LG:003736.32.orf2:2002JAN 18 CM
TABLE 1
SEQ ID Template ID SEQ ID ORF ID PFH
NO: . NO: Designation
148 LG:1502253.2:2002JAN18 336 LG:1502253.2.orfl :2002JAN18 CM
149 LG:216797.51 :2002JAN 18 337 LG:216797.51. orf 1 :2002JAN 18 CM
150 LG:7685287.1 18:2002JAN18 338 LG:7ό85287.1 18.orf2:2002JAN 18 CM
151 LG:405272.4:2002JAN18 339 LG:405272.4.orf 1 :2002JAN18 HCMM
152 LG:247382.7:2002JAN18 340 LG:247382.7.orf2:2002JAN18 HCMM
153 LG:7763403.34:2002JAN18 341 LG:7763403.34.orf3:2002JAN 18 HCMM
154 LG:258352.1 :2002JAN18 342 LG:258352.1.orfl :2002JAN18 HCMM
155 LG:109671 1.3:2002JAN18 343 LG:1096711.3.orfl :2002JAN18 HCMM
156 LG:77όl740.1 :2002JAN18 344 LG:7761740.1.orf3:2002JAN18 RBM
157 LG: 1382987.89:2002JAN 18 345 LG:1382987.89.orfl :2002JAN18 RBM
158 LG:444673.50:2002J AN 18 346 LG:444673.50.orf 1 :2002J AN 18 RBM
159 LG:7767853.1 :2002JAN18 347 LG:7767853.1.orf3:2002JAN18 RBM
160 LG: 1375802.70:2002JAN 18 348 LG:1375802.70.orf2:2002JAN18 RBM
161 LG:414732.1 :2002JAN18 349 LG:414732.1. orf 1 :2002JAN 18 OAM
162 LG: 1328394.25:2002JAN 18 350 LG:1328394.25.orf3:2002JAN18 OAM
163 LG:336953.5:2002JAN18 351 LG:336953.5.orf 1 :2002JAN 18 OAM
164 LG:7697931.25:2002JAN18 352 LG:7697931.25.orf3:2002JAN18 OAM
165 LG:300147.58:2002JAN18 353 LG:300147.58.orf2:2002JAN18 OAM
166 LG:7763115.9:2002JAN18 354 LG:7763H5.9.orf2:2002JAN18 OAM
167 LG:7693875.4:2002JAN 18 355 LG:7693875.4.orf2:2002JAN18 OAM
168 LG:089516.22:2002JAN18 356 LG:089516.22.orf3:2002JAN18 OAM
169 LG:336671.1 :2002JAN18 357 LG:336671.1.orf2:2002JAN18 BPM
170 LG:234504.1 1 :2002JAN18 358 LG:234504.n .orfl :2002JAN18 BPM
171 LG:1018931.3:2002JAN18 359 LG:1018931.3.orfl :2002JAN18 BPM
172 LG: 1377369.45:2002JAN18 360 LG:1377369.45.orf2;2002JAN18 BPM
173 LG: 1 135404.1 13:2002JAN18 361 LG: 1 135404.1 13.orfl :2002JAN18 BPM
174 LG: 145260ό.33:2002JAN 18 362 LG:1452606.33.orf3:2002JAN18 BPM
175 LG:018099.22:2002JAN18 363 LG:018099.22.orf3:2002JAN18 BPM
176 LG:7771625.8:2002JAN18 364 LG:7771625.8.orf2:2002JAN18 BPM
177 LG:1513012.6:2002JAN18 365 LG:1513012.6.orfl :2002JAN18 BPM
178 LG:903956.34:2002JAN18 366 LG:903956.34.orf 1 :2002JAN 18 MAGD
179 LG:331 171.22:2002JAN18 367 LG:331 171.22.orf2:2002JAN18 MAGD
180 LG:380305.28:2002JAN 18 368 LG:380305.28.orfl :2002JAN 18 MAGD
181 LG:227928.19:2002JAN18 369 LG:227928.19.orfl :2002JAN18 MAGD
182 LG: 1099593.39:2002JAN18 370 LG:1099593.39.orf3:2002JAN18 MAGD
183 LG:1501223.87:2002JAN18 371 LG:1501223.87.orfl:2002JAN18 MAGD
184 LG:7690039.1 :2002JAN18 372 LG:7690039.1.orf2:2002JAN18 MAGD
185 LG:332701.3:2002JAN18 373 LG:332701.3.orf3:2002JAN18 ■ MAGD
186 LG:237963.28:2002JAN18 374 LG:237963.28.orf3:2002JAN 18 MAGD
187 LG:245267.1 :2002JAN18 375 LG:245267.1 ,orf2:2002JAN 18 MAGD
188 LG:7761954.21 :2002JAN 18 376 LG:77όl 954.21.orfl :2002JAN 18 MAGD
188 LG:7761964.21 :2002JAN18 377 LG:77όl 954.21. orf3:2002JAN 18 MAGD
TABLE 2
SEQ ID Template ID GI Number Probability Annotation NO: Score
1 LG:1100267.1 :2002JAN18 g!000283 0.0 Human selenium donor protein (selD) mRNA, complete eds.
2 LG:1376818.27:2002JAN 18 gό46721 1 0.0 Homo sapiens DRP-2 gene for dihydropyrimidinase related protein 2, exon 14 and complete eds.
3 LG:990561.44:2002JAN18 ■ g2183037 0.0 Human diacylglycerol kinase zeta mRNA, alternatively spliced, complete eds.
4 LG:990855.9:2002JAN 18 g8132761 0.0 Homo sapiens glutathione transferase omega (GSTOl) mRNA, complete eds.
5 LG:898483.1 :2002JAN18 gl 051280 0.0 Human aldehyde dehydrogenase (ALDH8) mRNA, complete eds.
6 LG:150971.1 :2002JAN18 g 12653688 0.0 Homo sapiens, Similar to aspartyl-tRNA synthetase, clone MGC:1562
IMAGE:3344322, mRNA, complete eds.
7 LG:7771532.20:2002JAN18 gl 015320 2.0E-33 Human mRNA for alanyl-tRNA synthetase, complete eds.
8 LG:1501261.5:2002JAN18 g515631 0.0 Human glutathione S-transferase (GSTM5) mRNA, complete eds.
9 LG:1454772.2:2002JAN18 g7542574 2.0E-28 Human asparagine synthetase gene, promoter and exon 1.
10 LG:203951.12:2002JAN18 g 10434467 0.0 Homo sapiens cDNA FLJ 12778 fis, clone NT2RP2001740, weakly similar to
UBIQUITIN CARBOXYL-TERMINAL HYDROLASE DUB-1 (EC 3.1.2.15).
1 1 LG:142131.16:2002JAN18 g404012 5.0E-67 Human pre-B cell enhancing factor (PBEF) mRNA, complete eds.
12 LG:333034.3:2002JAN18 g9622332 0.0 Homo sapiens fibroblast growth factor 5 short variant (FGF5) mRNA, complete eds; alternatively spliced.
13 LG:1291525.9:2002JAN18 g598955 0.0 Human mRNA for hepatoma-derived growth factor, complete eds.
14 LG:7771246.13:2002JAN18 g 186751 0.0 Human kininogen gene, exon 10, encoding bradykinin and exon 1 1.
15 LG: 1125820.1 :2002JAN 18 g 14028616 7.0E-66 Homo sapiens low density lipoprotein receptor-related protein 5 (LRP5) gene, exons 1 through 9.
16 LG:299789.1 :2002JAN18 g53549 1.0E-25 IL-6 receptor (mutated)
17 LG:044888.1 :2002JAN18 g!4028617 1.0E-32 Homo sapiens low density lipoprotein receptor-related protein 5 (LRP5) gene, exons 10 through 23 and complete eds.
18 LG:410020.3:2002JAN18 gl 718390 0.0 Human glycine receptor alpha 1 subunit gene, partial eds.
19 LG:7684165.8:2002JAN18 g4249765 0.0 Homo sapiens retinoic X receptor beta (RXRB) gene, complete eds.
20 LG:358050.2:2002JAN18 g 1669360 1.0E-38 Kupffer cell receptor
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
21 LG:230905.7:2002JAN18 g 1552494 1.0E-138 Human germline T-cell receptor beta chain Dopamine-beta- hydroxylase-like, TRY1, TRY2, TRY3, TCRBV27S1 P, TCRBV22S1A2N1T, TCRBV9S1A1T, TCRBV7S1A1 N2T, TCRBV5S1A1T, TCRBV13S3, TCRBV6S7P, TCRBV7S3A2T, TCRBV13S2A1T, TCRBV9S2A2PT, TCRBV7S2A1 N4T, TCRBV13S9/13S2A1T, TCRBV6S5A1 N1, TCRBV30S1 P, TCRBV31S1, TCRBV13S5, TCRBV6S1A1 N1, TCRBV32S1 P, TCRBV5S5P, TCRBV1S1A1 N1, TCRBV12S2A1T, TCRBV21S1, TCRBV8S4P, TCRBV12S3, TCRBV21S3A2N2T, TCRBV8S5P, TCRBV13S1 genes from bases 1 to 267156 (section 1 of 3).
22 LG:7688735.3:2002JAN 18 g3861482 0.0 Human chromosome 3, olfactory receptor pseudogene cluster 1, complete sequence, and myosin light chain kinase (MLCK) pseudogene, partial sequence.
23 LG:445084.2:2002JAN18 g 13365552 2.0E-22 Homo sapiens SRCL mRNA for scavenger receptor with C-type lectin type II, complete eds. oo σ-, 24 LG:7696681.1 :2002JAN18 g871359 3.0E-47 Human mRNA for olfactory receptor expressed pseudogene, poly A site.
25 LG:1446403.4:2002JAN18 g3861482 0.0 Homo sapiens chromosome 3, olfactory receptor pseudogene cluster 1, complete sequence, and myosin light chain kinase (MLCK) pseudogene, partial sequence.
26 LG:1042935.2:2002JAN18 g 1552494 1.0E-1 19 Human germline T-cell receptor beta chain Dopamine-beta- hydroxylase-like, TRY1, TRY2, TRY3, TCRBV27S1 P, TCRBV22S1A2N1T, TCRBV9S1A1T, TCRBV7S1A1 N2T, TCRBV5S1A1T, TCRBV13S3, TCRBV6S7P, TCRBV7S3A2T, TCRBV13S2A1T, TCRBV9S2A2PT, TCRBV7S2A1 N4T, TCRBV13S9/13S2A1T, TCRBV6S5A1 N1, TCRBV30S1 P, TCRBV31S1, TCRBV13S5, TCRBV6S1A1N1, TCRBV32S1 P, TCRBV5S5P, TCRBV1S1A1N1, TCRBV12S2A1T, TCRBV21S1, TCRBV8S4P, TCRBV12S3, TCRBV21S3A2N2T, TCRBV8S5P, TCRBV13S1 genes from bases 1 to 267156 (section 1 of 3).
27 LG:7691854.1 :2002JAN18 g2358042 1.0E-176 Homo sapiens T-cell receptor alpha delta locus from bases 501613 to 752736 (section 3 of 5) of the Complete Nucleotide Sequence.
28 LG:979580.1 :2002JAN 18 gl l 140021 2.0E-52 Homo sapiens WEE! gene for protein kinase and partial ZNF143 gene for zinc finger transcription factor.
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
29 LG:185136.4:2002JAN18 g 12744923 1.0E-20 Ras association domain family 3 protein
30 LG:1398319.1 :2002JAN18 g38521 0.0 Human EF-1 delta gene encoding Human elongation factor- 1 -delta.
31 LG:375724.10:2002JAN18 g 14042694 0.0 Homo sapiens cDNA FLJ14864 fis, clone PLACE! 001845, weakly similar to
Mus musculus cyclin ania-6a mRNA.
32 LG:220407.7:2002JAN18 g 10436772 0.0 Homo sapiens cDNA FLJ 14332 fis, clone PLACE4000344.
33 LG:259850.1 :2002JAN18 g 1657753 7.0E-44 Human elastin (ELN) gene, partial eds, and UM-kinase (LIMK1) gene, complete eds.
34 LG:435726.8:2002JAN18 g4153873 3.0E-23 similar to weel-like protein kinase; similar to P30291 (PID:gl351419)
35 LG:271394.44:2002J AN 18 g7239697 0.0 Homo sapiens myosin light chain kinase isoform 2 (MLCK) mRNA, complete eds.
36 LG:7761755.9:2002JAN18 g!0439740 0.0 Homo sapiens cDNA: FLJ23148 fis, clone LNG09313, highly similar to
ABO 18001 Homo sapiens mRNA for Death-associated protein kinase 2.
37 LG:7762920.1 :2002JAN18 gl 81267 0.0 Human c-yes-1 mRNA.
38 LG:776332ό.ό:2002JAN18 gl 78325 2.0E-69 Human protein-serine/threonine (AKT2) mRNA, complete eds.
39 LG:242234.14:2002JAN18 g 10437870 0.0 Homo sapiens cDNA: FLJ21715 fis, clone COLI 0287, highly similar to
AF071569 Homo sapiens multifunctional calclum/calmodulin- dependent protein kinase II delta2 isoform mRNA.
40 LG:291526.1 :2002JAN18 gill40021 1.0E-47 Homo sapiens WEE! gene for protein kinase and partial ZNF143 gene for zinc finger transcription factor.
41 LG:243209.10:2002JAN18 gill40021 1.0E-23 Homo sapiens WEE1 gene for protein kinase and partial ZNF143 gene for zinc finger transcription factor.
42 LG:378592.15:2002JAN18 g897982ό 0.0 dJ776F14.2 (a novel protein member of the PTPNS (protein tyrosine phosphatase, non-receptor type substrate) family)
43 LG:357276.1 1 :2002JAN18 g2232030 0.0 Homo sapiens inositol polyphosphate 4-phosphatase type l-beta mRNA, complete eds.
44 LG:1507027.3:2002JAN18 g38521 0.0 Human EF-1 delta gene encoding Human elongation factor-1 -delta.
45 LG:201342.4:2002JAN18 g10437572 0.0 Homo sapiens cDNA: FLJ21466 fis, clone COL04842, highly similar to
AFl 27481 Homo sapiens non-ocogenic Rho GTPase-specific GTP exchange factor (proto-LBC) mRNA.
46 LG:327504.9:2002JAN18 g4007565 ό.OE-43 Human mRNA for protein kinase.
47 LG:346506.19:2002JAN18 g2077990 0.0 H.sapiens mRNA for AMP-activated protein kinase alpha-1, partial.
TABLE 2
SEQ ID Template ID GI Number Probability Annotation NO: Score
48 LG:7771048.3:2002JAN18 g3646067 8.0E-28 MEKK5 (MAP/ERK. kinase kinase 5 (ASK1, MAPKKK5, Mitogen Activated
Protein kinase kinase kinase 5))
49 LG:395081.7:2002JAN18 g8250238 2.0E-30 Human mRNA for protein phosphatase 4 regulatory subunit 2 (PPP4R2 gene).
50 LG:1452709.28:2002JAN18 g852056 0.0 Homo sapiens casein kinase I epsilon mRNA, complete eds.
51 LG:991 162.52:2002JAN18 g 15030036 4.0E-34 Human, Similar to mitogen-activated protein kinase kinase kinase 1 1, clone MGC:171 14 IMAGE:4215281, mRNA, complete eds.
52 LG:346677.1 1 :2002JAN18 g321321ό 7.0E-33 Human IkB kinase beta subunit mRNA, complete eds.
53 LG:1400284.13:2002JAN 18 g 14276190 0.0 Homo sapiens rho GTPase activating protein 8 isoform 1 (ARHGAP8) mRNA, complete eds.
54 LG:7698465.26:2002JAN18 g9714271 0.0 Homo sapiens partial mRNA for double stranded RNA binding nuclear protein ILF3, alternative transcript.
55 LG:7698696.18:2002JAN18 g475788 0.0 Homo sapiens DNA-binding protein (APRF) mRNA, complete eds.
56 LG:350410.3:2002JAN18 g33968 0.0 H.sapiens irlB mRNA.
57 LG:7770751.8:2002JAN18 gill38930 2.0E-67 Human homeobox B7 (HOXB7) gene, partial eds; and homeobox B6
(HOXB6), homeobox B5 (HOXB5), homeobox B4 (HOXB4), and homeobox B3 (HOXB3) genes, complete eds.
58 LG:052513.3:2002JAN18 gl 151169 0.0 Human signal transducer and activator of transcription StatδA mRNA, complete eds.
59 LG:7692334.1 :2002JAN18 g6689881 0.0 Homo sapiens iroquois homeobox protein 4 (IRX4) mRNA, complete eds.
60 LG: 199284.1 1 :2002JAN 18 g4519621 2.0E-09 OASIS protein
61 LG:7683993.13:2002J AN 18 g4454551 0.0 Homo sapiens silencing mediator of retinoic acid and thyroid hormone receptor alpha mRNA, complete eds.
62 LG:1079823.1 :2002JAN 18 g8099348 9.0E-92 zinc finger protein
63 LG: 1082263.10:2002JAN 18 g 10437541 3.0E-44 Homo sapiens cDNA: FLJ21441 fis, clone COL04422.
64 LG:1076162.1 :2002JAN18 g573019ό ό.OE-43 Kruppel-type zinc finger
65 LG:404157.1 :2002JAN18 g 13938351 0.0 Similar to zinc finger protein 268
66 LG:474725.1 :2002JAN18 g 13543419 0.0 Similar to zinc finger protein 304
67 LG:1080918.1 :2002JAN18 gl2314165 0.0 bA526D8.4 (novel KRAB box containing C2H2 type zinc finger protein)
68 LG: 1092343.1 :2002JAN 18 g488555 • 2.0E-70 zinc finger protein ZNF135
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
69 LG:7684505.1 :2002JAN18 g 13938350 0.0 Homo sapiens, Similar to zinc finger protein 268, clone IMAGE:3352268, mRNA, partial eds.
70 LG:7689627.1 :2002JAN18 g1049300 1.0E-29 Human KRAB zinc finger protein (ZNF177) mRNA, complete eds.
71 LG:122863.1 :2002JAN18 g12584159 0.0 zinc finger protein 268
72 LG:7690093.1 :2002JAN 18 g!017721 0.0 Human repressor transcriptional factor (ZNF85) mRNA, complete eds.
73 LG: 1449021.1 :2002JAN 18 g14042550 0.0 unnamed protein product
74 LG:958155.1 :2002JAN18 g7012690 4.0E-34 KRAB-zinc finger protein KID3
75 LG:7684559.1 :2002JAN18 g38031 0.0 Human ZNF43 mRNA.
76 LG:080328.2:2002JAN18 g13623606 5.0E-40 Human, zinc finger protein 136 (clone pHZ-20), clone MGC:1271 1
IMAGE:4040430, mRNA, complete eds.
77 LG:7687730.5:2002JAN18 g4164082 0.0 Human zinc finger protein EZNF (EZNF) mRNA, complete eds.
78 LG:7691462.5:2002JAN18 g7023215 1.0E-108 Homo sapiens cDNA FLJ 10891 fis, clone NT2RP4002078, weakly similar to
ZINC FINGER PROTEIN 91.
79 LG:7690229.9:2002JAN18 gl 0434855 1.0E-55 Human cDNA FLJ 13032 fis, clone NT2RP3001120, moderately similar to
ZINC FINGER PROTEIN 136.
80 LG:7691 1 17.5:2002JAN18 g288424 0.0 H. sapiens ZNF37A mRNA for zinc finger protein.
81 LG:413642.1 :2002JAN18 g 12483904 5.0E-59 zinc finger protein H1T-39
82 LG:7771639.1 :2002JAN18 gl4042351 1.0E-124 Homo sapiens cDNA FLJ 14673 fis, clone NT2RP2003714, moderately similar to ZINC FINGER PROTEIN 91.
83 LG:7684553.3:2002JAN18 g 10435737 0.0 Human cDNA FLJ 13659 fis, clone PLACE! 01 1576, moderately similar to
Human Kruppel related zinc finger protein (HTF10) mRNA.
84 LG:7690374.7:2002JAN18 g 10435737 0.0 Human cDNA FLJ 13659 fis, clone PLACE! 01 1576, moderately similar to
Human Kruppel related zinc finger protein (HTF10) mRNA.
85 LG:7690065.3:2002JAN18 g7023215 2.0E-85 Human cDNA FU10891 fis, clone NT2RP4002078, weakly similar to ZINC
FINGER PROTEIN 91.
86 LG:7690583.5:2002JAN18 gl 2655165 2.0E-34 zinc finger protein 256
87 LG:7771893.1 :2002JAN18 gl 017721 0.0 Human repressor transcriptional factor (ZNF85) mRNA, complete eds.
88 LG:7691582.2:2002JAN18 g38031 0.0 Human ZNF43 mRNA.
89 LG:7687809.2:2002JAN18 gl2314165 0.0 bA526D8.4 (novel KRAB box containing C2H2 type zinc finger protein)
90 LG:7691200.3:2002JAN18 gl3752754 0.0 zinc finger 1 1 1 ! (Homo sapiens)
91 LG:405709.4:2002JAN18 g6650686 3.0E-16 Homo sapiens Y-linked zinc finger protein (ZFY) gene, complete eds.
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
92 LG:982979.1 :2002JAN18 g2088550 0.0 Human hereditary haemochromatosis region, histone 2A-like protein gene, hereditary haemochromatosis (HLA-H) gene, RoRet gene, and sodium phosphate transporter (NPT3) gene, complete eds.
93 LG:7669310.1 :2002JAN!8 gl 3562151 2.0E-41 Homo sapiens small-conductance calcium-activated potassium channel (KCNN3) gene, complete eds.
94 LG:231546.6:2002JAN!8 g14571904 4.0E-08 lysosomal amino acid transporter 1
95 LG:7693668.4:2002JAN18 g14571904 2.0E-37 lysosomal amino acid transporter 1
96 LG:7771057.9:2002JAN18 g6996441 1.0E-122 Homo sapiens CTL! gene.
97 LG: 1 14448.25:2002J AN 18 g12804460 0.0 Homo sapiens, ATP synthase, H+ transporting, mitochondrial Fl complex, alpha subunit, isoform 1, cardiac muscle, clone MGC:3! 74
IMAGE:3355758, mRNA, complete eds.
98 LG:180803.3:2002JAN18 g14326412 1.0E-12 short heat shock protein 60 Hsp60s2
99 LG:1094595.3:2002JAN!8 g14042793 0.0 Homo sapiens cDNA FLJ 14922 fis, clone PLACE! 007729, weakly similar to
RETROVIRUS-RELATED PROTEASE (EC 3.4.23.-).
100 LG:150288.12:2002JAN 18 gl!071726 0.0 Homo sapiens mRNA for putative sialoglycoprotease type 2. 10! LG:7761700.28:2002JAN18 g190282 0.0 Human protective protein mRNA, complete eds.
102 LG:1093982.42:2002JAN18 g13543665 0.0 Homo sapiens, peptidylprolyl isomerase A (cyclophilin A), clone
MGC: 14681 IMAGE:4109260, mRNA, complete eds.
103 LG:7762752.1 :2002JAN 18 gόO13463 6.0E-86 carboxypeptidase homolog
104 LG:01300ό.l l :2002JAN18 g11493589 3.0E-16 zinc metalloendopeptidase
105 LG:054509.10:2002JAN18 g12002206 0.0 Homo sapiens chymotrypsin-like protein mRNA, complete eds.
106 LG:34527ό.3:2002JAN18 g6002693 0.0 Homo sapiens histone acetyltransferase MORF alpha mRNA, alternative splice product, complete eds.
107 LG:247354.20:2002JAN18 g9280814 8.0E-27 calpain
108 LG:1454791.33:2002JAN18 g5!7064 2.0E-49 Human chaperonin protein (Tcp20) gene complete eds.
109 LG:7690539.5:2002JAN18 g182612 9.0E-94 Human coagulation factor IX gene, complete eds.
1 10 LG:984007.4:2002JAN! 8 g2228749 1.0E-180 Human RNA polymerase III subunit (RPC32) mRNA, complete eds.
1 1 1 LG: 1093386.25:2002JAN! 8 g2231374 1.0E-124 Human EST clone 25267 mariner transposon Hsmarl sequence.
112 LG:7693871.ό:2002JAN18 g7332056 7.0E-29 contains similarity to Pfam family PF00078 (Reverse transcriptase (RNA- dependenf)), score=79.6, E=ό.3e-20, E=l
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
1 13 LG:7693934.1 :2002JAN 18 g2104909 4.0E-99 Human endogenous retrovirus H Dl leader region/integrase-derived
ORFl, ORF2, and putative envelope protein mRNA, complete eds.
1 14 LG:7697553.34:2002JAN18 gl 0437416 0.0 Homo sapiens cDNA: FLJ21332 fis, clone COL02523, highly similar to
HSU59321 Human DEAD-box protein p72 (P72) mRNA.
1 15 LG:337345.5:2002JAN18 gl3161065 0.0 Homo sapiens chromosome X protocadherin 1 1 (PCDH11) mRNA, complete eds, alternatively spliced.
1 16 LG:410680.7:2002JAN18 g198697 0.0 laminin A-chain
1 17 LG:7771583.2:2002JAN18 g5456921 0.0 Homo sapiens protocadherin alpha 9 (PCDH-alpha9) mRNA, complete
1 18 LG:074994.14:2002JAN 18 g13794594 3.0E-18 Homo sapiens genomic DNA, chromosome 8q23, clone:KB1672E10.
1 19 LG:7691 131.1 :2002JAN18 g14009458 0.0 Homo sapiens protocadherin-betal 1 (PCDHB1 1) mRNA, complete eds.
120 LG:983975.1 :2002J AN 18 g5926692 3.0E-47 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 4/20.
121 LG:1383194.7:2002JAN18 g9971 1 13 0.0 Homo sapiens mRNA for MHC class I antigen, HLA-E*0101 allele (HLA-E gene).
122 LG:1328573.4:2002JAN18 g2114242 0.0 Homo sapiens immunoglobulin lambda gene locus DNA, clone:2H8.
123 LG:7692963.1 :2002JAN18 g5926696 0.0 Human genomic DNA, chromosome 6p21.3, HLA Class I region, section
8/20.
124 LG:7696423.1 :2002JAN 18 g2765422 0.0 Homo sapiens mRNA for immunoglobulin kappa light chain.
125 LG:7696234.1 :2002JAN18 g10437403 0.0 Homo sapiens cDNA: FLJ21321 fis, clone COL02335, highly similar to
HSA010442 Homo sapiens mRNA for immunoglobulin kappa light chain.
126 LG:1388299.1 :2002JAN 18 g5926698 2.0E-12 Human genomic DNA, chromosome 6p21.3, HLA Class I region, section
10/20.
127 LG:978521 ,5:2002JAN18 g5926696 7.0E-15 Human genomic DNA, chromosome 6p21.3, HLA Class I region, section
8/20.
128 LG:7692599.9:2002JAN18 g37881 0.0 Human chromosome 1 immunoglobulin V(K)I gene, part, with 5' breakpoint between orphon and neighbouring non-amplified region.
129 LG:1452678.13:2002JAN 18 g184538 0.0 Human isovaleryl-coA dehydrogenase (IVD) mRNA, complete eds.
130 LG:332947.1 :2002JAN18 g1668744 9.0E-67 HHa5 hair keratin type I intermediate filament
131 LG:1292520.13:2002JAN 18 g3127925 0.0 H. sapiens RNA for type VI collagen alpha3 chain.
132 LG:7750009.1 :2002JAN18 g1183937 1.OE-93 gamma-fibrinogen
133 LG:238322.4:2002JAN18 g12081908 0.0 Homo sapiens mRNA for semaphorin Y, complete eds.
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
134 LG:7694382.4:2002JAN18 g202412 1.0E-40 Wnt-7b
135 LG:1329198.3:2002JAN18 g187229 3.0E-92 Human pancreatic lipase related protein 1 (PLRP1) mRNA, complete
136 LG:345314.33:2002JAN18 g292043 5.0E-90 Human mucin mRNA, partial eds.
137 LG:215030.7:2002JAN18 - g1709300 0.0 Homo sapiens amyloid precursor-like protein 1 (APLP1) mRNA, complete eds.
138 LG:383884.26:2002JAN 18 gό572247 4.0E-93 dJ466Nl .4 (novel protein similar to ANK3 (ankyrin 3, node of Ranvier
(ankyrin G)))
139 LG:413518.62:2002JAN18 g4006920 7.0E-78 actin interacting protein
140 LG:903138.45:2002JAN18 g7022112 0.0 Homo sapiens cDNA FLJ 10210 fis, clone HEMBA1006344, weakly similar to RADIXIN.
141 LG:1377804.32:2002JAN18 gl531593 0.0 Human BRCA2 region, mRNA sequence CG037.
142 LG:1390822.13:2002JAN 18 g2665835 0.0 Homo sapiens dynein light intermediate chain 2 (LIC2) mRNA, complete eds.
143 LG:7698830.22:2002JAN18 g14424676 0.0 Homo sapiens, transgelin 2, clone MGC: 15279 IMAGE:4301018, mRNA, complete eds.
144 LG:7762105.20:2002JAN18 g10439441 0.0 Homo sapiens cDNA: FLJ22906 fis, clone KAT05659, highly similar to
HSTROPCR Human 2.5 kb mRNA for cytoskeletal tropomyosin TM30(nm).
145 LG: 1382907.104:2002JAN 18 g2352946 0.0 Homo sapiens smooth muscle myosin heavy chain SMI mRNA, alternatively spliced, partial eds.
146 LG:294464.12:2002JAN18 g623408 0.0 Homo sapiens keratin 10 type I intermediate filament (KRT10) mRNA, complete eds.
147 LG:003736.32:2002JAN18 gl 160964 2.0E-55 Human phosphoglucomutase-related protein (PGMRP) gene, complete
148 LG:1502253.2:2002JAN18 g7339829 5.0E-56 Human keratin 18 (KRT18) gene, complete eds.
149 LG:216797.51 :2002JAN 18 g2952330 0.0 Homo sapiens Arg/Abl-interacting protein ArgBP2a (ArgBP2a) mRNA, complete eds.
150 LG:7685287.1 18:2002JAN 18 g14424510 0.0 Homo sapiens, actin, beta, clone MGC: 10644 IMAGE:3960255, mRNA, complete eds.
151 LG:405272.4:2002JAN18 g184785 0.0 Human Ig superfamily cytotoxic T-lymphocyte-associated protein (CTLA-
4) gene, last exon.
152 LG:247382.7:2002JAN18 g4151807 0.0 membrane-associated guanylate kinase-interacting protein 2 Maguin-2
153 LG:7763403.34:2002JAN18 gl071680 0.0 H. sapiens mRNA for rat translocon-associated protein delta homolog.
TABLE 2
SEQ ID Template ID GI Number Probability Annotation NO: Score
154 LG:258352.1 :2002JAN18 g507743 0.0 Human vesicular acetylcholine transporter mRNA, complete eds.
155 LG:109671 1.3:2002JAN18 g1335863 2.0E-29 Human patched homolog (PTC) mRNA, complete eds.
156 LG:7761740,1 :2002JAN18 g12653460 0.0 Human, ribosomal protein LI 7, clone MGC:8457 IMAGE:2821464, mRNA, complete eds.
157 LG: 1382987.89:2002JAN18 g401844 5.0E-14 Human ribosomal protein LI 8a mRNA, complete eds.
158 LG:444673.50:2002JAN18 g36139 8.0E-48 Human mRNA for ribosomal protein L7.
159 LG:7767853.1 :2002JAN18 g13509321 3.0E-86 Homo sapiens ST5 gene for suppression of tumorigenicity 5, L27a gene for ribosomal protein L27a and KIAA0298 gene.
160 LG:1375802.70:2002JAN18 g1399085 0.0 Human ribosomal protein L23a mRNA, complete eds.
161 LG:414732.1 :2002JAN18 g183232 0.0 Human beta-glucuronidase mRNA, complete eds.
162 LG:1328394.25:2002JAN18 g7022370 0.0 Homo sapiens cDNA FLJ10377 fis, clone NT2RM2001989, weakly similar to
NUCLEOLAR PROTEIN NOP4.
163 LG:336953.5:2002JAN18 gl813423 0.0 Homo sapiens mRNA for HCS, complete eds.
164 LG:7697931.25:2002JAN18 g14249958 0.0 Homo sapiens, heterogeneous nuclear ribonucleoprotein C (C1 /C2), clone MGC:3150 IMAGE:3354131, mRNA, complete eds.
165 LG:300147.58:2002JAN18 g241477 0.0 heterogeneous nuclear ribonucleoprotein complex K (human, mRNA,
2302 nt). .
166 LG:77631 15.9:2002JAN18 g8671585 0.0 Homo sapiens ataxin 2-binding protein (A2BP) mRNA, complete eds.
167 LG:7693875.4:2002J AN 18 g431309 0.0 Homo sapiens galactocerebrosidase (GALC) mRNA, complete eds.
168 LG:089516.22:2002JAN18 g 13540300 1.0E-50 nucleolar protein C7B
169 LG:336671.1 :2002JAN18 g9968295 0.0 Homo sapiens mRNA for inducible T-cell co-stimulator (ICOS gene).
170 LG:234504.1 1 :2002JAN18 g4240459 0.0 Homo sapiens VAMP-associated protein C (VAP-C) mRNA, complete
171 LG:1018931.3:2002JAN18 g 1932801 2.0E-45 synaptotagmin X
172 LG: 1377369.45:2002JAN 18 g 12803366 0.0 Homo sapiens, spermidine/spermine Nl-acetyltransferase, clone
MGC: 1625 1 MAGE: 3051095, mRNA, complete eds.
173 LG: 1 135404.1 13:2002JAN 18 g288099 0.0 H.sapiens initiation factor 4B cDNA.
174 LG: 1452606.33:2002JAN18 g476261 2.0E-73 Human clone CCA11 locus D20S101 mRNA containing CCA trinucleotide repeat.
175 LG:018099.22:2002JAN18 g 10438243 0.0 Homo sapiens cDNA: FLJ22003 fis, clone HEP06764.
176 LG:7771625.8:2002JAN18 g8570521 0.0 Homo sapiens genomic DNA, chromosome Iq22-q23, CDI region, section 114.
TABLE 2
SEQ ID Template ID Gl Number Probability Annotation NO: Score
177 LG:1513012.6:2002JAN18 g 1 1065992 0.0 Homo sapiens neuronal calcium binding protein NECAB1 mRNA, partial eds.
178 LG:903956.34:2002JAN18 g5852975 0.0 Homo sapiens NUMB isoform 4 (NUMB) mRNA, complete eds.
179 LG:331 171.22:2002JAN18 gl3171105 1.OE-06 pecanex
180 LG:380305.28:2002JAN 18 g2618587 2.0E-40 Human mRNA for Hrs, complete eds.
181 LG:227928.19:2002JAN18 g6539605 2.0E-27 Human metastasis suppressor protein mRNA, complete eds.
182 LG:1099593.39:2002JAN1S g1702923 1.OE-08 Human mRNA for p0071 protein.
183 LG:1501223.87:2002JAN1S g13277547 0.0 Homo sapiens, cell division cycle 42 (GTP-binding protein, 25kD), clone
MGC:5044 IMAGE:3457085, mRNA, complete eds.
184 LG:7690039.1 :2002JAN 18 g325464 0.0 Human endogenous retrovirus type C oncovirus sequence.
185 LG:332701.3:2002JAN18 g9367839 0.0 Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 2005417.
186 LG:237963.28:2002JAN18 g6708478 0.0 formin-like protein
187 LG:245267.1 :2002JAN18 gl5021904 5.0E-08 meiotic meta-phase Expressing Gene in spermatogenesis
188 LG:7761954.21 :2002JAN16 gl3129532 1.0E-120 Homo sapiens spindlin 1 mRNA, complete eds.
TABLE 3
SEQ Template ID Start Stop Frame Pfam Hit Pfam Description E-value
ID
NO:
2 LG:1376818.27:2002JAN 18 88 1 167 forward Dihydroorotase Dihydroorotase-like 2.7E-71
2 LG:1376818.27:2002JAN18 33 806 forward Dihydroorotase Dihydroorotase-like 1.8E-10
3 LG:9905όl .44:2002JAN18 353 799 forward DAGKa Diacylglycerol kinase accessory 2.7E-05
2 domain (presumed)
6 LG:150971.1 :2002JAN18 98 787 forward tRNA-synt_2 tRNA synthetases class II (D, K and N) 7.4E-09
10 LG:203951.12:2002JAN18 1367 1552 forward UCH-2 Ubiquitin carboxyl-terminal hydrolase 7.5E-18
2 family 2
10 LG:203951.12:2002JAN18 650 745 forward UCH-1 Ubiquitin carboxyl-terminal hydrolases 2.9E-13
2 family 2
12 LG:333034.3:2002JAN18 64 351 forward FGF Fibroblast growth factor 3.3E-05
13 LG:1291525.9:2002JAN18 71 289 forward PWWP PWWP domain 1.5E-33
23 LG:445084.2:2002JAN18 551 952 forward DUF101 Protein of unknown function DUF101 2.9E-62
30 LG:1398319.1 :2002JAN18 661 918 forward EF1 BD EF-1 guanine nucleotide exchange 1.1 E-42
1 domain
35 LG:271394.44:2002JAN18 66 632 forward pkinase Protein kinase domain 6.9E-13
35 LG:271394.44:2002JAN18 942 1 127 forward ig Immunoglobulin domain 2.9E-05
37 LG:7762920.1 :2002JAN18 16 558 forward pkinase Protein kinase domain 8.5E-07
42 LG:378592.15:2002JAN18 201 431 forward ig Immunoglobulin domain 1.1 E-15
44 LG:1507027.3:2002JAN18 1443 1700 forward EF1 BD EF-1 guanine nucleotide exchange 1.8E-42
3 domain
54 LG:7698465.26:2002JAN 18 1413 1604 forward dsrm Double-stranded RNA binding motif 2.8E-43
58 LG:052513.3:2002JAN18 3 536 forward STAT_bind STAT protein, DNA binding domain 1.1 E-54
59 LG:7692334.1 :2002JAN 18 577 747 forward homeobox Homeobox domain 1.2E-04
62 LG:1079823.1 :2002JAN 18 58 126 forward zf-C2H2 Zinc finger, C2H2 type 3.7E-61
63 LG: 1082263.10:2002JAN 18 3 59 forward zf-C2H2 Zinc finger, C2H2 type 8.2E-49
64 LG:1076162.1 :2002JAN18 61 129 forward zf-C2H2 Zinc finger, C2H2 type 1.1 E-27
65 LG:404157.1 :2002JAN18 210 278 forward zf-C2H2 Zinc finger, C2H2 type 3.4E-79
66 LG:474725.1 :2002JAN18 700 768 forward zf-C2H2 Zinc finger, C2H2 type 8.5E-67
66 LG:474725.1 :2002JAN18 64 186 forward KRAB KRAB box 1.4E-21
67 LG:1080918.1 :2002JAN18 331 399 forward zf-C2H2 Zinc finger, C2H2 type 2.8E-100
TABLE 3
SEQ Template ID Start Stop Frame Pfam Hit Pfam Description E-value
ID
NO:
68 LG:1092343.1 :2002JAN 18 26 97 forward zf-C2H2 Zinc finger, C2H2 type 1.2E-44
69 LG:7684505.1 :2002JAN 18 323 391 forward zf-C2H2 Zinc finger, C2H2 type 3.2E-09
70 LG:7689627.1 :2002JAN18 12 80 forward zf-C2H2 Zinc finger, C2H2 type 2.0E-23
71 LG: 122863.1 :2002JAN 18 759 827 forward zf-C2H2 Zinc finger, C2H2 type 7.9E-86
71 LG: 122863.1 :2002JAN 18 294 416 forward KRAB KRAB box 1.1E-25
72 LG:7690093.1 :2002JAN18 710 778 forward zf-C2H2 Zinc finger, C2H2 type 3.4E-148
72 LG:7690093.1 :2002JAN18 80 202 forward KRAB KRAB box 1.2E-24
73 LG:1449021.1 :2002JAN 18 629 697 forward zf-C2H2 Zinc finger, C2H2 type 8.0E-37
73 LG: 1449021.1 :2002JAN 18 266 388 forward KRAB KRAB box 5.9E-22
74 LG:958155.1 :2002JAN18 471 593 forward KRAB KRAB box 8.7E-25
75 LG:7684559.1 :2002JAN18 296 364 forward zf-C2H2 Zinc finger, C2H2 type 2.6E-34
77 LG:7687730.5:2002JAN18 866 934 forward zf-C2H2 Zinc finger, C2H2 type 8.7E-99
77 LG:7687730.5:2002JAN 18 266 388 forward KRAB KRAB box 1.5E-26
78 LG:7691462.5:2002JAN18 317 439 forward KRAB KRAB box 3.8E-28
79 LG:7690229.9:2002JAN18 270 392 forward KRAB KRAB box 6.8E-26
80 LG:7691 1 17.5:2002JAN18 2229 2297 forward zf-C2H2 Zinc finger, C2H2 type 3.6E-74
80 LG:7691 117.5:2002JAN18 1440 1562 forward KRAB KRAB box 3.3E-24
81 LG:413642.1 :2002JAN18 384 446 forward zf-C2H2 Zinc finger, C2H2 type 9.4E-06
82 LG:7771639.1 :2002JAN18 296 418 forward KRAB KRAB box 2.8E-23
83 LG:7684553.3:2002JAN 18 256 378 forward KRAB KRAB box 4.2E-24
84 LG:7690374.7:2002JAN18 330 452 forward KRAB KRAB box 2.3E-22
85 LG:7690065.3:2002JAN18 88 210 forward KRAB KRAB box 1.5E-21
86 LG:7690583.5:2002JAN18 69 137 forward zf-C2H2 Zinc finger, C2H2 type 2.4E-19
87 LG:7771893.1 :2002JAN18 361 429 forward zf-C2H2 Zinc finger, C2H2 type 1.4E-85
88 LG:7691582.2:2002JAN18 440 508 forward zf-C2H2 Zinc finger, C2H2 type 5.3E-91
89 LG:7687809.2:2002JAN18 465 533 forward zf-C2H2 Zinc finger, C2H2 type 4.7E-79
90 LG:7691200.3:2002JAN18 71 139 forward zf-C2H2 Zinc finger, C2H2 type 8.4E-109
90 LG:7691200.3:2002JAN18 1 164 1232 forward zf-C2H2 Zinc finger, C2H2 type 1.4E-04
97 LG:1 14448.25:2002JAN18 6439 7362 forward HECT HECT-domain (ubiquitin-transferase) 5.3E-196
99 LG: 1094595.3:2002JAN 18 2139 2543 forward DUF232 Putative transcriptional regulator 1.9E-53
TABLE 3
SEQ Template ID Start Stop Frame Pfam Hit Pfam Description E-value
ID
NO:
101 LG:7761700.28:2002JAN18 55 1575 forward serine_carbpept Serine carboxypeptidase 1.4E-140
102 LG: 1093982.42:2002JAN 18 206 670 forward projsomerase Cyclophilin type peptidyl-prolyl cis- 3.2E-60 2 trans isomerase
102 LG: 1093982.42:2002JAN 18 297 809 forward projsomerase Cyclophilin type peptidyl-prolyl cis- 5.4E-04 3 trans isomerase
103 LG:7762752.1 :2002JAN18 219 1067 forward Zn_carbOpept Zinc carboxypeptidase 2.0E-114
106 LG:34527ό.3:2002JAN18 1318 1428 forward PHD PHD-finger 9.2E-06
106 LG:34527ό.3:2002JAN18 1907 2470 forward MOZ_SAS MOZ/SAS family 1.2E-126
106 LG:34527ό.3:2002JAN18 mo 1283 forward PHD PHD-finger 2.8E-06
1 16 LG:410680.7:2002JAN18 106 492 forward laminin_G Laminin G domain 2.3E-94
1 17 LG:7771583.2:2002JAN18 1447 1734 forward cadherin Cadherin domain 2.9E-77
1 19 LG:7691 131.1 :2002JAN18 1516 1812 forward cadherin Cadherin domain 2.1 E-12
1 19 LG:7691 131.1 :2002JAN18 567 854 forward cadherin Cadherin domain 1.2E-49
121 LG:1383194.7:2002JAN18 88 624 forward MHCJ Class 1 Histocompatibility antigen, 1.1 E-139 1 domains alpha 1 and 2
121 LG:1383194.7:2002JAN18 673 870 forward ig Immunoglobulin domain 1.4E-07
124 LG:7696423.1 :2002JAN18 457 666 forward ig Immunoglobulin domain 8.5E-07
125 LG:7696234.1 :2002JAN18 129 353 forward ig Immunoglobulin domain 1.6E-20
129 LG: 1452678.13:2002JAN 18 824 1273 forward Acyl-CoA_dh Acyl-CoA dehydrogenase, C-terminal 1.0E-55 2 domain
129 LG: 1452678.13:2002JAN 18 536 814 forward Acyl-CoA_dh_M Acyl-CoA dehydrogenase, middle 1.2E-38
129 LG: 1452678.13:2002JAN 18 141 500 forward Acyl-CoA_dh_N Acyl-CoA dehydrogenase, N-terminal 3.1 E-63 3 domain
129 LG: 1452678.13:2002JAN 18 3168 3584 forward Acyl-CoA_dh Acyl-CoA dehydrogenase, C-terminal 3.4E-37 3 domain
129 LG:1452678.13:2002JAN 18 504 662 forward Acyl-CoA_dh_M Acyl-CoA dehydrogenase, middle 4.1 E-04 3 domain
130 LG:332947.1 :2002JAN18 4 636 forward filament Intermediate filament protein 2.0E-14
131 LG: 1292520.13:2002JAN 18 68 247 forward Collagen Collagen triple helix repeat (20 copies) 1.7E-11
133 LG:238322.4:2002JAN18 859 21 18 forward Sema Sema domain 2.0E-131
TABLE 3
SEQ Template ID Start Stop Frame Pfam Hit Pfam Description E-value
ID
NO:
134 LG:7694382.4:2002JAN18 35 643 forward wnt wnt family 2.5E-14
137 LG:215030.7:2002JAN18 248 745 forward A4_EXTRA Amyloid A4 extracellular domain 7.0E-121
138 LG:383884.2ό:2002JAN 18 244 342 forward ank Ankyrin repeat 4.0E-23
139 LG:413518.62:2002JAN18 519 1004 forward FAD_binding_4 FAD binding domain 2.5E-60
143 LG:7698830.22:2002JAN 18 139 468 forward CH Calponin homology (CH) domain 1.2E-25
143 LG:7698830.22:2002JAN 18 921 998 forward calponin Calponin family repeat 4.5E-12
144 LG:7762105.20:2002JAN18 2209 2925 forward Tropomyosin Tropomyosin 5.2E-07
144 LG:7762105.20:2002JAN18 93 794 forward Tropomyosin Tropomyosin 2.6E-73
146 LG:294464.12:2002JAN18 449 1393 forward filament Intermediate filament protein 1.6E-169
149 LG:216797.51 :2002JAN18 899 1069 forward SH3 SH3 domain 3.9E-12
150 LG:7685287.1 18:2002JAN 18 1621 2751 forward actin Actin 4.8E-07
150 LG:7685287.1 18:2002JAN18 1493 2677 forward actin Actin 1.5E-64
150 LG:7685287.1 18:2002JAN18 1521 2627 forward actin Actin 2.7E-20
152 LG:247382.7:2002JAN18 569 766 forward SAM SAM domain (Sterile alpha motif) 1.2E-1 1
152 LG:247382.7:2002JAN18 1 187 1435 forward PDZ PDZ domain (Also known as DHR or 2.0E-07 2 GLGF)
156 LG:7761740.1 :2002JAN18 171 485 forward Ribosomal_L22 Ribosomal protein L22p/Ll 7e 1.8E-05
160 LG: 1375802.70:2002J AN 18 224 445 forward Ribosomal_L23 Ribosomal protein L23 7.4E-17
161 LG:414732.1 :2002JAN18 76 537 forward Glyco_hydro_2_ Glycosyl hydrolases family 2, sugar 6.5E-12 1 N binding domain
162 LG:1328394.25:2002JAN18 1215 1448 forward rrm RNA recognition motif, (a.k.a. RRM, 6.2E-16 3 RBD, or RNP domain)
163 LG:336953.5:2002JAN18 1753 2184 forward BPL_LipA_LipB Biotin/lipoate A/B protein ligase family 1.3E-36
163 LG:336953.5:2002JAN18 2359 2505 forward BPLC Biotin protein ligase C terminal domain 5.5E-1 1
165 LG:300147.58:2002JAN18 910 1056 forward KH-domain KH domain 2.7E-20
166 LG:7763115.9:2002JAN18 704 913 forward rrm RNA recognition motif, (a.k.a. RRM, 1.0E-19 2 RBD, or RNP domain)
167 LG:7693875.4:2002JAN18 479 2485 forward Glyco_hydro_59 Glycosyl hydrolase family 59 0.0
171 LG:1018931.3:2002JAN18 55 312 forward C2 C2 domain 6.2E-21
172 LG:1377369.45:2002JAN18 1760 201 1 forward Acetyltransf Acetyltransferase (GNAT) family 9.4E-16
TABLE 3
SEQ Template ID Start Stop Frame Pfam Hit Pfam Description E-value
ID
NO: ,
177 LG:1513012.ό:2002JAN18 154 240 forward efhand EF and 4.2E-08
KΩ
6237
TABLE 4
SEQ Template ID Start Stop Frame ID NO:
105 LG.O54509.10:2002JAN 18 715 737
105 LG:054509.10:2002JAN18 738 751
105 LG:054509.10:2002JAN 18 752 771
105 LG:054509.10:2002JAN18 772 900
105 LG:054509.10:2002JAN18 901 918
105 LG:054509.10:2002JAN18 919 985
105 LG:054509.10:2002JAN18 986 1008
105 LG:054509.10:2002JAN 18 1009 1061
105 LG:054509.10:2002JAN18 1062 1081
105 LG:054509.10:2002JAN18 1082 1090
105 LG:054509,10:2002JAN18 1091 1 1 13
105 LG:054509, 10:2002JAN18 1 1 14 1 194
105 LG:054509.10:2002JAN18 1 195 1217
105 LG:054509.10:2002JAN18 1218 1252
105 LG:054509.10:2002JAN 18 1253 1275
105 LG:054509.10:2002JAN18 1276 1314
105 LG:054509.10:2002JAN18 1315 1337
105 LG:054509.10:2002JAN18 1338 1382
105 LG:054509.10:2002JAN18 669 722 forward 3
106 LG:34527ό.3:2002JAN18 1 1996
106 LG:34527ό.3:2002JAN18 1997 2019
106 LG:34527ό.3:2002JAN18 2020 2080
106 LG:345276.3:2002JAN18 2081 2103
106 LG:34527ό.3:2002JAN18 2104 2225
106 LG:345276.3:2002JAN18 2226 2248
106 LG:345276.3:2002JAN18 2249 2280
106 LG:345276.3:2002JAN 18 2281 2298
106 LG:34527ό.3:2002JAN18 2299 2312
106 LG:34527ό.3:2002JAN18 2313 2330
106 LG:345276.3:2002JAN 18 2331 2387
106 LG:34527ό.3:2002JAN18 2388 2410
106 LG:345276.3:2002JAN18 241 1 2419
106 LG:34527ό.3:2002JAN18 2420 2442
106 LG:345276.3:2002JAN18 2443 2599
106 LG:34527ό.3:2002JAN18 1 1993
106 LG:345276.3:2002JAN18 1994 2016
106 LG:34527ό.3:2002JAN18 2017 2022
106 LG:345276.3:2002JAN18 2023 2045
106 LG:345276.3:2002JAN18 2046 2059
106 LG:345276.3:2002JAN18 2060 2082
106 LG:345276.3:2002JAN18 2083 2098
106 LG:345276.3:2002JAN18 2099 2121
106 LG:34527ό.3:2002JAN18 2122 2235
106 LG:345276.3:2002JAN18 2236 2258
106 LG:34527ό.3:2002JAN18 2259 2278
106 LG:345276.3:2002JAN18 2279 2298
106 LG:345276.3:2002JAN18 2299 2312
106 LG:345276.3:2002JAN18 2313 2330
106 LG:345276.3:2002JAN18 2331 2385
TABLE 4
TABLE 5
SEQ ID NOJComponent ID Fragments
1 /LG: 1 100267.1 :2002JAN 18 1-647; 26-633; 26-288; 206-475; 411-827; 674-826
2/LG:1376818.27:2002JAN18 1-518; 7-524; 102-686; 492-632; 503-1056; 561-1092; 814-962; 814-953; 1003-1376; 1021-1592; 1028-1329; 1104-1358; 1261-1480; 1268-1510; 1287-1809; 1349-1592; 1406-1592; 1638-1805; 1685-2162; 1695-2240; 1753-2330; 1774-2287; 1811-2321; 1803-2180; 1834-2325; 1828-1924; 1824-2329; 1822-2284; 1885-2299; 1937-1999; 2030-2167; 2020-2329; 2020-2162; 2086-2648; 2153-2357; 2282-2569; 2292-2667; 2318-3031; 2353-3061; 2374-2738; 2374-2897; 2463-3004; 2494-3119; 2474-2769; 2525-2758; 2547-2790; 2547-2784; 2572-3125; 2575-2827; 2566-2815; 2568-3262; 2599-2801; 2611-2806; 2592-3271; 2722-3435; 2727-3022; 2753-2920; 2898-3126; 2888-3247; 2916-3100; 2932-3185; 2943-3652; 2971-3210; 2984-3235; 2987-3218; 3009-3265; 3012-3290; 3018-3186; 3035-3427; 3036-3297; 3182-3762; 3235-3882; 3281-3867; 3692-3912; 3693-3933
3/LG:990561.44:2002JAN18 1-570; 532-902; 815-1384; 842-957, 965-1211; 987-1533; 1048-1439; 1139-1816; 1167-1295, 1176-1746; 1226-1496; 1283-13621295-1679; 1409-1608; 1421-1749; 1438-2043; 1438-1948, 1457-1721; 1457-1683; 1504-1659; 1504-1842; 1503-1894; 1512-1691; 1530-1644; 1553-1736; 1555-1799; 1559-1745; 1571-17801576-1681; 1610-1850; 1666-1945; 1763-2210; 1770-2242 1825-2219; 1832-2094; 1839-2101 1844-2063; 1844-2062; 1844-2191; 1844-2226; 1851-2238, 1860-2197; 1860-2219; 1860-2226; 1860-2236; 1862-1981; 1888-2049; 1915-2240; 1907-2225. 1908-2226; 1930-2219; 1936-2217, 1952-2229; 1955-2219; 1969-2090; 1969-2219; 1988-2219; 2017-2217; 2067-2660; 2073-2165, 2091-2226; 2094-2226
4/LG:990855.9:2002JANl8 1-246; 1-255; 11-310; 39-523; 265-528; 304-768; 305-840; 345-770; 346-541; 381-714; 403-560;
582-746; 619-852 • •
5/LG:898483.1 :2Q02JAN18 1-322; 1-334; 1-243; 1-227; 2-275; 124-693; 129-278
6/LG:150971.1:2002JAN18 1-786; 3-226; 48-744; 323-664; 422-754; 477-1060; 489-808; 786-1016
7/LG:7771532.20:2002JAN18 1-162; 14-206
8/LG:15012όl .5:2002JAN18 1-592; 42-749; 150-457; 292-439; 675-978; 757-1222; 766-1266; 766-1010; 831-1210; 942-1215; 1004-1210
TABLE 5
SEQ ID NOJComponent ID Fragments
9/LG: 1454772.2:2002JAN18 1-550; 12-228; 336-597; 374-819; 400-797; 394-957; 400-575; 400-569; 400-939; 400-1097; 400- 918; 400-561; 400-524; 425-671; 443-677; 471-712; 476-714; 486-1032; 495-822; 524-1074; 524- 742; 524-731; 526-685; 533-793; 568-1354; 579-762; 583-1347; 581-807; 598-766; 616-871; 617- 1347; 623-861; 628-870; 631-912; 633-1046; 633-966; 633-1019; 637-1362; 645-812- 652-1192; 670-1346; 680-888; 710-921; 708-955; 722-1040; 722-1000; 728-1343; 741-1183; 747-1282; 747- 995; 748-1344; 753-1352; 753-1006; 781-1060; 812-1079; 832-1005; 851-1302; 863-1056; 863-
1075; 863-1083; 888-1157; 888-1171; 890-1189; .
894-1122; 899-1365; 907-1346; 905-1368; 906-1365; 910-1367; 914-1282; 920-1365; 921-1365; 929-1362; 928-1363; 929-1368; 934-1365; 936-1364; 937-1363; 938-1364; 940-1365; 950-1258; 956-1364; 960-1304; 960-1363; 964-1365; 968-1339; 969-1365; 973-1364; 983-1329; 993-1365; 992-1367; 1002-1366; 1005-1365; 1007-1365; 1009-1370; 1009-1363; 1013-1321; 1013-1363; 1015-1365; 1020-1246; 1021-1131; 1025-1306; 1046-1297; 1056-1367; 1063-1363; 1088-1339; 1088-1264; 1095-1362; 1121-1284; 1110-1370; 1145-1365; 1147-1365; 1149-1365; 1165-1370; 1176-1338; 1187-1365; 1198-1366; 1241-1363; 1243-1363; 1286-1365; 1318-1375
10/LG:203951.12:2002JAN18 1-224; 53-602; 53-j 599; 67-319; 69-5948; 72-369; 89-757; 90-861; 91-792; 162-738; 164-704; 201- t
LH Ul 453; 196-629; 409'-932; 409-963; 421-726; 574-851; 577-973; 579-1161; 591-1246; 648-874; 660- 924; 695-986; 7441221; 755-1199; 834-1089; 918-1720; 924-1599; 970-1620; 970-1505; 970- 1208; 1015-1557 1051-1608; 1053-1344; 1122-1395; 1178-1677; 1205-1749; 1219-1826; 1245- 1509; 1247-14741282-1783; 1298-1864; 1301-1861; 1316-1838; 1317-1920; 1385-1653; 1452- 1882; 1459-1842.1465-1974; 1495-1737; 1497-2273; 1518-1767; 1548-2091; 1648-2050; 1697- 2323; 1727-19071729-2004; 1747-2274; 1747-1915; 1751-1983; 1811-2287; 1829-2082; 1832- 1928; 1848-24641861-2080; 1868-2464; 1894-2132; 1895-2398; 1898-2150; 1900-2155; 1998- 2449; 1973-2291 2077-2337; 2136-2409; 2161-2219;
2161-2402; 2164-2424; 2199-2451; 2262-2528; 2331-2609; 2346-2516; 2346-2409; 2454-2514, 2469-3039; 2525-3048; 2552-3000; 2592-3035; 2639-3100; 2633-2810; 2636-2812; 2639-3157 2681-3040; 2687-3148; 2700-3277; 2722-3148; 2724-3149; 2747-3104; 2837-3148; 2843-3194; 2858-3149; 2870-3153; 2884-3114; 3022-3093; 3037-3679; 3047-3155; 3056-3148; 3088-3414; 3111-3665; 3125-3333; 3135-3562; 3144-3383; 3171-3527; 3183-3493; 3202-3449; 3205-3561 3213-3559; 3234-3560; 3246-3493; 3265-3501; 3297-3560; 3340-3508; 3353-3868; 3355-3868 3390-3624; 3650-3770; 3788-4323;
TABLE 5
SEQ ID NQJComponent ID Fragments
10, cont. 4018-4270. 4059-4248, 4089-4328; 4111-4530, 4111-4333, 4244-4769, 4309-4638 4348-4894, 4369-4598, 4435-4807, 4488-4757 4494-4770, 4535-5005, 4535-4766; 4591-4816. 4590-4703, 4594-5249, 4602-4972, 4612-4911 4613-5172, 4692-49774710-5004; 4712-5183, 4717-5184 4722-5187. 4731-5183. 4733-5187, 4734-5184; 4747-5192, 4756-4982; 4763-5175; 4764-4981 4772-5184 4779-5184 4791-5049, 4790-5050; 4791-5067, 4806-5166; 4806-5033, 4817-5184; 4831-5027 4843-5407 4855-5184 4865-5540; 4883-5184, 4908-5185; 4953-5392, 4941-5539, 4955-5248, 4984-5241 4986-5647, 4991-5241 5001-5529; 5006-5518, 5007-5240; 5009-5173, 5024-5140, 5082-5521 5082-5354; 5096-5341 5112-5362, 5112-5322, 5112-5313 5119-5405; 5141-5328 5165-5415; 5165-5708. 5165-5409; 5175-5361 5177-5471 5177-5748 5185-5427, 5186-5439, 5189-5426, 5193-5716, 5192-5470, 5197-5478, 5232-5428, 5234-5484. 5539-5766; 5561-5949; 5689-5948
11/LG:142131.16:2002JAN18 1-308; 263-489; 278-560; 374-612; 424-645; 460-606; 476-739; 492-606; 541-762; 642-1019; 643- 845; 675-1182; 675-830; 686-1028; 686-946; 686-916; 706-969; 732-978; 755-874; 832-1051; 960- 1157; 995-1222; 999-1192; 1002-1207; 1005-1424; 1005-1277; 1060-1333; 1080-1318; 1098-
1319; 1109-1235; 1113- 1383; 1130-1681; 1140-1394; 1169-1401; 1196-1434; 1243-1513; 1281- 1360; 1284-1763, 1284-1510; 1293-1435; 1314-1922; 1313-1620; 1321-1518; 1325-1555; 1327- 1544; 1331-1567 1341-1563; 1358-1599; 1364-1902; 1394-1604; 1436-1910; 1461-1907; 1463- 1787; 1473-1693.1510- 1674; 1517-2025, 1550-1804; 1551-1819; 1560-1907; 1576-1778; 1662- 1763; 1671-1877, 1675- 1885; 1688-2120, 1690-1888; 1718-2249; 1716-1973; 1723-2013; 1723- 1969; 1792-2261 1805- 1942; 1807-2131; 1859-2025; 1866-1949; 1868-2091; 1975-2289; 1975- 2123; 1991-2249, 1991-2245; 1991-2142, 2025-2300; 2028-2403; 2031-2253; 2154-2739; 2148- 2239; 2164-27532167-2266; 2168-2388; 2186-2372; 2189-2376; 2191-2255; 2201-2420; 2212-
2612; 2213-2484; 2221-2467; 2220-2405; 2228-2409; 2238-2452;
TABLE 5
SEQ ID NOJComponent ID Fragments
11, cont. 2272-2437; 2317-2437 2344-2603 2355-2465, 2359-2567 2408-2638, 2410-2877 2438-2902,
2439-2541 2464-2878, 2508-2951 2508-2744, 2513-2748, 2513-2706, 2551-2706.2551-2724,
2551-2720, 2551-2689, 2551-2663, 2551-2615, 2551-2669, 2614-2992, 2676-2841 2679-2806,
2685-2912; 2694-2902, 2707-2989; 2759-2976; 2772-3016; 2804-3034; 2826-3091 2832-3097,
2856-2941, 2877-2976, 2884-3087, 2899-3100, 2901-3164, 2906-2996; 2909-3462, 2918-3152
2932-3552, 2943-3235, 2961-35932974-3608, 2985-3200, 2987-3171 3003-3250.3006-3187.
3015-3243, 3054-3603 3062-3546; 3062-3324; 3065-3617 3075-3617, 3083-36173104-3232,
3108-3620; 3110-3320, 3127-35823127-3655, 3148-3404, 3150-3609. 3155-3600, 3155-3393,
3153-3806, 3156-3336, 3162-3601 3181-3280, 3183-3470 3205-3488, 3209-3339, 3211-3495;
3226-3531, 3228-3658, 3232-33753234-3382, 3248-3493, 3262-3476, 3314-3655; 3340-3477,
3461-3655, 3489-3655; 3489-3619, 3497-3804; 3517-3655; 3517-3612
12/LG:333034.3:2002JANl8 1-4378; 1-386; 77-371; 145-859; 173-845; 175-446; 303-817; 510-679; 538-1166; 732-1091; 989- 1166; 1101-1643; 1602-1829; 1633-2195; 1931-2290; 1931-2392; 1931-2233; 1934-2219; 2559-
3235; 2593-2825; 3136-3348; 3250-3426; 3369-3539; 4196-4377; 4264-4378
13/LG:1291525.9:2002JAN18 1-241; 51-717; 226-750; 484-750; 612-1292; 612-894; 773-1034; 779-1187; 888-1492; 936-1194; 990-1496; 989-1478; 1080-1757; 1093-1328; 1142-1642; 1174-1349; 1175-1285; 1178-1310; 1191-1299; 1381-1965; 1394-1674; 1413-1671; 1461-1716; 1483-1705; 1495-1706; 1500-1707; 1499-1922; 1500-1767; 1509-1871; 1515-1914; 1543-1735; 1548-1947; 1557-1756; 1585-1680; 1598-1910; 1658-1803; 1658-1897; 1693-1962; 1907-2135; 1930-2081; 1367-1606; 1386-1635;
1299-1580
14/LG:7771246.13:2002JAN18 1-740; 10-712; 123-751; 127-740; 177-494; 226-547; 235-545; 244-581; 417-616; 424-592; 444- 786; 511-823; 550-821; 612-823; 618-821; 632-821; 641-820; 674-821; 728-1315; 754-818; 897- 1694; 933-1567; 933-1552; 933-1352; 933-1666; 976-1358; 970-1312; 987-1591; 1020-1432; , 1022-1282; 1052-1516; 1 104-1376; 1098-1320; 1 131-1381; 1 137-1679; 1 181-1430; 1 192-1735; 1200-1448; 1251-1665; 1252-1616; 1269-1673; 1338-1665; 1408-1664; 1408-1665; 1409-1615; 141 1-1665; 1474-2002; 1488-1672; 1518-1881; 1518-1801; 1529-1667; 1537-1990; 1631-1997; 1701-2001; 1773-1894; 1778-2001; 1810-2003
15/LG: 1 125820.1 :2002JAN18 1 -139; 5-753
16/LG:299789.1 :2002JAN 18 173-782; 1 -349
17/LG:044888.1 :2002JAN 18 1 -585, 1 -627, 198-807,353-995,399-1027,432-1003,451 -997,608-1319,661 -997,667-1050,782- 1319,929-992,1 150-1603,1 151-1669,1 151-1715
18/LG:410020.3:2002JAN18 1-284; 1-186
TABLE 5
SEQ ID NOJComponent ID Fragments
19/LG7684165.8:2002JAN18 1-505; 182-674; 202-694; 450-1036; 529-882; 716-926; 758-1 191; 850-1200; 868-1214; 881-1202; 938-1 196; 973-1 196; 818-1 198; 941-1210; 844-1210
20/LG:358050.2:2002JAN18 87-714; 45-688; 1-277; 1-269; 45-216
21 /LG:230905.7:2002JAN18 1-251; 1 15-245; 148-282; 151-413; 152-242; 153-242; 161-242; 165-242; 232-400; 230-867; 233- 400; 234-400; 239-400; 240-400; 242-400; 243-400; 244-400; 248-352; 250-400; 252-400; 253- 883; 262-400; 261-399; 261-400; 263-419; 266-400; 293-766; 295-400; 297-400; 312-400; 339- 1094; 416-1223; 666-871; 674-920; 683-1 178; 690-982; 691-1207; 691-914; 694-886; 698-970; 698-919; 698-1234; 702-1 1 15; 702-1047; 702-886; 702-882; 702-811; 702-773; 702-1174; 705- 1 1 12; 705-1216; 705-945; 705-780; 706-942; 71 1-1025; 71 1-933; 71 1-973; 714-963; 723-1216; 723-1000; 730-1006; 730-978; 730-967; 731-928; 732-1031; 732-924; 735-1005; 741-1028; 743- 987; 748-1221; 749-1071; 755-1 181; 758-1216; 768-1217; 769-947; 772-1219; 781-1020; 782- 1215; 784-1216; 784-1085; 784-1028; 787-1216; 792-1 180; 797-1215; 803-1220; 805-1216; 816- 1216; 823-1 179; 836-1217; 844-1200; 846-1078; 848-1080; 851-1223; 860-1221; 861-1216; 873- 1 1 19; 878-1216; 884-1080; 892-1 199; 924-1 181; 1026-1222
22/LG:7688735.3:2002JAN18 1-593; 1-395; 148-277; 554-954; 554-975; 728-937; 786-959
23/LG:445084.2:2002JAN18 1-143; 1-239; 1-383; 51-268; 58-234; 73-666; 161-395; 430-655; 431-665; 431-641; 444-881; 446-
643; 447-678; 446-708; 447-671; 449-686; 454-701; 454-713; 455-644; 456-678; 456-924; 457- 697; 457-664; 457-982; 460-982; 460-800; 460-713; 460-776; 461-547; 464-639; 465-751; 466- 722; 467-715; 468-759; 467-717; 468-737; 468-982; 477-1 106; 490-949; 502-988; 526-988; 539- 807; 548-979; 570-986; 598-807; 602-983; 606-946; 606-986; 665-982; 665-985; 666-989; 693- 890; 717-1220; 732-985; 834-985; 859-1 141; 868-985; 868-982; 874-982; 874-936; 908-1 161; 917- 1192; 1155-1670; 1391-1788; 1397-1758; 1397-1919; 1399-1507; 1399-1542; 1542-1801; 1537- 1965; 1649-1976; 1775-1974
24/LG7696681.1 :2002JAN18 1-541; 10-108; 50-457; 358-603; 468-674; 559-722; 570-982; 573-985; 32-340
25/LG:1446403.4:2002JAN18 1-206; 1-547; 413-883
26/LG:1042935.2:2002JAN18 1-833; 521-1242; 586-691; 1 108-1520; 1 108-1621; 1 108-1632; 1 108-1633; 1399-1566; 1437- 1920; 1493-1819; 1519-1923; 1616-1922; 1634-1923; 1837-2408
27/LG7691854.1 :2002JAN18 755-956; 756-956; 602-896; 620-896; 646-896; 756-832; 529-755; 1-606; 244-472; 1-1 11
28/LG:979580.1 :2002JAN 18 1 -329; 61-526; 62-562; 256-526; 256-498; 393-986; 395-533; 450-1009; 669-11 15; 788-1064; 804-
1 1 12; 804-1 1 13; 804-985; 804-928; 823-1095; 823-1287; 929-1 1 14; 947-1069; 954-1015
TABLE 5
SEQ ID NOJComponent ID Fragments
29/LG:185136.4:2002JAN18 1-708; 23-630; 134-705; 138-452; 175-690; 187-693; 193-707; 245-756; 261-723; 279-728; 278- 668; 290-707; 301-707; 303-672; 343-586; 405-699; 464-813; 468-576; 477-659; 590-1020; 652- 1 195; 675-1248; 712-926; 725-854; 850-940; 861 -1 167; 861 -1 128; 938-1253; 1044-1248; 1 146- 1251
30/LG:1398319.1 :2002JAN18 1-488; 1-530; 3-261; 394-880; 841-952
31 /LG :375724.10:2002J AN 18 1-591; 1-595; 1-566; 134-647; 191-805; 201-629; 199-687; 199-837; 200-618; 201-813; 202-393; 204-706; 204-790; 204-803; 204-828; 209-787; 210-851; 212-844; 213-807; 241-828; 274-926; 284-926; 285-542; 284-805; 287-932; 284-915; 298-915; 306-935; 307-913; 309-1062; 336-558; 337-914; 339-888; 342-907; 348-712; 351-675; 359-928; 376-908; 373-895; 390-930; 417-934; 484-909; 520-805; 531-926; 540-688; 551-683; 651-915; 647-922; 860-1408; 1360-1997; 1808- 2093; 1958-2372; 2030-2408; 221-659; 221-661; 221-645; 221-675; 221-665; 221-703; 221-760; 249-778; 221-754; 374-882; 221-768; 221-824; 294-885; 221-797
32/LG:220407.7:2002JANl 8 1-379; 237-841; 451-765; 452 -544; 659-959; 781-1311; 798-1350; 935-1477; 1 003-1679; 1072- 1351; 1096-1697; 1299-1844; 1461-2025; 1568-2033; 1662-2056; 1704-2425, 1896-2454; 1901- 2166; 1926-2118; 1930-2434; 2194-2451; 2227-2483; 2373-2857; 2373-2604; 2376-2563; 2547- toΛ 2727; 2547-2800; 2571-3109; 2592-3168; 2660-3088; 2660-2918; 2763-3234; 2763-3024; 2806- 3148; 2809-3491; 2819-28802950-3547; 2968-3205; 3062-3732; 3070-3332 3125-3527; 3128- 3315; 3181-3779; 3241-3843, 3256-3604; 3296-3858; 3406-3699; 3444-3694; 3493-3700; 3512- 3858; 3524-3963; 3553-36693580-3675; 3583-3980; 3587-3868; 3606-4229, 3608-3862; 3636- 3873; 3662-3800; 3689-3950; 3703-3841; 3863-4317; 3863-4110; 3865-4306.3892-4152; 3924- 4130; 3931-4255; 3972-4432; 3977-4042; 3981-4360; 3981-4224; 3988-4251 4017-4409; 4046- 4303; 4054-4568; 4096-4325; 4053-4280; 4065-4224;
4091- 4293, 4109-4332 4109 4325; 411 ■4372; 4138-4406; 4153-4412 4153 4391 4163-4454; 4167-4815, 4174-4453, 41854466, 4187■4348; 4205-4337, 4419-4981 4435 •4692, 4451-4828, 4456-4699, 4481-4827 4510■4586; 4570 ■5111; 4616-4736, 4892-5254; 4893- •5259, 4893-5183, 4899--5361 4899-5120, 4900'•5332, 4899■5284; 4902-5195; 4903-5203 4904 ■5089, 4905-5323, 4916*-5320, 4916-5193.4918 ■5292 491 ■5187; 4919-5166; 4920-5305; 4920 ■5148, 4920-5091 4920*-5339, 4921-5312.4920 •5022, 4932-5169; 4936-5210; 4938-5363, 4941 ■5246; 4942-5120, 4948*■5372, 4949-5429, 4969 ■5268 4983 -5627; 4990-5457 4996-5203 5001 ■5351 5019-5266, 5031-■5262, 5060-5344; 5115 ■5579, 5143■5392; 5168-5423 5173-5780, 5218 ■5459; 5233-5520, 5245--5328, 5297-5641 5320 •6047 5342■5474; 5375-5704; 5420-5620, 5483 ■5771 5504-5874
33/LG:259850.1 :2002JAN18 1-649; 1-116; 137-579
TABLE 5
SEQ ID NOJComponent ID Fragments
34/LG:435726.8:2002JANl 8 1-232; 16-353; 56-618; 56-649; 56-869; 56-747
35/LG:271394.44:2002JAN18 1-490; 253-470; 266-383; 274-439; 291-899; 295-542; 299-534; 311-566; 318-909; 399-871; 437- 642; 443-1217; 461-1056; 474-1065; 475-1006; 489-856; 515-1180; 530-1124; 543-664; 599-871; 601-882; 608-1017; 615-1039; 616-868; 619-850; 639-884; 639-997; 643-849; 650-1060; 655- 1239; 662-870; 663-1272; 675-1142; 677-930; 678-930; 683-900; 709-1187; 713-951; 714-1018; 714-960; 718-1239; 721-954; 721-949; 729-950; 733-1230; 740-1179; 734-862; 741-1242; 744- 1280; 752-1235; 748-1235; 825-1181; 825-1231; 859-1285; 948-1266; 1056-1397; 1063-1347; 1075-1672; 1174-1426; 1132-1285; 1148-1482; 1196-1617; 1278-1433; 1284-1583; 1287-1499; 1292-1837; 1292-1554; 1292-1536; 1293-1536; 1297-1756; 1296-1458; 1297-1564; 1297-1390; 1297-1535; 1299-1746; 1300-1576; 1301-1829; 1305-1928; 1308-1612; 1334-1629; 1427-1916; 1455-1701; 1484-1817; 1528-1809; 1590-1953; 1618-1847; 1689-1838; 1755-2012; 1764-1965; 1372-1630; 1471-1731; 822-1057; 1553-1837; 1495-1782; 982-1274; 1590-1872; 929-1269; 903-
1182; 900-1273; 1524-1765; 1457-1681; 1751-1940; 1484-1642
3ό/LG:7761755.9:2002JAN18 1-167; 1-584; 1-256; 1-169; 60-394; 61-319; 112-720; 223-439; 312-2146; 322-558; 684-862; 700- 1195; 778-1193; 826-1110; 897-1000; 1062-1168
37/LG:7762920.1:2002JAN18 1-559
38/LG:776332ό.6:2002JAN18 344-853; 458-853; 610-853; 527-853; 304-846; 345-846; 360-844; 360-794; 163-651; 1-633; 477-
580; 152-393; 1-337
39/LG:242234.14:2002JANl 8 1-714; 10-443; 14-222; 136-609; 214-614; 242-393; 406-654; 424-840; 428-664; 428-687; 430- 665; 600-929; 679-1193; 681-1173; 768-1061; 793-1032; 791-1076; 793-1085; 792-1068; 794- 1091; 792-1084; 804-1108; 807-975; 813-1138; 819-1048; 833-1126; 835-1058; 838-1028; 844- 1077; 935-1159; 935-1215; 961-1048; 981-1238; 1020-1189; 1104-1343; 1104-1614; 1122-1315; 1148-1387; 1158-1418; 1159-1423; 1162-1483; 1186-1779; 1191-1762; 1193-1684; 1193-1411 1220-1861; 1219-1926; 1224-1498; 1224-1609; 1224-1504; 1224-1633; 1223-1357; 1224-1451 1244-1469; 1245-1478; 1261-1455; 1276-1526; 1281-2038; 1287-1800; 1291-1543; 1295-1541 1312-1397; 1322-1456; 1322-1521; 1322-1509; 1322-1531; 1325-1416; 1342-1609; 1346-1612; 1347-1507; 1359-1426; 1360-1633; 1362-1644; 1360-1708;
TABLE 5
SEQ ID NOJComponent ID Fragments
39, cont. 1362-1636; 1369-1888 1374- 1599; 1374 1730; 1374- 1579; 1393-2050; 1393- 1678, 1393-1676,
1393-1581 1393-1577, 1393- 1580; 1411 1709, .1410 1697, 1415-1508, 1417- 1887, 1416-1872;
1421-1668, 1423-1694, 1427- 1702, 1441 1684, 1443 1685, 1449-1893 1451- 1888, 1452-1888
1463-1941 1465-1714; 1466- 1889, 14761884; 1504 1790, 1510-1735, 1513- 1887, 1520-1766,
1519-1888 1523-2212, 1528- 2192, 1546■1888 1551 1864; 1569-1795; 1586- 2191 1594-1896;
1610-1889, 1615-1884 16161887 1622•1878, 1629 1886, 1646-1893, 1651- 1892, 1651-1893
1671-1878, 1673-1945, 16752233, 1679 •1883, 1677 1844, 1679-1887, 1683- •1834 1690-1981
1689-2003, 1689-1873; 1691 -1967, 1693-■1876; 1707 1891 1727-1970, 1727- ■1971 1756-2468,
1755-1891 1759-2411 1773•1964; 1778 ■2054; 1783 1895, 1792-2359; 1795- •2053, 1800-2192,
1818-2052, 1820-2375, 1822•2118 1850-2101 1864 2195,
1867-2189, 1869-2040, 1871-2128 1869-2191 1881-2132, 1881-2025; 1888-2423 1895-2098,
1911-2421 1919-2347, 1923-2424; 1926-2196; 1931-2210, 1932-2404; 1934-2210, 1936-2233,
1942-2073 1944-2418 1947-2195; 1957-2422, 1962-2195; 1968-2195; 1984-2254, 1994-2230;
2000-2247, 1995-2266, 1999-2197 2000-2390, 2002-2411 2009-2437, 2013-2226, 2014-2420,
2016-2460, 2036-2454; 2039-2464; 2037-2463, 2045-2423, 2045-2464, 2050-2336; 2053-2460,
2054-2352, 2064-2191 2068-2465, 2069-2461 2069-2411 2071-2461, 2076-2418 2074-2468,
2076-2438 2082-2461 2083-2387, 2089-2463, 2078-2257 2092-2222.2093-2465, 2098-2351
2112-2465, 2109-2393. 2134-2461 2136-2373. 2140-2461 2141-2461 2147-2461 2153-2456,
2153-2404; 2168-2461 2175-2461 2189-2459; 2204-2452, 2205-2461 2209-2461 2212-2465,
2229-2463, 2240-2462 2244-2438, 2245-2456; 2252-2461 2263-2461 2267-2461 2287-2432,
2293-2461 2298-2461 2309-2456; 2315-2467, 2315-2766; 2328-2461 2330-2463 2339-2461
2340-2598, 2347-2467. 2422-2636; 2486-2696, 2586-2736
40/LG:291526.1 :2002JAN18 1-446; 1-434; 1-458; 1-431; 56-419; 143-434; 165-434; 278-444; 282-434; 328-918; 690-1164; 739-
1121; 748-1112; 770-1146; 773-1097; 940-1233; 940-1363; 940-1171; 1093-1824; 1258-1824; 1261-1536; 1440-1878; 1472-1712; 1473-1712; 1518-1722
41/LG:243209.10:2002JAN18 1-242; 1-565; 16-455; 26-591; 63-473; 76-713; 246-582; 360-623; 382-585
42/LG:378592.15:2002JAN18 1-261; 1-549; 1-662; 90-482; 290-941; 290-856; 290-870; 506-861; 506-737; 713-1406; 860-1405;
1142-1341; 1201-1422
TABLE 5
SEQ ID NOJComponent ID Fragments
43/LG:35727ό.l 1 :2002JAN18 1-522; 45-649; 385-3309; 651 ■878; 698-1120; 7 13-1099; 1063-1758; 1139-1569; 1192-1700; 1575- 2040; 1760-2345; 2013-2158; 2013-2199; 2073-2310; 2242-2752; 2252-2485; 2249-2480; 2273- 2537; 2273-2657; 2318-2991 2457-2882; 2471 -3079; 2581-2721; 2831-3400; 2909-3133; 2882- 3296; 2932-3547; 2941-3448, 2977-3600; 3001 -3381; 2995-3441; 3001-3324; 3001-3233; 3001- 3506; 3004-3248; 3067-3366, 3067-3296; 3215-3389; 3221-3649; 3228-3817; 3242-3847; 3271- 3431; 3280-3776; 3334-3742; 3409-4139; 3474-3877; 3476-3760; 3493-3963; 3536-4022; 3612- 4182; 3623-4026; 3625-3897 3657-3801; 3734-4277; 3754-4475; 3786-4333; 3801-4088; 3857- 4118; 3871-4120; 3873-4121 3873-3951;
3876-4147 3876- 3985; 3878- ■4297 3904-4318; 3927-4266; 3927-4275; 3936- 4314; 3949-4207 3949-4149; 3949 4137, 3965-■4658, 3980-4194; 3988-4037; 4034-4194, 4045-4266, 4073-4563, 4075-4194; 4101 4676, 4212-■4320; 4212-4578; 4212-4460; 4212-4445, 4229-4715, 4246-4509; 4269-4486, 4269 4451 4314-4644; 4316-4578; 4326-4428; 4336-4593, 4347-■4873, 4468-4703, 4468-4705; 4500- ■4716; 4501--4572, 4523-4923; 4550-5009; 4598-4859, 4600-•4848, 4600-4850, 4619-4968, 4644 4808, 4650--4895; 4678-4883; 4678-5151; 4688-5137, 4692-■5195; 4727-4975, 4733-5012; 4735- ■4976; 4744-5260; t
ON t 4774-5013; 4792-5368, 4792-5379; 4796-5350; 4817-5067; 4828- 4953; 4836- 5073; 4836-5387, 4841-5102; 4843-4896; 4845-5120, 4849-5370; 4852-5027; 4869-
•5083; 4877-•5419; 4877-5090, 4911-5321; 4913-52234938-5497, 4943-5191; 4946-5252; 4966-
•5169; 4972-
■5123; 4972-5564; 4972-5400; 4972-5190, 4989-5236; 5013-5567; 5013-5284; 5021-
■5271; 5039-
•5319; 5045-5230, 5070-5432; 5073-5359, 5086-5406; 5093-5363; 5094-5299; 5101-
•5306; 5101-
•5170; 5115-5500, 5124-5349; 5124-5259, 5154-5658, 5153-5770; 5150-5315; 5151-
•5708; 5155*-5897; 5155-5396; 5155-5426; 5159-5408; 5168-5422, 5173-5734; 5172-5814; 5179--5746; 5185-
■5366; 5188-5466, 5191-5449; 5189-5819, 5191-56075202-5476; 5219-5474; 5223--5853; 5224--5454; 5240-5468, 5241-5473; 5249-5527, 5249-5744, 5254-5834; 5282-5747; 5281--5460; 5287--5810; 5310-5806; 5311-5797; 5315-5808, 5330-5839, 5370-5848; 5373-5598; 5377-5641; 5382--5853; 5383-5825; 5383-5851; 5383-5816, 5383-5853; 5383-5605; 5393-5531;
TABLE 5
SEQ ID NOJComponent ID Fragments
43, cont. 5395-5850, 5404-5825; 5406-5699, 5408-5853 5432-5817 5441-5772, 5448-5850; 5450-5850;
5451-5850, 5463-5684; 5471-5802 5475-5845 5475-5850, 5483-5745; 5487-5707, 5487-5850;
5506-5845, 5497-5754; 5504-5853, 5508-5791 5513-6042; 5513-5854; 5515-5850; 5530-5719,
5532-5853, 5536-5850, 5548-5852 5542-5843 5552-58135552-6029, 5552-57825555-5807,
5556-5809, 5556-58175558-5758, 5565-5850, 5569-5850, 5574-5826; 5582-5914, 5595-5852,
5627-5923, 5629-5843, 5627-5843, 5637-5853, 5641-5842, 5671-5853; 5678-5855; 5680-5936;
5696-5861 5706-5850; 5717-5848, 5725-5879; 5726-5829; 5727-5850, 5729-5855; 5750-5893,
5775-6299, 5785-5996, 5788-6038 5825-6051 5839-6051 5844-6384; 5842-6053, 5890-6432
5890-6038, 5890-6320, 5905-6446; 5947-6535, 5959-6583 6175-6694; 6217-6657 6217-6485,
6217-6320, 6248-6481 6254-6486, 6268-6368, 6268-6530; 6269-6542, 6274-6487, 6307-6486,
6315-6734, 6327-6786; 6345-6923, 6509-6730, 6581-6849; 6581-7031 6581-6993 6906-7034
44/LG:1507027.3:2002JAN18 1448-1781; 1450-1764; 1451-1763; 1458-1763; 1231-1761; 1289-1737; 1196-1737; 1295-1714; 1506-1690; 1408-1658; 1314-1650; 1101-1518; 1102-1357; 795-1236; 893-1148; 681-1108; 523- 1054; 706-961; 714-957; 841-943; 536-743; 444-684; 476-684; 375-684; 122-611; 96-606; 416- 589; 5-587; 43-588; 3-587; 268-551; 66-484; 1-411; 1-310; 9-310; 1-307; 27-300; 6-248; 6-245; 1-
222; 4-141; 1-68
45/LG:201342.4:2002JAN18 1-277; 32-277; 56-783; 228-1000; 319-814; 319-812; 328-790; 320-756; 331-576; 337-752; 337- 713; 353-661; 357-812; 418-809; 576-924; 579-776; 631-859; 655-1320; 680-929; 1110-1520; 1113-1380; 1113-1364; 1203-1710; 1203-1712; 1286-1367; 1336-1495; 1359-1569; 1497-2106, 1559-2249; 1643-1888; 1643-1889; 1782-1888; 1782-2053; 1809-1902; 1909-2203; 1913-2329, 1915-2251; 1917-2117; 1922-3846; 1946-2538; 1956-2399; 1956-2143; 1957-2147; 2010-2299, 2010-2520; 2017-2149; 2027-2446; 2030-2314; 2095-2318; 2126-2342; 2139-2417; 2165-2571 2208-2581; 2247-2617; 2247-2518; 2293-2537; 2299-2906; 2322-2650; 2393-2946; 2428-2685; 2479-2748; 2479-2905; 2559-2783; 2601-2771;
TABLE 5
SEQ ID NQJComponent ID Fragments
45, cont. 2626-2902; 2657-2771, 2663-3156; 2713-2976; 2775-3050; 2805-3096; 2831-3023; 2831-3250; 2831-3042; 2849-3140, 2849-3511; 2855-3296; 2890-3120; 2905-3466; 2906-3147; 2924-3190; 2927-3187; 2960-3097, 2980-31 6; 3010-3273; 3022-3232; 3023-3139; 3023-3396; 3037-3277, 3047-3511; 3098-3671; 3124-3690; 3233-3809; 3240-3811; 3260-3706; 3263-3416; 3267-3812. 3267-3541; 3274-3814; 3281-3519; 3282-3504; 3282-3512; 3281-3518; 3289-3485; 3293-3480, 3314-3764; 3325-3758, 3329-3816; 3332-3624; 3342-3783; 3370-3845; 3375-3808; 3383-3813, 3410-3641; 3438-3849, 3438-3850; 3438-3855; 3442-3845; 3447-3852; 3450-3850; 3460-3852, 3469-3845; 3474-3855; 3475-3853; 3491-3850; 3499-3842; 3503-3850; 3562-3798; 3570-3825, 3610-3846; 3637-3846; 3652-3846; 3679-3845; 3705-3846; 3752-3845; 3754-3846; 3768-3852
46/LG:327504.9:2002JAN18 1-336; 90-499; 96-776
47/LG:346506.19:2002JAN18 1-548; 1-487; 153-784; 194-784; 594-1136; 654-1227; 678-1132; 724-1181; 728-1176; 742-1130; 749-980; 749-1168; 755-1173; 771-960; 767-1178; 784-1275; 797-1055; 840-1245; 896-1284; 910- 1283; 934-1202; 934-1180; 936-1172; 1033-1284; 1056-1650; 1117-1822; 1147-1788; 1150-1283; 1192-1286; 1230-1279; 1232-1955; 1426-2039; 1472-2062; 1550-2095; 1555-1801; 1562-1640; 1575-1985; 1575-2113; 1575-1812; 1575-1801; 1586-1839; 1609-1849; 1675-1848; 1741-2235; 1755-2221; 1741-1974; 1797-2066; 1858-2396; 1858-2062; 1872-2078; 1937-2035; 1955-2209; 2001-2455; 2098-2376; 2098-2301; 2147-2377; 2155-2521; 2155-2444; 2155-2293; 2155-2274; 2155-2273; 2165-2418; 2170-2419; 2198-2471; 2210-2840; 2226-2513; 2266-2950; 2273-2702; 2273-2558; 2314-2864; 2314-2670; 2336-2589; 2342-2537; 2343-2604; 2345-2590; 2411-2658; 2421-2849; 2429-2719; 2432-2809; 2468-2638; 2480-2771;
2486-2766, 2497-2972, 2503-2702, 2530-2755, 2547-2750, 2552-2771 2587-2806, 2588-3192, 2588-2844; 2602-2858, 2622-2876, 2631-2838 2632-2888; 2635-2774; 2635-2904, 2645-3153, 2651-3134; 2651-2893, 2654-2910, 2661-2957, 2666-2867, 2683-3215, 2683-2944; 2699-2872, 2701-3110; 2702-2967, 2701-2956. 2713-2968, 2722-2838, 2732-3298, 2734-3297.2760-3044; 2797-2930; 2818-3136; 2825-3121 2833-3301 2861-3121 2863-3126; 2873-31292874-3127 2878-3119, 2880-3146, 2880-3065; 2882-3142, 2881-3063, 2913-3100, 2915-3129; 2944-3164, 2949-3212, 2959-3260, 2975-3544; 2984-3283 2984-3229; 3005-3281 3015-3268, 3015-3480, 3015-3452, 3031-3272, 3036-3154 3037-3299, 3058-3213, 3064-3332, 3065-3344, 3066-3689, 3073-3579; 3073-3327, 3074-3344, 3075-3742, 3101-3281 3106-3277, 3109-3603, 3115-3548. 3122-33933128-3601 3131-3434; 3139-3337. 3140-3373, 3141-3601 3158-3443; 3159-3238, 3160-3422; 3179-3652, 3181-3448, 3192-3489, 3207-3463,
TABLE 5
SEQ ID NOJComponent ID Fragments
47, cont. 3222-3704; 3246-3702; 3251-3645; 3265-3652; 3271-3741; 3276-3524; 3287-3744; 3307-3720;
3309-3742, 3309-3454; 3323-3545, 3315-3742, 3316-3797, 3316-3703, 3318-3533, 3325-3583
3333-3740, 3334-3746. 3334-3738, 3335-3560, 3342-3906.3342-3576, 3347-3602, 3348-3560,
3354-3742, 3370-3661 3374-3746; 3374-3709; 3374-3618 3381-3648, 3385-3645, 3396-3742,
3404-3471 3405-3742; 3407-3673, 3407-3523 3409-3642; 3410-3697, 3410-3646, 3413-3672,
3418-3742; 3429-3635; 3432-3641 3433-3645, 3433-3643, 3434-4073; 3438-3691 3440-3592,
3445-3633, 3465-3725 3480-3766, 3488-3748, 3503-3938, 3509-3645, 3523-3832, 3527-3837,
3527-3800, 3534-3680, 3540-3745, 3542-3794; 3545-3742, 3547-3748, 3555-4008, 3560-3817,
3568-3840, 3616-3742 3618-3704, 3625-3742, 3625-3745, 3637-3867, 3654-3742, 3657-3926;
3671-3742, 3731-3982 3752-4175, 3753-4179, 3752-4158, 3759-4176; 3760-4126, 3760-4029,
3782-4177, 3821-4186; 3828-4178; 3845-4159; 3860-4176; 3863-4173; 3920-4176; 3924-4138;
3950-4176
48/LG7771048.3:2002JAN18 1-287; 132-662; 132-755
49/LG:395081.7:2002JAN18 1-236
50/LG:1452709.28:2002JAN18 1-578; 536-771; 536-744; 657-924; 670-736; 697-952; 735-1011; 740-1204; 769-1198; 820-895; 826-1019; 856-1038; 874-1210; 914-1019; 962-1073; 993-1206; 992-1073; 1018-1073; 1023-
1204; 1095-1208; 1095-1204; 1103-1194; 1103-1198; 1095-1185; 1095-1210; 1095-1207
51/LG:991162.52:2002JAN18 1-512; 3-259; 153-719; 153-682; 160-505; 170-543; 205-438; 236-552; 282-547; 344-698; 380- 578; 431-648; 473-782; 481-832; 551-812; 558-820; 558-876; 574-1092; 574-1096; 574-993; 614- 1080; 649-897; 661-823; 694-1235; 745-1346; 789-1336; 824-1022; 824-1058; 854-1195; 856- 1092; 893-1235; 893-1094; 894-1170; 893-1232; 908-1206; 938-1369; 938-1194; 948-1486; 946- 1177; 958-1384; 958-1208; 978-1519; 988-1331; 988-1270; 1008-1274; 1007-1255; 1044-1393; 1045-1404; 1100-1377; 1101-1321; 1114-1630; 1116-1368; 1117-1366; 1117-1344; 1117-1348; 1117-1333; 1117-1228; 1127-1371; 1133-1643; 1171-1423; 1196-1725; 1246-1535; 1250-1792; 1266-1520; 1268-1536; 1270-1498; 1324-1573; 1335-1600; 1368-1628; 1389-1515; 1400-1650; 1407-1690; 1444-1555; 1449-1897; 1492-1663; 1500-1691; 1529-1812; 1551-2146; 1552-1814;
1555-1801; 1583-1818; 1884-2145; 1902-2144
52/LG:346677.11 :2002JAN18 1-256; 95-683; 95-629; 95-536
TABLE 5
SEQ ID NOJComponent ID Fragments
53/LG:1400284.13:2002JAN18 1-543; 22-1 9; 22-243; 22-210; 22-160; 22-161; 22-467; 22-124; 22-562; 22-476; 65-255; 85-520; 140-405; 165-454; 371-587; 371-546; 371-1041; 471-1002; 471-562; 863-1128; 864-1458; 869- 1148; 887-1222; 900-1028; 962-1282; 962-1250; 992-1351; 1056-1546; 1072-1559; 1072-1586; 1072-1314; 1094-1503; 1156-1542; 1157-1607; 1198-1591; 1213-1573; 1271-1593; 1295-1707; 1347-1608
54/LG:7698465.26:2002JAN18 1-2043; 560 1203; 710-1093; 791-1376; 852-11 11; 895-1147; 918 -1237; 931- 1193; 974-1449;
1117-1481; 1119-5982; 1122-■1685; 1034-1361 1140-1812; 1081 1356; 11761355; 1244-1529,
1250-1548; 1285-1555; 1294-■1806; 1294-1870; 1364-1931; 1393-1659; 1397- 1939; 1406-2028,
1419-1805; 1483-1717; 1517-•2092; 1521-1936; 1521-1717; 1557* 1896; 1557- 1891; 1584-1852;
1592-2041; 1597-1849; 1607-■2025; 1610-2138, 1622-2249; 16261883; 1629- 2244; 1630-2226,
1667-1934; 1670-2211; 1686-■2280; 1718-2237, 1758-2068; 17621992; 1762-■2127; 1776-2042,
1819-2096; 1822-2003; 1863*-2148; 1865-2143, 1883-2426;'1901- 2230; 1913 •2344; 1918-2143,
1926-2480; 1928-2152; 1936-2154; 1959-2444; 1995-2251; 20662372; 2133-■2623; 2191-2359;
2235-2789; 2262-2838; 2284--2470; 2289-2867, 2306-2899; 22682329; 2340-■2746; 2325-2604;
2338-2767; 2341-2632; 2343--2632; 2363-2914, 2365-2660;
2372-2811; 2391-2819; 2391-2663; 2397-2942; 2401-2601; 2446-2932; 2475-2674; 2476-3046, 2476-2935; 2478-2787; 2486-3010; 2485-2772; 2487-2777; 2500-2971; 2502-2596; 2519-2771 2531-2795; 2536-2819; 2536-2763; 2579-2865; 2579-2875; 2603-2889; 2603-2878; 2606-2860; 2617-3225; 2617-2805; 2622-2872; 2628-2917; 2628-2903; 2628-2899; 2629-2913; 2630-2926, 2630-2880; 2637-3019; 2637-2716; 2638-2864; 2655-3186; 2655-2870; 2646-2927; 2650-2897, 2655-3045; 2662-2903; 2670-2961; 2675-2834; 2678-2870; 2676-3304; 2682-3007; 2689-2876, 2690-2876; 2724-3330; 2707-2978; 2709-3237; 2709-2842; 2709-3157; 2709-3355; 2719-3152, 2720-3236; 2724-3279; 2751-2957; 2758-3339; 2760-3009; 2761-3024; 2764-3305; 2765-3349; 2770-3323; 2776-3034; 2783-3347; 2784-3088; 2789-2978; 2797-3327; 2803-3325; 2815-3329; 2817-3327; 2815-3058; 2821-3111; 2821-3065; 2851-3283; 2854-3390; 2855-3094; 2858-3154, 2861-3132; 2861-3108; 2873-3331; 2895-3350; 2895-3288; 2910-3268; 2907-3279; 2911-3171 2931-3389; 2941-3416; 2942-3161; 2944-3191; 2951-3197; 2955-3377;
TABLE 5
SEQ ID NOJComponent ID Fragments
54, cont. 2959-3430 2960-3253, 2960-3250, 2960-3228, 2961- 3192, 2962-3273 2964- 3105; 2970-3430,
2966-3201 2968-3435, 2969-3434, 2969-3433, 2974- 3161 2976-3214, 2978- 3432, 2980-3433,
2978-3503, 2980-3431 2980-3429; 2982-3429, 2981- 3367 2983-3429, 2984- 3429, 2984-3255;
2986-3431 2987-3433, 2987-35722987-3429, 2987- 3288 2988-3252, 2990- 3422, 2993-3434;
2995-3256; 3004-3435; 3007-3248, 3008-3254; 3013- ■3475, 3010-3205, 3010- ■3375; 3011-3375,
3013-3586; 3013-3375, 3013-3102, 3014-3375, 3026- ■3283, 3029-3310, 3031- ■3430, 3035-3416,
3036-3430, 3036-3375, 3037-3429, 3040-3284, 3058- ■3429, 3063-3430, 3068- •3431 3069-3429;
3076-3429, 3077-3391 3104-3436, 3086-3343 3092- ■3429, 3093-3364; 3100- •3429, 3100-3431
3112-3404, 3112-3326; 3116-3391 3132-3429, 3134- ■3431 3141-3306, 3143- ■3354; 3153-3662,
3154-3416; 3202-3429; 3208-3433, 3212-3436, 3216 -3396, 3223-3431 3227- ■3791 3233-3406,
3261-3401 3257-3539; 3257-3548, 3295-3551 3297* -3820; 3311-3431 3312- -3570; 3344-3429,
3346-3521 3328-3599, 3347-3631 3362-3781 3399- •3630;
3417-4053, 3437-3686, 3446-4014; 3446-3995; 3449-4015; 3465-4033, 3504-3671 3515-4064,
3517-3660, 3520-3673, 3524-3893, 3523-4124, 3525-4160, 3526-3754; 3559-3800, 3580-3862, to 3609-3898, 3638-4113, 3672-4250, 3673-4145, 3673-3926, 3699-3975, 3720-4185, 3723-4264;
ON -4 3728-4012; 3728-3918 3736-3978, 3745-3986; 3750-4304; 3755-4295, 3763-4367, 3778-3946,
3825-4118 3829-4389, 3842-4177, 3844-4388, 3851-4256; 3863-4067, 3884-4066; 3909-4176,
3978-4331 3978-4180, 3979-4196, 4025-4282. 4062-4316, 4062-4306; 4067-4159, 4091-4751
4099-4365; 4153-4474; 4163-4755, 4169-4392, 4171-4761 4180-4383; 4195-4732 4209-4402;
4212-4437 4232-4761 4257-4511 4263-4821 4274-4495; 4291-4818, 4315-4556, 4316-4864,
4328-4848, 4330-4905, 4338-4986; 4381-4642, 4389-4574; 4420-4711, 4423-4681 4427-4704,
4429-4975; 4436-4659, 4450-4700; 4451-4663; 4477-4743; 4488-4718, 4488-4638; 4492-4752,
4509-5039; 4520-4745; 4533-4823, 4533-4982, 4545-5025, 4561-4775; 4573-4845; 4583-4918.
4585-4794, 4590-4931 4595-5186. 4598-4852, 4610-4866, 4638-4874;
TABLE 5
SEQ ID NOJComponent ID Fragments
54, cont. 4639-4893, 4655-5157 4655-4900 4667-4852 4673-4866, 4677-5219; 4685-4933, 4695-4978, 4719-5185, 4719-4796, 4721-4955; 4754-5217 4761-5012 4761-4901 4768-5028, 4785-4968 4805-5025; 4805-4970; 4806-5185. 4827-5442, 4820-5103, 4825-5411 4825-5024, 4825-4953, 4827-5085; 4839-5105; 4840-5107. 4841-5125; 4845-4963 4848-4959. 4851-4926; 4866-5346; 4858-5056 4862-5170; 4862-5112; 4866-4978 4874-5071 4878-5139; 4907-5241 4912-5481 4916-4965; 4917-5562, 4930-5126, 4952-5070, 4966-5498, 4955-5200, 4961-5181 4989-5269, 4989-5264; 4981-5214, 4982-5411 4995-5279; 5003-5543, 5005-5212, 5019-5263. 5029-5219, 5029-5244, 5030-5274; 5033-5416, 5034-5263; 5044-5463; 5044-5256; 5057-5159; 5057-5298; 5058-5197, 5065-5311 5072-5711 5076-5477, 5076-5311 5081-5328 5083-5333, 5083-5299, 5085-5389, 5101-5343. 5101-5259, 5108-5217, 5116-5363. 5120-5629, 5137-5362, 5170-5301 5178-5426; 5179-5388 5180-5472 5185-5395, 5192-5438 5193-5457 5194-5400, 5208-5437, 5231-5802, 5241-5753, 5241-5713 5238-5448 5241-5757;
5241-5569, 5243-5504, 5243-5479, 5243-5466 5245-5466, 5299-5497, 5262-5841 5317-5530, 5267-5527, 5270-5741 5270-5463, 5282-5478, 5295-5545; 5348-5606, 5293-6051 5312-5528, 5387-5926; 5300-5774, 5279-5839; 5359-5563, 5360-5669; 5359-5619, 5324-5571 5324-5504, t
ON 00 5306-5826; 5366-5714, 5366-5524; 5366-5504, 5256-5370, 5367-5892, 5367-5571 5367-5873, 5369-5500, 5331-5591 5331-5745, 5325-5617, 5326-5618. 5380-5565, 5382-5647, 5382-5620, 5382-5579, 5337-5934; 5387-5790, 5350-5638, 5346-5781 5363-5466, 5405-5488, 5357-5920, 5372-5668, 5372-5653, 5372-5651 5372-5635; 5373-5649; 5375-5660, 5375-5644; 5358-5884; 5359-5783, 5362-5967, 5380-5999, 5388-5630. 5404-5876; 5435-5561 5410-5943, 5415-5956; 5420-5684; 5420-5671 5425-5710, 5428-5720, 5420-5978, 5433-5685; 5456-5899; 5448-5939, 5453-6058 5498-5900; 5474-5874; 5557-5809 5488-5945; 5488-6087 5506-5892; 5497-5881 5504-5979, 5532-5709, 5528-59825542-5790, 5547-5984, 5551-5987, 5555-5947, 5553-5793, 5567-5721 5572-5983, 5590-5937 5589-5978, 5592-5995, 5617-5983,
TABLE 5
SEQ ID NQJComponent ID Fragments
54, cont. 5618-5910; 5625-5982, 5628-5981; 5649-5793, 5664-5982, 5690-5744; 5695-5981 5731- 5996, 5771-5938 5780-5941 5791-5978 5791-5972, 5478-5724, 2523-2772 5731-5978, 5744-5978 5380-5638 5743-5978, 5726-5978, 2753-3021 5705-5973, 5678-5951 5552-5828, 5703-5978 5502-5777, 5373-5651 5522-5802, 5625-5904; 5694-5969; 2993-3279, 2476-2766; 2898-3193, 3123-3416 5540-5839; 5373-5681 5668-59572853-3158 2458-2767 5659-59785644-5978, 3035-3342, 5649-5978, 5628-5970; 2936-3279, 3071-3429, 5614-5942, 5569-59785534-■5978 5556-5978, 5542-5978, 5538-5976; 3000-3429, 5520-5978, 5529-5978; 5501-5978, 3013-■3546, 5502-5745. 5486-5730, 5683-5912, 3189-3416, 5501-5729, 5752-5978, 5757-5978, 2530-■2749, 2798-3016; 5560-5777 2648-2863, 5363-5574, 5475-5685, 5549-5758. 5508-5715, 5480-■5684; 5544-5745, 5522-5721 5510-5708, 5538-5733, 5804-5978; 5817-5978, 5522-5709; 2757-■2943. 3251-3429, 5562-5748, 5591-5764, 3193-3332, 5800-5927, 5859-5974; 3322-3429, 5886-5978; 3249-3429; 5884-5978, 5919-5978 5928-5978
55/LG7698696.18:2002JAN18 1-151; 1-79; 16-269; 17-268; 169-358; 291-560; 294-575; 369-640; 401-1009; 487-1064; 685- 1051; 703-1275; 711-1143; 1021-1403; 1269-1399; 1309-1555; 1388-1640; 1388-1629; 1494- 1632
56/LG:350410.3:2002JAN18 1-505; 329-490; 452-960; 535 -889; 724-1497; 771-1009; 953-1616; 965-1702; 1264-1926, 1340- 1952; 1365-2084; 1446-2198, 1577-2148; 1577-2078; 1577-1962; 1577-1954; 1619-2086; 1702- 2385; 1717-2206; 1850-2308 1853-2051; 1911-2449; 2025-2563; 2194-2591; 2194-2407, 2200- 2472; 2212-2580; 2212-2451 2212-2437; 2291-2388; 2439-3063; 2446-2879; 2446-2705; 2446- 2678; 2589-2874; 2600-3150, 2599-3175; 2680-2948; 2682-2961; 2735-3302; 2763-3075, 2813- 3044; 2904-3125; 2904-3016; 2953-3733; 3021-3301; 3037-3268; 3159-3803; 3172-3396, 3193- 3444; 3207-3309; 3259-3909, 3270-3847; 3284-3826; 3289-3770; 3297-3764; 3297-3511, 3317- 3580; 3329-3596; 3343-3810, 3342-3763; 3376-3718; 3444-3622; 3449-4000; 3456-3648, 3486- 4170; 3498-4146; 3492-4146, 3506-4169; 3506-4170; 3506-3808; 3506-3805; 3506-3789, 3506- 3750; 3506-3732; 3507-3820, 3507-3803; 3507-3791;
TABLE 5
SEQ ID NOJComponent ID Fragments
56, cont. 3507-3784 3507-3796; 3507-3798 3507-3754 3508-3784; 3515-3785 3515-3721 3524-4155 3533-3776, 3550-4177, 3551-3799, 3553-4147 3583-3823, 3590-4170; 3598-3698, 3607-4001 3611-3803, 3645-4146; 3663-3974, 3664-4142.3682-3943, 3802-4035 3806-4243, 3808-4246 3820-4313 3825-4335; 3843-4000; 3852-4267, 3832-4067 3818-4251 3898-4313, 3946-4243, 3948-4313 4015-42434039-4243, 4103-42434108-4386, 4130-4682, 4173-4429, 4173-4434; 4182-4604; 4273-4921 4339-4582, 4379-4626, 4516-5029, 4572-4961 4582-5037. 4707-5280, 4738-5022, 4783-5123, 4876-5039, 4937-5164; 5046-5287 5076-5325, 5134-5415, 5202-5687 5212-5474; 5216-5311 5242-56805242-5512; 5242-5488 5325-5591 5359-5604; 5362-5610; 5435-5605, 5455-5708, 5455-56885500-5743, 5503-5645, 5538-5839; 5539-5755, 5574-5845, 5623-5857, 5638-5938, 5638-5942; 5677-5938, 5752-5937, 5850-5938, 5858-5938; 5895-5962
57/LG:7770751.8:2002JAN18 1-267
58/LG:052513.3:2002JAN18 1-643,594-782,594-1266,718-996,718-1003,718-1153,722-968,806-1093,822-1066,826-1072,826-
1124,827-987,827-1106,892-1108,1030-1268
59/LG7092334.1 :2002JAN18 1-237; 1-450; 32-578; 38-711; 378-714; 384-520; 666-863; 742-1368; 808-1368; 811-1083; 815- 1190; 1094-1362; 1100-1684; 1100-1530; 1200-1507; 1439-1857; 1446-1915; 1453-1920; 1512- 1920; 1532-1916; 1574-1924
60/LG:l99284.11:2002JAN18 1-574
61/LG7683993.13:2002JAN18 1732-2336; 1727-2291; 1720-2139; 1644-1984; 1644-1983; 1409-1727; 1420-1727; 1228-1716; 1080-1679; 1538-1606; 1297-1516; 1044-1475; 460-1121; 460-1008; 460-722; 1-541; 1307-1727
62/LG:1079823.1:2002JAN18 436-892; 360-892; 1-547; 59-541
63/LG:l082263.10:2002JAN18 1-442; 7-553; 34-604; 63-435; 82-731; 197-581; 207-750; 227-865; 249-772; 257-912; 315-717;
316-621; 338-849; 340-616; 366-764; 434-936; 434-696; 452-865; 566-751; 594-876; 647-876;
652-922; 683-877; 729-840;.729-1228; 731-877; 767-905; 784-903; 791-1172; 1040-1 163; 1042- 1J72
64/LG:1076162.1:2002JAN18 1-269; 200-780; 213-360; 366-907; 453-651; 453-1122; 830-1203; 857-1114
65/LG:404157.1 :2002JAN18 1-772; 183-769; 301-969; 454-877; 457-752; 498-788; 581-838; 581-1071; 867-1050; 931-1319; 931-1526; 942-1084; 1052-1514; 1091-1619; 1201-1663; 1247-1666; 1259-1660; 1265-1669; 1354-1516; 1369-1663; 1468-1821; 1601-2102; 1608-1664; 1635-1917; 1675-1928; 1795-2069; 1808-2269; 1811-2124; 1835-2177; 2186-2787; 2250-2745; 2335-2787; 2357-2786; 2366-2588; 2408-2656; 2424-2795; 2451-2601; 2477-2787; 2514-2787; 2527-2942; 2597-2788
TABLE 5
SEQ ID NOJComponent ID Fragments
66/LG:474725.1 :2002JAN 18 1-534; 1-612; 93-534; 200-728; 449-1077; 514-1077; 588-921; 588-842; 632-879; 866-1075; 866- 1360; 936-1 177; 938-1531; 1 101-1401; 1 136-1379; 1368-1979; 1418-1965; 1443-2091; 1446- 1765; 1451-1965; 1478-1918; 1579-1999; 1614-2006; 1624-2003; 1633-1998; 1663-1895; 1668-
2004; 1690-2179; 1736-2001; 1875-2004; 1902-2003; 1902-2071
67/LG:1080918.1 :2002JAN18 21 18-2459; 2050-2459; 2163-2456; 1901-2262; 1899-2227; 1712-2227; 1317-1890; 1050-1715; 1051-1715; 1066-1702; 1399-1604; 1317-1510; 834-1368; 1002-1298; 525-1094; 834-1077; 649-
1048; 529-923; 269-815; 218-815; 205-802; 1-639
68/LG: 1092343.1 :2002JAN18 1-444; 4-305; 41-247; 259-707; 259-490; 338-811; 339-602; 351-757; 386-936; 403-601; 556- 1093; 556-777; 675-840; 742-984; 751-1065; 804-1223; 817-1265; 855-1265; 902-1261; 915-
1265; 931-1265; 939-1322; 950-1 1 10; 950-1092; 973-1265; 991-1265; 1130-1268
69/LG7684505.1 :2002JAN18 347-926; 486-873; 152-815; 469-627; 243-618; 382-618; 1-618; 207-618; 440-606; 440-604; 340- 599
70/LG7689627.1 :2002JAN18 1-401; 50-293
71/LG:122863.1 :2002JAN18 1-444; 42-326; 240-957; 348-931; 435-1067; 441-703; 442-727; 746-1505; 976-1229; 991-1510; 1057-1807; 1099-1609; 1 179-181 1; 1348-1809; 1370-1632; 1406-1730; 1661-2054; 1794-2480
72/LG7690093.1 :2002JAN18 1-684; 26-587; 26-260; 59-694; 270-833; 300-833; 675-1225; 675-938; 698-1391; 1272-1548; 1275-1665; 1328-1648; 1345-1875; 1351-1772; 1355-1778; 1525-1794; 1712-1987; 1888-2062;
2018-2298
73/LG:1449021.1 :2002JAN18 663-1251; 663-1220; 500-1 1 19; 206-807; 314-725; 316-721; 330-704; 233-690; 393-690; 257-690;
232-679; 87-664; 120-551; 79-310; 1-217 __
74/LG:958155.1 :2002JAN18 1-771; 123-844; 377-772
75/LG7684559.1 :2002JAN18 1-316; 161-664; 161-400; 244-578; 359-957; 361-705; 542-816; 542-1065; 544-990; 880-1236; 941-1506; 944-1200; 962-1251; 1030-1328; 1035-1270; 1040-1276; 1048-1318; 1048-1317; 1 122- 1269; 1 136-1647; 1 140-1772; 1 168-1595; 1 168-1445; 1200-1604; 1202-1460; 1246-1783; 1246- 1809; 1246-1814; 1355-1455; 1355-1984; 1502-1874; 1502-1720; 1593-2041; 1588-2182; 1589- 21 1 1; 1609-1785; 1609-2209; 1615-1956; 1629-2144; 1633-1886; 1668-2141; 1722-1983; 1735- 2010; 1773-2100; 1818-2088; 1818-2240; 1842-221 1; 1842-2005; 1934-2140; 1982-2487; 2004- 2280; 2004-2487; 2020-2182; 2027-2490; 2032-2490; 2036-2490; 2044-2166; 2044-2490; 2065-
2490; 2199-2487; 2286-2694; 2351-2490; 2357-2490; 2370-2487; 2395-2490; 2430-2490
7ό/LG:080328.2:2002JAN18 1-195,1-674,170-101 1,176-750,798-1 149
TABLE 5
SEQ ID NO:/Component ID Fragments
77/LG.7687730.5:2002JANl 8 1-436; 142-379; 148-447; 171-402; 192-789; 195-462; 420-830; 682-1 174; 683-1238; 690-1335; 757-1303; 757-1020; 772-1 135; 774-1040; 986-1 179; 1 172-1721; 1291-1873; 1298-1850; 1298-
1655; 1308-1761 1308-1788; 1309-1820; 1379-2000; 1475-1797; 1488-1999; 1531-1705; 1546- 2196; 1660-2093. 1683-2154; 1709-2325; 1732-21 16; 1746-2010; 1759-2049; 1762-2347; 1763- 231 1; 1796-2198, 1864-2356; 1872-2208; 1914-2200; 1967-2246; 1975-2202; 1992-2422; 2008- 2200; 2036-2307, 2036-2423; 2062-2267; 2062-2254
78/LG:7691462.5:2002JAN18 1-464
79/LG:7690229.9:2002JAN 18 1 -502; 5-504
80/LG7691 1 17.5:2002JAN18 1-510; 1-622; 78-264; 88-264; 88-704; 108-454; 202-320; 354-442; 612-1 168; 768-1268; 768-986; 819-1065; 821-1049; 1002-1593; 1002-1645; 1009-1505; 1378-1853; 1413-1849; 1775-2324;
2247-2383; 2297-2872; 2297-2767, 2319-2848; 2374-3383; 2395-2662; 2397-2564; 2528-2765; 2594-2869; 2645-3142; 2714-3251 2757-3050; 2873-3055; 2881-3434; 2894-3631; 2934-3173 2974-3247; 3192-3941; 3359-3941; 3548-3802; 3548-3881; 3548-3828; 3583-3958; 3655-4054; 3766-4040; 3769-4054; 3827-4330; 3860-4440; 3914-4123; 4071-4400; 4129-4399; 4129-4416; 4199-4432; 4246-4400; 4246-4333; 4282-4400; 4347-4400
81/LG:413642.1 :2002JAN18 1-715
82/LG:7771639.1 :2002JAN18 1-251; 1-247; 2-251; 7-251; 52-456; 72-597; 74-430; 75-51 1; 74-508; 75-601; 75-61 1; 76-508; 78- 336; 128-342; 133-324; 99-378; 158-342; 128-408; 140-248; 153-400; 186-580; 157-233; 235-581; 235-367; 235-463; 365-907; 380-564; 380-836; 380-853; 544-756; 560-624; 668-831
83/LG:7684553.3:2002JAN18 1-515; 1-588
84/LG:7690374.7:2002JAN18 316-961; 360-961; 359-961; 354-620; 1-523
85/LG:7690065.3:2002JAN18 1-477; 1-599; 1-479
86/LG:7690583.5:2002JAN18 1-327
87/LG:7771893.1 :2002JAN18 1-392; 3-262; 126-737; 145-537; 186-781; 359-456; 541-687; 576-1 197; 639-882; 717-1323; 719- 967; 776-1029; 942-1581; 969-1240; 978-1475; 1075-1577
TABLE 5
SEQ ID NOJComponent ID Fragments
88/LG:7691582.2:2002JAN18 1-519; 1-532; 19-621; 25-435; 115-461; 170-647; 170-813; 521-946; 771-1118; 771-1030; 840- 1468; 869-1063; 964-1198; 993-1505; 1027-1468; 1107-1686; 1110-1585; 1110-1935; 1115-1407; 1128-1696; 1271-1929; 1271-1868; 1274-1561; 1337-1793; 1358-1556; 1450-1758; 1452-1698; 1459-1698; 1505-1948; 1508-1760; 1522-1791; 1612-1895; 1628-1688; 1634-1860; 1666-1772; 1734-2020; 1765-2044; 1787-2467; 1795-2043; 1835-2429; 1879-2214; 1905-2084; 1916-2140; 1916-2170; 1926-2239; 1926-2166; 1933-2190; 1916-2538; 1928-2468; 1933-2376; 1944-2200; 1978-2466; 2000-2227; 2028-2256; 2044-2311; 2081-2667; 2161-2708; 2157-2757; 2188-2727; 2213-2444; 2249-2668; 2238-2565; 2274-2666; 2271-2531; 2290-2528; 2297-2733; 2362-2772; 2364-2770; 2365-2761; 2372-2658; 2374-2613; 2375-2773; 2408-2772; 2452-2770; 2470-2771; 2471-2667; 2490-2729; 2490-2770; 2504-2770; 2521-2622; 2524-2770; 2524-2775; 2554-2770;
2584-2770; 2713-2777
89/LG:7687809.2:2002JAN18 2-387; 1-268; 69-628; 592-1129; 594-878; 853-1089; 853-1055; 859-1114; 901-1392; 904-1176; 1006-1530; 1029-1481; 1029-1573; 1045-1608; 1050-1294; 1066-1490; 1119-1704; 1126-1578; 1165-1726; 1186-1472; 1278-1480; 1299-1763; 1369-1782; 1395-1747; 1430-1763; 1430-1668;
1456-1730
90/LG:7691200.3:2002JAN18 1-391; 182-694; 409-1014; 413-999; 562-1170; 568-829; 598-909; 620-890; 719-1303; 721-1015; 803-1024; 888-1090; 967-1498; 992-1251; 1181-1595; 1303-1606; 1366-1603; 1471-1608; 1471-
1592
91 /LG:405709.4:2002JAN18 1-339; 273-557; 274-614; 274-423; 279-416; 298-567; 303-636; 412-614; 431-614; 442-633; 482-
614; 492-614
92/LG:982979.1 :2002JAN18 1-357; 1-60; 18-70; 187-377; 189-379; 190-376; 190-360; 210-737; 292-348; 666-1227; 687-1099; 737-1251; 739-1046; 746-1 105; 806-1273; 825-1036; 825-973; 829-1289; 878-1261; 900-1287; 915-1187; 937-1309; 1038-1300; 1053-1288
93/LG:7669310.1 :2002JAN18 1-608
94/LG:231546.6:2002JAN 18 1-723; 1-705; 1-558; 1-539
95/LG:7693668.4:2002JAN 18 1 -458; 58-442
96/LG:7771057.9:2002JAN18 1-274; 1-238; 1-319; 94-222; 96-768; 97-813; 97-344; 298-606; 392-778; 507-880; 535-882; 535- 804; 576-882; 633-839
TABLE 5
SEQ ID NOJComponent ID Fragments
97/LG: 1 14448.25:2002JAN 18 1-681; 2-263; 12-538; 15-601; 136-226; 159-619; 166 331; 171-614; 173-516; 192-624; 202-807; 202-808; 210-725; 266-857; 272-617; 291-615; 442-1 138; 507-1145; 507-725; 546-1185; 561- 11 17; 652-1008; 735-1160; 765-1365; 798-1 143; 890-1470; 948-1526; 1055-1445; 1061-1626; 1 150-1445; 1 146-1389; 1 149-1775; 1 160-1774; 1248-1721; 1243-1697; 1248-1737; 1295-1625, 1315-1502; 1322-1772; 1322-1864; 1327-1748; 1335-1495; 1466-1944; 1485-2062; 1562-2138, 1582-2240; 1594-7883; 1647-7485; 1720-2236; 1720-2239; 1720-2014; 1721-1967; 1734-2200, 1772-1923; 1877-2287; 1895-2491; 1892-2180; 1916 ■2379; 1921-2379; 1928-2366; 1952-2368, 2027-2273; 2068-2790; 2127-2380; 2129-2720; 2131 •2659; 2232-2781; 2233-2480; 2267-2539, 2322-2587; 2379-2552; 2439-2780; 2484-2691; 2529'*-2774;
2566- 3053, 2563-3123 2563-3104; 2586 ■3231; 2644-3229; 2792-3024; 2985-3409; 2994-3624; 3028-3614, 3037-3308, 3063-3225; 3089•3304; 3091-3596; 3139-3293; 3213-3778; 3226-3699; 3249-•3614, 3255-3369, 3260-3596; 3279 •3857; 3312-3583; 3385-4015; 3429-3764; 3446-3603, 3476-■4090; 3477-3767, 3599-3914; 3601 •3920; 3606-4211; 3625-3869; 3636-4094; 3648-3939, 3692-4223, 3718-4272; 3723-4180; 3748 ■4307; 3774-4170; 3898-4460; 3936-4242; 3961-4514, 3983-■4576, 3996-4485, 4021-4590; 4025■4549; 4035-4276; 4052-4312; 4080-4625; 4109-4626, 4135*4661; 4156-4563, 4157-4437;
4175-4535; 4185-4532; 4193-4702; 4190-4518; 4207-4518; 4280-4540; 4300-4653; 4300-4496, 4326-4910; 4350-5074; 4357-4538; 4361-4575; 4362-5088; 4370-5095; 4390-4656; 4538-5193, 4542-5192; 4622-511.1; 4628-5302; 4628-5310; 4657-5265; 4705-5229; 4722-4984; 4733-5242, 4741-5361; 4750-4830; 4762-5376; 4773-5267; 4773-5010; 4789-5395; 4812-5331; 4855-5475; 4889-5159; 4934-5304; 4940-5343; 4968-5563; 5006-5629; 5069-5533; 5110-5511; 5124-5371 5128-5655; 5130-5333; 5217-5806; 5248-5880; 5264-5560; 5267-5942; 5279-5524; 5298-5813. 5304-5560; 5310-5533; 5326-5537; 5347-5866; 5349-5616; 5351-5613; 5362-5562; 5421-6014; 5411-5950; 5431-5638; 5435-5889; 5449-5866; 5501-5866; 5505-6056; 5518-6005; 5521-5734, 5557-5978; 5559-5813; 5567-5980; 5567-5832; 5576-5990; 5580-6004; 5615-5704; 5628-5989, 5630-6226; 5630-6077; 5636-5844; 5640-6158; 5658-5824; 5663-5910; 5671-5835; 5671-5934; 5678-6301; 5680-6162; 5680-5910; 5692-5906; 5695-5783; 5727-5981; 5726-6424; 5731-5961 5734-6022; 5752-6175; 5764-6272; 5777-6169; 5761-6173; 5781-6413;
TABLE 5
SEQ ID NOJComponent ID Fragments
97, cont. 5781-6060; 5802-6192; 5802-6048; 5807-6079; 5824-6076; 5836-6104; 5849-6427; 5850-6260, 5862-6108; 5878-6330; 5887-6136; 5902-6291; 5930-6385; 5929-6315; 5931-6034; 5935-6534; 5932-6275; 6005-6179; 6012-6112; 6012-6452; 6024-6203; 6037-6576; 6034-6387; 6037-6284; 6038-6118; 6041-6543; 6057-6542; 6068-6540; 6073-6410; 6093-6328; 6109-6491; 6117-6390, 6117-6356; 6120-6391; 6124-6443; 6124-6391; 6125-6489; 6125-6403; 6126-6507; 6132-6484; 6141-6396; 6151-6370; 6154-6377; 6156-6391; 6159-6358; 6159-6238; 6164-6396; 6168-6903, 6192-6752; 6193-6506; 6213-6498; 6226-6556; 6226-6491; 6251-6520; 6253-6817; 6278-6574; 6280-6861; 6288-6556; 6295-6561; 6305-6582; 6306-6822; 6309-6640; 6323-6573; 6324-7485, 6327-6546; 6340-6840; 6340-6606; 6342-6597; 6342-6571; 6346-6618; 6349-6630; 6351-6624, 6368-6640; 6371-6791; 6371-6629; 6379-6840; 6384-6561; 6384-6979; 6385-6660; 6384-6598, 6384-6759; 6383-6783; 6387-6635; 6389-6672; 6397-6617; 6398-6652; 6406-6669; 6405-6575;
6406-6889; 6415-6803; 6424-6651; 6428-6750; 6442-6590;
6460-6714; 6466-6757; 6477-6681; 6494-6969; 6500-6920; 6508-6781; 6520-7059; 6530-6785, 6534-6819; 6534-7183; 6557-7237; 6541-6799; 6553-6798; 6554-6978; 6564-7154; 6567-7067, 6567-6942; 6603-6911; 6613-7140; 6618-6895; 6661-7106; 6661-7162; 6661-6934; 6661-6914, 6671-6913; 6676-7105; 6686-7239; 6709-6984; 6712-6940; 6715-6920; 6716-6994; 6716-6955, 6722-7358; 6721-6974; 6723-6997; 6723-6991; 6740-6972; 6743-7001; 6740-7166; 6748-7311 6754-7002; 6759-6846; 6764-7415; 6762-7018; 6769-6983; 6771-6969; 6778-7091; 6780-7034, 6786-7014; 6790-7043; 6793-7034; 6792-7043; 6795-7092; 6796-7401; 6796-6975; 6773-7500; 6808-7462; 6820-7073; 6832-7092; 6849-7095; 6839-7115; 6840-7279; 6841-7073; 6842-7140, 6846-7036; 6859-7263; 6848-7581; 6849-7414; 6852-7081; 6854-7478; 6857-7105; 6858-7115, 6869-7554; 6871-7548; 6856-7109; 6890-7031; 6881-7332; 6882-7051; 6888-7124; 6893-7140; 6893-7143; 6893-7394; 6894-7176; 6897-7440; 6903-7622; 6910-7497; 6911-7157; 6914-7169; 6914-7154; 6918-7436; 6922-7477; 6928-7282; 6930-7198; 6931-7048;
TABLE 5
SEQ ID NOJComponent ID Fragments
97, cont. 6938- 7440, 6940-7143 6945-7193, 6947-7067; 6949- 7245; 6950- 7214; 6952-7228. 6963-7213 6963-7225; 6963-7242, 6974-7237, 6977-7297. 6977-7245; 6978- 7253.6982-7578.6986-7309, 6989-7139; 6997-72387004-71467006-7483 7008-7550; 7016- 7241 7019-7485; 7018-7260 7020-7484, 7002-7478 7022-7289; 7005-7479, 7024-7235; 7026- 7288, 7026-75067028-7478 7030-■7485; 7031-7490, 7031-74127031-7280, 70357478; 7041- 7525.7040-7295; 7026-7483, 7043-•73207047-7234; 7055-7478, 7056-7483, 7040-•7484; 7064- ■7480.7052-7484; 7075-7363, 7059-7487 7059-7483, 7077-7478 7077-7287 7077•7238; 7079- 7338, 7080-7327.7083-7495, 7068-7443 7068-7478, 7075-74467095-7306 70967877; 7110- 73577111-7627, 7110-7349 7118--7487, 7104-7478.7124-7394, 7125-7284, 7118 ■7477; 7126 -7478, 7113-74287134-7556, 7121-7428, 7119-7404; 7121-75667121-7480; 71237428; 7130 74867140-7489; 7142-7477, 7142-7489, 7142-7409, 7166-7303, 7148-7476; 71707468; 7159- 7483.7165-7483, 7168-7440, 7168-7424; 7186-7411 7170-7478; 7169-7483, 71717475;
7195-7883; 7197-7883, 7179-7453, 7204-7738, 7203-7576. 7210-7868; 7198-7443; 7206-7456, 7214-7428; 7217-7483, 7217-7426, 7221-7428, 7225-7481 7226-7479, 7229-7439; 7233-7556, 7251-7473; 7254-7460; 7279-7481 7304-7889; 7303-7600; 7284-7475, 7307-7638; 7308-7563, to -J
ON 7291-7490; 7311-7882 7315-7838 7299-7482, 7299-7481 7318-7839, 7302-7464; 7300-7409, 7332-7589; 7337-7845; 7336-7611 7318-7483, 7318-7477 7350-7616, 7357-7839; 7361-7689, 7363-7878; 7362-7639; 7346-7483, 7375-7670. 7376-7691 7376-7687, 7376-7668; 7376-7639. 7378-7840; 7378-7627, 7359-7591 7385-7743, 7384-7843, 7385-7663, 7374-7526; 7398-7465, 7399-7851; 7402-7848, 7402-7878 7404-7845, 7404-7468, 7412-7883, 7412-7877; 7413-7618 7414-7858; 7414-7621 7416-7875, 7416-7515; 7419-7686; 7414-7878, 7429-7627; 7430-7677. 7436-7879; 7439-7882; 7440-78447444-7878; 7445-7878; 7445-7689, 7446-7717; 7449-7608, 7449-7709; 7451-7679, 7452-7667, 7456-7703 7455-7687, 7456-7738.7458-7883; 7442-7802, 7461-7715; 7464-7878, 7464-7884, 7470-7878. 7473-7626, 7474-7883,
TABLE 5
SEQ ID NOJComponent ID Fragments
97, cont. 7475-7738, 7476-7874; 7481-7707, 7481-7687, 7488-7878, 7495-7736, 7495-7738 7497-7885, 7497-7738 7498-7736; 7506-7738. 7508-7881; 7513-7889; 7514-7878, 7519-7883 7528-7738, 7533-7875, 7538-7878, 7540-7885, 7538-7738, 7541-7738, 7549-7738 7552-7860.7552-7878, 7562-7878, 7569-7873, 7566-7738, 7553-7738, 7567-7738, 7576-7879, 7577-7706.7580-7879, 7581-7866, 7585-7738, 7587-7860, 7597-7921, 7587-7738, 7593-7856. 7601-7882; 7597-7875; 7617-7878, 7620-7856; 7620-7738, 7624-7883, 7625-7882; 7630-7883, 7638-7903 7638-7688, 7647-7883 7650-7881 7652-7883 7669-7876; 7687-7882, 7686-7738 7719-7883, 7759-7843 7762-7975, 7762-7979, 7762-7883, 7762-7876, 7767-7883, 7774-7883, 7779-7965, 7794-7883, 7797-7883, 7799-7878, 7827-7885, 7829-8064, 7830-8100, 7833-8092; 7838-8001 7845-8114, 7930-8252, 7930-8314; 7930-8142, 7931-8601 7947-8151 8095-8352, 8115-8315; 8115-8562, 8230-8559; 8230-8524; 8313-8517, 8454-8559; 8459-8559, 8462-8559. 4028-4514, 6037-7478, 2748-3529; 6587-73275619-6317 6037-6745; 5619-6310;
6602-7311; 5422-6141; 5660-6345; 6225-6901; 5552-6228; 6584-7242; 4000-4674; 4726-5395; 6657-7324; 4741-5373; 414-1041; 4878-5479; 8255-8338; 6579-6684; 7624-7738; 7766-7880; 7760-7864; 8413-8559; 3228-3381; 4478-4635; 7053-7232; 6709-6896; 818-1009; 6769-6976; 5719-5930; 7122-7326; 5665-5889; 384-601; 390-614; 7762-7879; 5773-6007; 6756-6999; 2832- 3065; 5697-5929; 6805-7056; 5689-5926; 8230-8462; 1887-2157; 6094-6363; 6877-7160; 2268- 2570; 8249-8528; 8291-8559; 3333-3662; 3257-3589; 1201-1537; 8213-8559; 8214-8557; 3811- 4155; 3950-4293; 844-1189; 6755-7113; 5504-5862; 5629-5990; 6785-7150; 5616-5977; 5616- 5983; 5595-5964; 5578-5960; 3237-3620; 5462-5847; 5618-6007; 3198-3589; 212-601; 6491- 6892; 5462-5867; 5565-5977; 159-573; 6067-6467; 5462-5868; 5462-5873; 6214-6634; 5462- 5880; 278-703; 5579-6007; 389-853; 2565-3027; 4745-5212; 817-1283; 389-864; 587-1107; 1739-
226
98/LG:180803.3:2002JAN18 512-847; 229-748; 1-556; 1-540; 1-419
TABLE 5
SEQ ID NOJComponent ID Fragments
99/LG: 1094595.3:2002JAN18 1-335; 1-614; 13-271; 90-692, ; 315-898; 317-613; 354-613; 426-831; 443-739; 474-755; 539-896; 634-939; 649-81 1; 887-1460; 887-1066; 887-1338; 894-1112; 894-1427; 897-1130; 897-1082; 1341-1782; 1656-2690; 2071- 2728; 2086-2514; 2085- 2513, 2084-2591 2083-2631; 2083-2604 2086-2684; 2097-2579; 2098-2531; 2098-2605; 2103- 2319, 2105-2585; 2103-2617; 2103-2418, 2103-2354; 2103-2307; 2103-2683; 2103-2306; 2103- 2331 2104-24902103-2338; 2103-2351 2103-2315; 2107-2584; 2103-2340; 2103-2431, 2103- 2357, 2103-2426; 2103-2317; 2103-2289, 2103-2390; 2103-2476; 2103-■2369; 2103-23702103- ■2371 2103-2375, 2104-■2397; 2105-2288, 2105-2338; 2105-2330; 2104-•2331; 2105-2539, 2104- •2362, 2105-2386, 2105-•2364; 2105-2616, 2104-2402; 2105-2640; 2106-•2512; 2104-2310; 2105- •2685, 2105-2354; 2105-•2409; 2105-2366; 2106-2355; 2106-2331; 2106•2460; 2105-2341, 2105- ■2421 2106-2318, 2106■2452; 2106-2603, 2105-2342; 2106-2633; 2105■2646; 2105-2316; 2106 ■2306, 2105-2352, 2105■2370; 2106-2372; 2105-2299; 2106-2336; 2106■2316; 2105-2682; 2105* ■2339; 2106-2379,
2105- 2607; 2106-2417 2107-2475; 2106-2682 2106-2681 2106- ■2702; 2107- 2366; 2106-2369, 2106-2329; 2106-2341 2107-2348, 2106-2353, 2106-2363, 2106-■2361; 2106•2456; 2107-2355, 2107-•2340; 2106-2545; 2106-2548, 2108-2383; 2108-2382, 2109-•2388; 2108-•2365; 2111-2473; 2108-•2349; 2108-2363, 2109-2344; 2109-2355; 2109-2410.2109-•2320; 2108-■2376; 2109-2377, 2109-•2367; 2109-2360, 2110-2373, 2108-2685; 2110-2345, 2108-■2300; 2110-•2402; 2110-2300; 2113-•2378; 2111-2608, 2111-2382, 2111-2689, 2113-26822111-■2409; 2113-•2439; 2112-2367, 2113-•2605; 2113-2420, 2113-2374, 2113-2341, 2113-24072113-•2311; 2114-■2309; 2113-2683, 2112-•2656; 2114-2340, 2115-2400, 2116-2580; 2114-2353, 2115--2354; 2115-■2378; 2115-2413. 2115--2370; 2115-2369, 2116-2375; 2117-2349, 2116-2371 2116-2377; 2116-2434; 2117-2398, 2117--2408; 2120-2615; 2118-2681 2118-2676; 2118-2388; 2118--2286; 2120-2386; 2120-2344; 2120--2449; 2120-2412; 2120-2358 2120-2387; 2120-2398;
TABLE 5
SEQ ID NOJComponent ID Fragments
99, cont. 2120-2327 2121-2589 2120-2393 2121-2438 2120-2369 2120-2353 2122-2393 2121-2353
2121-2281 2121-2377, 2121-2379, 2121-2369 2122-2395; 2123-2372, 2122-2325, 2122-2406
2123-2312 2122-2365; 2122-23792120-2379, 2122-2404; 2122-2405, 2122-2612 2122-2394
2124-2455, 2124-2683, 2123-2353, 2123-2415, 2124-2390, 2123-23022123-2408, 2123-2409;
2123-2339, 2124-23282124-2412; 2124-2373, 2125-2406, 2127-22802128-2522 2125-2461
2127-2413. 2128-24152128-2363, 2128-2306 2130-2221 2130-2367 2130-2372 2129-2434;
2130-2378, 2130-2385; 2130-23152130-2370 2130-2357.2130-2392, 2130-2382; 2132-2369
2132-2377 2130-24242133-2408 2133-2385, 2132-2320, 2132-2419; 2132-2372, 2135-2426,
2134-2494 2134-23832135-2679, 2135-2392, 2136-24062136-2685; 2136-2355, 2140-2502
2138-2377, 2138-2376; 2139-2393.2137-2425 2138-2398, 2138-24162142-2295; 2140-2370,
2140-2488 2142-2681 2141-25122141-2440. 2140-2413, 2141-2335, 2142-2392; 2129-2353
2143-2515, 2143-2414, 2144-2380, 2144-2402 2146-2408, 2146-2383,
2147-2396 2147-2316 2147-2395; 2148-2273 2153-2376, 2160-2345; 2163-2420, 2163-2250
2163-2538, 2163-2387; 2164-2440.2162-2348, 2180-2388. 2180-2443, 2182-2482 2187-2576,
2187-2683, 2193-2598; 2198-2480.2198-2447, 2198-2460. 2200-2695, 2203-2486 2203-2469,
2207-2461 2212-24562218-26842231-24962234-2448. 2234-2522, 2237-2485, 2238-2481
2241-2425, 2242-2683, 2255-2685, 2249-2504, 2253-2462, 2259-2482, 2262-2530, 2265-2515;
2264-2513, 2268-2527; 2271-25142272-2515; 2274-2512, 2281-2685, 2292-2683, 2298-2543,
2302-2549; 2315-2554; 2328-2685.2330-2683, 2317-2683, 2317-2561 2338-2556; 2336-2462,
2339-2685, 2343-2590, 2361-2607 2381-25932390-2594, 2397-2683, 2417-2514, 2414-2632
2416-2683, 2420-25542422-2545, 2429-2685, 2517-2683 2586-2683.2596-2683, 2596-2684;
2584-2683, 2111-2668; 2139-2397, 2112-2330, 2120-2339 2328-2548, 2112-2332, 2130-2353
2122-2345; 2129-2354; 2459-26862141-2368.2110-2340, 2121-2352; 2297-2530, 2106-2340,
2108-2342; 2120-23562131-2369; 2115-2352, 2117-2356;
TABLE 5
SEQ ID NOJComponent ID Fragments
99, cont. 2105-2344; 2193-2433. 2106-2344, 2118-2358, 2106-2346; 2127-2369, 2155-2396; 2129-2370, 2110-2352, 2145-2386; 2107-2351 2120-2366; 2203-2449; 2127-2373, 2155-2402; 2107-2354; 2124-2332; 2477-2685; 2114-2321 2120-2325, 2110-2311 2131-2331 2132-2329, 2109-2303 2114-2299; 2498-2681 2112-2295; 2223-2407 2184-2355 2227-2379 2554-2685, 2256-2378 2580-2688, 2105-2353, 2256-2504, 2183-2432, 2122-2371 2105-2355; 2189-2439, 2138-2389, 2108-2360, 2123-2377, 2119-2372, 2136-2390, 2134-2389; 2116-2370, 2123-2378 2126-2382, 2118-2375, 2105-2362, 2114-2370; 2106-2364, 2137-2395, 2115-2376; 2110-2371 2106-2367 2120-2381 2190-2446. 2113-2376; 2106-2368. 2178-2441 2128-2390; 2121-2385; 2152-2415, 2122-2387 2109-2372. 2126-2390, 2106-2370, 2105-2369; 2208-2473 2108-2375, 2107-2374; 2417-2683, 2106-2374; 2121-2388, 2110-2378, 2109-2378, 2141-2410. 2125-2396, 2132-2402, 2106-2377, 2110-2379, 2108-2378 2187-2457, 2113-2583, 2170-2607, 2332-2683; 2363-2686, 2112-2429; 2129-2444; 2106-2419; 2106-2396; 2110-2397 2121-2403;
2123-2399, 2141-2420, 2122-2409; 2120-2463; 2153-2438, 2128- 2411; 2263-2545 2118-2397 2277-2557, 2138-2412, 2139-2417, 2156-2432, 2409-2685; 2120-2399; 2106-2385; 2113-2392 2120-2400; 2126-2398 2130-2412, 2108-23922106-2390, 2259-2548; 2123-24132110-2399, 2145-2439; 2108-2406; 2146-2448, 2130-2438; 2377-2685; 2123-•2433; 2148-2442, 2256-2571 2130-2450, 2161-2480, 2259-2582, 2335-2683, 2267-2618, 2327-■2685; 2317-2685, 2314-2683 2253-2614; 2291-2683, 2145-2550. 2136-2610, 2126-26062113 ■2604; 2123-2638, 2123-2646 2106-2646, 2121-2564, 2113-2666; 2132-2686, 2126-26832249-•2681; 2139-2609, 2129-2681 2591-2683, 2336-2441 2117-2244, 2310-2471 2118-2326, 2118 ■2329; 2194-2404; 2212-2432 2191-2412; 2147-2370, 2113-2339. 2114-2338.2111-23372147•2377; 2146-2374; 2120-2355 2123-2364, 2444-2684; 2141-2384 2109-2351 2111-2354, 2132■2378; 2105-2351 2120-2367 2132-2380, 2115-2365, 2124-2374; 2123-2369; 2135-2387, 2115-2367; 2139-2390, 2142-2396, 2136-2389; 2113-23652144-2400, 2424-2683; 2111-2370;
TABLE 5
SEQ ID NOJComponent ID Fragments
99, cont. 2105-2365; 2124-2378, 2177-2438, 2112-2375, 2111-2376; 2110-2374 2360-2624, 2111-2378,
2121-23872144-2412.2134-2402; 2117-2385; 2110-2377 2120-23902410-26832122-2396,
2110-2383.2237-2514; 2128-2404; 2150-2432, 2160-2443, 2138-2422; 2123-2417, 2146-2426;
2114-2407.2281-2586; 2138-2443 2270-2602, 2292-2628, 2243-2601 2294-2685; 2272-2683;
2175-2580, 2239-2683, 2120-2606, 2214-2688, 2130-2598 2113-2570.2113-26072114-2687
2120-26842123-2684, 2439-2685, 2352-2601 2123-2381 2151-2409, 2425-2685.2108-2369,
2136-2399; 2122-2384; 2133-2398, 2260-2531 2124-2398, 2134-24082121-23972110-2386
2125-2402; 2114-2396; 2132-2414; 2111-2392; 2120-2405; 2121-2405, 2114-2403, 2112-2407,
2106-2401 2119-2413, 2122-2430; 2380-2685; 2146-2453; 2373-2685, 2138-2462, 2326-2685;
2327-2684; 2330-2685.2123-2605, 2129-2684; 2448-2684; 2162-2402, 2124-23662405-2646,
2109-2352.2120-2363, 2120-2351 2114-2330, 2338-2547. 2481-2685, 2235-2436, 2117-2286,
2248-23922255-2390, 2626-2683
100/LG:!50288.12:2002JAN18 1-443; 1-237; 170-391; 320-558; 331-732; 370-838; 370-796; 373-678; 386-841; 583-834; 588-
838; 738-1074; 741-1045; 759-1162; 824-1245; 824-1247; 837-1247; 846-1225; 854-1168; 856- 1277; 859-1247; 863-1251; 863-1266; 864-1247; 864-1242; 880-1247; 879-1000; 879-1193; 879- 1187; 881-1042; 879-1133; 879-1116; 893-1175; 901-1132; 927-1158; 928-1248; 930-1218; 924- 1168; 928-1505; 934-1247; 956-1132; 970-1247; 975-1237; 975-1263; 981-1591; 982-1247; 994- 1247; 996-1248; 998-1562; 1008-1581; 1098-1246; 1121-1197; 1142-1247; 1142-1214; 1157- 1679; 1186-1247; 1370-1915; 1370-1441; 1486-1809; 1564-1800; 1564-1801; 1588-1860; 1589- 1794; 1670-2002; 1806-2076; 1966-2621; 2230-2733; 2240-2728; 2281-2580; 2374-2580; 2660- 3222; 3060-3358
101/LG:7761700.28:2002JAN18 1-244; 1-139; 1-157; 1-488; 1-151; 1-186; 1-196; 1-141; 1-159; 1-60; 1-183; 1-194; 1-502; 1-309; 1-
238; 1-192; 126-761; 462-1059; 460-672; 539-1142; 665-1170; 665-942; 690-952; 805-1319; 889- 1277; 916-1167; 920-1187; 924-1139; 924-1137; 924-1060; 927-1155; 927-1131; 953-1051; 1036- 1531; 1102-1535; 1113-1534; 1132-1561; 1155-1535; 1219-1753; 1220-1345; 1388-1535; 1609- 1732; 1704-1988
TABLE 5
SEQ ID NOJComponent ID Fragments
102/LG:1093982.42:2002JAN18 1-750; 6-272; 8-282; 7-254; 1 1-280; 12-306; 12-241; 12-239; 12-226; 13-267; 14-291; 13-264; 13-
365; 15-318; 15-270; 16-226; 18-468; 19-256; 22-307; 23-457; 23-267; 26-319; 27-276; 30-325; 48-
214; 50-359; 81 -630; 85-378; 1 12-254; 181-413; 209-452; 272-820; 281 -506; 287-542; 308-554;
386-488; 41 1 -1056; 424-986; 525-1074; 534-1 100; 603-1026; 608-1017; 632-1016; 635-1016; 647- 1022; 646-1016; 649-1034; 652-878; 653-1023; 658-916; 660-858; 671-996; 679-1003; 684-940; 686-1016; 695-961 ; 695-1 190; 698-915; 704-1025; 704-982; 706-976; 709-1016; 714-1016; 719- 986; 720-1025; 722-995; 732-1016; 738-1003;
740-994; 743-943; 744-943; 746-1023; 762-1023; 773-1016; 776-1020; 780-1003;- 788-991; 790- 1016; 791-989; 793-1016; 803-1019; 803-1016; 804-1008; 804-1023; 806-1019; 824-1023; 838- 1009; 842-1016; 842-1003; 846-1016; 848-1023; 855-1016; 861-1016; 872-1008; 881-1012; 883- 1016; 746-992; 750-997; 745-998; 763-1016; 755-1019; 754-1023; 740-1013; 736-1009; 741 - 1016; 736-1023; 181-466; 731 -1016; 706-1003; 715-1023; 705-1009; 718-1023; 704-1013; 681- 1016; 15-555; 33-555; 762-998; 731-973; 725-967; 275-508; 786-1019; 792-1022; 788-1013; 702- 928; 318-540; 802-1023; 810-1016; 783-991; 814-1013; 852-1016; 853-1016; 854-1016; 383-545; 271 -425; 383-528; 883-1020; 888-1023; 785-901; 916-1017; 925-1023 103/LG:7762752.1 :2002JAN18 1 -265; 6-295; 7-596; 8-162; 8-246; 71-354; 154-335; 312-581; 350-639; 373-781; 455-1052; 544- 1 184; 544-791; 550-1056; 602-1214; 615-1210; 726-1 192; 912-1030
104/LG:013006.1 1 :2002JAN18 1 -176, 137-734,226-750,315-778,364-779,692-1490,693-1326,737-1035,842-1057,936-1531 , 1 182- 1885, 1273-1952, 1323-1883
105/LG:054509.10:2002JAN18 1-570,3-251 ,4-187,387-944,585-925,586-928,664-928,755-1 181, 1 141 -1500, 1 141-1510, 1 195-
1405, 1223-1486, 1252-2955, 1362-1902, 1483-231 1 , 1621 -1798, 1631 -2006, 1640-1906, 1666- 1919, 1776-2005, 1805-1994, 1889-2094, 1889-2134, 1889-2145, 1909-2338,2102-2460,21 16- - 2315,2135-2352,2135-2423,2402-2947,2406-2892,2421 -2677,2423-2772,2436-2902,2440- 2788,2444-2621,2447-2668,2447-2914,2469-2913,2475-2825,2477-2722,2478-2912,2487- 2912,2515-2849,2521 -2951 ,2552-2792,2554-2951 ,2564-2955,2566-3055,2568-2842,2569- 3052,2588-2825,2597-2946,2609-2934,2617-2910,2617-2951 ,2617-2952,2623-3332,2636-28 _4 —6, _
TABLE 5
SEQ ID NOJComponent ID Fragments
105, cont. 2653-2874, 2653-2905, 2653-2951, 2664-2912,2666-2871 ,2669-2949,2681 -3335,2694-2948,2706- 2973,2709- 2980,2720 2934,2725- 2886,2725-2909,2742-2952,2745-2951 ,2776-3051 ,2776- 3171,2796 2951,2801 ■2951,28062943,2813-2904,2818-2949,2833-3075,2887-2951,2894- 3337,2906 3144,2942 •3174,2942- ■3201 ,2942-3401 , 2942-3472,3043-3622,3066-3302,3066- 3312,3067- ■3352,3068 ■3342,3087- ■3362,3090-3354,3190-3400,3217-3357,3266-3512,3272- 3893,3345- •3833,3347 ■3899,3364 •3646,3368-3672,3371-3734,3383-3919,3389-3929,3430- 3921,3433- ■3909,3480 -3747,351 •3845,3559-3959,3580-3970,3645-3960,3760-3946,3801- 3966,3900- •4148
106/LG:345276.3:2002JAN 18 1-328; 33-622; 195-718; : 195-714; 210-783; 239-688; 258-521; 271-896; 314-528; 316-599; 376- 670; 419-691; 454-785; 454-719; 533-1063; 570-1031; 756-1002; 756-1006; 923-1503; 959-1424; 987-1396; 1250-7061; 1311-1497; 1519-2112; 1526-2112; 1565-2221; 1607-2173; 1680-2160; 1734-1887; 1743-1837 1743-1974; 1843-2177, 2015-2280; 2090-2355; 2153-2719; 2243-2596, 2371-2825; 2383-2815; 2466-3127; 2488-2785; 2490-2797; 2543-2794; 2543-3098; 2557-2852. 2561-2797; 2660-2797, 2750-2800; 2803-3427, 2817-3268; 2903-3129; 2905-3411; 2905-3090, 2929-3036; 2962-3420; 3147-3659; 3162-3898, 3334-3480; 3391-3641; 3407-3817; 3417-3999, 3419-4042; 3427-3821; 3442-3669; 3478-3896, 3476-3950; 3481-3641; 3482-3822; 3482-3814; 3489-3784; 3505-3980, 3505-4209; 3528-4107; 3552-3747; 3552-3641; 3573-3842; 3576-3912, 3579-4025; 3698-4099; 3725-3912; 3730-3990, 3730-3996;
3738-3987 3753-4445; 3815- 4377 3831-4437 3831-4432, 3834-4422, 3839-4031 3842-3997 3870-4110; 3894-43733894-
■4136; 3902-4291 3903-4482; 3903-4116, 3910-4471 3923-4550; 3937-42273949-4170, 3961-4216; 3963-4571 4030-4459; 4101-4712. 4109-4643, 4127-4663, 4151-4350.4159-4372.4160-4438 4267-4703, 4323-4854, 4331-4542, 4447-47644529-5054; 4588-5030; 4588-4832, 4595-
•5108, 4591-4818.4597-4848.4617-4883 4617-4841 4632-4877 4635-4913; 4640-4962; 4640-
•5223, 4640-5120; 4646-5045; 4647-5225; 4650-4931 4719-4971 4733-5215, 4736-5082, 4777-
•5012, 4791-5179, 4805-5074; 4827-5318, 4829-4985; 4843-5509; 4876-5101 4886-5496, 4943-
•5433 4971-5380,
TABLE 5
SEQ ID NOJComponent ID Fragments
106, cont. 5002-5624; 5016-5203; 5090-5368; 5096-5711; 5104-5382; 5107-5386; 5155-5412; 5174-5402, 5190-5398; 5200-5682; 5236-5324; 5265-5838; 5265-5577; 5265-5467; 5289-5592; 5298-5877 5301-5457; 5319-5593; 5331-5775; 5378-5906; 5374-5471; 5376-5503; 5392-5709; 5413-5659; 5433-5877; 5499-5948; 5526-6115; 5542-5945; 5543-5945; 5559-5938; 5560-5945; 5583-5786; 5590-5945; 5591-5807; 5620-5945; 5633-5943; 5664-5945; 5683-6111; 5703-5945; 5721-5931 5764-5938; 5790-6371; 5790-6031; 5817-6078; 5817-5938; 5820-6078; 5820-6286; 5840-6078. 5863-5945; 5928-6196; 5960-6206; 5960-6251; 5974-6664; 6004-6634; 6024-6527; 6047-6324; 6049-6602; 6047-6197; 6072-6272; 6077-6362; 6118-6305; 6122-6450; 6166-6667; 6166-6356; 6167-6357; 6173-6342; 6199-6485; 6217-6483; 6223-6278; 6258-6356; 6258-6483; 6267-6355; 6286-6546; 6288-6802; 6286-6552; 6296-6588; 6334-6687; 6340-7039; 6355-6999; 6365-6537 6367-6589; 6410-6991; 6410-6986; 6433-6964; 6420-6682; 6420-6678; 6428-6691; 6430-6691 6433-6692; 6433-6672; 6434-6991; 6443-6705; 6445-6648;
6472-6994, 6489-6724, 6497-7039; 6516-6710; 6520-6965; 6546- 6859; 6575-6952; 6571-7102,
6571-6821 6581-6870, 6574-6767, 6577-6847, 6581-6860, 6587- 6994, 6587-6991 6587-6880,
6589-6991 6589-6817, 6590-6853, 6593-7060.6595-6829.6596* ■6994, 6601-69906601-6994,
6604-6802, 6605-6749, 6608-6994, 6610-6994; 6611-6827, 6626 ■6994 6626-6889; 6632-7254,
6644-6994; 6645-6994; 6049-6994; 6650-72436651-6994; 6652* ■6994; 6666-6906; 6669-6906;
6669-6825, 6678-6999, 6672-6994, 6695-6990; 6695-6982, 6696 ■6994; 6696-6990, 6700-6994,
6724-7189, 6721-6994; 6722-6994, 6739-6994; 6742-6994; 6758- 7384; 6757-6989.6765-6994,
6768-6994; 6774-6949; 6796-6990, 6802-7380; 6825-6990; 6829 7061 6833-7024, 6831-7172,
6835-6991 6840-7298 6851-7339; 6848-70606850-7172, 6863 -6994, 6865-7055; 6869-7180,
6897-71876912-7381 6918-7396; 6939-7392; 6949-7399, 6966 -7299; 7018-73947029-7392,
7099-7400, 7099-7330, 7100-7381 7100-7355, 7104-7299; 7116 .-7351 7154-7393, 7285-7392,
7305-77177305-7589; 7314-7589, 7495-7798
107/LG:247354.20:2002JAN18 231-914; 1-460
108/LG:1454791.33:2002JAN18 618-1034; 608-855; 398-706; 104-671; 104-536; 15-523; 15-511; 103-431; 142-428; 123-394; 11-
TABLE 5
SEQ ID NOJComponent ID Fragments
110/LG:984007.4:2002JAN18 1-426; 3-426; 156-601; 425-970; 425-948; 635-1248; 635-11 78; 635-1111; 1149-1684; 1205-1685; 1344-1940; 1362-1725; 1384-2063; 1445- 1729; 1463-2026, 1463-1746; 1486-1746; 1609-2052; 1653-2116; 1671-1908; 1689-1982; 1724-2073; 1724-1804; 1851-2044; 1881-2111; 1916-2569; 1997-2116; 2003-2128; 2163-2552; 221 2369; 2212-2405, 2368-2833; 2368-2638; 2409-2684; 2442-2670; 2444-2998; 2460-2734; 25863051; 2593-2838; 2593-2833; 2619-3090; 2665-2905; 2813-3085; 2820-3517; 2826-3089; 2858-■3448; 2918-3402, 2918-3291; 2918-3431; 2952-3324; 2947-3390; 2962-3206; 2975-3054; 2983■3421; 3018-3432; 3034-3342; 3045-3389; 3045-3384; 3052-3429; 3059-3290; 3061-3269; 3066 ■3384; 3070-3430; 3154-3424; 3177-3384
11 1/LG:109338ό.25:2002JAN18 1-424; 1-422; 17-425; 320-605; 320-631; 327-605; 428-648; 435-605; 447-641; 453-605; 459-525;
464-626; 466-605; 483-605; 483-654; 483-627; 483-622; 485-601; 486-605; 495-635; 501-1036; 502- 1036; 506-627; 515-605; 516-644; 516-630; 516-627; 516-605; 516-740; 516-657; 516-626; ' 516-814; 518-650; 518-640; 518-627; 518-617; 518-605; 518-629; 518-604; 519-625; 542-605; 550-1003; 575-624; 673-1083; 634-832; 634-892; 636-917; 640-1002; 640-865; 641-878; 643-892; 644-894; 647-903; 638-888; 651-1377; 654-747; 656-958; 656-941; 669-1066; 658-902; 664-1038; 654-946; 655-1 184; 668-908; 668-986; 658-948; 668-957; 675-1038; 675-1005; 675-835; 676- 1 1 19; 676-1 157; 676- 1052; 675-807; 665-934; 675-784;
676-804; 680-1299; 680-1303; 684-947; 685-982; 690-972; 675-774; 687-947; 686-773; 690-1095, 690-962; 689-912; 690-1094; 690-1014; 690-966; 690-834; 690-809; 690-803; 690-769; 695-1234, 695-1052; 695-949; 698-986; 701-1056; 690-993; 703-960; 695-988; 710-926; 710-947; 725-1413. 718-957; 718-1 152; 721-1049; 711-1184; 723-986; 741-1027; 754-1042; 767-1002; 773-1215; 785 1 163; 791-1220; 793-1056; 795-1045; 805-1002; 802-1331; 81 1-1060; 815-1582; 843-1094; 835- 1078; 852-1507; 853-1328; 856-906; 862-1403; 862-1530; 869-1108; 880-1022; 875-1585; 875- 1596; 900-1001; 904-1038; 909-1298;
TABLE 5
SEQ ID NOJComponent ID Fragments
111, cont. 915-1599; 923-1436; 928-1160; 935-1534; 923-1086; 939-1204; 943-1295; 951-1571; 952-1533; 959-1438; 963-1194; 977-1512; 976-1238; 989-1182; 974-1131; 993-1427; 1008-1319; 1010- 1571; 1014-1512; 1001-1239; 1015-1249; 1016- 1273, 1031-1306; 1032- 1425; 1039- 1290; 1039- 1264; 1043-1302; 1046-1289; 1051-1288.1053-1774; 1055-1339, 1055-1274; 1056-1268, 1058- 1280; 1058-1110; 1061-1346; 1066-1440, 1053-1442, 1069-13101055-1320, 1067-12521089- 1664; 1109-1868; 1109-1882; 1117-1492. 1122- 1407, 1140-1386, 1130-1853, 1146- 1202; 1138- 1354; 1153-1453; 1143-1396; 1166-1434; 1169- 1579; 1174-1759; 1219- 1594; 1220- 17551220- 1644; 1222-1541; 1222-1755; 1233-1739 1238- 1510.1251-1818; 1286- 1944, 1377-•1907, 1430- 1986; 1467-2080; 1489-1986; 1494-2078, 1501- 1615; 1522-1871, 1534- 1705, 1531-■2048. 1533- 1978; 1533-1950; 1550-2046; 1557-2008.1566■1772.1574-2109, 1578- 1848, 1598-■1974, 1597- 1862; 1608-2026; 1623-1736; 1631-1943, 1633-2153, 1636-1913.1638- 1914, 1671-■1951 1681- 1983; 1682-1916; 1693-1916; 1706-2295; 1692-2184; 1696-1968; 1704-2249;
1729-2004; 1735-2406; 1766-2356, 1747-1965, 1770-2324; 1776-2366; 1788-2521; 1792-2421
1808-2521 1806-2414; 1814-2351 1830-2281 1842-2455; 1871-2035; 1874-2432; 1874-2505;
1871-2456, 1876-2419, 1877-2445, 1886-2247 1900-2207 1905-2033; 1965-2418; 1976-2263
1972-2363, 1972-2270, 1974-2421 1976-2456, 1985-2457.1981-2447; 1987-2456; 1989-2458,
1990-2456; 1990-2459; 1990-2461 1991-2456; 1992-2456; 1998-2458; 1999-2418; 1998-2241
1998-2229; 2000-2457 2003-2250 2007-2461 2008-24572007-2184; 2020-2463; 2025-2456;
2020-24572031-2235, 2031-2263 2034-24632035-2456, 2035-2285; 2040-2457; 2038-2454;
2040-2458, 2040-2456; 2041-2440, 2042-2402, 2041-2284; 2043-2456; 2043-2283; 2045-2462.
2047-2457, 2051-2456. 2053-2463, 2055-2456.2054-2314; 2055-2461; 2063-2462; 2068-2454,
2069-2454, 2070-2464; 2072-2456, 2074-2461 2075-2454, 2077-2328; 2078-2450; 2078-2164,
2079-24542083-2459; 2083-2456 2083-2360; 2084-2456; 2090-2339; 2092-2456; 2093-2456
2095-2331 2096-2475, 2098-2333, 2100-2328 2101-2456,
TABLE 5
SEQ ID NOJComponent ID Fragments
111, cont. 2108-2460, 2105-2331 2107-2457; 2107-2456, 2113-2440; 2121-2456; 2132-2454; 2129-2459, 2130-2454, 2130-2386. 2131-2471 2131-24542145-2360; 2134-2463, 2133-2456; 2139-2455, 2138-2459, 2141-2456, 2143-2456; 2146-2456; 2145-2276; 2147-2461.2149-2454; 2159-2412, 2158-2456; 2160-2327 2168-2454; 2168-24562168-2440; 2171-2393, 2172-2442; 2180-2456; 2181-2416, 2180-2460, 2194-2441, 2205-2335, 2211-2410; 2211-2381 2213-2437; 2215-2457 2216-2456 2218-2454. 2220-2455, 2221-2457.2207-2456; 2231-24362238-2456; 2239-2455. 2243-2456; 2245-2461 2245-2401, 2246-2456, 2247-2465; 2252-2434; 2253-2456; 2255-2456; 2256-2456 2260-2446, 2261-24582272-2456, 2289-2454; 2299-2411 2301-2456; 2324-2457 2325-2437 2330-2456. 2331-2459; 2331-24562363-2456; 2365-24562365-2453; 2365-2420; 2368-2431 2380-2456; 2390-2463, 2391-2460, 902-1640; 1295-1484; 1347-1575; 695-1015; 2106-2454; 1468-1866; 2012-2454.686-1126, 1822-2410; 1130-1678; 1892-2443; 1286-1831
1 12/LG7693871.6:2002JAN 18 1-628; 1-497; 1-589; 85-580; 99-749; 1 18-770; 213-770; 418-576; 525-1066; 763-945; 763-979; 826-1066; 859-1050; 922-1048; 1013-1409; 1024-1498; 1024-1412; 1138-1393; 1224-1695; 1233- 1482; 1401-1535; 1498-2112; 1750-2332; 1799-2170; 1823-2332; 2129-2408
1 13/LG7693934.1 :2002JAN 18 1-391; 41-631; 21 1-675; 236-725; 235-725; 392-614; 392-869; 406-725; 422-982; 750-1225; 762- 1225; 916-1430 1 14/LG:7697553.34:2002JAN 18 1 -668; 202-81 1 ; 223-656; 235-472; 236-524; 240-483; 262-468; 270-549; 271 -581 ; 278-487; 272-
465; 278-564; 287-586; 298-912; 305-878; 312-643; 324-902; 328-604; 338-807; 361-652; 378-
655; 392-584; 411-578; 417-668; 417-690; 426-494; 545-696; 614-1218; 756-1230; 1156-1237;
233-646
115/LG:337345.5:2002JAN18 1-5567; 655-1008; 566-634; 617-1182; 666-952; 667-954; 691-944; 678-954; 711-1441; 765-1302; 1226-1859; 1314-1775; 1440-2088; 1463-1888; 1502-1747; 1630-2228; 1674-1734; 1769-2239; 1837-2280; 2111-2263; 2837-3408; 2839-3478; 2837-3400; 2847-3291; 2895-3258; 2896-3399; 3065-3592; 3089-3465; 3617-3861; 3853-4107; 4034-4303; 4661-5448; 4759-4964; 4862-5215;
4871-5114; 4988-5477; 5014-5466; 5113-5561; 5319-5570
11ό/LG:410680.7:2002JAN18 1076-1437; 989-1437; 977-1429; 962-1413; 1230-1338; 876-1315; 1054-1307; 930-1300; 1213- 1299; 854-1297; 1011-1298; 980-1298; 985-1298; 887-1294; 893-1296; 826-1294; 834-1294; 849- 1294; 854-1294; 730-1294; 842-1294; 893-1294; 930-1294; 931-1294; 974-1294; 1004-1294; 1079-1294; 1124-1294; 1134-1294; 1230-1291; 670-1251; 887-1225; 556-1218; 876-1201; 603- 1195; 986-1173; 959-1159; 876-1132; 735-1105; 580-1093; 319-978; 320-944; 319-934; 228-912; 320-878; 399-867; 319-849; 1-781; 410-735; 474-687; 403-681; 146-669; 141-376; 1-231
TABLE 5
SEQ ID NOJComponent ID Fragments
117/LG7771583.2:2002JAN18 1-4020; 775-1165; 777-1194; 778-1199; 1 036-6295; 1036-3564; 1644-2087; 1645-2092; 1645- 2070; 1698-1999; 2712-3224; 3126-3396.3126,3319; 4014-4256; 4216-4978; 4401-5031; 4420- 5129; 4452-5183, 4522-4767; 4535-4758, 4578-4830; 4595-4840; 4624-5031; 4696-5364; 4759- 5003; 4775-4984, 4775-5185; 4822-5195; 4811-5346; 4811-5031; 4825-5031; 4830-5415; 4845- 5125; 4932-5329; 4932-5175; 4935-5304; 4962-5491; 5059-5288; 5059-5286; 5059-5220; 5062- 5462; 5063-5525; 5095-5692; 5116-5392, 5116-5391; 5283-5536; 5321-5857; 5365-5530; 5383- 5900; 5415-5650; 5474-5931; 5605-6189; 5603-6194; 5606-6216; 5624-6338; 5658-6109; 5661- 5935; 5689-6253 5703-6338; 5720-6256, 5743-5962; 5772-6258; 5780-6257; 5803-6044; 5845- 6252; 5866-6295.5866-6126; 5873-6185, 5872-6292; 5873-6260; 5873-6101; 5874-6295; 5880- 6295; 5887-6295; 5890-6279; 5894-6278, 5941-6295; 5937-6295; 5947-6288; 5948-6240; 5959- 6292; 5971-6295, 5984-6296; 6047-6292
118/LG:074994.14:2002JAN18 1-261,1-288,1-609,273-822,328-1067,340-595,340-596,374-949,377-818,427-611,771-822,871- 1516,946-1319,976-1302,979-1177,987-1072,987-1085,987-1389,988-1389,1022-1389,1027- 1474,1068-1589,1153-1288,1157-1340,1170-1385,1228-1519,1306-1521,1317-2015,1322- 1549,1483-1731,1484-1808,1520-1738,1529-1610,1663-2070,1689-2099,1704-2145,1752-
2185,1798-2099,1864-2185,2031-2182,2086-2365
119/LG:7691131.1:2002JAN18 1-826; 24-455; 35-557; 35-580; 65-288; 156-2520; 683-758; 693-887; 905-1502; 907-1502; 1146- 1581; 1358-1591; 1359-1656; 1427-1689; 1427-1698; 1427-1743; 1482-1766; 1827-2100; 1937- 2416; 1943-2617; 2295-2718; 2313-2589; 2326-2716; 2327-2757; 2328-2578; 2330-2655; 2366- 2919; 2406-2762; 2411-2757; 2438-2810; 2457-3091; 2463-2757; 2594-2759; 2611-2807; 2639- 3197; 2645-2928; 2669-2810; 2684-2774; 2696-2934; 2882-3117; 2979-3312; 2979-3330; 2979- 3237; 3129-3933; 3148-3939; 3215-3544; 3482-3943; 3484-3944; 3516-3777; 3517-3943; 3525-
3941; 3623-3943; 3627-4038; 3736-3946
120/LG:983975.1 :2002JAN18 1-505; 2-114; 434-728; 440-893; 445-705; 465-633; 466-545; 489-702; 693-894; 693-893; 693- 1136; 1059-1208; 1063-1187; 1069-1233; 1071-1235
TABLE 5
SEQ ID NO:/Component ID Fragments
121/LG:1383194.7:2002JAN18 30-386; 12-97; 43-469; 26-238; 1-88; 46-426; 1-231; 10-231; 1-201; 1-268; 1-448; 1-566; 10-266; 11-556; 12-483; 11-284; 13-550; 11-599; 14-276; 14-275; 15-272; 15-600; 26-533; 15-384; 15-
335; 15-316; 15-307; ; 15-288; 15-286; 15-279; 15-276; 15-278; 15-274; 16-276; 20-245; 16-594; 17- 432; 16-321; 16-287;; 16-277; 16-221; 15-447; 17-315; 16-314; 17-284; 18-608; 20-559; 19-304; 19- 303; 19-281; 20-602;; 20-584; 20-291; 20-96; 21-533; 21-305; 21-302; 27-571; 24-301; 27-429; 27- 308; 27-315; 27-293,; 27-319; 27-142; 27-555; 27-310; 28-294; 28-310; 27-125; 28-495; 29-308; 28- 288; 28-286; 30-330,; 31-305; 29-543; 31-326; 31-317; 31-309; 34-289; 30-293; 31-316; 32-607; 32- 497; 32-332; 32-314; 32-308; 32-306; 34-299; 34-281; 34-553; 34-476; 41-315; 57-419; 58-271; 59- 246; 66-310; 113-545; 123-700; 53-588; 120-242; 128-378; 113-302; 171-420; 795-908; 209-469; 178-766; 183-412; 191-715; 223-889; 240-504;
274-473; 282-832; 282-920; 281-482; 300 ■559; 309-885; 364-441; 367-442; 380-432; 332-831 433-624; 355-553; 333-850; 381-894; 437 -619; 356-785; 360-812; 359-790; 449-975; 414-521 366-847; 343-829; 366-849; 366-810; 366- -805; 366-850; 366-794; 365-791; 366-845; 366-770. 366-788; 367-789; 366-623; 406-583; 367 -806; 406-781; 390-813; 390-817; 390-812; 388-785, 389-785; 389-783; 390-819; 390-629; 390 -728; 389-741; 472-619; 390-782; 474-738; 472-615; 344-750; 395-685; 452-765; 398-688; 402 -887; 401-884; 407-751; 407-759; 41 1-887; 415-882, 419-887; 422-893; 423-887; 424-894; 423 -883; 427-650; 427-888; 429-889; 429-646; 430-887, 513-610; 433-713; 433-816; 437-887; 437 -812; 439-888; 439-887; 438-889; 443-887; 442-889, 442-887; 444-895; 442-884; 444-888; 446 -890; 448-891; 447-870; 452-887; 451-742; 458-747 499-1075; 437-810; 469-890; 457-878; 471-888; 471-885; 514-781 ; 504-570; 472-893;
470-882; 549-954; 474-627; 475-891; 476-890; 477-893; 478-891; 476-887; 547-960; 567-980; 567-1047; 521-812; 568-987; 478-884; 483-808; 483-887; 502-839; 484-890; 503-925; 488-893; 491-891; 490-861; 493-650; 585-974; 498-745; 499-849; 560-1081; 502-690; 485-734; 486-765; 486-734; 540-799; 504-689; 487-791; 505-804; 507-613; 523-816; 596-974; 521-902; 512-760; 513-762; 512-759; 501-743; 516-803; 519-887; 618-880; 586-863; 586-825; 538-744; 587-696; 524-867; 630-1001; 528-797; 529-785; 521-809;
TABLE 5
SEQ ID NOJComponent ID Fragments
121, cont. 531 -752; 533-890; 533-895; 538-892; 575-923; 578-914; 602-949; 596-834; 603-712; 603-708;
595-684; 595-683; 595-679; 595-676; 595-669; 595-667; 595-652; 603-742; 603-738; 603-915; 595-663; 551-752; 551-694; 553-854; 515-595; 554-770; 573-1200; 554-882; 524-747; 635-961; 558-855; 621-923; 560-889; 616-680; 628-71 1; 628-726; 606-699; 628-867; 628-838; 632-822; 577-891; 638-838; 555-796; 555-774; 556-787; 643-838; 621-768; 560-863; 621-723; 653-966; 587-898; 587-887; 650-838; 651-1318; 673-973; 567-815; 569-813; 656-1326; 656-844; 677-967; 573-81 1; 573-803; 574-857; 579-782; 579-781; 666-838;
668-838; 584-837; 673-838; 680-923; 588-857; 653-858; 714-987; 683-835; 715-866; 720-985; 614-856; 619-850; 621-870; 622-857; 646-868; 629-857; 640-875; 659-886; 631-752; 749-987; 639-752; 641-834; 667-881; 643-752; 646-752; 1348-1482; 673-883; 676-881; 677-869; 1355- 1520; 685-883; 1361-1486; 780-939; 687-881; 663-779; 678-863; 678-841; 678-815; 678-795; 678- 786; 678-775; 719-803; 696-881; 678-807; 679-754; 703-893; 675-780; 678-763; 678-755; 708- 887; 708-886; 1382-1524; 711-887; 715-887; 719-795; 709-864; 729-838; 1401-1522; 818-985; 731-887; 731-893; 1404-1521; 734-887; 1423-1549; 736-887; 736-883; 777-981; 821-978; 831- 974; 831-924; 786-1173; 832-974; 834-974; 836-974; 837-976; 838-980; 838-947; 843-929; 840- 974; 843-974; 843-975; 844-974; 845-959; 835-890; 851-974; 855-974; 855-983; 855-973; 855- 971; 855-965; 855-964; 855-953; 855-952; 855-947; 855-944; 855-954; 855-923; 845-910; 845- 909; 845-897; 855-1003; 855-1206; 845-896; 1401-1453; 1459-1541; 868-974; 883-1419; 895- 980; 806-860; 913-1047; 1614-2017; 1050-1531; 1050-1176; 1095-1670; 1675-1954; 1104-1521;
1103- 1173, 1115- 1521 1116- 1176; 1149-1426; 1768-2023, 1192-1415; 1275-1414; 1293-1534; 1295- 1403, 1276- 1521 1301- 1522, 1279-1866; 1277-1535, 1280-1521 1309-1521 1312-1521 1287- 1494; 1294- 1547 1294- 1410; 1295-1521 1294-1507; 1294-1494; 1296-1521 1296-1510; 1296- 1428, 1298- 1521 1300- 1434, 1328-1467, 1301-1697, 1301-1396; 1333-1729, 1326-1522 1302- 1528, 1302- 1521 1328- 1903, 1329-1406, 1303-1521 1303-1400, 1303-1416, 1330-1535, 1304- 1521 1333- 1550; 1308- 1521 1333-1781 1334-1404, 1308-1434, 1334-1810, 1308-1911 1310- 1521 1310- 1535, 1313- 1521 1313-1749, 1341-1521 1344-1409; 1316-1548, 1316-1521 1319- 1551 131 1461 1344- 1418 1344-1475, 1318-1494, 1319-1521 1323-1521 1323-1437,
1323- •1879; 1324- 1789; 1324- •1521 1326-1559, 1324-1850, 1325-1535; 1324-1849, 1326-1521 1326 •1495; 1327- 1521 1327- 1400, 1328-1425, 1328-1415, 1328-1395; 1328-1388, 1328-1391 1329- •1548, 1329- 1525; 1329- •1521 1333-1522, 1329-1541 1355-1532, 1331-1684; 1331-1521 1333-
•1823, 1332- 1521 1332-
■1523, 1332-1401 1333-1761;
TABLE 5
SEQ ID NOJComponent ID Fragments
121, cont. 1335-1784, 1333- 1538, 1333- 1536; 1333-1530, 1333- 1528, 1333- 1527. 1333- 1521 1333-1509; 1333-1498, 1333- 1497, 1333- 1494, 1333-1493, 1333- 1488.1333- 1486; 1333- 1504; 1333-1466. 1333-1465; 1333- 1454, 1333- 1453, 1333-1452, 1333- 1429, 1333- 1934, 1334- 1920, 1333-1871 1333-1790; 1334- 1750; 1333- 1793, 1334-1541 1334- 1538 1334- 1530, 1334- 1521 1334-1480; 1334-1479, 1334- 1465, 1334- 1455; 1334-1459, 1334- 1439, 1334- 1432, 1334- 1429, 1334-1427, 1334-1423, 1334- 1418, 1334- 1417, 1334-1410, 1334- 1408, 1334- 1842; 1334- 1781 1334-1760, 1334-1833, 1333 1501 1333- 1474; 1335-1545; 1335- 1525; 1335- 1521 1335- 1914; 1335-1754; 1336-1537, 1336 1526; 1336 1521 1335-1872, 1337- 1521 1363- 1651 1338- 1521 1339-1803, 1340-1536; 1340 1521 1959- 2179. 1342-1549; 1342- 1947, 1344- 1537, 1344- 1521 1344-1795; 1344-17651363 1730, 1347- •1521 1347-2066, 1348- 1522, 1349- 1533 1349- 1521 1382-1796, 1355-1522, 1357 1745, 1364- 1648, 1361-1880, 1365- ■1521 1363- 1644, 1363- 1585 1388-1852, 1364-1661 1376 1796; 1366 1521 1366-1827 1368- 1521 1369- ■1521
1370-1521 1398- 1522, 1370 1541; 1374-1750, 1373-1521 1375-1904 1377-2022, 1377-1521
1381-1950 1381- 1521 1383- 1662 1410-1667, 1387-1818 1387-1521 1388-2014, 1387-1852
1390-1591 1394 1514; 1397- 1529. 1430-1708, 1409-1605. 1433-1707 1409-1949. 1409-1924;
1409-1521 141 1521 1413- 1521 1414-1521 1415-2048, 1415-1521 1416-1927, 1417-1521
1418-1808 1430 1934; 1423- 1934; 1423-1696 1423-1790, 1425-1521 1428-1809 1428-1806,
1428-1739, 1428 1521 1436 1950, 1433-1927 1432-1832, 1433-1756; 1434-1744, 1417-1967,
1456-1883, 1457 1809, 1441 1840, 1443-2016, 1445-1966, 1473-1522, 1478-1670. 1455-2011
1457-1791 1464 1836; 1467 1895; 1476-1790, 1476-1531 1478-1549, 1516-1800. 1522-2111
1518-2034; 1499 1611 151 1894; 1564-2019; 1572-2098, 1584-1885, 1607-1840, 1601-1736,
1618-1670, 1625 1747, 1650 2045; 1647-1881 1652-1957. 1652-1918 1653-1922, 1653-1737
1657-1832, 1658 1909, 1676 1944, 1660-1886, 1664-1891 1666-1926; 1666-1943; 1666-1939,
1666-1937, 1666 1933. 1666 1927, 1666-1919, 1666-1901
TABLE 5
SEQ ID NOJComponent ID Fragments
121, cont. 1666-1886; 1666-1820, 1667-1935; 1674- 1882, 1672- 1928, 1673- 1927 1673- 1917, 1681-1968;
1679-1923, 1679-1874; 1679-1865, 1674- 1893. 1681- 2013, 1681- 1953, 1681- 1889, 1681-1877,
1681-1875, 1681-1872, 1681-1868 1681- 1896 1681- 1863, 1681- 1861 1681- 1862. 168.1-1859,
1681-1856, 1681-1858, 1681-1855; 1681- 1850, 1681- 1801 1681- 1799, 1681- 1795. 1681-1794;
1666-1878, 1683-1974, 1677-1927, 1677- 1931 1678- ■1957, 1678- 1935, 1684- 1943 1684-1851
1676-2090; 1681-2035; 1679-1914; 1680- 1944, 1681- •1961 1680- 1924; 1680- 1921 1681-1930;
1680-1901 1680-1897 1681-1736; 1681- 1918, 1681 ■1904, 1681- 1905; 1681 1899, 1681-1894,
1681-1907.1681-1878, 1688-1914, 1690 1935; 1681 •1841 1679- 852, 1697 ■1901 1698-1918,
1701-1958.1702-1964; 1681-2200, 1681 844; 1681 •1843, 1681- 842, 1681 •1840 1681-1839,
1681-1836; 1681-1834; 1681-1835; 1681 ■1831 1681 -1824, 1681- 1823, 1681 •1820, 1681-1819;
1681-1816; 1681-1815; 1681-1810; 1681 ■1806; 1681 -1805; 1681- •1822; 1681 •1828, 1703-1957
1705-2110, 1703-1941; 1702-1940, 1698 ■1754; 1708 -2072.1705* 1984,
1705-1960, 1705-1941 1705-1852, 1705-2019, 1706-1969; 1705-1988; 1705-1910, 1708-1968,
1708-1917, 1709-1931 1711-1963, 1712-1979; 1713-1945; 1713-1939; 1714-1941 1715-1781
1718-1944, 1702-2100, 1699-2111 1703-1930, 1726-2039, 1704-1927, 1705-2099, 1705-1808,
1728-2052, 1727-1974; 1727-1975; 1705-2198, 1706-2038, 1729-1966; 1728-1971 1711-1931
1713-1941 1736-1961 1735-1877.1716-1929; 1746-1984; 1726-2207, 1763-2101 1780-1991
1743-1952, 1743-1941 1767-2066; 1746-1940, 1749-1945, 1767-2029; 1777-1939, 1760-1956;
1771-2101 1797-2213, 1780-1926; 1780-1889, 1786-22061806-2068, 1818-2041 1795-2193,
1795-2087, 1799-2210, 1807-2098, 1804-1889; 1826-2103, 1811-2202; 1783-1838. 1818-1889,
1822-1922; 1826-1889; 1828-1889 1848-2100; 1852-2205; 1858-1925; 1866-2062; 1869-2069.
1890-1944, 1922-2071 1916-2206; 1919-2213, 1943-2202, 1924-2227, 1881-1950, 1920-1969,
1994-2073, 1972-2154; 2006-2196, 2004-2199, 2017-2087, 2030-2200; 2035-21992051-2202,
2083-2202, 2086-2202. 2087-2202; 2088-2202, 2093-2202,
TABLE 5
SEQ ID NQJComponent ID Fragments
121, cont. 2099-2206; 2097-2199; 2098-2202; 2100-2202; 2102-2198; 2106-2199; 2107-2202; 2115-2202; 2146-2203; 2130-2202; 2132-2202; 2138-2199;,2142-2202; 2153-2202; 1955-2202; 26-273; 406- 640; 430-675; 263-512; 171-419; 1928-2173; 441-690; 250-498; 23-271; 28-276; 27-27.4; 27-270; 565-812; 21-268; 22-269; 22-270; 135-385; 282-531; 575-824; 530-783; 39-290; 35-286; 553-804; 409-660; 30-280; 27-280; 28-281; 27-282; 22-277; 715-969; 956-1176; 1925-2179; 605-859; 28- 284; 722-978; 1922-2177; 1923-2182; 38-293; 32-287; 93-343; 646-904; 27-285; 21-278; 29-286; 30-286; 29-287; 587-848; 30-290; 1942-2202; 130-389; 1561-1822; 27-286; 1364-1615; 463-725; 28-290; 234-497; 27-290; 16-278; 23-284; 22-283; 489-751; 27-288; 671-932; 28-289; 1941-2202; 22-282; 1927-2188; 1462-1727; 216-478; 1948-2209; 617-879; 21-283; 20-282; 1920-2181; 28- 291; 500-765; 1900-2163; 19-285; 410-678; 1930-2199; 23-290; 408-674; 174-440; 567-833; 405- 665; 23-288; 320-589; 263-531; 1934-2202; 550-814; 28-296; 174-442; 587-854; 1933-2198; 590-
860; 1926-2198; 1930-2202; 1929-2202; 550-822; 1923-2198; 32-303; 578-852;
669-943; 98-371; 1929-2078; 450-726; 1923-2199; 582-857; 563-838; 1928-2185; 1924-2202; 1921-2191; 20-298; 168-447; 330-606; 525-803; 1926-2203; 1917-2199; 1922-2202; 1921-2202; 1897-2177; 23-302; 495-776; 264-544; 1925-2198; 396-680; 1919-2202; 1885-2162; 1925-2203; 642-924; 1916-2202; 212-498; 17-296; 1917-2202; 1903-2188; 1918-2202; 1924-2203; 1915- 2203; 530-820; 1920-2203; 615-903; 1919-2203; 1915-2202; 17-304; 333-620; 610-898; 1906- 2202; 32-324; 179-471; 1911-2202; 320-612; 35-325; 1914-2203; 462-753; 23-313; 1912-2203; 645-942; 1905-2203; 1907-2203; 27-323; 17-312; 610-904; 17-311; 112-406; 597-891; 590-883; 598-898; 399-700; 1903-2202; 1899-2199; 394-693; 31-328; 473-783; 1880-2188; 1901-2203; 100- 407; 401-706; 479-783; 1430-1737; 1904-2203; 29-332; 1903-2201; 716-1022; 1896-2202; 1891- 2202; 441-768; 1878-2203; 1909-2198; 112-437; 1876-2199; 1434-1757; 725-1031; 399-723; 1885-2198; 1860-2200; 1885-2202; 1863-2203; 1891-2198; 112-450; 1885-2184; 1869-2203; 1866-2202; 1503-1836; 1867-2199; 1885-2203; 1457-1788; 1850-2181; 1871-2203; 1872-2202; 1870-2199; 1849-2202; 1851-2199; 1849-2201; 1831-2182; 1853-2203; 1854-2202; 30-385; 1852- 2202; 1852-2199; 1853-2202; 1855-2202; 1503-1846; 1858-2201; 1862-2201; 358-724; 1838-
2203; 1838-2202; 1476-1838; 1841-2202; 1399-1759; 490-844; 27-385; 1846-2202; 1839-2198; 28-382; 1785-2142; 1854-2188; 1462-1844; 1822-2202; 28-410; 136-513; 1825-2203; 1824-2201; 1806-2179; 2010-2202; 1519-1890; 1790-2188; 1800-2202; 1801-2195; 1809-2203; 1810-2203; 1812-2191; 136-520; 64-453; 410-738; 1791-2202; 1789-2201; 1789-2199; 1793-2202; 1759- 2168; 1791-2200; 1791-2199; 1796-2198; 21-427;
TABLE 5
SEQ ID NOJComponent ID Fragments
121, cont. 1799-2203; 1798-2203; 1797-2202; 1799-2202; 1786-2188; 1485-1892; 1798-2202; 359-758;
1802-2203; 48-449; 32-433; 1680-2070; 1800-2199; 177-602; 1780-2203; 1764-2189; 1781-2202;
1785-2202; 1782-2201; 1784-2202; 1423-1840; 1786-2202; 1789-2200; 450-864; 1785-2199;
1787-2201; 1789-2202; 1775-2188; 1787-2203; 1792-2200; 571-1003; 58-498; 1761-2202; 1766- 2203; 312-751; 1762-2199; 1767-2202; 1765-2199; 1763-2198; 555-990; 1776-2202; 1769-2196; 27-501; 1-475; 28-491; 30-482; 396-857; 27-488;
471-928; 225-672; 32-475; 738-925; 1-515; 1-516; 1-505; 287-782; 1-503; 22-526; 407-909; 360- 863; 25-523; 291-783; 32-524; 406-902; 462-957; 1-494; 399-892; 498-986; 27-517; 383-871; 22- 502; 1-545; 21-561; 406-950; 7-536; 1-538; 1-537; 560-1031; 27-562; 1363-1895; 27-560; 84-607; 1-523; 1-584; 1-585; 26-610; 28-607; 26-605; 28-608; 28-606; 27-604; 270-843; 1-564; 28-602; 1- 572; 1-570; 1-569; 58-625; 27-585; 24-586; 27-586; 136-695; 565-1031; 321-877; 256-804; 31- 586; 1-553; 20-643; 27-639; 162-779; 125-742; 173-784; 1-611; 1-606; 7-607; 534-1031; 1-601; 17- 617; 1-595; 1-599; 178-775; 1-578; 42-734; 31-700; 171-844; 15-665; 27-681; 21-677; 1-658; 543- 1031; 25-658; 15-635; 39-277; 31-765; 356-591; 37-268; 795-1030; 1410-1643; 1415-1630; 27- 264; 645-882; 28-265; 1702-1939; 22-260; 445-685; 20-260; 1681-1921; 16-256; 279-854; 609- 850; 28-269; 533-775; 530-773; 25-267; 27-269; 550-786; 1517-1762; 324-568; 17-261; 354-599;
704-938; 1391-1605; 295-528; 645-878; 565-798; 1334-1551; 29-261; 1975-2203; 780-1012; 24- 255; 422-654; 243-474; 1977-2203; 27-257; 17-246; 688-917; 557-786;
1974-2202; 31-260; 28-257; 30-257; 1978-2203; 27-254; 32-259; 1975-2202; 796-1023; 1398- 1623; 1976-2202; 174-400; 17-243; 1585-1811; 27-252; 27-251; 478-702; 171-392; 27-248; 1376- 1597; 1681-1902; 1697-1918; 27-246; 31-250; 27-247; 27-245; 31-247; 229-445; 1992-2202; 21- 236; 1988-2202; 457-671; 1705-1918; 21-233; 1991-2202; 58-269; 1991-2203; 52-262; 566-767; 615-824; 40-245; 500-709; 15-223; 28-236; 187-394; 566-773; 324-498; 1363-1567; 792-999; 1976-2182; 1998-2202; 561-764; 21-224; 1717-1919; 16-217; 2002-2189; 218-419; 1974-2174; 2000-2198; 498-698; 2005-2185; 2004-2202; 2014-2194;
755-954; 20-216; 690-884; 2010-2203; 22-215; 2010-2195; 40-230; 1430-1615; 2014-2203; 321-
510; 1705-1893; 286-470; 580-767; 128-316; 2015-2201; 1705-1888; 1967-2150; 1344-1526; 560-
743; 1333-1515; 1339-1521; 300-481; 829-1011; 2021-2202; 1344-1525; 1333-1513; 1342-1521;
2024-2203; 1335-1514; 1343-1521; 2006-2182; 2023-2196; 1362-1536; 1334-1508; 1333-1507;
1430-1604; 1335-1507; 503-676; 1333-1505; 1350-1521; 27-198; 1681-1852; 1351-1521; 2034-
2203; 20-190; 1974-2141; 1336-1504;
TABLE 5
SEQ ID NOJComponent ID Fragments
121, cont. 1334-1501; 140-306; 2036-2202; 50-215; 2037-2202; 1339-1496; 27-191; 1358-1521; 15-179; 23-
187; 2010-2173; 1334-1494; 1361-1521; 1362-1521; 29-188; 21-179; 28-187; 2044-2202; 1363-
1521; 2045-2203; 1340-1496; 29-184; 2048-2203; 1334-1487; 1334-1486; 664-814; 1372-1521;
33-183; 2112-2203; 500-647; 1462-1608; 720-861; 2048-2188; 1334-1475; 1334-1474; 31-172;
698-837; 2042-2181; 1333-1471; 1334-1471; 1716-1852; 1386-1521; 1705-1840; 1705-1839; 321- 455; 1334-1466; 2067-2199; 1390-1521; 33-163; 21-148; 1334-1464; 1393-1521; 854-979; 259- 386; 1394-1521; 1333-1459; 1334-1460; 914-1041; 1334-1461; 2074-2199; 2077-2202; 1728- 1852; 1334-1458; 834-956; 2083-2203; 28-148; 1681-1802; 1743-1864; 2021-2142; 1334-1454; 2084-2203; 578-698; 1377-1496; 1733-1852; 2085-2203; 2083-2200; 1334-1452; 2085-2201; 1705-1822; 1681-1797; 1333-1449; 1773-1889; 1705-1820; 2083-2198; 2083-2196; 22-134; 27- 139; 1334-1446; 274-386; 1705-1817; 1771-1882; 1334-1445; 1741-1852; 16-126; 2092-2202; 1783-1894; 31-144; 1334-1442; 1432-1535; 1743-1852; 1334-1443; 24-130; 2098-2203;
1705-1811; 2010-2114; 2097-2202; 1418-1521; 1705-1809; 1705-1810; 1334-1436; 1706-1807; 1334-1434; 1422-1521; 28-116; 2104-2202; 1705-1802; 1705-1800; 1426-1521; 2123-2202; 28- 124; 2106-2203; 1334-1430; 29-116; 1334-1428; 22-109; 353-446; 22-111; 1705-1799; 1705- 1797; 22-113; 1760-1852; 1334-1424; 1344-1435; 2105-2195; 24-115; 1705-1793; 1334-1422;
2083-2172; 27-115; 1433-1521; 1766-1852; 1705-1792; 20-108; 1767-1852; 1369-1454; 1434- 1519; 1334-1420; 1436-1521; 1438-1521; 1806-1889; 1705-1788; 1705-1786; 36-115; 1441-1515; 1742-1822; 1334-1414; 28-111; 1771-1852; 1439-1521; 1760-1842; 1761-1841; 1336-1415; 1726- 1805; 1705-1785; 2125-2202; 1334-1411; 1334-1412; 31-109; 1777-1852; 1775-1852; 1344-1417; 1336-1407; 1705-1775; 1705-1774; 39-108; 1334-1403; 1783-1852; 1334-1401; 1455-1521; 1334- 1400; 1705-1770; 1334-1389; 1705-1769; 1705-1767; 1460-1521; 1825-1889; 1788-1852; 2138- 2203; 1334-1397; 1791-1852; 1334-1395; 1794-1852; 1698-1753; 1822-1882; 1461-1521; 1705- 1758; 1799-1852; 1698-1750; 1801-1852; 1334-138
122/LG:1328573.4:2002JAN18 1-461; 344-574; 447-1011; 501-707; 509-719; 631-853; 631-1205; 631-865; 631-852; 631-1162;
631-1146; 631-1344; 785-956; 847-1379; 861-1207; 905-1540; 925-1373; 1053-1376; 1077-1289;
1077-1366; 1139-1368; 1301-1611; 1348-1538; 1348-1722; 1424-1912; 1425-1542; 1457-1741 1541-1985; 1592-2098; 1652-1933; 1657-1927; 1711-1997; 1723-2207; 1740-2170; 1742-1980, 1777-2351; 1882-2411; 2014-2454; 2016-2418; 2023-2418; 2037-2314; 2050-2182; 2058-2418
2127-2448; 2141-2460; 2154-2418; 2159-2418; 2193-2411; 2202-2343; 2235-2363; 2257-2320; 2257-2313; 2257-2372; 2262-2320; 2303-2463; 2307-2418
TABLE 5
SEQ ID NOJComponent ID Fragments
123/LG:7692963.1 :2002JAN18 2121-2403; 1805-2230; 1614-2183; 1770-2122; 1710-2007; 1639-2007; 1614-2006; 1524-2005;
1209-1724; 1403-1705; 1195-1469; 953-1372; 832-1 173; 845-1 168; 510-1006; 446-1015; 636-
940; 486-930; 562-931 ; 407-822; 455-826; 181 -776; 241 -713; 494-713; 218-464; 227-464; 71 -448;
63-425; 82-365; 65-352; 97-335; 67-272; 105-272; 1-264; 67-136
124/LG7696423.1 :2002JAN 1-8 639-890; 346-855; 1-498; 1-62; 628-779; 613-761; 708-840; 664-820; 633-813
125/LG7696234.1 :2002JAN 18 1-579; 1-254; 5-209; 18-247; 18-251; 290-816; 348-615; 498-943; 534-739; 562-807; 310-552; 5- 233; 260-478; 327-588; 294-516; 278-843; 867-921; 679-769; 633-731; 647-843; 655-842; 33-240; 403-61 1; 390-620; 5-245; 388-631; 415-661; 386-651; 384-677; 374-604; 5-534; 418-808
126/LG: 1388299.1 :2002JAN 18 1-510; 179-815; 179-704; 179-687; 179-686; 179-663; 179-814; 180-815; 272-794; 286-81 1; 441- 1056
127/LG:978521 ,5:2002JAN18 1-555; 265-798; 425-750; 425-677; 425-659; 436-780; 439-827; 559-643
128/LG:7692599.9:2002JAN18 1-161; 12-293; 146-683; 157-698; 178-614; 291-528; 325-683; 426-647; 450-697; 487-662; 563- 812; 592-812; 606-891; 744-1269; 786-972; 786-956; 825-1447; 858-1483; 1020-1284; 1101-
1496; 1115-1482; 1166-1581; 1166-1399; 1199-1797; 1215-1314; 1271-1493; 1351-1575; 1394- 1616; 1394-1641; 1422-1620; 1452-2024; 1452-1701; 1478-1789; 1512-1696; 1516-1975; 1526- 1712; 1539-1978; 1543-2013; 1561-1978; 1561-2019; 1566-2013; 1570-2023; 1572-2023; 1586- 2017, 1591-2016; 1595-1978; 1609-2017; 1611-2017; 1617-2017; 1620-2023; 1631-2017; 1635- 2015.1645-2016; 1664-2018; 1661-1926; 1668-1956; 1678-1980; 1688-2005; 1690-2015; 1692- 2017, 1695-2014; 1698-2014; 1706-2024; 1708-2020; 1714-2016; 1715-2016; 1723-2041; 1755- 1889; 1761-2020; 1823-2017; 1826-2100; 1827-2045; 1851-2017; 1893-2017; 1899-2018; 1990-
2267; 2038-2487; 2090-2460; 2090-2488; 2094-2428; 2224-2411;
TABLE 5
SEQ ID NOJComponent ID Fragments
128, cont. 2287-2898, 2287-2481 2287-2863; 2385-2503 2385-2488, 2386-2521 2389-2537 2393-2488.
2397-2507, 2393-2516; 2393-2966 2401-2981 2434-2516; 2625-2972. 2631-3023, 2639-3085;
2643-2781 2644-2895; 2645-2918, 2645-3100 2649-3091 2650-2887, 2657-2901 2660-2960,
2667-2835. 2668-2832, 2668-2884, 2668-2852, 2668-2761 2668-2794, 2668-3085; 2669-2798,
2669-2931 2671-2937, 2678-3085, 2668-2727, 2668-2908, 2668-3242, 2668-3178, 2669-2984,
2669-2803 2668-28402668-3116, 2668-2904 2669-2764; 2688-3085, 2692-2931 2689-3085
2705-3085; 2692-2948 2694-2922, 2712-3085, 2730-2973, 2725-2963, 2727-3072, 2744-2995;
2735-3085; 2758-3085, 2795-3185; 2796-3085; 2790-3085, 2806-3076; 2806-3062, 2814-3116,
2806-3073, 2855-3085, 2884-3116, 2929-3219 2946-3155. 2951-3203 2959-3238, 2976-3262,
3000-3155; 3004-3256, 3042-3313, 3043-3109. 3045-3221 3062-3348 3076-3256; 3088-3331
3108-3190 3123-34053119-3324; 3135-3190, 3138-3190 3145-3421 3156-3416 3165-3297
3182-3417 3188-3430, 3210-3522; 3214-3368 3211-3487
3233-3531 3248-3496; 3248-3513, 3248-3535; 3309-3585, 3302-3499; 3307-3743 3316 3522,
3319-3584; 3329-3574; 3360-3443, 3360-3627 3386-3961 3377-3466; 3378-3443, 3380-4114,
3398-3675; 3434-3675; 3469-3756.3460-3702 3462-3756; 3495-3610, 3531-3787, 3537- 3690,
3537-3726; 3538-3726; 3563-3787, 3599-3696 3588-4149; 3556-3851 3591-3732; 3594-4104,
3599-3665, 3599-3752, 3618-3892; 3622-3893, 3658-4113, 3658-4103, 3649-4115.3658-4155,
3667-3919, 3740-4113, 3780-3943, 3786-4246, 3791-4077, 3786-4063 3805-4061 3797 ■4005,
3834-4199, 3826-41533843-4265, 3833-4247, 3835-4251 3851-4303. 3843-41273848 ■4305,
3857-4067, 3872-4052; 3896-4022, 3903-4158 3905-4128, 3908-4155. 3914-4218, 3908 •4138,
3911-42533911-4155; 3913-41583917-3982 3922-4118 3930-4186, 3949-412239384297
3965-4168.3965-4238, 3965-4216, 3954-4151 3982-4174; 3983-4247 4086-4253.4086■4215;
4096-4215.4109-4251 4185-4251 4211-4483, 4275-4584, 4324-4897, 4347-4897, 4691 •4897
TABLE 5
SEQ ID NOJComponent ID Fragments
129/LG:!452678.13:2002JAN18 1-421; 1-408; 3-1882; 26-382; 80-647; 93-366; 94-296; 110-647; 607-1226; 787-1326; 800-1028; 814-1351; 821-1376; 839-945; 839-975; 861-1181; 1138-1638; 1232-1943; 1249-1594; 1248-
1769; 1307- 1885; 1308- 1399; 1380- 1904, 1427-2028, 1443-1888 1464-1882; 1475-1884; 1525- 1848, 1532- 1882, 1549- 1890.1560- ■1882, 1628-1885, 1634-18821641-1857, 1651-1853; 1657- 1882.1663- 1857, 1715- 1864; 1744- ■1857, 1750-1857, 1768-1859; 1768-1857, 1788-1857; 1826- 1882.1964- 2188, 1965 2039, 2072- •2330, 2072-2574; 2101-2629; 2109-2367, 2131-2277; 2131- 2789, 2565- 3143, 2650 2925, 2668- •3144; 2684-3196, 2690-3196; 2699-3253, 2763-3418; 2931- 3202, 2951- •3301 2951 4168, 2953- -3573, 2951-3179, 3046-3293, 3217-3775, 3259-3591; 3315- 3681, 3317- •3544; 3359 •3811 3359- -3567 3383-3607, 3396-3659; 3443-3629, 3446-3948; 3450- 3965; 3457- 4089; 3462 ■3686, 3488- -3701 3495-3744; 3521-3742, 3525-3772, 3545-4032; 3585- 4093; 3580 ■3844; 3625 ■4080, 3642 -4154; 3685-4161 3713-4080; 3741-4080; 3743-4138; 3827- 4089, 3832 ■3983, 3847 4193 3875 -4168, 3998-4080
130/LG:332947.1 :2002JAN18 1-530; 1-596; 1-622; 7-569; 6-620; 449-1112; 553-986; 572-1108; 614-986; 614-946; 614-827; 614- 819
131 /LG:1292520.13-.2002JAN 18 1-614; 12-220; 21-529; 25-186; 47-274; 47-270; 46-516; 207-651; 333-864; 403-662; 453-676; 482-
897; 574-1165; 698-1 186; 974-1415; 1091-1728; 1196-1705; 1 196-1520; 1196-1432; 1196-1307;
1413-1764; 1547-2128; 1664-2061; 1703-2123; 1753-2286; 1754-2019; 1812-2325; 1812-2061
132/LG7750009.1 :2002JAN18 1-528
133/LG:238322.4:2002JAN18 1-605; 1-236; 236-722; 290-895; 292-552; 302-410; 390-1018; 438-1069; 454-990; 459-3865; 486- 1139; 522-705; 570-733; 729-1375; 798-1107; 819-1452; 1024-1487; 1056-1446; 1115-1552; 1317-1572; 1336-1797; 1366-1563; 1397-1875; 1423-1856; 1481-2108; 1516-2115; 1618-1816; 1628-2113; 1682-1903; 1721-2225; 1802-2334; 1860-2249; 2023-2598; 1929-2419; 2030-2613; 2034-2624; 2096-3865; 2131-2411; 2131-2388; 2131-2318; 2132-2327; 2133-2384; 2133-2373; 2134-2393; 2135-2384; 2151-2791; 2221-2548; 2224-2648; 2245-2672; 2301-2908; 2250-2618; 2234-2626; 2234-2605; 2235-2621; 2235-2603; 2235-2634; 2235-2606; 2234-2561; 2235-2528; 2235-2522; 2237-2555; 2237-2545; 2237-2543;
TABLE 5
SEQ ID NOJComponent ID Fragments
133, cont. 2237- •2530, 2289- 2583; 2237-2522, 2237-2514; 2302- ■2523; 2302-2495; 2238-2551; 2238-2547; 2238-■2523.22382522; 2238-2515, 2238-2512; 2303-•2411; 2240-2515; 2260-2534; 2319-2829, 2273-•2504; 2423 ■2696; 2467-26762491-2732; 2545-2803; 2495-2749; 2653-2996; 2688-2904; 2884-■3385; 3058 •3348; 3057-3781 3059-3348; 3125-•3474; 3150-3362; 3156-3253; 3178-3432, 3192-•3758, 3225 ■3510; 3227-3607 3226-3491, 3284-■3535; 3323-3959; 3335-3762; 3349-3793, 3371-■36293383 ■3761; 3383-3744, 3413-3865, 3414-•3891; 3414-3665; 3432-3639; 3449-3682 3457--3648, 3473-3807; 3493-4230, 3516-3869; 3547--3831; 3550-3679; 3561-3700; 3583-3865; 3649--3835; 3667 ■3858
134/LG:7694382.4:2002JAN18 1-603; 1-650
135/LG:1329198.3:2002JAN18 1-531; 1-526; 34-206; 38-731; 148-732; 339-833; 436-967; 444-967; 468-732; 701-1222; 836-
1346; 846-1358; 855-141 1; 859-1099; 1072-1697; 1413-1663; 1427-1925; 1490-1922; 1672-2264; 1832-2344; 1832-2204; 1832-2509; 1832-2457; 1991-2622; 2057-2619; 2070-2622; 2278-2539
136/LG:345314.33:2002JAN18 1-249; 1 1-249
137/LG:215030.7:2002JAN18 1-632; 3-640; 8-437; 7-624; 17-564; 24-171; 23-159; 21-454; 25-605; 25-607; 27-590; 30-620; 27- £ NO 653; 30-460; 32-622; 34-590; 36-404; 34-491 ; 35-660; 48-541 ; 45-489; 50-172; 53-539; 57-603; 59-
486; 62-528; 63-618; 64-584; 70-718; 68-2408; 78-648; 78-694; 82-635; 82-653; 91-493; 86-654; 88-579; 101-591; 99-712; 99-661; 99-655; 99-578; 99-583; 98-376; 99-641; 99-707; 99-522; 103- 466; 123-551; 202-592; 202-390; 205-402; 224-712; 261-492; 282-784; 293-890; 303-787; 304- 430; 305-552; 305-507; 319-791; 319-556; 320-853; 320-732; 320-538; 326-618; 359-582; 380- 545; 409-665; 409-612; 418-665; 418-610; 423-667; 440-727; 445-962; 483-761; 488-717; 510- 618; 541-976; 543-901; 558-927; 579-824; 583-1206; 618-877; 626-901; 638-763; 645-896; 665- 832; 660-1084; 671-1446; 661-1149; 711-914; 713-962; 681-922; 737-929; 690-1113; 727-978; 691-1119; 691-957; 692-1048; 694-1085; 694-1171; 694-921; 694-1088; 694-1048; 695-1008; 694- 986; 694-856; 695-1083; 695-1079; 695-997; 697-1151; 698-1011; 698-1012; 697-852; 698-1289;
698-1319; 699-989; 701 -852; — - — —
■
TABLE 5
SEQ ID NOJComponent ID Fragments
137, cont. 701-1059; 739-1005; 700-1003; 714-1238; 717-933; 708-1210; 710-937; 758-883; 712-876; 723- 917; 723-1305; 721-990; 731-991; 732-1204; 734-1139; 736-1120; 727-1028; 737-1038; 737-939; 742-1307; 741-1015; 779-1036; 742-889; 782-1172; 826-1100; 843-1128; 850-1118; 789-1274; 803-1072; 847-1368; 816-1249; 840-1463; 841-1457; 804-1206; 806-1251; 841-939; 769-1254; 862-1400; 812-1041; 827-1315; 856-1393; 827-1229; 884-1154; 839-1261; 889-1186; 891-1105; 894-1151; 894-1090; 898-1162; 849-1332; 900-1168; 848-1246; 903-1076; 894-1484; 905-1097; 859-1349; 898-1437; 940-1205; 912-1209; 929-1227; 932-1109; 933-1233; 934-1138; 939-1319; 943-1241; 928-1183; 930-1387; 953-1071; 989-1272; 949-1518; 1000-1258; 978-1138; 1009- 1263; 966-1504; 960-1512; 973-1487; 1017-1265; 978-1518; 969-1550; 1027-1275; 951-1219; 981-1583; 957-1222; 997-1765; 964-1238; 1036-1478; 996-1150; 1005-1471; 974-1205; 1061- 1375; 1061-1328; 1016-1392; 1014-1620; 986-1206; 987-1233; 1073-1356; 994-1303; 1032-1489; 1046-1664; 965-1109; 1095-1416;
1045-1243, 1055-1607 1077- 1211; 1027- 1276; 1030- 1297; 1064- 1290; 1123- 1440; 1068-1310; 1069-1318; 1041-1338, 996-1147; 1045-1330; 1045-1327; 1000-1114; 1047-1330; 1049-1328; J 1134-1294, 1057-1490, 1097- 1543, 1098- 1396; 1099- 1332, 1152 1426, 1099- 1586; 1100-1268, o o 1100-1360; 1074-1347 1078- 1345; 1117- 1650; 1082- 1289 1109- 1344; 1083-1330; 1082-1286; 1121-1417, 1112-1611 1114 1363, 1113- 1341 1087- 1341 1125 1405, 1087-1314, 1116-1305; 1116-1268; 1126-1407 11SO- 1409; 1121- 1354; 1095- 1329, 1180 1473 1134-1424; 1182-1396; 1182-1318, 1099-1327. ll29 1338, 1138- 1394.1102- 1287, 1134 1443, 1107-1341.1135-1545, 1145-1643, 1109-1373, 1148 1406, 1151- 1584, 1115- 1339 1145 1724; 1153-1268 1147-1370, 1159-1393, 1122-1338 11-50- 1268, 1163- 1425; 1125- 1317, 1137 1361 1154- 1708 1167-1433, 1169-1541 1172-1435, 1145 1739; 1173- 1456; 1173- 1399; 1174 1397, 1182-1919; 1138-1337, 1178-1399 1140-1330.1179- 1411 1183- 1436, 1183- 1518, 1196 1690; 1194-1761 1204-1402, 1206-1494; 1240-1647, 1213 1488; 1215- 14361180- 1269, 1209 1327
TABLE 5
SEQ ID NOJComponent ID Fragments
137, cont. 1210-1467, 1222-1483, 1186-1477, 1223-1494; 1224-1486 1229-1809; 1230-1495; 1232-1484,
1233-1441 1234-15581236-1486; 1273-1590; 1242-1486; 1235-1519; 1248-1456; 1197-1436,
1254-1476.1286-1747, 1255-1394, 1199-1432, 1260-15051261-1510; 1254-1495; 1265-1527,
1269-1565; 1304-1554, 1273-1520, 1289-15821295-1575; 1307-1590; 1312-1441; 1312-1400;
1302-1621 1353-1598, 1360-1614, 1372-1639; 1375-1738, 1451-1590; 1420-1603; 1423-1679;
1425-1692, 1425-1619, 1429-1733 1429-1713, 1429-1679, 1408-2185; 1475-1718; 1536-1799,
1542-2074; 1542-1803, 1543-1840, 1543-1798, 1531-1683, 1565-1836; 1548-1662; 1578-1841
1579-1769, 1592-1873, 1611-1879. 1605-1851 1631-1735; 1616-1784; 1612-1751; 1632-1865,
1651-2040, 1659-1750, 1722-2349; 1759-2272 1754-2310, 1768-1870; 1813-2381; 1752-2032,
1860-2368; 1864-2093, 1867-2388; 1869-2385; 1901-2137, 1969-2129; 1897-2127; 1992-2563
1973-2214; 2061-2147.2079-2133, 2056-2413, 2145-2252, 1-514; 1-354; 1-258
138/LG:383884.26:2002JAN18 951-1553; 1206-1552; 1271-1552; 1112-1550; 1101-1548; 1058-1547; 1068-1547; 1085-1547; 1087-1547; 1094-1547; 1103-1547; 1262-1547; 1163-1547; 1238-1547; 1280-1547; 1281-1547; 1301-1547; 1302-1547; 1072-1542; 1302-1542; 1302-1545; 1319-1540; 1261-1541; 1263-1541; 1076-1538; 931-1532; 976-1523; 912-1523; 1064-1518; 1093-1510; 1104-1510; 1095-1510; 1105- J o 1510; 1107-1510; 979-1509; 1240-1510; 1236-1510; 1257-1510; 1282-1510; 1302-1510; 1301- 1510; 1316-1510; 1320-1510; 988-1508; 1076-1508; 1280-1510; 1029-1506; 998-1505; 1283- 1500; 1302-1476; 912-1473; 772-1474; 796-1468; 903-1468; 999-1466; 912-1463; 1302-1420; 1319-1385; 1079-1379; 917-1348; 925-1194; 1064-1157; 918-1150; 912-1122; 364-1104; 364-
1071; 364-956; 364-905; 364-883; 1-669
139/LG:413518.62:2002JAN18 1-582; 59-904; 96-711; 109-803; 108-323; 205-367; 356-900; 438-828; 623-1171; 634-1171; 635-
885; 707-892; 878-1153; 899-1464; 981-1468; 1115-1641; 1458-2028; 1541-2028
140/LG:903138.45:2002JAN18 1-538; 253-516; 261-401; 324-612; 375-784; 429-745; 447-687; 448-639; 450-784; 450-591; 466- 1012; 513-784; 544-784; 669-748; 800-1070; 858-1442; 857-1222; 1003-1612; 1053-1605; 1089- 1539; 1539-1962; 1539-2088; 1805-2317; 1805-2218; 1867-2484; 1911-2297; 1982-2260; 1989- 2422; 1989-2168; 1991-2253; 2049-2276; 2157-2399; 2189-2506; 2236-2470; 2241-2896; 2262- 2490; 2262-2391; 2296-2656; 2297-2598; 2297-2589; 2297-2592; 2305-2873; 2330-2629; 2330-
2405; 2346-2597; 2365-2550; 2437-2670; 1065-1746
TABLE 5
SEQ ID NOJComponent ID Fragments
141/LG:1377804.32:2002JAN18 1-505; 1-566; 3-294; 84-356; 89-401; 105-631; 142-553; 142-405; 267-540; 299-758; 299-872; 299-
918; 413-865; 433-672; 585-1062; 613-1296; 673-1067; 696-894; 833-1079; 839-1382; 845-1087; 868-1491; 927-1346; 927-1107; 941-1509; 974-1505; 1047-1232; 1054-1592; 1085-1306; 1090- 1351; 1098-1591; 1170-1522; 1177-1501; 1184-1596; 1184-1481; 1204-1502; 1213-1462; 1208- 1543; 1209-1592; 1209-1466; 1210-1577; 1211-1439; 1217-1480; 1226-1478; 1245-1477; 1264- __ 1519; 1305-1555; 1407-1810; 1618-1826
142/LG:1390822.13:2002JAN18 1-625; 281-881; 302-547; 340-860; 343-574; 425-1041; 446-987; 526-780; 555-941; 792-973; 925-
1223; 934-1176; 935-1237; 1022-1519; 1029-1579; 1029-1637; 1037-1150; 1047-1328; 1052- 1312; 1057-1556; 1112-1219; 1150-1712; 1211-1574; 1211-1452; 1243-1535; 1252-1741; 1252- 1454; 1266-1806; 1265-1769; 1265-1591; 1273-1688; 1426-1943; 1461-1707; 1524-2102; 1589- 2144; 1606-2203; 1628-1796; 1676-2147; 1676-1904; 1681-2290; 1681-2208; 1679-1945; 1709- 1954; 1747-2311; 1779-1895; 1800-2316; 1920-2398; 1923-2196; 1924-2202;- 1994-2286; 2004- 2182; 2066-2342; 2086-2348; 2206-2479; 2210-2788; 2243-2415; 2250-2524; 2270-2510; 2298- 2542; 2333-2570; 2336-2431; 2491-3008; 2508-2777; 2704-3144; 2780-3363; 2779-3321; 2805- 3368; 2867-3131; 2872-3123; 2875-3369; 2882-3131;
2917-31962955-3176 3015-3471; 3018-3182; 3012-3402; 3018-3307; 3022-3320; 3024-3538; 3024-3222; 3027-3548, 3028-3241; 3031-3321; 3033-3423; 3044-3269; 3051-3591; 3048-3283; 3051-3584; 3051-32863051-3624; 3053-3288; 3053-3278; 3055-3245; 3095-3339; 3103-3185; 3155-3760; 3168-3287; 3170-3419; 3174-3471; 3174-3438; 3182-3254; 3190-3632; 3206-3447; 3206-3750; 3211-3434; 3213-3672; 3233-3504; 3249-3567; 3255-3567; 3283-3543; 3287-3850; 3289-3749; 3291-3774; 3307-3543;
TABLE 5
SEQ ID NOJComponent ID Fragments
142, cont. 3317-3497 3337-3573 3353-3600; 3374- 3838; 3387-3625, 3395-3834; 3399- 3680. 3402-3649, 3403-3564, 3411-37363407-3616, 3412-3966; 3424-3815; 3432-3864; 3433- ■3680 3440-3736; 3486-3855; 3500-3680, 3512-3777, 3532-3997; 3537-4062, 3554-3687.3578- ■3920; 3581-3881 3584-3878. 3583-3824; 3601-3914; 3601-•3904; 3634-3896; 3641-3868, 3644- ■3915; 3647-3888, 3662-4024. 3662-39373662-3900, 3667■3943; 3681-3924; 3693-4241 3695- ■3908, 3696-3892, 3698-3963, 3697-3950.3699-3966, 3699•3897; 3700-3943, 3700-3940; 3699- ■3911 3701-3924. 3701-3813, 3702-3954; 3704-3994, 3704■3963; 3704-3787. 3705-3947, 3709- ■3997, 3710-4232, 3710-3924; 3712-4118, 3713-3972, 3713■3956; 3714-4260, 3714-3814; 3716 -3909, 3721-4115; 3719-3994, 3733-4026, 3741-4015; 3741 ■3939; 3741-3890; 3750-3994; 3769- -4164, 3773-4252; 3776-3836, 3780-4041 3814-4088 381 ■4040; 3829-4281 3835-4046; 3844- -4056, 3845-4128 3854-4385, 3862-4419; 3861-4154, 3861 -4103; 3864-4412, 3864-4118, 3893- -4272, 3905-4327 3913-4214; 3934-4381 3939-4200; 3943-4057; 3959-4117,
3961- 4310; 3968-4523, 3990-4254; 4000-4249 4002-4242; 4002 •4214; 4005-4269; 4005-4230; 40104125, 4015-4545, 4014-4302, 4015-4221 4010-4421 40344282, 4046-4577, 4057-4294; 40584309, 4052-4550. 4089-4721 4096-4650, 4099-4336, 4099■4360; 4103-4582, 4115-4582; 4126-4721; 4146-4424; 4162-4615; 4167-4667 4178-4385; 417547474177-4451, 4176-4441 4178-4441, 4181-4582; 4182-4435; 4187-4721 4190-4721 4200 •4480, 4208-4489; 4213-4747, 4213-4429; 4216-4535; 4228-4481 4230-4582; 4231-4315; 42344504, 4234-4500, 4234-4647, 4246-4582, 4249-4414; 4257-4408, 4258-4518, 4260-4559, 4262■4537, 4270-4518, 4272-4503 4275--4565; 4278-4563. 4279-4513, 4281-4538. 4286-4762. 4291 ■4515; 4293-4698, 4311-4563 4321 -4700; 4324-4432; 4324-4579, 4326-4924, 4326-4529 4328'•45634330-4924; 4337-4912; 4351 -4563, 4356-4585, 4357-4561 4366-4776, 4362-4579, 4369■4579, 4372-4579; 4374-4579, 4388-4579; 4395-4579, 4402-4985, 4407-4648, 4410-4677; 4410•4673, 4428-4698; 4448-4716, 4457-4695; 4467-4719, 4479-4743, 4485-4754; 4503-5049; 4518-4579;
TABLE 5
SEQ ID NOJComponent ID Fragments
142, cont. 4519-4772; 4527-5084, 4534-4902, 4548-5151; 4587-4832, 4587-4821 4589-4823; 4597-4819, 4597-4800; 4597-4795; 4597-4790; 4597-4749; 4603-4854, 4606-4747, 4615-4900, 4615-4804; 4616-4872; 4617-4698, 4629-4899; 4632-4952, 4639-4890, 4672-4944; 4672-4881 4674-4944; 4674-4858; 4683-4824, 4689-4939, 4689-4935, 4701-4987, 4703-5061 4704-4949; 4708-4976, 4710-4914; 4714-5244, 4714-5047 4711-4992, 4714-4909, 4714-4971 4714-4960; 4714-4776, 4714-4774; 4717-4979; 4728-4868, 4734-4833, 4740-5035, 4741-5020, 4747-5030, 4758-4983, 4766-5297; 4774-51734780-5287, 4780-5003, 4784-5265, 4790-5264; 4798-5016, 4800-5065; 4803-5264; 4803-5078, 4808-5259, 4816-5301 4818-4938. 4821-5258, 4823-5028, 4826-5358. 4826-5248; 4837-5325, 4844-5126, 4845-5265, 4845-5002, 4847-5115, 4848-5065, 4851-5266, 4851-5100; 4855-51074861-5160; 4861-5125; 4861-5100. 4862-5112, 4862-51084865-5094; 4870-5069; 4870-5182, 4872-5119; 4878-4999, 4879-5181 4880-5131 4880-5128 4884-5057, 4888-5143; 4890-5123, 4891-5085, 4899-5127, 4903-5138,
4903-5157; 4905-5285; 4906-5228; 4909-5025; 4914-5187; 4914-5180; 4914-5150; 4918-5172; 4922-5145; 4924-5150; 4925-5174; 4925-5156; 4933-5195; 4936-5424; 4938-5205; 4942-5264; 4944-5126; 4965-5349; 4966-5216; 4966-5173; 4967-5234; 4968-5233; 4969-5094; 4972-5225; 4973-5225; 4974-5241; 4974-5219; 4984-5240; 4990-5224; 5016-5532; 4998-5244; 5000-5265, 5002-5230; 5002-5232; 5002-5208; 5027-5113; 5007-5224; 5007-5217; 5026-5659; 5048-5598 5031-5304; 5051-5756; 5050-5564; 5034-5260; 5036-5442; 5062-5376; 5069-5558; 5051-5173, 5053-5324; 5073-5349; 5077-5641; 5070-5288; 5074-5392; 5074-5364; 5074-5328; 5073-5259; 5076-5180; 5076-5259; 5098-5651; 5080-5368; 5081-5259; 5082-5259; 5141-5711; 5090-5259, 5091-5255; 5098-5320; 5098-5259; 5098-5363; 5098-5151; 5126-5395; 5111-5398; 5111-5389; 5135-5711; 5118-5212; 5119-5360; 5142-5778; 5124-5259; 5130-5397; 5135-5380; 5156-5711 5155-5608; 5136-5259; 5164-5774; 5139-5397; 5139-5362; 5165-5467; 5147-5412; 5150-5472; 5178-5385; 5186-5713; 5188-5459; 5168-5457; 5178-5426; 5178-5412;
TABLE 5
SEQ ID NOJComponent ID Fragments
142, cont. 5198-5473, 5198-5382; 5202-5699; 5179-5419; 5200-5502; 5205-5514; 5206-5503; 5187-5451
5194-5445, 5198-5475. 5198-5461; 5198-5429, 5200-5461; 5247-5711; 5204-5450, 5204-5447
5225-5673 5205-5436; 5206-5258; 5208-5409, 5230-5711; 5237-5714; 5237-5711 5243-5712,
5244-5714 5246-5675, 5247-5677; 5230-5299, 5252-5715; 5258-5713; 5258-5711 5262-5711
5263-5543, 5268-5565; 5270-5549; 5272-5714; 5273-5534; 5279-5539; 5279-5518, 5281-5713,
5281-5541; 5285-5483, 5286-5534; 5286-5518, 5286-5516; 5286-5515; 5286-5512, 5286-5493.
5267-5481, 5290-5713 5294-5528; 5300-5713 5297-5622; 5301-5713; 5302-5712 5302-5547,
5304-5503, 5286-5564; 5305-5711; 5305-5697 5286-5458; 5286-5407; 5286-5416; 5306-5713,
5308-5414, 5311-5582 5293-5387; 5315-5716, 5315-5506; 5316-5710; 5321-5709; 5325-5717
5324-5711; 5325-5709; 53+B19529- •5717; 5329- -5710; 5330-5713; 5330ι-5712; 5333 -5708; 5333
5627; 5346 -5575; 5327 -5708; 5347- 5711; 5349- •5579; 5358--5712; 53615717; 5360 -5708; 5345*
5468; 5367 -5575; 5370 -5711; 5380- ■5716; 5384- -5603; 5387-5603;
5388-5663, 5388- 5712, 5389-5717 5390-5713, 5390-5696, 5391-5713 5392-5713, 5392-5711
5393-5714; 5393- 5713, 5394-5713. 5396-5710, 5395-5597, 5396-5713, 5400-5712, 5403-5715,
5404-5694, 5405- 57075405-5710; 5407-5711 5408-5711 5410-5705. 5414-5714, 5416-5670,
5415-5711; 5421- •5674; 5421-5710; 5425-5713; 5429-5713; 5435-5658 5434-5711 5434-5713;
5435-5709, 5436- 5709, 5436-5710, 5437-5712, 5438-5710, 5440-5710, 5441-5715 5447-5711
5450-5658, 5454- ■5706; 5452-5713. 5461-5665; 5461-5713. 5464-5709, 5465-5696; 5466-5621
5467-5719, 5467- •5649, 5469-5709, 5478-5711 5477-5682, 5479-5711 5480-5696, 5486-5710,
5491-5711 5497- ■5581 5498-5636; 5511-5709; 5527-5690; 5530-5708, 5533-5713, 5535-5718,
5535-5713, 5536- ■5713.5541-5768, 5543-5712, 5547-5711 5533-5714; 5552-5711 5535-5711
5553-5710, 5553- -5705; 5552-5672, 5554-5711 5557-5687. 5576-5709, 5592-5711 5613-5711
5623-5716, 5625- -5711 5627-5706, 5636-5715, 5643-5711 5650-5709; 5660-5713; 4864-5112,
5158-5382, 4597- -4735; 5015-5264; 4677-4929; 4621-4798;
TABLE 5
SEQ ID NOyComponent ID Fragments
142, cont. 4972-5228; 4902-5156; 4465-4557; 5286-5430; 4970-5231; 4645-4906; 5219-5480; 4751-5017; 4741-5009; 4996-5264; 4689-4957; 4597-4757; 4842-5114; 4658-4796; 4939-5210; 4597-4778;

4615-4890; 4614-4888; 5122-5259; 4863-5130; 4844-4968; 4741-5021; 4714-4994; 4963-5246; 4862-5147; 4611-4802; 4408-4557; 4848-5144; 4851-5017; 4851-5104; 4668-4966; 4851-5074; 4928-5233; 4929-5241; 4862-5171; 4852-5192; 4787-5109; 4862-5206; 4913-5264; 4911-5260; 4914-5258; 4714-4947; 4922-5261; 4907-5264; 4911-5265; 4886-5265; 4884-5260; 4889-5264; 4890-5265; 2242-2640; 4875-5264; 4875-5263; 5291-5712; 5286-5698; 5276-5710; 5270-5712; 5279-5718; 5273-5712; 5276-5712; 5276-5717; 85-647; 5025-5255; 4970-5210; 4639-4798; 4885- 5129; 4851-5083; 4908-5138; 5029-5258; 4597-4719; 5040-5264; 4864-5085; 5047-5265; 4844- 5059; 4714-4927; 5052-5259; 4850-5056; 4907-5104; 4848-5044; 5155-5341; 4872-5051; 5178- 5349; 4407-4557; 4851-4954; 5151-5312; 5286-5443; 5173-5321; 5210-5341; 5276-5367; 4903- 5020; 4904-5020; 4965-5063; 5285-5365
143/LG:7698830.22:2002JAN18 1-760; 513 1114; 943-1240; 974-1274; 983-1205; 983-1232; 1065-1563; 1065-1421; 1065-1288 1065-1206. 1065-1199; 1065-1169; 1065-1139; 1075-1292; 1076-1215; 1079-1304; 1080-1204, 1102-1192; 1249-1408; 1462-2162; 1529-1983; 1531-2045; 1560-1805; 1574-1786; 1575-1858
OJ o 1576-1883; 1582-2039; 1584-1815; 1591-1835; 1592-1990; 1593-1837, 1600-1844; 1601-1823; 1602-1847, 1601-1831; 1602-1804; 1604-1821; 1605-1878; 1605-1822; 1605-1848; 1606-1845, 1607-1895; 1609-1879; 1611-1865; 1612-1889; 1615-1864; 1616-1880; 1616-1835; 1622-1838, 1623-1926; 1623-1886; 1624-1843;
1625-2075; 1627* 1908; 1628-1867 1631-1847 1636-2094 1639-2082; 1641-2094; 1645-1930;
1645-1907, 1645 1857, 1650-1992, 1651-1884, 1652-1882, 1652-1886; 1653-1798, 1654-1925,
1654-1909; 1654 1907, 1656-1939. 1661-2090, 1665-2090, 1664-1992, 1667-2088, 1666-1904,
1668-2091 1669 2091 1670-2091 1680-2132, 1669-1870, 1671-2094 1673-2093, 1673-2092,
1673-2090; 1674 2093 1674-2090 1675-2090 1676-2095; 1677-2093, 1677-2090; 1676-1947,
1676-1929, 1678 ■2091 1678-2089, 1679-2091 1679-1952, 1678-1943, 1678-1911 1680-1956
1680-2092, 1680 1929; 1682-2090, 1682-2091 1682-2088, 1683-2090; 1683-2089 1683-2091
1684-2089, 1684 ■2088, 1684-2091 1684-1864; 1690-1955, 1687-2092; 1688-2091; 1688-2090;
1688-1957, 1688 1945; 1688-1922, 1692-2094,
TABLE 5
SEQ ID NOJComponent ID Fragments
143, cont. 1694-2094, 1695-2092; 1695-2089; 1696-2089; 1700-2095; 1700-1939; 1700-2090; 1701-2093, 1701-2013; 1700-1925; 1700-1907; 1702-2098; 1702-2090; 1704-2090; 1705-2102; 1705-1933 1705-1929, 1704-1922; 1706-1942; 1709-1987; 1709-2050; 1711-2090; 1711-2088; 1722-1930, 1724-2004, 1725-1990; 1728-1990; 1730-2067; 1735-2018; 1740-2090; 1741-2090; 1745-2090; 1742-2067, 1742-1978; 1750-2090; 1746-1885; 1746-1986; 1748-2088; 1748-1958; 1749-2091 1749-2089, 1749-2005; 1750-2092; 1751-2089; 1752-1880; 1754-2090; 1760-2009; 1762-2066, 1764-2004; 1766-1995; 1770-1919; 1772-2104; 1783-2094; 1789-2090; 1788-2090; 1796-2094; 1796-2071, 1810-2053; 1817-2092; 1823-2086; 1829-2090; 1839-2090; 1840-2089; 1841-1957, 1845-2094; 1853-2089; 1854-2090; 1855-2066; 1858-2092; 1864-2089; 1870-2072; 1870-2070, 1871-2088.1873-2085; 1872-2090; 1869-1988; 1874-2090; 1876-2090; 1876-2082; 1879-2093, 1886-2090, 1888-2088; 1889-2090; 1896-2090; 1914-2093; 1934-2095; 1968-2090; 1974-2091 1994-2089; 69-315; 22-265; 29-277; 1683-1931; 34-282; 22-267; 24-271;
1843-2090; 1848-2092; 48-295; 54-304; 56-306; 22-269; 182-431; 1846-2092; 1223-1310; 29-280; 67-316; 33-284; 1841-2090; 1838-2088; 22-270; 1903-2092; 45-298; 22-272; 34-289; 22-274; 22- 271; 1672-1929; 29-283; 1833-2089; 1201-1310; 1808-2064; 1835-2091; 46-304; 27-286; 22-277;
OJ o 24-282; 22-276; 22-273; 23-284; 29-289; 26-285; 24-283; 47-310; 22-281; 22-282; 1831-2092; 43- 305; 1829-2091; 33-294; 23-279; 28-289; 1811-2071; 86-349; 29-292; 22-284; 41-305; 35-302; 22- 285; 22-290; 22-287; 23-290; 43-310; 27-297; 22-289; 1825-2088; 211-480; 28-297; 1241-1310; 145-414; 1804-2069; 1714-1988; 22-293; 22-297; 30-306; 1814-2088; 30-308; 29-305; 1205-1310; 29-304; 42-319; 30-311; 44-324; 22-299; 1207-1310; 34-313; 28-307; 1807-2090; 1809-2092; 22- 298; 115-401; 155-440; 1805-2090; 1225-1310; 22-304; 22-301; 22-306; 1821-2086; 1805-2089; 1807-2091; 30-319; 33-321; 1805-2092; 1804-2090; 1809-2090; 22-305; 1775-2067; 22-311; 119- 410; 28-318; 22-317; 29-325; 48-348; 1797-2092; 26-333; 1783-2090; 37-343; 51-354; 41-344; 1787-2093; 54-370; 88-403; 1778-2091; 22-335; 1801-2088; 30-298;
TABLE 5
SEQ ID NQJComponent ID Fragments
143, cont. 33-345; 1782-2092; 1785-2092; 1780-2090; 54-363; 1203-1310; 1758-2090; 1745-2092; 1746- 2092; 1707-2059; 1747-2096; 1742-2090; 1747÷2094; 1744-2091; 1745-2089; 1747-2091; 1747- 2090; 1753-2092; 1748-2089; 1723-2090; 1724-2090; 1742-2092; 1740-2092; 1713-2095; 1708- 2090; 1710-2089; 1711-2089; 1720-2092; 1713-2090; 1697-2091; 1698-2088; 1704-2089;-1681- 2090; 1686-2093; 1685-2090; 1668-2088; 1779-2088; 1672-2092; 1683-2092; 1256-1310; 1257- 1310; 76-528; 30-490; 1733-1968; 1852-2088; 1684-1920; 29-265; 1754-1990; 1851-2088; 46- 283; 22-262; 69-308; 1852-2090; 31-269; 35-275; 67-307; 1733-1974; 30-271; 29-264; 22-263; 1848-2090; 48-290; 29-272; 1851-2092; 1847-2090; 34-277; 25-268; 1849-2089; 1847-2091; 29- 276; 32-275; 1850-2092; 29-263; 1862-2092; 32-265; 30-263; 29-262; 1698-1931; 59-290; 1864- 2090; 45-275; 61-289; 43-271; 1853-2062; 1870-2092; 1742-1968; 1865-2089; 40-264; 1867- 2090; 1717-1934; 170-391; 42-263; 1869-2090; 25-246; 1870-2090; 41-262; 1789-2008; 1710-
1928; 1870-2089; 1871-2089; 1784-2002; 1806-2024; 1870-2087; 1717-1933; 1875-2090;
1733-1949; 1872-2087; 1742-1957; 1804-2019; 1883-2092; 1878-2090; 1884-2092; 1878-2087; 55-264; 1857-2066; 1883-2090; 55-261; 29-234; 1887-2091; 1888-2090; 1881-2083; 1897-2092; 1900-2090; 1850-2049; 55-253; 1870-2069; 1868-2066; 1893-2090; 1894-2090; 1206-1310; 43- 231; 1068-1260; 1744-1936; 1799-1990; 1899-2089; 1902-2090; 158-344; 79-313; 1887-2093; 1885-2067; 81-261; 1922-2096; 1871-2047; 1919-2089; 1923-2092; 1784-1952; 316-480; 32-195; 350-488; 1905-2066; 1934-2094; 1936-2090; 1742-1886; 1940-2089; 366-511; 1952-2096; 63- 206; 1768-1911; 1881-2022; 1948-2089; 1951-2086; 1749-1884; 1959-2089; 65-185; 160-279; 1099-1213; 1065-1176; 241-351; 1957-2067; 1997-2090; 2004-2091; 1993-2095; 2000-2092; 1996-2088; 2001-2083; 2004-2090; 879-967; 2006-2090; 2018-2088; 38-195; 52-218; 45-218; 23- 201; 22-199; 22-198; 69-247; 47-229; 32-219; 31-224; 23-216; 28-222; 35-227; 22-217; 28-223; 32- 227; 34-236; 22-223; 32-207; 34-241; 22-232; 45-259; 22-236; 34-251; 22-237; 22-244; 29-254; 23- 249; 22-245; 29-256; 34-260; 29-260; 23-253; 24-256; 23-255; 22-253; 24-260; 22-260; 22-227
TABLE 5
SEQ ID NOJComponent ID Fragments
144/LG:7762105.20:2002JAN18 1-580; 3-668; 3-51 1; 4-149; 5- 88; 25-146; 43-162; 252 :-915; 292-880; 551-864; 621-1279; 681- 1214; 1008-1382, 1008-1162; 1009-1205; 1056-1466, 1056-1301 1134-1586, 1226-1749; 1598- 2172; 1616-2188, 1791-2213; 1859-2326; 1866-2330, 1869-2154; 1972-2254; 2084-2330; 2085- 2575; 2219-2505, 2386-2647; 2544-2850; 2568-3080. 2568-3125; 2900-3132, 3060-3817; 3165- 3567; 3299-3977, 3324-3860; 3345-3960; 3353-3965, 3355-3704, 3372-3614, 3388-3959; 3392- 4087; 3392-4002; 3401-4111; 3408-3968; 3394-3620, 3460-3733, 3498-3848, 3498-3693; 3498- 3704; 3502-4091 3514-4080; 3516-3768; 3527-3821 3532-3821 3538-4163, 3543-3838; 3548- 3821; 3567-3781 3567-3777; 3570-3866; 3570-3843, 3570-3839, 3570-3831 3576-3844; 3583- 3704; 3594-3865; 3594-3845; 3594-3822; 3598-3689; 3601-3687; 3604-3704; 3607-3827; 3615- 3850; 3617-3796, 3630-3704; 3636-4133; 3639-3941;
3643-3884; 3651-3842; 3673-4177; 3720-3919; 3723-3986; 3724-4093; 3724-3904; 3724-4167; 3730-3985; 3740-3990; 3749-4195; 3755-4197; 3754-4168; 3756-4192; 3759-4063; 3761-4192; 3761-4193; 3761-4058; 3767-4192; 3766-4192; 3769-4193; 3773-4190; 3777-4198; 3793-4197; 3796-4193; 3805-4194; 3828-4125; 3837-4112; 3879-4192; 3877-4192; 3892-4192; 3915-4099; 3915-4102; 3927-4183; 3986-4192; 4033-4192; 4065-4171; 9-260; 3838-4097; 31-294; 3924- 4192; 2495-2763; 1232-1406; 4-282; 3806-4084; 3906-4192; 1232-1417; 106-393; 61-352; 3-294; 3802-4099; 3894-4192; 16-318; 3876-4192; 3881-4192; 3884-4192; 3865-4192; 3866-4192; 3871- 4192; 3895-4189; 3873-4192; 3873-4190; 3855-4192; 3811-4194; 3807-4192; 3768-4192; 2572- 2991; 3770-4191; 3776-4192; 3780-4192; 3751-4043; 3750-4195; 3744-4192; 3761-4194; 3759- 4191; 3764-4193; 3771-4190; 1951-2543; 11-246; 3856-4091; 3-238; 19-258; 1307-1437; 1-237; 8- 239; 5-235; 9-237; 3866-4089; 3970-4192; 41-219; 10-214; 3808-4017; 3925-4133; 155-362; 3872- 4075; 3988-4186; 4008-4192; 3-186; 23-205; 3917-4077; 4071-4190; 3719-3791
145/LG:!382907.104:2002JAN181-449; 1-699; 260-532; 273-525; 295-438; 343-622; 396-626; 448-1095; 431-483; 600-1154; 620-
1145; 635-1184; 640-1235; 640-1274; 640-1271; 641-1262; 641-1238; 663-1252; 663-1209; 675- 1078; 680-954; 685-1226; 685-1231; 686-1250; 687-1243; 697-921; 714-1435; 718-1250; 726- 1258; 757-979; 781-1231; 783-1075; 787-1242; 792-1241; 800-1335; 815-1223; 819-1242; 827- 1230; 856-1248; 864-1133; 864-989; 898-1249; 902-1171; 909-1177; 916-1175; 920-1153; 943- 1005; 958-1177; 962-1250; 1006-1274; 1108-1193; 1130-1366; 1183-1716; 1196-1805; 1203- 1819; 1226-1392; 1262-1806; 1269-1815; 1269-1416; 1269-1374; 1261-1836; 1261-1812; 1274- 1365; 1264-1837; 1269-1945; 1269-1807; 1269-1958; 1257-1391; 1262-1379; 1277-1940; 1290- 1361; 1297-1838; 1287-1963; 1300-1977; 1293-1841; 1301-1959; 1322-1903; 1323-1940; 1326- 1982; 1342-1975; 1380-1969; 1395-1955; 1416-2018; 1464-1705; 1484-1947;
TABLE 5
SEQ ID NOJComponent ID Fragments
145, cont. 1491-1735, 1506-2037, 1533-1976, 1583-1957, 1579- 1866; 1578-2055; 1585-1805, 1600-1837,
1606-1875, 1607-1778, 1610-1928 1597-2057, 1613-1824; 1598-2055, 1599-2048 1600-1893
1600-1878.1617-18371617-1834; 1617-1833, 1617-1823; 1617-1824; 1617-1819, 1617-1910,
1617-1771 1617-1765, 1617-1756, 1617-1763 1617-1810; 1617-1716 1604-1841 1605-1781
1606-2057 1606-2020, 1606-1943, 1607-2055, 1608-2057; 1608-2055, 1608-1980, 1611-2012,
1611-1965, 1612-1984, 1613-1867, 1615-1670.1614•2058; 1615-2053, 1615-2055, 1615-2032,
1615-2018, 1617-2047, 1615-2017 1615-2054; 1615-2005; 1615-1956; 1615-1984; 1617-1902;
1615-1852, 1615-1888, 1617-1964; 1617-1944; 161 •1885; 1616-1895; 1617-2063, 1617-2054,
1617-2055, 1617-2058, 1617-2036, 1617-2045, 1617■2018; 1617-2017 1617-2014; 1617-2012,
1617-1990, 1617-1991 1617-1975.1617-1958, 1617-1953; 1617-1952, 1617-1942, 1617-1873
1617-1862, 1617-18561617-1851 1617-1847, 1617■1843; 1617-1807, 1617-1804 1617-1800;
1617-1798. 1617-1797, 1617-1794; 1617-1791 161 •1790;
1617-1788 1617-1792, 1617-1787 1617-1785. 1617-1736; 1617-1735 1617-1737 1617-1731
1617-1733 1617-1722, 1617-2066, 1617-2051 1617-2007 1620-2050, 1636-1856; 1623-2050
OJ 1623-1881 1626-2063, 1625-2049, 1627-2054; 1627-1813 1628-2050, 1628-1874, 1632-2058
>—»
O 1632-2059; 1631-2050, 1635-2057, 1635-1879, 1635-1845, 1636-1885; 1636-2055; 1636-1879,
1633-2055, 1639-1904, 1639-1861 1641-1927 1643-2059, 1643-2057 1658-1927 1658-1924;
1658-1913, 1659-1937 1645-1901 1648-1961 1661-1844; 1661-1820.1648-2059, 1663-1838
1649-2055, 1664-1853, 1650-1865; 1651-2057 1652-2055, 1651-1903, 1652-1902. 1654-2048,
1656-2056; 1656-1899, 1656-1870; 1656-1869; 1656-1863, 1654-2083; 1659-2055; 1661-2050.
1661-19721661-2048 1662-2057 1661-1931 1661-1889, 1661-1885, 1661-1860, 1662-1969.
1662-1923, 1664-1955 1664-1949, 1666-2055; 1666-1977, 1682-2057, 1669-1914, 1672-1938,
1677-2015; 1674-1916, 1674-1890, 1675-2028, 1676-2057
TABLE 5
SEQ ID NOJComponent ID Fragments
145, cont. 1677-1946, 1678-1948 1693-1921, 1679-1890; 1681-2056; 1684-1744; 1696-2020, 1683-2050. 1684-2057 1685-1918, 1686-2054; 1687-2055; 1687-2028.1691-1978, 1691-1962, 1692-1924; 1692-1894; 1693-1933. 1708-2048, 1694-19521696-2055; 1696-1984, 1696-2007, 1697-1952, 1697-1929, 1702-1951 1699-2061, 1700-2055; 1701-1983 1703-2007, 1705-2001 1708-2055, 1707-2048, 1708-1904, 1708-1891; 1709-1998, 1709-1944, 1714-1990, 1714-2048.1715-1934, 1716-2055; 1716-1984; 1716-1973, 1717-1971.1718-2058, 1719-1998, 1720-2022, 1721-2002, 1721-1969, 1721-1918; 1722-2021 1723-1994; 1725-2060; 1725-2018, 1725-2005; 1725-1956; 1725-1970, 1726-2055, 1728-2059, 1731-2002, 1732-2012, 1746-1966; 1732-2008.1733-2013, 1733-2007 1735-2016 1734-1962, 1734-2055.1735-2014; 1736-2003, 1736-2071 1737-2058, 1737-2057, 1737-2050, 1734-2048, 1741-2047.1742-2060, 1745-1947, 1743-1969; 1744-2053, 1747-1968, 1748-2048, 1748-1988, 1747-2054; 1749-2006, 1755-2078, 1751-2007; 1751-2055; 1752-2001 1752-1957 1753-2055; 1754-2017; 1756-2036; 1756-2058.
1759-2063; 1759- 2050; 1760-2054, 1760-2056 1761-1990; 1767-2057, 1763-2039; 1769-2059, 1770-20061775-■2055; 1777-2017; 1778-2025; 1779-2055; 1780-1997 1787-2049; 1787-2038 1788-1987, 1789-•2055; 1795-2048.1796-2055; 1796-2049; 1804-2058, 1808-2056; 1809-2057, 1810-2023; 1810-•2016; 1814-2055; 1815-2058, 1815-2054; 1816-2057 1819-2067; 1821-2063, 1822-2059; 1823-•2031; 1823-2055, 1827-2057, 1827-2054; 1828-2055, 1829-2049; 1831-2057, 1834-2048, 1844--2064; 1845-2055; 1856-2055; 1856-2021; 1860-2055; 1864-2057; 1882-2079; 1883-2057; 1888--2076; 1893-2048 1894-2016, 1898-2057; 1932-2055; 1932-2050; 1972-2057
1982-2057; 1983-2055; 1992-2056; 1999-2055, 2000-2057; 1811-2057; 749-1002; 992-1249; 882- 1147; 1109-1238; 1772-2056; 1599-1908; 1698-2056; 761-966; 855-1067; 1026-1238; 1269-1391; 1157-1222; 393-593; 1056-1249; 1874-2056; 1872-2055; 1867-2048; 1613-1788; 1065-1234; 339- 508; 1911-2063; 757-908; 1599-1731; 1909-2055; 1919-2056; 1299-1391; 1922-2055; 1962-2055; 1144-1235; 1599-1662; 1615-1684; 1994-2055
TABLE 5
SEQ ID NOJComponent ID Fragments
14ό/LG:294464.12:2002JAN18 1-349; 1-366; 1-363; 1-380; 1-481; 1-394; 1-480; 1-64; 6-406; 2-482; 1-395; 6-299; 2-54; 4-367; 1- 378; 1-490; 6-474; 3-297; 10-379; 9-469; 1 1 -458; 9-248; 10-378; 12-482; 3-458; 6-277; 3-203; 6- 536; 8-467; 6-646; 6-444; 6-87; 8-485; 8-482; 8-360; 9-474; 9-520; 9-444; 6-64; 1 15-320; 130-188; 7-378; 103-373; 103-374; 1 18-350; 1 18-283; 130-185; 134-390; 88-322; 1 19-331; 62-378; 1 17-
355; 1 18-416; 108-278; 108-160; 109-537 ; 108-517; 108-392; 1 10-401; 108-355; 1 17-400; 136- 372; 139-649; 159-612, 167-649; 218-296 ;, 191-740; 195-846; 242-377; 264-826; 264-594; 255- 495; 255-434; 267-749, 275-475; 287-752; ; 318-826; 309-521; 308-581; 309-577; 337-906; 330- 989; 348-606; 359-846; 366-986; 360-592; ; 365-651; 400-920; 413-776; 403-2169; 423-666; 424- 986; 446-629; 466-992, 485-695; 491-734. ; 506-774; 509-805; 510-936; 519-748; 519-740; 529- 983; 533-1052; 535-1059; 536-778; 536-1090; 528-624; 542-777; 540-747; 566-830; 567-630; 572-
685; 588-822; 591 -1160; 597-885; 605-2119; 605-868; 661-777; 672-1255; 684-1235;
689-1258; 696-938; 705-935; 713-1284; 719-1241; 721-1214; 744-1283; 745-1348; 746-1 147; 762- 1324; 764-1025; 768-992; 769-925; 772-1274; 790-983; 802-987; 813-1017; 813-1077; 816-1383;
828-1013; 837-1251; 840-1422, 843-1073; 844-1409; 848-1 1 16; 865-959; 869-1431; 858-1415; 877-1 125; 884-1082; 889-1254; 889-1 174; 889-1 142; 891-1279; 891-1439; 894-1 153; 897-1 107;
OJ h- » to 915-1389; 915-1386; 920-1 134; 920-1088; 920-1024; 922-1224; 923-1047; 934-1125; 942-1106; 943-1058; 938-1466; 956-1396; 956-1251; 965-1246; 986-1253; 995-1216; 995-1 148; 1000-1436;
1001-1238; 1007-1 183; 1012-1548; 1018-1616; 1030-1264; 1034-1 193; 1041-1260; 1051-1275, 1052-1435; 1054-1293; 1061-1349; 1074-1326; 1074-1294; 1074-1 134; 1085-1 195; 1086-1235; 1086-121 1; 1094-1218; 1096-1217; 1 122-1345; 1 125-1271; 1 126-1238, 1 154-1315; 1 172-1291 1 162-1387; 1163-1387; 1182-1281; 1173-1596; 1 197-1617; 1210-1396; 1215-1329; 1205-1396; 1212-1395; 1213-131 1; 1225-1601; 1240-1301; 1242-1333; 1244-1327, 1266-1333; 1266-1342; 1285-1342; 1289-1432; 1332-1459; 1474-2071; 1458-1795;
TABLE 5
SEQ ID NOJComponent ID Fragments
146, cont, 1507-2035; 507-2043. 1488-1767. 1513-2114; 1518-1931 1533-2108 1538-1607, 1533-2009;
1539-1610, 1544-2008, 1549-1715, 1549-1799, 1551-1609, 1539-1615, 1521-2113, 1606-211 ;
1607-2090, 1610-20041616-1859; 1614-1863 1632-2079.1634-1896; 1635-1942.1637-1920,
1641-2119.1641-2120.1641-1857 1641-1856, 1641-1752, 1641-1869, 1642-1882; 1642-1860;
1644-1896; 1586-1898 1629-1864; 1595-1753 1648-1916, 1597-1886; 1636-1897 1656-2120;
1607-1931 1657-2120; 1659-1937, 1660-2067 1663-1901 1663-1918, 1664-1884, 1667-1946,
1649-17341667-1942, 1666-2104. 1668-1907 1653-19161656-1860, 1675-21221626-2107,
1681-2011 1682-2114; 1681-1970 1685-2114; 1687-2113.1688-1918, 1694-1951 1698-2117,
1709-1993. 1702-1804; 1702-2123, 1704-2114; 1702-1941 1706-2120; 1711-2114, 1713-2114,
1712-1976; 1717-1905; 1713-2119; 1715-2121 1666-2019; 1716-2122; 1666-2114, 1666-1888
1728-2114, 1730-21141730-2134, 1730-2006 1731-1924, 1732-1988, 1687-1922.1734-1970,
1735-1819; 1739-2122, 1738-1982, 1740-2120. 1743-2114, 1745-2114,
1748-1987; 1750-1932; 1753-2005; 1755-1926; 1709-2013; 1762-2119; 1761-1990; 1763-1964, 1763-1963; 1762-2009; 1764-2120; 1774-2004; 1776-2122; 1778-1981; 1779-2119; 1780-2112; 1781-2012; 1779-1980; 1790-2119; 1791-2121; 1793-2114; 1797-2118; 1798-2110; 1799-1987
OJ OJ 1804-2119; 1804-2035; 1804-2033; 1804-2031; 1808-2052; 1812-2025; 1818-1990; 1767-2121 1824-2120; 1827-2055; 1829-2118; 1834-2107; 1838-2099; 1838-2057; 1843-2090; 1844-2110; 1847-2073; 1850-2119; 1850-2115; 1856-2169; 1804-2113; 1856-2101; 1858-2101; 1804-2117, 1857-2102; 1859-2065; 1808-2114; 1813-2117; 1870-2118; 1874-2088; 1875-2114; 1876-2119, 1879-2067; 1881-2086; 1887-2113; 1905-2114; 1908-2119; 1920-2122; 1921-2119; 1922-2119, 1926-2114; 1932-2119; 1933-2114; 1936-2119; 1939-2077; 1944-2120; 1947-2114; 1960-2114, 1961-2114; 1964-2119; 1986-2119; 1987-2114; 1987-2117; 2000-2114; 1968-2119; 1975-2119; 2034-2114; 2063-2119; 2108-2651; 2129-2394; 2139-2551; 2141-2506; 2188-2627; 2199-2404, 2279-2444; 2414-2513; 2530-2708; 1641-2016
147/LG:003736.32:2002JAN18 1-490,1-547,1-547,1-625,5-327,120-738,125-747,137-748,262-665
148/LG:1502253.2:2002JAN18 1-251; 1-260
TABLE 5
SEQ ID NOJComponent ID Fragments
149/LG:216797.51 :2002JAN18 1-260; 18-61 1; 67-706; 81-506; 81-372; 302-569; 346-493; 346-815; 543-952; 546-827; 766-1334, 769-1320; 802-1275; 816-1339; 825-1332; 839-1 163; 861-1332; 873-1330; 888-1304; 898-1377 898-1376; 898-1378; 915-1306; 928-1380; 931-1377; 945-1303; 936-1377; 957-1379; 962-1296, 967-1379; 968-1377; 967-1204; 970-1377; 979-1379; 979-1377; 979-1259; 983-1217; 993-1385; 997-1380; 1005-1365; 1041-1357; 1030-1223; 1031-1294; 1037-1383; 1043-1377; 1052-1314; 1060-1377; 1064-1377; 1070-1377; 1076-1356; 1 1 13-1364; 1 128-1375; 1 150-1310; 1201-1375; 1216-1339; 1223-1708; 1223-1479; 1249-1534; 1269-1582; 1271-1377; 1282-1373; 1288-1539; 1303-1383; 1307-1526; 1327-1383; 1392-1610;
1395- 1660, 1396- 1685; 1395-1453; 1395- 1592, 1399-1630, 1405- 1696, 1407-1664 1408-1697, 1408-1670, 1408-1653; 1408-1648 1409- 1596, 1409-1534 1411- 1673, 1413-1710, 1414-1646, 1423-1660, 1426-1982; 1434-1682. 1450- 1710, 1480-2012, 1488- 1740. 1516-1800, 1542-1733, 1567-1804, 1570-1734; 1590-1828, 1594- 1863, 1596-1938, 1596- 1824, 1624-1852 1639-2291 1651-1800; 1651-1896; 1651-2117, 1663- 1792, 1669-1894, 1671- 1891 1671-2231 1671-2010, 1695-18961695-1987; 1707-2265; 1707- 1891 1720-1934; 1742- 1996 1747-2048, 1750-1921 1755-■2016, 1764-2041; 1765-2048. 1767- •2049, 1775-2039. 1779- 1966, 1785-2084; 1791-2034;
OJ
I—' 1796•2068, 1809-■2003; 1810-2121 1833- ■2114; 1834-21261851- 2118. 1872-2265; 1872-2174 1876-•2146; 1887-•2342; 1887-2071 1887- •2164; 1901-1975; 1897- 2172, 1902-2160, 1908-2155, 1912-•2161, 1933-■2246; 1937-2142; 1954- •2255, 1966-2294; 1970- 2292; 1994-2218 2007-2206; 2018-•2292.2022■2310; 2021-2218, 2026 ■2271 2040-2317, 2039* 2181 2040-2651 2065-2283, 2088*-2505; 2089-2652; 2091-2354; 2099- -2594; 2106-2309
2118-2350, 2121- 2377, 2130- 2379; 2159-2444; 2180-2452, 2180-2433 2182-2563; 2182-2393 2186-2442, 2201- •2508; 2216•2469; 2222-2394; 2227-2465; 2229-2502; 2232-2704; 2232-2551 2232-2424; 2236 •2480, 2223-■2472; 2239-2357, 2233-2770, 2233-2450; 2233-2436. 2248-2423 2262-2495, 2273 •2551 2277•2485; 2278-2490, 2276-2923, 2275-2802, 2283-2536; 2294-2598, 2310-2559, 2314 •2554; 2316■2396; 2313-2799; 2321-2587, 2339-2885, 2339-2602 2341-2689. 2342-2619, 2342 ■2631 2357•2678; 2365-2737, 2373-2531 2364-2630, 2385-2693, 2400-2645; 2401-2672, 2404 ■2687, 2405 ■2585; 2405-2575; 2405-2558 2407-2680; 2413-2656, 2415-2976; 2443-2582, 2466 -2572, 2471 •2713; 2480-2713, 2482-2663, 2484-2765, 2492-2554; 2497-2687 2504-2792, 2509 -2726, 2509•2801; 2509-2763, 2518-2928 2518-2778, 2508-2983. 2508-2661 2529-2755, 2534 -3031 2523 -2784; 2548-2763, 2578-3155, 2580-2743 2580-2994; 2561-3137, 2584-2840, 2616 •2862, 2618 •2887; 2619-2860, 2624-2892, 2629-2928, 2633-2853, 2625-3137 2635-2778, 2656 -2961 2656■2907; 2671-2928; 2668-3043; 2668-3068,
TABLE 5
SEQ ID NOJComponent ID Fragments
149, cont. 2671-3043; 2672-3054; 2687-3017; 2690-3131; 2692-3043; 2695-3135; 2694-3043; 2709-2762, 2704-3256; 2712-2925; 2722-3126; 2713-2914;.2720-3179; 2725-3175; 2729-3177; 2732-3179, 2734-3043; 2736-3181; 2742-3171; 2742-3031; 2742-3000; 2746-3182; 2753-3179; 2753-2954, 2754-3181; 2754-3177; 2756-3177; 2756-3155; 2758-3174; 2760-3175; 2776-3136; 2772-3063, 2778-3178; 2778-3234; 2783-3070; 2790-3177; 2804-3092; 2804-3064; 2814-3080; 2823-3177 2845-3177; 2846-3100; 2846-3097; 2856-3170; 2856-3182; 2855-3126; 2859-3061; 2864-3177 2867-3259; 2868-3137; 2868-3111; 2873-3259; 2881-3177; 2905-3183; 2909-3174; 2916-3139; 2940-3170; 2940-3140; 2960-3256; 2991-3166; 2993-3169; 2994-3177; 2998-3177; 3018-3175; 3024-3177; 3061-3171; 3065-3175; 3066-3177; 3096-3171; 3097-3177; 3089-3256; 2819-3029, 2694-2923; 2819-3050; 1472-1734; 2231-2507; 2692-2970; 2730-3022; 2692-3010; 2723-3043, 2711-3045; 2706-3043; 2723-3050; 2694-3025; 2686-3043; 2584-3060; 1956-2524
150/LG:7685287.118:2002JAN181-511; 97-760; 98-510; 156-682; 187-510; 214-514; 235-714; 241-494; 242-511; 237-531; 266-
756; 267-512; 267-501; 291-511; 299-886; 315-510; 332-910; 352-909; 363-937; 382-900; 377- 510; 394-611; 385-510; 405-970; 425-625; 416-677; 448-967; 452-755; 452-1002; 543-832; 543- 927; 557-661; 715-962; 729-1529; 768-1302; 744-973; 778-1350; 801-1328; 801-1408; 801-1424;
OJ
I— '
Ui 802-1340; 801-1121; 802-1351; 804-1242; 829-1422; 854-1424; 868-1479; 873-1009; 936-1237; 981-1711; 964-1215; 1011-1251; 1053-1128; 1058-1244; 1042-1662; 1068-1313; 1071-1296; 1071-1130; 1072-1337; 1071-1280; 1084-1280; 1054-1435; 1090-1267; 1089-1199; 1101-1313; 1078-1576; 1077-1358; 1096-1471; 1131-1486; 1139-1435; 1137-1375; 1138-1343; 1140-1390; 1142-1435; 1206-1570; 1181-1388; 1225-1470; 1206-1480;
1210-1476, 1239-1475; 1249-1409, 1228-1479, 1230-1476 1238-1421 1247-1465, 1286-1661
1316-1598, 1319-1671 1306-1542, 1310-1546, 1346-1621 1356-1571 1371-1607, 1380-1690,
1378-1624; 1416-1646; 1464-1784; 1545-1790 1550-1784; 1.562-1813; 1563-1832; 1606-1859.
1619-1834, 1600-1846, 1601-1980, 1603-1854; 1612-1839; 1607-1837. 1619-1845; 1620-1842,
1624-1837, 1624-1844; 1625-1845, 1626-1844, 1634-1842, 1605-1803 1674-1866, 1725-1884,
1731-2399, 1788-1851 1768-1991 2175-2723, 2332-2637 2335-2558, 2344-241 2354-2566;
2381-2562, 2381-2637 2400-2584, 2449-2739; 2461-2758, 2474-2733 2474-2727. 2484-2842,
2510-2978; 2516-2980; 2533-2766; 2533-2822; 2541-2862; 2542-2949. 2588-2711 2598-2842,
2598-2841 2598-2839, 2601-2859, 2605-2848, 2611-2857, 2611-2858, 2618-2853, 2625-2916,
2629-2873, 2650-2935, 2666-3129, 2674-2849, 2668-2954, 2674-2937. 2678-2916; 2676-2996.
2689-3054, 2695-2958, 2702-2837, 2706-3030; 2706-2866; 2722-2992, 2727-3010; 2733-2984,
2727-3036; 2755-3066, 2893-3309, 2997-3415, 3034-3347; 3049-3310,
TABLE 5
SEQ JD NO: /Component ID Fragments
150, cont. 3050-3356; 3065-3385; 3128-3347; 3128-3234; 3130-3347; 3131-3352; 3133-3352; 3134-3266;
3135-3347; 3135-3249; 3136-3347; 3137-3354; 3144-3353; 3147-3350; 3152-3347; 3154-3354; 3156-3239; 3163-3352; 3163-3354; 3170-3350; 3188-3354; 3198-3356; 3205-3347; 3207-3354; 3214-3347; 3219-3352; 3219-3354; 3220-3347; 3236-3462; 3238-3354; 3239-3354; 3240-3354; 3245-3364; 3249-3347; 3255-3354; 3259-3354; 3260-3347; 3261-3347; 3263-3355; 3286-3347; 3295-3347; 1900-2165; 47-192; 2900-3053; 2893-3053; 1690-1936; 2250-2496; 2880-2989; 2880- 3053; 1775-2021; 2183-2428; 1709-1957; 2195-2444; 2880-3015; 2166-2407; 2241-2489; 1899- 2146; 238-352; 3198-3256; 2247-2497; 1865-2114; 2196-2445; 2244-2493; 1882-2130; 549-709; 3243-3332; 1979-2227; 2169-2418; 2713-2813; 1816-2066; 3198-3312; 47-147; 2200-2451; 3243- 3338; 2238-2491; 2961-3053; 2929-3053; 2887-3053; 2206-2456; 2041-2290; 1839-2090; 1665- 1917; 1801-2052; 1919-2165; 2699-2848; 1734-1986; 1647-1878; 1871-2122; 2904-3053; 1683- 1937; 2890-3053; 2719-2848; 2684-2840; 2880-3029; 3243-3336;
1739-1993; 52-147; 3243-3337; 1666-1923; 1684-1941; 2894-3053; 1688-1944; 214-359; 2080- 2335; 2124-2379; 3243-3347; 2892-3053; 1885-2138; 1897-2154; 2158-2410; 3243-3348; 2196- 2454; 1763-2021; 2144-2402; 3243-3312; 2166-2418; 2973-3053; 2154-2411; 1897-2153; 2207- J ON 2456; 2888-3053; 2121-2377; 2238-2495; 1667-1924; 2075-2335; 1863-2122; 1807-2066; 1948- 2205; 3198-3284; 50-147; 1806-2064; 2880-3052; 1703-1962; 1815-2073; 2169-2428; 1753-2012; 1802-2060; 2102-2364; 1807-2065; 2041-2296; 2692-2842; 1819-2080; 1896-2156; 1771-2032; 2200-2460; 2722-2848; 2700-2842; 2180-2441; 2179-2441; 214-434; 2140-2402; 1704-1967; 2952-3053; 1762-2025; 1828-2090; 2945-3053; 3198-3350; 3243-3315; 2916-3053; 1752-2017; 2196-2460; 1647-1905; 2963-3053; 1751-2015; 2880-3048; 3360-3453; 1800-2063; 2048-2311; 1949-2211; 2993-3053; 2897-3053; 2930-3053; 2224-2491; 2727-2848; 1851-2116; 2915-3053; 1806-2073; 1699-1965; 2880-3002; 3360-3454; 3161-3258; 1996-2259; 1939-2207; 1914-2182; 1877-2144; 1940-2206; 3239-3350; 3243-3311; 2024-2290; 3003-3053;
2934-3053; 2917-3053; 1743-1954; 1680-1947; 2201-2468; 1828-2098; 2975-3053; 2145-2414; 2889-3053; 2880-3019; 1983-2251; 1688-1958; 2997-3053; 2021-2289; 3243-3302; 238-424; 2079-2346; 2895-3053; 2212-2484; 1647-1912; 1764-2036; 2169-2440; 3243-3345; 1781-2051; 2041-2306; 2049-2319; 2923-3053; 50-145; 2060-2325; 3230-3321; 841-1038; 1786-2061; 3243- 3329; 2169-2442; 2953-3053; 1759-2032; 2939-3053; 2202-2475; 1747-2020; 1816-2089; 1850- 2064; 2948-3053; 214-319; 2171-2445;
TABLE 5
SEQ ID NQi/Component ID Fragments
150, cont. 2200-2475; 2651-2848; 2734-2842; 3004-3053; 2674-2848; 238-382; 2135-2412; 2932-3053;
3243-3308; 2910-3053; 2653-2848; 2643-2838; 1925-2202; 1657-1937; 238-327; 2773-2842;
2896-3053; 1709-1989; 2902-3053; 1767-2041; 1852-2130; 1664-1944; 1727-2009; 1952-2233;
2176-2459; 1828-2105; 1904-2185; 1764-2041; 2202-2485; 1858-2140; 1730-2012; 2203-2485;
1991-2271; 2121-2406; 1765-2050; 2250-2459; 214-329; 2132-2416; 47-159; 3243-3322; 1690- 1975; 1727-2012; 214-327; 1889-2173; 1828-2111; 2198-2482; 2215-2498; 2695-2834; 1678- 1967; 214-332; 241-332; 1978-2265; 214-330; 2201-2489;
1689-1976; 238-510; 238-328; 2724-2842; 2971-3053; 2169-2456; 1647-1928; 1989-2280; 1689- 1979; 2664-2848; 1816-2102; 1777-2052; 238-330; 1650-1946; 813-1107; 1705-2000; 2124-2419; 2198-2493; 1788-2079; 1809-2102; 214-337; 214-336; 1875-2165; 1802-2094; 3297-3459; 2183- 2476; 2022-2321; 1909-2208; 214-344; 1734-2036; 175-383; 2918-3053; 1704-2004; 1769-2067; 2627-2847; 2081-2378; 1764-2061; 3243-3333; 214-350; 214-347; 214-341; 173-345; 1708-2009; 214-360; 807-1124; 238-363; 214-390; 1671-1984; 214-355; 817-1119; 1256-1565; 2013-2273; 214-356; 1974-2285; 1684-1994; 2631-2848; 214-353; 214-345; 3243-3321; 1865-2188; 214-365; J 1911-2232; 238-356; 1873-2193; 2725-2842; 214-382; 1685-2024; 2041-2366; 3230-3329; 214- -0 370; 1739-2067; 1918-2231; 214-393; 3360-3444; 175-510; 238-390; 214-388; 2113-2443; 214- 391; 214-389; 238-397; 214-384; 3359-3459; 239-510; 214-510; 214-386; 175-509; 214-378; 214- 437; 214-399; 2882-3053; 1998-2358; 1049-1405; 238-399; 238-461; 214-426; 1942-2322; 214- 423; 175-444; 214-418; 1859-2230; 1999-2365; 214-439; 250-450; 214-441;
214-438; 214-436; 3243-3346; 238-435; 214-428; 214-451; 214-430; 2968-3053; 214-454; 214- 450; 214-453; 214-448; 214-449; 214-444; 175-447; 238-451; 802-1207; 214-447; 214-446; 1647- 2041; 214-445; 238-495; 214-443; 239-444; 214-463; 815-1238; 239-467; 214-462; 238-459; 2697- 2842; 214-459; 214-458; 214-456; 214-455; 2745-2842; 214-491; 3243-3344; 214-482; 214-472; 1916-2389; 1851-2320; 214-507; 214-503; 214-501; 214-499; 214-497; 214-500; 238-506; 818- 1269; 1714-2165; 245-507; 214-492;
3161-3328; 813-1322; 1809-2315; 1889-2398; 422-510; 1927-2421; 804-1291; 2701-2842; 1709-
2192; 2642-2745; 45-147; 1731-2273; 1800-2346; 1683-2227; 1650-2191; 717-1057; 1774-2308;
1898-2415; 1762-2278; 1735-2256; 801-1319; 243-510; 1797-2368; 813-1382; 801-1374; 1748-
2316; 1647-2189; 814-1361; 1703-2257; 1729-2278; 1688-2237; 811-1366; 1715-2338; 804-1420;
1752-2343; 811-1406; 935-1528; 898-1489; 1717-2375; 817-1458; 1729-2343; 1647-2414; 801-
1512; 3228-3342; 1850-2093; 1962-2195; 1690-1924; 2612-2846; 2166-2397; 2909-3053; 1851-
2048; 1737-1973; 2629-2838; 2166-2395; 2168-2397;
TABLE 5
SEQ ID NOJComponent ID Fragments
150, cont. 238-365; 1656-1892; 2612-2848; 2154-2389; 1933-2165; 1653-1889; 1764-2000; 2880-2979; 3198-3257; 2880-3021; 1779-2015; 3243-3314; 1952-2187; 1899-2135; 1802-2040; 2899-3053 2224-2462; 2610-2848; 1717-1955; 2967-3053; 1811-2041; 2167-2405; 2125-2363; 2247-2486, 1777-2016; 1732-1971; 1935-2165; 2715-2848; 2925-3053; 2935-3053; 2015-2253; 1962-2200, 2234-2474; 1688-1928; 1727-1968; 1681-1922; 1870-2110; 2175-2417; 3198-3271; 2102-2342, 2168-2410; 459-510; 2166-2400; 2972-3053; 2880-3031; 1693-1935; 1870-2111; 2092-2334; 2230-2473; 1661-1904; 1807-2051; 2080-2323; 2873-3032; 2733-2848; 2226-2470; 2128-2372, 1647-1892; 2721-2848; 2946-3053; 2224-2459; 2718-2848; 3247-3341; 1647-1874; 1695-1928, 1891-2122; 3161-3215; 1687-1920; 1985-2214; 1647-1880; 2731-2848; 1749-1981; 2266-2498, 3243-3318; 1709-1941; 3243-3310; 2891-3053; 2098-2328; 2722-2842; 2936-3053; 2176-2408, 2125-2354; 2201-2431; 2898-3053; 2195-2424; 3161-3255; 2245-2471; 2620-2848; 2880-2995;
2725-2848; 1795-2023; 1767-1992; 2062-2289; 1766-1968; 1750-1977;
2620-2847; 2175-2402; 3243-3326; 2729-2842; 2880-2991; 2882-2995; 2933-3053; 2145-2370; 238-370; 1862-2087; 2248-2474; 1850-2019; 2647-2838; 1921-2098; 1781-2005; 1793-2017; 2154-2377; 2880-2992; 1771-1991; 3000-3053; 2730-2848; 1727-1950; 2056-2277; 2633-2842;
OJ
I— -
00 1715-1906; 2066-2287; 2062-2240; 2096-2316; 2155-2375; 1657-1878; 3001-3053; 2247-2467; 2628-2848; 2610-2829; 2134-2352; 53-147; 3198-3342; 2642-2839; 2756-2865; 2249-2465; 2641-
2848; 1762-1978; 2111-2325; 2726-2842 2754-2842; 2613-2828; 1647-1856; 3243-3324; 1710- 1925; 1665-1880; 2912-3053; 3243-3351; 1688-1902; 2617-2830, 2248-2461; 1663-1876; 2273- 2486; 2880-3012; 2224-2429; 2241-2453; 1704-19162637-2847, 1924-2134; 1865-2076; 1762- 1973; 2092-2302; 2130-2338; 2174-2383, 2281-2491; 1851-2058.3198-3267; 1786-1993; 2166- 2369; 2283-2490; 1902-2108; 1709-1915, 1744-1950; 1674-1873, 2183-2389; 2117-2320; 1990- 2193; 2634-2833; 1788-1993; 2210-2414; 2238-2442; 1803-2008 1875-2078; 2922-3053; 1786-
1992; 2061-2264; 1729-1932; 1810-2013; 2645-2848; 2610-2812
TABLE 5
SEQ ID NQ-./Component ID Fragments
150, cont. 1710-1913; 2237-2439; 2941-3053; 2647-2848; 1781-1982; 2117-2317; 2624-2825; 47-158; 2183- 2384; 1737-1937; 50-152; 2065-2263; 2237-2437; 2674-2834; 2650-2842; 2654-2848; 2651-2849,
52-133; 2650-2848; 238-380; 1812-2010; 2651-2842; 2661-2842; 1771-1970; 2650-2847; 214- 363; 2192-2388; 1907-2102; 214-331; 238-383; 1727-1916; 238-333; 2255-2447; 2006--2195; 2184-2375; 2658-2848; 1776-1965; 2615-2804; 1857-2041; 1684-1872; 2882-2979; 2183-2370; 2875-2989; 238-369; 63-147; 1659-1846; 2168-2386; 50-101; 2666-2848; 2615-2797; 2255-2436; 2669-2848; 1795-1974; 2095-2272; 1912-2089; 238-368; 1811-1988; 2613-2786; 2647-2822; 2142-2316; 2262-2435; 2949-3053; 2675-2848; 2301-2474; 1979-2151; 3360-3442; 2071-2241; 3238-3348; 2316-2487; 1659-1824; 3198-3321; 2671-2841; 2684-2846; 2760-2848; 1861-2022; 2688-2848; 3188-3347; 2690-2842; 47-167; 1664-1822; 2657-2814; 1828-1985; 3243-3352; 2202- 2358; 2966-3053; 1789-1946; 2693-2848; 2880-3032; 47-196; 2174-2327; 2697-2848; 2124-2273; 3198-3347; 3292-3351; 214-322; 3243-3343; 3223-3350; 1667-1814; 47-152; 1907-2041;
1659-1803; 2705-2848; 3243-3350; 2902-3045; 2714-2842; 3199-3342; 2636-2776; 1863-1951; 2293-2432; 3226-3347; 47-109; 2713-2842; 2889-3025; 2297-2432; 3218-3348; 55-168; 2166- 2299; 50-169; 50-170; 2344-2449; 2720-2848; 3219-3347; 50-171; 3240-3341; 52-152; 3222-
OJ
H-- VO 3347; 2166-2281; 1769-1892; 47-165; 2712-2833; 3225-3347; 55-163; 2321-2441; 1900-2019; 47- 160; 3229-3347; 2409-2481; 194-308; 47-156; 1881-1994; 3236-3347; 3238-3347; 1672-1781; 2326-2435; 2742-2844; 3240-3347; 1783-1891; 3241-3347; 3241-3343; 2375-2480; 3248-3347; 3247-3347; 3234-3333; 2743-2842; 2742-2839; 52-139; 2756-2842; 2281-2371; 2384-2480; 47- 139; 3255-3347; 47-133; 3257-3347; 2901-2989; 1794-1882; 2077-2161; 3260-3346; 3265-3347; 2766-2848; 47-119; 3271-3347; 3273-3347; 3276-3347; 1955-2020; 3280,3347; 1818-1885; 1822- 1889; 2906-2973; 784-834; 3282-3347; 3289-3347; 3284-3342; 3287-3347; 50-103; 50-104; 2787- 2839; 2790-2848; 2888-2957; 2880-2946; 2787-2842; 2880-2966; 2064-2143;
TABLE 5
SEQ ID NOJComponent ID Fragments
150, cont. 2372-2462. 2880-2974, 2766-2834, 291 3014 2747-2838, 2713 ■2841 2889-3020, 2609-2766; 2650-2803, 3198-3291 2687-2848, 2084 2244, 1799-1960, 1987 •2150, 3198-3343, 1976-2161 2309-2496; 1856-2041 2199-2389, 2738- •2848, 1669-1821 2631 •2832, 1865-2067 2623-2833 2202-2419, 1763-1986, 2156-2379; 1824 ■2041 3238-3342; 2186 ■2415, 2198-2427, 2179-2413 1781-2016, 2063-2297 2124-2358, 2255 ■2490, 1742-1978, 2098 •2334, 1767-2006, 2126-2366, 1734-1974. 1882-2122, 1964-2203, 1939- •2180 1736-1978, 1809 •2050, 2084-2326; 1727-1967, 2252-2496. 1804-2050, 2125-2373.1727 ■1977, 1762-2013, 1688- ■1939, 1885-2135; 1664-1916; 1817-2019; 1688-1941 1678-1932; 1857 •2110, 1647-1894 1654 -1914; 2236-2497, 1850-2113, 1707-1972, 1844-2108 1793-2060; 2655 ■2848 1737-2014 2175 -2446; 2652-2848, 2199-2483, 1801-2085; 2054-2341 2166-2460; 2882 -2986; 2195-2492, 2181 -2478; 1765-2142; 1647-2035; 1714-2240, 1657-2140, 1752-2363; 1765 -2353. 1750-2317 1647 -2202; 900-1466; 1 647-2186; 1703-2238, 1709-2086
151/LG:405272.4:2002JAN18 1-320; 1-314; 1-305; 49-186; 71-784; 203-293; 240-629; 240-401; 281-512; 292-508; 308-633; 476 656; 504-656; 531-966; 589-1194; 599-796; 578-656; 705-1144; 800-1271; 799-1270; 822-1280;
OJ 855-1076; 862-1004; 867-1076; 895-1052; 898-1108; 911-1173; 1072-1276; 1112-1265; 1112- t o 1267; 1112-1279; 1125-1273; 1125-1206; 1143-1273
152/LG:247382.7:2002JAN18 1-210; 9-635; 35-384; 71-448; 83-580; 98-597; 137-308; 169-261; 394-1010; 409-755; 409-604; 444-1065; 448-727; 528-973; 577-1169; 597-835; 615-846; 615-882; 615-898; 617-878; 632-913; 632-919; 687-1034; 752-1474; 752-1360; 752-1010; 876-1524; 951-1496; 1011-1238; 1057-1729; 1190-1453; 1222-1438; 1222-1706; 1236-1762; 1261-1905; 1281-1545; 1439-2004; 1689-2135; 1689-2082; 1689-1951; 1698-1933; 1710-2039; 1734-2340; 1751-2303; 1755-1996; 1770-2380; 1828-2318; 1828-2074; 1830-2076; 1855-2099; 1869-2042; 1904-2173; 1904-2164; 1914-2100; 1953-2203; 2003-2498; 2003-2255; 2003-2263; 2019-2229; 2020-2155; 2043-2771; 2046-2121; 2061-2364; 2072-2293; 2092-2364; 2141-2358; 2172-2715; 2196-2498; 2228-2485; 2238-2535; 2320-2482; 2349-2454; 2373-2886; 2384-2699; 2392-2715; 2418-2679; 2464-2732; 2474-3225; 2492-2780; 2519-2795; 2645-3175; 2647-2937; 2682-2983; 2683-2900; 2910-2987; 2945-3160;
TABLE 5
SEQ ID NOJComponent ID Fragments
153/LG:7763403.34:2002JAN18 1-274; 106-239; 107-404; 179-439; 253-400; 262-420; 264-477; 274-484; 278-843; 293-514; 540-
962; 731-949; 735-943; 736-958; 741-954; 743-954; 747-948; 754-943; 775-948; 790-974; 828- 903; 836-948; 863-948; 879-943; 606-852; 601-850; 695-949; 693-948; 684-948; 679-944; 680- 948; 683-949; 679-948; 616-887; 608-890; 663-948; 662-944; 661-941 ; 550-843; 656-948; 635- 929; 553-855; 647-947; 567-873; 633-948; 569-883; 634-948; 632-948; 61 1 -947; 640-948; 601- 943; 607-948; 582-944; 582-947; 586-944; 577-926; 587-943; 578-948; 575-948; 746-931 ; 550- 948; 555-943; 574-949; 548-948; 715-888; 713-948; 707-948; 715-948; 612-844; 718-948; 726- 949; 693-912; 736-952; 740-948; 682-886; 741-943; 758-943; 679-862; 753-928; 772-948; 776- 949; 778-948; 775-943; 776-943; 784-948; 779-941; 787-944; 266-422; 784-938; 687-842; 795- 949; 783-929; 801-944; 788-923; 825-949; 827-949; 828-949; 822-948; 824-948; 833-948; 841- 945
154/LG:258352.1 :2002JAN18 1 -3931 ; 1434-3854; 1479-1632; 1499-3854; 1507-2107; 2621-3330; 2764-3198; 2776-3426; 3005-
3442; 3373-3851 ; 3448-3858; 3533-3851
155/LG: 109671 1.3:2002JAN18 1-387; 262-526
156/LG7761740.1 :2002JAN18 1-245; 16-235; 193-463; 336-759; 405-759; 618-696 157/LG: 1382987.89:2002JAN18 1-82; 1-603; 58-591 ; 66-467; 75-715; 96-635; 1 15-734; 160-590; 163-796; 192-613; 206-608; 206-
605; 206-484; 214-696; 249-488; 249-646; 286-912; 286-980; 288-556; 333-869; 424-1038; 565- 1090; 626-1068; 626-1 189; 629-1038; 637-1240; 732-1339; 732-1329; 801-1292; 817-1259; 916- 1373; 1076-1646; 1238-1646
158/LG:444673.50:2002JAN18 1-520; 1-248; 4-78; 359-414; 361-585; 363-599; 362-562; 375-541; 375-632; 377-728; 401-647;
392-618; 408-649; 408-660; 394-651 ; 41 1-675; 412-685; 413-1001 ; 414-623; 415-656; 415-737; 389-964; 403-676; 404-655; 419-628; 420-474; 406-669; 406-587; 41 1-891; 406-501; 421-644; 422-645; 408-644; 410-704; 410-679; 410-656; 410-645; 425-71 1; 425-602; 412-694; 427-647; 413-695; 414-695; 414-637; 429-681; 400-1038; 430-709; 430-641; 432-675; 418-622; 419-695; 419-678; 420-718; 420-712; 420-688; 422-618; 422-718; 422-705; 422-805; 422-682; 422-676; 422-674; 422-665; 422-600; 422-597; 410-887; 424-730; 439-1031; 413-976; 427-652; 428-617; 415-957; 415-1008; 429-714; 429-706; 429-705; 429-703; 429-693;
TABLE 5
SEQ ID NOJComponent ID Fragments
158, cont. 429-687; 429-677; 429-629; 429-582; 444-719; 444-691; 430-733; 430-685; 431-703; 433-710; 433-701; 433-729; 433-593; 434-692; 434-688; 434-676; 422-1036; 420-497; 422-1035; 436-708; 435-665; 437-732; 424-698; 438-715; 438-680; 438-670; 438-664; 425-974; 439-727; 438-583; 426-792; 440-713; 441-717; 429-834; 429-754; 429-1031; 429-966; 429-994; 430-853; 430-797; 444-636; 430-1037; 429-996; 442-893; 444-1023; 432-741; 445-664; 432-1030; 447-707; 446-619; 446-616; 461-665; 447-697; 448-713; 448-708; 448-740; 449-719; 438-894; 464-710; 450-691; 451-695; 450-638; 436-996; 451-694; 451-697; 451-672; 452-640; 452-591; 438-979; 452-689; 453-617; 453-689; 454-726; 454-698; 455-700; 455-684; 456-696; 455-659; 456-727; 444-542; 470-999; 461-782; 447-738; 447-737; 460-620; 461-723; 461-755; 462-666; 461-787; 447-1037;
449-973; 462-731; 462-697; 462-672; 449-1037; 465-689; 466-704;
453-912; 467-730; 467-710; 467-678; 467-848; 468-668; 455-963; 469-732; 469-738; 469-656; 469-613; 457-1003; 471-727; 472-741; 472-733; 472-598; 460-887; 471-1027; 473-734; 476-740; 477-661; 482-1037; 479-658; 467-1037; 467-615; 481-607; 469-983; 472-758; 460-989; 491-669; 470-732; 493-705; 482-735; 508-946; 497-673; 513-893; 490-739; 493-720; 508-661; 485-1019; 507-804; 547-1037; 547-997; 550-836; 552-799; 561-1035; 555-805; 589-825; 565-1050; 61 1-
OJ t to 1037; 588-1042; 628-1041; 641-1042; 642-1037; 644-908; 645-1040; 647-1044; 647-1037; 649- 1037; 651-1040; 652-1035; 653-782; 654-1037; 654-894; 654-1024; 656-1035; 666-1038; 666- 997; 682-1035; 669-899; 669-876; 673-1034; 701-1002; 677-1042; 680-904; 683-1037; 685-1042; 689-1050; 691-1037; 697-1037; 701-890; 706-999; 706-964; 717-904; 722-974; 723-906; 726- 1030; 732-1013; 743-1000; 743-1040; 747-1043; 750-1037; 758-1036; 768-1026; 778-1041; 779- 1037; 784-1037; 784-1028; 785-1037; 815-934; 796-1038; 806-1039; 811-907; 812-1037; 814- 1039; 839-1 125; 986-1039
159/LG:7767853.1 :2002JAN18 1-632; 1-607
160/LG:1375802.70:2002JAN18 1-326; 1 1-133; 32-283; 52-288; 134-645; 234-740; 251-476; 257-810; 289-454; 336-400; 356-707;
437-1001; 394-656; 360-903; 420-578; 425-570; 425-615; 412-525; 412-578; 426-71 1; 418-557; 425-510; 425-513; 425-516; 425-524; 425-531; 425-533; 425-784; 425-548; 425-501; 425-693; 425-508; 425-485; 425-560; 425-558; 425-530; 425-51 1 ; 425-617; 425-521 ; 425-563; 425-502; 439-721; 440-721; 436-673; 455-647; 445-714; 466-706; 481-832; 488-656; 489-763; 490-705; 499-761; 499-760; 500-759; 517-915; 501-765; 501-730; 502-771; 505-744; 505-756; 509-757; 510-756; 510-744; 525-720; 509-714; 514-754; 515-757; 517-781; 518-749; 520-908; 519-768; 519-669; 519-738; 520-748; 529-1069; 520-744; 523-756; 523-680;
TABLE 5
SEQ ID NOJComponent ID Fragments
160, cont. 525-788; 525-767 525-910, 527-748, 527-781; 533-744; 531-711; 553-707; 537-972; 533-926; 543-787; 489-704; 547-759; 547-787, 549-749; 628-913; 499-726; 555-897; 556-794; 558-833, 576-762; 510-667, 563-913, 561-848 568-764; 576-843; 653-749; 622-890; 658-929; 616-909, 582-640; 624-973, 538-657, 539-738, 539-787; 539-819; 539-667; 541-805; 540-825; 541-822, 540-788; 542-833, 541-783, 543-884; 587-872; 588-863; 544-859; 630-744; 544-821; 678-914; 631-898; 546-800; 546-798 546-782; 546-806; 546-815; 632-861; 546-821; 546-805; 546-813; 547-725; 547-742, 547-653, 548-802, 548-827; 596-829; 549-785; 549-801; 549-783; 549-798, 595-802; 549-819, 594-765, 551-819, 552-653; 550-803; 551-775;
550-805; 550-633; 550-817; 550-784; 556-710; 551-660; 551-788; 603-709; 552-794; 552-666, 552-681; 558-698; 558-708; 552-729; 552-741; 552-765; 552-805; 552-725; 552-728; 552-804, 552-764; 552-766; 552-770; 552-771; 552-776; 552-782; 552-784; 552-792; 552-803; 552-786; 552-791; 552-797; 552-809; 552-788; 552-789; 554-833; 552-814; 553-817; 552-818; 552-816, 552-806; 552-820; 552-801; 552-821; 552-822; 552-824; 552-825; 552-958; 553-816; 554-802,
554-813; 554-899; 554-812; 553-837; 554-796;
554-815; 554-829; 8-78; 553-809; 554-837; 555-997; 554-842; 554-717; 554-784; 610-705; 554-
OJ to
OJ 774; 554-807; 554-823; 553-724; 554-805; 554-873; 554-770; 559-708; 553-786; 553-671; 553- 793; 559-791; 555-829; 554-867; 554-810; 555-813; 555-817; 554-801; 554-800; 555-799; 554- 81 1; 555-761; 641-929; 555-81 1; 555-802; 555-804; 640-921; 555-825; 687-915; 555-796; 555- 807; 555-820; 555-810; 561-833; 555-786; 554-816; 554-817; 554-809; 555-777; 555-765; 554- 814; 554-777; 599-817; 560-1070; 555-805; 555-795; 554-804; 554-794; 554-789; 555-793; 555-
788; 555-815; 599-786; 599-843; 555-784; 553-804;
554-786; 555-854; 554-785; 555-801; 554-728; 556-852; 556-889; 556-798; 556-828; 555-814; 556-821; 556-802; 555-809; 556-831; 555-780; 600-840; 555-812; 555-826; 556-847; 10-80; 556- 819; 556-717; 555-740; 556-724; 556-785; 600-815; 556-805; 555-798; 556-793; 555-762; 556- 873; 556-767; 555-791; 556-792; 555-766; 555-822; 555-823; 556-822; 555-821; 555-792; 556- 795; 556-773; 556-774; 556-789; 600-851; 556-890; 555-667; 556-833; 555-806; 556-826; 556- 817; 561-703; 555-778; 556-787; 555-785; 556-809; 600-770; 556-776; 557-787; 562-705; 557- 832; 556-803; 558-921; 557-823; 557-825; 556-766;
TABLE 5
SEQ ID NOJComponent ID Fragments
160, cont. 556-804; 562-784; 601-878; 602-873; 631-744; 562-695; 557-813; 644-825; 556-739; 557-81 1 557-798; 557-831; 556-816; 557-835; 556-800; 556-81 1; 642-806; 606-826; 556-812; 557-716 556-751; 556-780; 557-820; 556-820; 556-786; 556-788; 556-813; 558-813; 557-812; 558-815; 558-814; 557-786; 557-817; 558-892; 558-820; 557-766; 558-828; 602-878; 557-745; 558-826; 558-801; 557-659; 558-788; 558-796; 557-729; 557-790; 558-854; 558-819; 558-794; 558-824; 563-706 558-718; 557-794; 558-803; 558-804;
558-858; 558-806; 558-808; 558-791; 558-818; 558-799; 557-815; 602-859; 603-866; 558-793;
558-777; 558-807; 603-860; 559-826; 558-653; 630-723; 638-695; 559-985; 559-729; 604-862;
559-841; 559-786; 559-817; 559-838; 558-764; 559-820; 559-831; 559-800; 559-792; 559-802;
559-666; 559-768; 559-821; 559-822; 559-824; 561-761; 560-802; 607-898; 560-810; 611-784;
561-804; 560-836; 562-802; 561-793; 561-821; 561-803; 562-783; 561-666; 562-823; 561-812;
562-821; 562-793; 607-1049; 562-826; 562-789; 562-819; 559-833; 562-808; 563-824; 563-1024;
562-804; 562-878; 563-832; 562-726; 562-718; 564-837;
562-825; 564-736; 564-1067; 608-858; 608-864; 563-828; 563-827; 563-882; 562-801; 563-823; 568-1115; 563-819; 563-817; 563-816; 564-829; 569-737; 564-904; 565-820; 565-831; 565-840;
OJ to 565-839; 565-821; 611-886 565-877; 565-834; 565-824; 610-887; 567-871; 566-822; 566-816. 567-820; 568-910; 567-832567-851; 613-895; 566-879; 567-825; 567-824; 568-838; 568-832, 568-835; 569-830; 568-819; 569-835; 570-823; 570-835; 570-834; 570-831; 570-826; 570-815 570-822; 571-833; 571-825.571-830; 572-821
571-815; 573-820; 573-838; 574-807; 617-871; 573-826; 573-827; 574-822; 619-876; 573-825; 574-818; 577-824; 635-1089; 579-942; 582-732; 633-915; 585-821; 588-875; 602-741; 675-929; 590-843; 592-842; 594-847; 649-1093; 643-913; 654-886; 669-949; 603-825; 651-903; 604-823; 604-821; 604-825; 604-831; 655-891; 661-903; 666-895; 666-898; 681-924; 678-873; 680-960; 698-816; 682-928; 687-914; 687-959; 704-800; 688-947; 688-943; 693-915; 693-929; 693-967; 694-962; 697-950; 699-876; 703-928; 703-934; 706-972; 705-941; 707-950; 711-911; 707-938; 712-919; 709-924; 714-913; 712-866; 751-978; 716-967; 716-927; 716-912; 720-962; 722-959; 726-914; 724-950; 728-909; 724-963; 728-911; 730-915; 733-915; 734-911; 750-918; 773-910;

807-1059; 786-924; 789-920; 762-1000; 796-918; 811-915; 823-1521; 824-918; 826-924; 837-922; 840-919; 849-941; 977-1522; 979-1119; 1002-1123; 1003-1124; 254-399; 425-671; 253-399; 10- 115; 11-257; 143-392; 278-399; 135-387; 128-389; 117-377; 119-382; 124-392; 123-397; 30-308; 103-390; 426-676; 21-325; 67-379; 558-711; 158-398; 265-376; 7-242; 144-380; 103-340;
TABLE 5
SEQ ID NO.'/Component ID Fragments
160, cont. 125-278; 1-243; 426-628; 425-635; 9-242; 8-241; 11-242; 10-241; 9-240; 12-242; 8-238; 11-240; 19-152; 1 1-241; 1-231; 21-241; 15-241; 9-237; 19-241; 13-240; 10-237; 8-235; 1 1-238; 24-237; 10- 236; 426-611; 7-233; 26-241; 426-621; 18-244; 11 -236; 21-240; 16-241; 10-234; 21-166; 16-230; 13-237; 12-235; 15-238; 21-234; 10-225; 18-241; 1 1-233; 10-231; 19-240; 9-230; 16-237; 21-242; 15-235; 18-238; 280-399; 11 -231; 22-241; 10-229; 434-615; 18-237; 425-644; 1 17-336; 17-235; 10- 228; 17-233; 16-233; 13-230; 425-642; 5-222; 19-235; 14-230; 12-228; 23-238; 26-240; 7-220; 25- 239; 22-235; 1-212; 11-222; 9-220; 10-221; 25-236; 16-226; 8-218; 23-233; 1 1-220; 27-236; 21- 141; 10-218; 12-214; 22-226; 38-235; 425-627; 10-21 1; 1 1-212; 23-222; 21-222; 10-210; 10-209; 25-224; 10-207; 425-623; 302-399; 458-635; 23-220; 11-207; 19-215; 17-213; 17-135; 425-620; 305-399; 13-207; 12-206; 22-215; 9-179; 29-223; 21-214; 425-606; 1 1-203; 10-202; 44-235; 12- 203; 17-207; 425-586; 8-198; 10-199; 11-201; 1 1-200; 425-613; 10-198; 8-194; 26-142; 11-197; 21-
207; 425-610; 55-240; 1 1-196; 24-208; 29-213; 20-204; 13-197;
481-611; 12-194; 11-194; 11-193; 18-200; 10-191; 16-198; 23-199; 19-200; 1 1-192; 319-399; 12- 193; 10-190; 17-197; 13-192; 13-191; 19-195; 20-195; 10-184; 21-187; 195-369; 22-187; 8-180; 1 1- 183; 8-181; 13-184; 8-179; 23-194; 12-184; 480-631; 26-195; 19-187; 13-181; 15-182; 206-372; 21-
OJ to
Ux 186; 21-185; 425-589; 16-179; 10-173; 22-183; 15-175; 12-168; 10-166; 21-175; 25-99;.22-177; 13- 165; 78-230; 13-163; 23-173; 21-162; 21-165; 13-161; 21-157; 68-214; 25-169; 1 1 1-252; 21-160; 12-151; 17-151; 16-155; 73-203; 9-147; 47-183; 37-160; 167-302; 10-143; 22-156; 12-145; 73- 205; 296-396; 25-157; 425-554; 1 1-140; 1-123; 21-137; 14-137; 18-139; 30-145; 130-245; 12-127; 21-136; 25-137; 1 1-122; 1-108; 13-123; 22-108; 13-1 15; 13-121; 15-123; 15-122; 8-1 14; 19-123; 26-134; 15-121 ; 10-114; 17-122; 23-1 16; 4-108; 10-113; 10-112; 21 -122; 22-112; 25-125; 25-122; 37-132; 1 1-105; 14-108; 24-1 10; 18-1 10; 12-104; 25-1 14; 26-1 15; 14-88; 23-108; 505-586; 425- 504; 37-1 12; 43-121; 22-91; 425-490; 15-64; 18-71; 17-82; 25-104; 11-97; 19-108
161/LG:414732.1 :2002JAN18 1-714; 2-240; 61-610; 232-520; 250-674; 529-891; 653-995
TABLE 5
SEQ ID NOJComponent ID Fragments
162/LG:1328394.25:2002JAN18 1-716; 343-959; 503-728; 508-1058; 768-874; 771-1051; 775-1003; 786-1126; 806-1056; 825- 1295; 825-1013; 825-994; 876-1105; 1204-1848; 1315-1808; 1340-2120; 1399-1940; 1413-2032;
1412-1659, 1416-1705; 1422-2062; 1434-2116; 1434-2026; 1459-1630; 1548-2017; 1616-2294;
1646-1859; 1649-1924; 1652-2118; 1652-1888; 1677-1917; 1694-2313; 1718-2117; 1776-2040,
1783-1875, 1788-2306; 1789-2321; 1823-1987; 1835-2503, 1877-2043; 1889-1962; 1901-2112,
1916-2421 1916-2169; 1924-2102; 1951-2228; 1986-2233, 1988-2105; 2008-2667; 2094-2643,
2094-2173, 2118-2880; 2132-2721; 2124-2255; 2131-2421 2149-2792; 2150-2689; 2176-2412;
2184-2552, 2185-2430; 2206-2582; 2206-2462;
2206-2420; 2246-2415; 2265-2505; 2266-2734 2270-2741 2272- 2793; 2277-2896 2297-2886;
2296-2541 2296-2500, 2300-2813, 2301-2882 2301-2540, 2303-•2624; 2306-2781 2307-2575,
2310-2780, 2319-25662338-2728, 2324-2854 2324-25762324-■2574; 2332-2796; 2331-2573,
2363-2757, 2361-2639, 2371-2780, 2366-3069; 2372-2462, 2377-•2668; 2391-2796, 2395-2803,
2412-2896; 2421-2854; 2446-2904; 2451-2682 2456-2899, 2463-■2895; 2466-2803, 2467-2750,
2473-2855, 2475-2899, 2492-2899, 2501-2815, 2508-2884, 2510-■2896; 2524-2863.2547-2898
2556-2763.2560-2860, 2577-2900, 2579-2901 2581-27762639-■2896; 2657-2895, 2691-2852,
OJ to
ON 2734-2805; 2781-2881 2781-3150, 2797-2911
163/LG:336953.5:2002JAN18 1-187; 25-2577; 47-160; 69-2850; 241-906; 305- 659; 386-641; 500-1042; 696 855; 904-11 69; 923- 1128; 1019-1394; 1156-1510; 1240-1660; 1243- 1435; 1299-1752; 1330-1597 1460-1743; 1574- 1979; 1634-2231; 1647-1977; 1679-1868; 2087-2576; 2155-2738; 2194-2424; 2259-2630, 2304- 2531; 2327-2586; 2327-2577; 2332-2649; 2333-2558; 2400-2930. 2400-2603; 2448-3015; 2458- 2616; 2568-3021; 2594-3056; 2597-3013; 2637-2973; 2676-3013. 2702-3322, 2707-3161 2707- 2939; 2716-3018; 2723-3447; 2841-3056; 2848-3390; 2892-3446, 3039-3498 3058-3498, 3100- 3359; 3131-3340; 3161-3509; 3198-3405; 3240-3628; 3247-3488; 3250-3508, 3250-3500, 3250- 3498; 3250-3468; 3260-3485; 3273-3514; 3278-3729; 3296-3678; 3393-3939, 3397-4098, 3424- 3677; 3523-3988; 3531-3619; 3566-3942; 3569-■3758; 3611-4074, 3621-3835; 3691-4132. 3693- 3958; 3695-4450; 3712-4132; 3711-4077; 3726■4077; 3741-4086 3740-4011 3745-4117, 3743- 4031; 3761-4089; 3803-4079; 3806-3950; 3812-■4048; 3813-4132; 3821-4106 3842-3973, 3851- 4103; 3857-4045; 3923-4117; 3933-4104; 3965-■4116; 4026-4116; 4228-4806
164/LG:7697931.25:2002JAN18 1-829; 1-93; 1-686; 2-662; 146-674; 182-926; 204-877; 291-818; 278-717; 279-419; 307-774; 311-
930; 344-561; 363-550; 363-536; 380-667; 407-561; 408-561; 402-774; 432-561; 435-560; 427- 675; 430-677; 432-684; 433-680; 441-708; 460-700; 494-752; 494-763; 508-774; 589-766; 592- 725; 603-741; 608-768; 592-663; 619-729; 800-1029; 805-1055; 954-1052; 764-828
TABLE 5
SEQ ID NOJComponent ID Fragments
165/LG:300147.58:2002JAN 18 1-516; 1 -155; 226-658; 291 -441 ; 380-973; 418-689; 441 -1002; 445-696; 548-812; 555-792; 692-
951; 734-987; 780-1042; 788-1299; 788-101 1; 872-1 137; 919-1 163; 948-1 156; 949-1 183; 958-
1242; 968-1158; 1029-1132; 1094-1590; 1107-1343; 1159-1409; 1161-1555; 1176-1511; 1178-
1631; 1196-1632; 1208-1408; 1219-1632; 1243-1632; 1245-1327; 1258-1632; 1264-1632; 1267-
1632; 1351-1639; 1352-1632; 1353-1632; 1355-1632; 1426-1632; 1437-1632; 1530-1772; 1651-
1896; 1663-1931; 1678-1918; 1683-1926; 1867-2138; 1930-2254; 2126-2365; 1235-1484; 1328-
1578; 1426-1634; 1235-1335; 1362-1632; 1268-1573; 1313-1634; 1159-1443; 1290-1632; 1248-
1633; 1232-1632; 1233-1632; 1235-1632; 1159-1594; 1377-1618; 1514-1640; 1399-1619; 1266-
1470; 1377-1585; 1427-1634; 1413-1610; 1441-1633; 1418-1611; 1459-1627; 1055-1173; 1002-
1110
166/LG77631 15.9:2002JAN 18 1-1642; 1 -256; 1 -494; 1 -527; 51-406; 69-526; 83-607; 80-648; 85-352; 103-532; 99-767; 107-385;
107-387; 107-367; 107-356; 107-320; 132-243; 134-537; 136-354; 156-738; 151-412; 232-659; 280-843; 290-360; 365-903; 380-904; 412-1015; 459-1040; 473-931; 502-838; 509-1040; 567- 1034; 577-796; 574-1064; 583-1041; 534-1 175; 540-995; 562-1013; 565-968; 591-881; 631-898; 648-1017; 671-1017; 684-923; 779-977; 877-1 147; 907-1079; 865-1 137; 990-1428; 1015-1067;
OJ to -j 993-1402; 1 009-1428; 1 742-2133; 1 113-1885; 1 139-1618; 1 139-1615; 1148-1400; 1148-1528; 1157-1521; 1157-1612; 1157-1605; 1186-1636; 1217-1583; 1218-1480; 1246-1642; 1253-1495; 1842-2035; 1264-1464; 1379-1642; 1388-1642; 1388-1621;
1404-1946; 1415-1642; 1430-1651; 1446-1649; 1514-1984; 1591-2155; 1593-2026; 1649-2026; 1671-2063; 1663-2022; 1670-1896; 1670-1982; 1725-1946; 1729-1894; 1734-1986; 1767-1895; 1769-1983; 1784-2248; 1804-2020; 1822-2022; 1841-1962; 1848-2373; 1861-2022; 1898-2141; 1999-2447; 2013-2348; 2023-2274; 2026-2454; 2026-2246; 2065-2447; 2064-2374; 2064-2350; 2090-2716; 2092-2182; 2092-2150; 2157-2809; 2228-2824; 2241-2785; 2246-2823; 2267-2743; 2294-2788; 2298-2782; 2297-2582; 2315-2950; 2329-2648; 2329-2623; 2342-2825; 2363-2501; 2379-2644; 2380-2639; 2380-2627; 2385-2828; 2408-2651; 2412-2678; 2414-2660; 2446-2993; 2446-2759; 2448-2617; 2453-2699; 2459-2825; 2476-3015; 2540-2809; 2544-2813; 2569-3275; 2592-2843; 2583-2882; 2584-2823; 2644-2826;
TABLE 5
SEQ ID NOJComponent ID Fragments
166, cont. 2650-3166; 2658- 2960; 2685-3039; 2686-2844; 2726-3213; 2761-3360; 2785-3492, 2788-3257
2793-2942, 2796-3537 2798-3620; 2803-3064; 2873-3481 2966-3293, 2973-3195, 3018-3295,
3054-3223, 3068-■3303, 3126-36153126-3382, 3170-3281 3190-3622, 3190-3423 3204-3501
3216-3402, 3220-■3482, 3222-3410.3252-3507. 3263-3471 3312-3525, 3367-3604, 3387-3660,
3396-3631 3401-•3503, 3440-3798 3440-3693 3448-3998, 3449-4211 3452-4035 3484-4102,
3489-3654, 3511-4069, 3528-4024; 3572-3661 3580-3821 3588-3924; 3598-4102 3616-3891
3628-38363629-■3873, 3629-3858.3632-3859, 3632-4099; 3636-39243664-4033, 3671-4138,
3673-38973693-•3943; 3713-41433758-4027 3854-4091 3875-3982, 3879-4101 3880-4105;
3900-4073; 3980-•4131
167/LG:7693875.4:2002JAN18 1-424; 15-492; 432-3748; 441- 3748; 441-1659; 456-1319; 482-1144; 485-988; 485-929; 485-931; 491-1007; 494-1164; 513-998; 532-688; 602-1009; 634-1390; 745-1186; 953-1209; 964-1381; 967- 1207; 967-1084; 1061-1334; 1202-1459; 1218-1480; 1253-1542; 1256-1799; 1383-1940; 1486- 2052; 1536-1791; 1588-2051; 1677-2233; 1683-1936; 1690-1983; 1796-2085; 1796-2000; 1907- 2057; 2013-2270; 2014-2257; 2046-2608; 2068-2402; 2092-2554; 2092-2345; 2157-2392; 2260- 2521; 2277-2553; 2283-2553; 2311-2380; 2331-2624; 2358-2673; 2410-2658; 2414-2682; 2423-
OJ to oo 2663; 2524-2680; 2524-2715; 2552-2790; 2554-2859; 2566-2673; 2603-3099; 2603-3106; 2620- 2905; 2641-3105; 2656-2934; 2667-3005;
2676-3442; 2676-2898; 2678-2897; 2679- ■2899; 2681-2995; 2684-2966; 2681-2919; 2695-2958, 2706-2973; 2709-2821; 2716-3077; 2710-3198; 2714-2971; 2719-3206; 2718-2923; 2730-2874; 2786-3559; 2848-3090; 2903-3133; 2979-•3383; 3045-3291; 3084-3247; 3091-3257; 3094-3351 3234-3476; 3278-3569; 3334-3951; 3381-•3748; 3379-3748; 3411-3656; 3438-3748; 3465-3724; 3477-3754; 3484-3748; 3499-3753; 3502-•3703; 3521-3748; 3536-3748; 3549-3748; 3593-3748, 3607-3748; 3627-3748; 3632-3748; 3669-■3748; 3701-3993; 3743-4013; 3931-4008
168/LG:089516.22:2002JAN18 1-674,170-1011,176-750,798-1149
169/LG:336671.1 :2002JAN18 1-2638; 34-532; 34-267; 43-2638; 39-781 ; 417-940; 881-1342; 907-1378; 995-1388; 999-1238; 1013-1516; 1027-1634; 1178-1416; 1495- 1664; 1539-1780; 1671-1854; 1683-1996; 1754-2524; 1774-2537; 1832-2045; 1940-2483; 1945-■2261; 1945-2204; 2021-2532; 2013-2535; 2018-2597; 2019-2267; 2056-2311; 2129-2633; 2181- 2638; 2288-2638; 2412-2639; 2507-2760
TABLE 5
SEQ ID NOJComponent ID Fragments
170/LG:234504.1 1 :2002JAN 18 3-391; 3-87; 1-272; 3-1843; 187-765; 407-866; 432-618; 478-868; 518-766; 527-1 172; 527-1208 548-1839; 550-668; 576-1328; 598-845; 603-1123; 603-825; 609-778; 612-1 155; 617-1327; 649* 1 140; 684-933; 698-934; 700-938; 700-929; 700-908; 700-869; 700-846; 700-843; 700-839; 720* 969; 801-1392; 832-1533; 865-1384; 867-1354; 948-1797; 1001-1522; 1008-1419; 1023-1310; 1040-1794; 1062-1557; 1063-1553; 1058-1623; 1066-1333; 1069-1627; 1087-1812; 1078-1623 1077-1620; 1087-1375; 1090-1572; 1093-1426; 1 105-1605; 1 1 19-3177; 1 120-1323; 1 134-1596, 1 134-1383; 1 136-1464; 1 146-1810; 1 155-1467; 1 191-1803; 1201-1483; 1201-1481; 1203-1771 1230-1833; 1235-1591; 1235-1505; 1242-1545;
1264-1764; 1262-1580; 1267-1667; 1269-1528 1269-1842; 1272-1739, 1274-1523; 1275-1543, 1275-1541; 1276-1497; 1281-1573; 1289-1842; 1302-1882; 1300-1797 1300-1770; 1303-1482; 1306-1836; 1339-1627; 1343-1599; 1343-1592, 1344-1498; 1347-1610, 1350-1839; 1370-1831 1384-1842; 1385-1762; 1393-1842; 1395-1703, 1397-1844; 1397-1842; 1402-1835; 1404-1838, 1407-1842; 1407-1839; 1416-1775; 1419-1844; 1418-1798; 1421-1742.1421-1838; 1423-1775; 1425-1844; 1424-1846; 1426-1838; 1428-1709; 1428-1654; 1430-1842, 1431-1836; 1435-1842, 1437-1833; 1439-1833; 1438-1659; J
1444- 1841 1441-1839; 1442-1842, 1443- 1803; 1464 1830. 1446- 1838; 1451- 1791; 1459-1729; 1461- 1839; 1460-1833 1462-1840.1466-1842; 1467-1855; 1473-1843; 1479-1842; 1480-1808, 1484- 1751 1491-1842; 1492-1891 1493-1807; 1496-1842; 1497-1843; 1499-1839; 1500-1842; 1500- 1838, 1501-1843, 1502-1847 1503- 1832; 15051743.1505-1842; 1507-1845; 1514-1846, 1533- 1774, 1534-1839, 1560-1833, 1561- 1839; 1570-1847, 1600-1844; 1602-1767; 1605-1839, 1605 1837, 1606-1798 1607-1839; 1610-1839; 1641-1836, 1641-1845; 1651-1792; 1651-1842, 1678- 1821 1686-1849, 1695-1833, 1860-2180; 2014-2459, 2027-2498; 2030-2287; 2129-2525; 2291- 2525, 2376-2624, 2386-2788, 2517-3009; 2519-3077; 2523-•3185; 2542-2983; 2564-3162, 2711- ■3129, 2719-3177, 2719-3167, 2722-3177; 2750-■3181; 2795■3181; 2797-•3177; 2826-3182, 2830- 3180; 2873-3145
171/LG:1018931 ,3:2002JAN18 1-373; 1-439; 10-628; 456-1145; 716-1144
TABLE 5
SEQ ID NOJComponent ID ■ Fragments
172/LG:1377369.45:2002JAN18 1-531; 4-531; 5-482; 6-529; 14-500; 14-710; 14-518; 15-263; 18-576; 18-528; 21-121; 20-598; 20-
528; 20-548; 20-421; 21-439; 21-573; 21-533; 20-529; 21-465; 22-535; 23-562; 22-313; 23-528; 24
249; 24-548; 24-182; 25-576; 32-576; 25-517; 28-420; 30-586; 30-550; 32-454; 32-586; 34-639; 34-
560; 35-614; 42-631; 39-487; 40-439; 48-230; 44-496; 45-322; 64-529; 73-517; 74-669; 82-533; 85-
817; 95-352; 101 -872; 104-737; 104-531 ; 105-532; 106-537; 110-531 ; 112-673; 1 16-579; 121 -646;
121-532; 126-345; 129-402; 128-520; 132-537; 135-337; 136-531; 139-561; 144-534; 163-531;
173-670; 167-389; 175-537; 177-579; 178-421; 190-531; 222-532; 223-533; 244-535; 286-532;
288-600; 296-429; 302-536; 324-566; 324-532; 332-816; 334-532; 380-532; 381-573; 402-532; 402-495; 424-935; 471-532; 531-944; 532-1071; 537-747;
559-734; 563-1081; 583-1078; 584-844; 601-826; 613-1071; 617-874; 620-931; 619-888; 619-877;
622-900; 619-739; 627-878; 634-883; 634-851; 641-864; 661-966; 667-905; 669-1312; 669-1017;
674-939; 676-958; 688-942; 709-961; 71 1-955; 722-990; 725-1003; 728-921; 730-1071; 734-977;
744-901; 763-953; 777-967; 778-1045; 780-1050; 797-1080; 797-1066; 798-1056; 803-1058; 805-
1064; 812-1071; 820-1033; 821-1062; 864-1073; 867-1071; 880-1071; 888-1185; 898-1161; 898- ω 1077; 898-1071; 900-1163; 900-1071; 907-987; 911-1071; 917-1164; 924-1157; 924-1014; 928- g 1169; 939-1185; 940-1069; 945-1082; 945-1080; 947-1029; 954-1074; 986-1252; 1013-1237; 1016-1071; 1090-1404; 1090-1434; 1090-1342;
1090-1164; 1090-1155; 1090-1162; 1336-1456; 1408-2047; 1603-1875; 1629-2207; 1711-1964;
1711-1903; 1714-2261; 1718-2152; 1720-1816; 1744-2335; 1762-1839; 1806-2158; 1858-2277;
1930-2177; 2030-2441; 2105-2342; 2141-2441; 2331-2441; 2335-2441; 2363-2431; 20-268; 1862-
2109; 1753-2001; 20-271; 24-282; 27-285; 30-291; 24-287; 1863-2127; 1305-1567; 20-285; 30-
294; 1324-1583; 1980-2247; 1790-2060; 27-301; 1724-1997; 79-358; 1999-2269; 21-303; 20-301;
23-308; 21-310; 20-317; 30-325; 1878-2171; 1803-2102; 17-322; 1748-2094; 1755-2102; 1785-
2137; 1931-2166; 1376-1612; 673-906; 1323-1566; 115-344; 1823-2050; 2038-2265; 1717-1928;
1973-2176; 2022-2228; 1305-1509; 1821-2020; 1955-2151; 1805-2012; 1996-2179; 28-200; 1002-
1071; 2048-2217; 945-1071; 1762-1879; 1381-1483; 2062-2153; 1389-1454; 1442-1510; 14-73
TABLE 5
SEQ ID NOJComponent ID Fragments
173/LG:! 135404.113:2002JAN 18 1-366; 1-475; 10-129; 21-478; 47-305; 57-275; 59-281; 69-286; 90-387; 98-294; 131-653; 177-
387; 223-461; 250-382; 254-841; 316-625; 364-1055; 380-579; 380-577; 565-1 197; 569-812; 593- 891; 594-846; 606-1 174; 603-868; 608-785; 649-1224; 651-854; 655-1264; 666-1 178; 669-1328; 673-806; 676-869; 691-982; 695-965; 696-922; 697-955; 697-978; 705-1 138; 710-884; 718-1321; ' 714-892; 729-1310; 738-1008; 738-1 172; 742-947; 745-1012; 753-1351 ; 756-1 195; 751-1288; 763- 1 154; 767-1035; 769-1001; 773-1008; 774-1024; 766-999; 790-1096; 793-1098; 792-1057; 792- 1044; 796-1 147; 81 1-891; 812-1354; 805-1299; 820-1513; 822-1254; 825-1021; 822-1504; 827- 1040; 840-992; 861-1094; 890-1 1 18; 897-1 156; 898-1097;
902-1045; 904-1098; 904-1155; 904 1074904-1128; 904-1084; 904-1123; 909-1219; 908-1179; 913-1132; 913-1221; 902-1463; 905.-1467; 921-1183; 932-1164; 932-1601; 940-1022; 936-1548; 950-1225; 959-1168; 973-1499; 977-1483; 985-1253; 1015-1260; 1015-1399; 1021-1434; 1048- 1493; 1050-1357; 1055-1473; 1056-1286; 1067-1194; 1072-1269; 1076-1333; 1076-1245; 1085- 1769; 1097-1549; 1088-1323; 1088- 1760; 1094-1321; 1098-1713; 1098-1323; 1099-1323; 1101- 1327; 1121-1327; 1125-1309; 1131 1323; 1136-1323; 1137-1301; 1139-1843; 1169-1435; 1179- 1412; 1182-1584; 1200-1416; 12061510; 1208-1515; 1209-1474; 1221-1323; 1252-1540; 1233-
OJ OJ 1500; 1243-1498; 1248-1523; 12591479; 1259-1469; 1264-1556; 1276-1414; 1289-1524; 1312- 1829; 1315-1540; 1344-1559; 13471546; 1350-1534;
1350- 1489, 1350-1543, 1350-1518 1372-1487; 1397-2074; 1421-1659; 1434-1614; 1439-1516; 1449- 1651 1489-1688, 1489-1651 1555-2067; 1579-2297; 1581-22761600-2267; 1624-2111 1627- 2253, 1638-2264; 1641-1913, 1650-2392; 1665-2135; 1682-2328; 1682-1940; 1684-2437, 1691- 2311 1689-1924; 1691-1931 1694-1944; 1695-2115; 1706-1991; 1708-2093; 1707-1973, 1707- •1974, 1711-2169, 1718-2005; 1724-1951; 1727-1971; 1728-1984; 1731-1953; 1732-2322, 1738- 19801743-2040 1745-1941 1748-2006; 1753-1992; 1752-2164; 1761-2009; 1765-2304; 1768- ■2123, 1767-2002, 1773-2026, 1768-1966; 1774-2008; 1776-2458; 1783-2039; 1789-2331 1793- •2043 1793-2039; 1795-20391796-2036; 1799-2044; 1799-2036; 1800-2217; 1805-2036 1808- ■20361812-2389, 1816-20361827-2036; 1824-2338; 1828-2364; 1828-2060; 1828-2036, 1830- •2036, 1837-2130. 1840-2315 -1839-2221; 1843-2046; 1843-2268; 1844-2036; 1844-2120, 1844- ■2145.1844-2007, 1846-2036, 1845-2036; 1847-2036; 1849-1917; 1849-2036; 1852-2036, 1854- ■2036, 1857-2156, 1862-2036; 1863-2117; 1865-2036;
TABLE 5
SEQ ID NOJComponent ID Fragments
173, cont. 1868-2265. 1870-2324; 1869-2039, 1871-2145, 1876-2036; 1877-2265. 1880- 2153; 1881-2148; 1882-2159, 1881-2113, 1884-2123 1887-2009, 1887-2036 1891-2321 1891- •2148. 1891-2154, 1891-2036; 1893-2014; 1900-2172 1911-2120, 1914-2157 1916-21861916 ■2052, 1916-2036 1924-2220. 1924-2073, 1928-2036. 1933-2189, 1936-2200; J942-2171 1946- ■2211 1950-2541 1954-2220; 1954-2223, 1960-2203, 1966-2183, 1966-2161 1971-2204; 1984- ■2221 2060-2221 2060-2247 2060-2283, 2061-2327 2070-2306, 2077-2163, 2174-2649, 2207- ■2754; 2207-2753, 2210-2756. 2223-2631 2336-2593. 2356-2891 2365-2633 2451-2643, 2451- -2965, 2532-2980, 2558-2636; 2581-2889; 2676-2961 2692-2752 2742-2980; 2800-3069, 3000- •3330; 3000-3333; 3098-3333 1391-1635, 1746-1990; 1637-1885, 1531-1780, 1393-1635; 2196 -2421, 1539-1792, 1621-1873. 2161-2415. 2060-2219, 1828-2084; 1497-1753. 2451-2604, 1421- -1679; 1904-2036, 1416-1676. 1439-1698, 1390-1649, 1911-2036, 1642-1903. 1657-1907 2451- -2631; 1643-1908, 1472-1736; 1688-1952, 1832-2096; 2451-2639; 1872-2036; 1394-1660,
1395-1661; 1350-1511; 2071-2341; 1843-2113; 1635-1904; 2061-2324; 1670-1941; 2060-2335; 1426-1699; 1676-1946; 2158-2425; 2060-2197; 2060-2236; 76-376; 1490-1788; 1541-1848; 1824- 2129; 2060-2225; 1753-2050; 2278-2421; 1419-1732; 2080-2410; 1389-1711; 1691-2036; 2351-
OJ OJ to 2421; 2453-2693; 1855-2036; 1348-1708; 1647-1998; 1707-2038; 1350-1658; 1429-1843; 1348- 1794; 1483-1909; 1338-1812; 1343-1797; 2138-2433; 1651-2161; 1487-2033; 1603-2134; 1745- 2047; 1834-2405; 1730-2286; 1424-2036; 1350-1876; 1422-2024; 1667-2340; 2126-2362; 1644- 1881; 2061-2274; 1724-1946; 2160-2402; 2160-2403; 1711-1951; 1644-1889; 2180-2414; 654- 884; 1990-2218; 1678-1906; 1754-1982; 1927-2034; 1835-2063; 1479-1707; 2177-2404; 1595- 1819; 1733-1955; 2080-2302; 1901-2036; 2152-2361; 1542-1751; 1679-1887; 2166-2372; 2060- 2170; 2451-2547; 1924-2124; 1854-2050; 1682-1877; 2163-2357; 2060-2252; 1800-1985; 1350- 1443; 1493-1661; 971-1101; 1889-2032; 2452-2594; 2081-2217; 1724-1857; 104-236; 1630-1730; 2691-2752; 2679-2741; 2694-2752; 2159-2221
174/LG:1452606.33:2002JAN18 1-595; 61-274; 64-275; 101-651; 116-700; 306-521; 372-738; 406-696; 421-523; 421-809; 422-
667; 435-648; 473-685; 487-716; 487-904; 500-1091; 536-1191; 536-696; 537-747; 538-696; 539- 696; 540-696; 550-1091; 558-696; 564-934; 819-918; 820-918; 840-1101; 849-1106; 946-1114; 947-1432; 948-1164; 992-1432; 999-1458; 1001-1432; 1004-1432; 1014-1432; 1015-1458; 1015- 1322; 1022-1469; 1023-1355; 1038-1432; 1051-1449; 1052-1459; 1076-1457; 1078-1388; 1099- 1408; 1098-1449; 1109-1459; 1115-1449; 1159-1335; 1194-1440; 1279-1464; 1378-1449
TABLE 5
SEQ ID NQJComponent ID Fragments
175/LG:018099.22:2002JAN 18 1 -502,30-801 ,48-425,53-514,58-598,67-354,87-386,94-358, 135-514, 139-514, 144-518, 182-
518,242-424,253-423,254-514,260-502,283-518,314-514,336-514,384-514,386-923,529-1070,61 1 -
1037,621-1 181,621-1240,680-920,684-838,71 1-893,731-1 148, 741 -1 197, 747-926,894-1421, 896-
1433,1032-1376,1032-1806,1096-1751,1124-1401,1 168-1581,1168-1643,1173-1463,1218-
1658, 1255-1580, 1275-2094, 1393-2161 , 1402-1675, 1419-1673, 1419-2003, 1435-1792, 1468- 1719, 1507-2058, 1530-2203, 1552-1705, 1564-2241 , 1581 -1838, 1581 -2094, 1584-2218, 1598- 1841, 1615-181 ό, 1641 -18ό2, l ό43-17ό8, 1657-1805, 16όl -2204, 1661-2234, l ό67-2248, l ό82- 2249, 1684-2246, 1692-1921 , 1692-1937, 1710-1977, 1726-2283, 1739-2174, 1748-2355, 1776- 2405, 1798-2236, 1837-2063, 1837-2402, 1860-2085, 1879-2057, 1879-2524, 1896-2149, 1899- 2144, 1923-2256, 1932-2095, 1948-2579, 1974-2504,2000-2761 ,2027-2283,2038-2650,2045- 2348,2056-2268,
2061-2271,2062-2226,2092-241 1,2092-2442,21 13-2393,2191-2478,221 1-2821,2230-2821,2246- 2837,2286-2558,2317-2512,2336-2517,2342-2642,2342-2649,2342-2734,2345-2744,2346- 2748,2349-2616,2349-3954,2361-2613,2370-2626,2370-2953,2371 -271 1 ,2377-2647,2377- 2775,2380-2659,2392-2584,2395-267ό,2429-2677,243ό-2706,2452-26ό5,24ό3-3954,2471- 2977,2481-2768,2487-2689,2491 -2910,2492-3252,251 1 -2800,2529-2790,2544-3188,2583-
3062,2591-3028,2625-2895,2629-2795,2637-3323,2643-291 1,2643-301 1,2693-3297,2696- 2957,2700-3057,2700-3058,2722-2959,2734-3052,2736-3254,2738-3015,2744-2956,2750- 3041,2780-3540,2787-3270,2792-3218,2802-3225,2804-3354,2810-3196,2817-3218,2828- 3182,2870-3124,2870-3307,2890-31 12,2903-3148,2945-3237,2951-3305,2951-3384,2970- 3429,2983-3532,2997-361 1 ,2999-3260,3000-3148,3006-3540,3029-3306,3030-3209,3032-3237,
3032-3243,3045-3687,3048-3254,3048-3489,3059-3326,3060-3313,3088-3546,3089-3364,3105-
3607,3132-3436,3137-3747,3172-3937,3190-3423,3195-3376,3196-3408,3208-3486,3220-
3467,3221 -3476,3228-3755,3244-3474,3254-3550,3259-3490,3263-3551,3263-3842,3287-
3535,3291-3432,3304-3791,3304-3848,3306-3566,3307-3505,3307-3896,3312-3540,3312-
3839,3314-39463315-3643,3340-3594,3341 -3530,3341 -3593,3341 -3619,3343-351 1 ,3344-
3409,3344-3556,3344-3727,3344-3916,3353-3844,3354-3563,3359-3478,3400-3666,3400-'
393ό,3427-3536,3439-3664,3454-3914,3464-3706,3464--3909,3465-3954,3483-3955,3495-
3629,3504-3759,3512-3958,3516-3646,3518-3956,3518-3958,3520-3832,3521-3749,3521-
3959,3523-3837,3523-3950,3524-3957,3525-3964,3526-3954,3527-3813,3527-3954,3531- 3954,3535-3954,3535-3954,3535-3958,3536-3957,3537-3794,3540-3954,
TABLE 5
SEQ ID NOJComponent ID Fragments
175, cont. 3542-3948, 3542-3948, 3544-3954,3547-3950,3547-3954,3547-3954,3549-3954, 3550-3808,3551- 3950,3551-•3958,3557-■3955,3560-3954,3562-3746,3567-3886,3567-3950,3572- 3811,3577- 3758,3577-■3799,3577-■3808,3583-3958,3588-3831,3591-3954,3593-3916,3594- •3954,3606- 3955,3610-•3950,3611-■3950,3612-3917,3614-3950,3614-3957,3615-3954,3615* •3954,3615- 3957,3615-■3958,3616•3954,3616-3954,3617-3954,3626-3948,3648-3954,3652- •3795,3661- 3957,3663-■3950,3672-•3903,3672-3950,3679-3956,3080-3954,3689-3954,3089- ■3954,3689- 3959,3691-•3957,3695-•3950,3706-3957,3707-3950,3713-3834,3724-3944,3724 ■3950,3724- 3954,3724-3955,3724-3959,3727-3954,3728-3958,3730-3954,3732-3947,3738 •3944,3738- 3954,3740--3954,3740--3954,3740-3954,3740-3960,3741-3884,3754-3857,3755 •3914,3762- 3950,3774-3951,3774-3955,3778-3954,3796-3954,3807-3948,3811-3948,3848 -3943,3876- 3954,3883--3954,3886-3954
176/LG:7771625.8:2002JAN18 1-548
177/LG:1513012.6:2002JAN18 1-311; 1-556; 3-371; 6-464; 14-544; 3-665; 3-550; 3-545; 28-652; 34-638; 56-480; 118-756; 118- 669; 123-757; 262-789; 286-756; 353-784; 607-850; 743-1186; 755-925; 786-1349; 1193-1840;
1193- 1421; 1229-1600;
OJ 1231-1450, 1396-1914; 1572-1830; 1615-2083; 1617-1852; OJ 1621-2154; 1691-1913, 1753-2222, 1853-2321, 1867-2016; 1940-2203; 1984-2141 2007-•2354; 2025-2264, 2130-2392, 2141-2399, 2178-2478; 2363-2815; 2364-2651 2405-■2659; 2486-2706; 2446-2554; 2476-2873, 2593-2905; 2593-2790, 2593-2835; 2593-■2703; 2595-2850; 2602-3088, 2602-2807, 2608-2820; 2625-2725, 2641-3109, 2641-•3118; 2644-3096, 2644-2888 2644-3037 2651-3097; 2680-3264; 2684-3037 2696■2887; 2711-3237, 2736-3280, 2747-3001 2801-3004; 2801-2980; 2799-3191 2809*■3037; 2813-30182814-3036; 2814-3079;
2853- 3098, 2861 3100, 2862-2956; 2884-3215; 2893-3212; 2906-3060; 2909-3232; 2927-3072, 2955-3280, 2974 •3234, 3014-3266, 3029-3276, 3036-3280; 3039-3280; 3060-3247; 3091-3547 3088-■3560, 3093 3272, 3119-3278, 3119-3360, 3121-3576; 3143-3380; 3146-3280; 3149-3435, 3158-■3445; 3161 ■3381 3191-3411 3191-3459, 3191-3280; 3204-3356; 3219-3410; 3236-3354, 3312-■3422, 3314 ■3584; 3322-3795.3333-3576, 3336-3576; 3339-3582; 3367-3613; 3370-3827, 3384-■3692; 3401 •3601 3478-3689, 3492-3543, 3509-3928; 3603-4138; 3605-4138; 3614-3816. 3711-•4032, 3713 ■4168, 3713-3858, 3713-4147, 3713-4247; 3712-4089; 3714-4146; 3714-3786. 3716•4137 3750 ■4297 3752-4096; 3754-4379, 3754-4114; 3755-4133; 3756-3864; 3758-4029; 3816-•4426; 3812 ■3981; 3813-4064; 3828-4088;
TABLE 5
SEQ ID NQJComponent ID Fragments
177, cont. 3831-4038; 3830-4093; 3843-4063; 3855-4090; 3859-4067; 3861-4064; 3881-4164; 3913-4299, 3910-4167; 3929-4357; 3929-4467; 3916-4766; 3943-4418; 3951-4228; 3953-4219; 3962-4196; 3979-4224; 4059-4299; 4069-4805; 4075-4321; 4078-4523; 4081-4296; 4081-4319; 4106-4688, 4107-4369; 4087-4400; 4112-4380; 4138-4437; 4117-4345; 4142-4367; 4120-4567; 4125-4620, 4148-4393; 4136-4375; 4149-4398; 4161-4649; 4187-4430; 4165-4917; 4204-4467; 4188-4721 4190-4556; 4193-4755; 4196-4556; 4208-4573; 4219-4556; 4246-4686; 4243-4317; 4230-4570, 4230-4581; 4230-4569; 4230-4556; 4230-4572; 4230-4567; 4230-4535; 4230-4566; 4251-4486; 4283-4552; 4258-4789; 4263-4436; 4311-4385; 4332-4603; 4337-4433; 4337-4603; 4347-4578, 4382-4901; 4407-4806; 4409-4667; 4478-4707; 4482-4730; 4505-4798; 4536-4764; 4543-4778, 4562-4795; 4587-4985; 4588-4833; 4611-5012; 4661-4917; 4675-4886; 4710-5020; 4706-4936, 4722-5004; 4783-5019; 4857-5245; 4893-4984; 4965-5019; 4978-5245; 5093-5702; 5288-5702, 5394-5700
178/LG:903956.34:2002JANl8 3726-4068; 3468-4023; 3581-4023; 3822-4023; 3619-4019; 3639-3997; 3821-3997; 3886-3997, 3488-3997; 3589-3997; 3814-3997; 3840-3997; 3843-3997; 3858-3997; 3877-3997; 3899-3997 3906-3997; 3926-3997; 3933-3997; 3934-3997; 3937-3997; 3938-3997; 3946-3997; 3904-3997
OJ OJ x 3818-3995; 3819-3995;
3448-3995; 3710- 3986; 3724-3986; 3810-3981 3745-3978 3762-3959; 3538-3905, 3665-3905;
3593-3886, 3569-•3834; 3568-3834, 3207-3810, 3504-3807. 3526-3801 3352-3796, 3666-3761
3537-3758, 3532-■3746; 3487-3743, 3479-3709 3314-3706; 3270-3705; 3417-3694; 3361-3676.
3401-3671 3452-■3663, 3247-3656, 3531-3656; 3394-3655; 3069-3648, 3362-3633, 3075-3630,
3339-3627, 3342-•3615, 3348-3615, 3359-3591 3167-3591 3362-3577, 3073-3570, 2951-3561
3050-3558, 3006•3532, 3211-3500, 3205-3463; 2967-3461 3147-3441 2968-3440, 3182-3430,
3213-3430, 2967-■3422, 2997-3421 2962-3418, 2965-3415, 3152-3415, 2965-3414, 2969-3411
2967-34092962-■3405; 2963-3399, 2967-3397 2974-3397, 2971-3395; 2960-3383, 3085-3383,
3121-3380, 2965-3376, 2964-3375, 2963-3375; 3133-3371 2967-3365, 2964-3359; 2967-3346;
2967-3344; 2968--3340, 2954-3331 2967-3331 3109-3329, 2966-3315; 2966-3305 2967-3300,
2967-3288, 2788--3272, 2967-3258, 2967-3254. 2967-3247, 2945-3243; 2787-3243; 2946-3240,
2980-3226; 2965-3204; 2987-3204, 2954-3200; 2965-3152,
TABLE 5
SEQ ID NOJComponent ID Fragments
178, cont. 2967-3144; 2967-3143; 2914-3130; 2672-3130; 2967-3094; 2995-3094; 2974-3093; 2960-3031 2727-2957; 2607-2897; 2395-2884; 2610-2884; 2318-2854; 2408-2849; 2749-2845; 2238-2579; 2231-2579; 2226-2516; 2230-2493; 2134-2440; 2234-2436; 1823-2385; 2087-2385; 2265-2384, 2000-2381; 2040-2381; 2002-2380; 1911-2378; 1785-2375; 1962-2375; 2007-2375; 2109-2375, 2176-2375; 1987-2374; 2003-2374; 2185-2375; 1899-2373; 1958-2373; 2050-2373; 2199-2373, 2007-2372; 2060-2372; 2002-2371; 2074-2371; 2106-2371; 63-2369; 1932-2370; 1940-2370; 1952-2370; 2005-2370; 2023-2370; 2128-2369; 2148-2370; 2222-2370; 1730-2371; 1915-2369, 1986-2369; 1968-2369; 2009-2369; 2050-2369; 2007-2369; 2064-2369; 2068-2369; 2073-2369, 2097-2369; 2095-2369; 2134-2369; 2087-2369; 2156-2369; 2176-2369; 2185-2369; 2195-2369; 2216-2369; 2235-2369; 2319-2369; 1986-2368; 2119-2368; 2192-2369; 2218-2367; 2071-2367 2077-2367; 2146-2367; 2192-2367; 2224-2367; 2179-2366; 2189-23662133-2366;
2146-2332; 2217* 2330; 1819-2330; 1867-2330; 1768-2327; 1950-2336; 1769-2322; 2036-2318; 1817-2308; 1904*2286; 1969-2254; 1647-2249; 1841-2235; 1984-2234; 2025-2220; 1925-2189; 1984-2175; 1921-•2172; 1934-2172; 1936-2169; 1437-2168; 1715-2137; 1889-2130; 1619-2111; 1853-2110; 1853-■2108; 1889-2101; 1804-2098; 1904-2094; 1846-2083; 1847-2077; 1778-2063;
OJ OJ
ON 1835-2060; 1804■2055; 1788-2046; 1763-2030; 1761-2007; 1877-2006; 1627-1998; 1619-1993; 1743-1940; 1682•1926; 1682-1914; 1759-1911; 1649-1910; 1649-1900; 1690-1898; 1679-1897; 1668-1872; 1294-1864; 1636-1803; 1715-1796; 1581-1732; 1496-1688; 1361-1650; 1497-1631; 1365-1614; 1103-1587; 1197-1336; 881-1107; 890-1099; 820-1071; 685-900; 673-881; 342-594; 369-561; 74-543; 408-540; 233-478; 220-470; 66-411; 112-381; 67-338; 66-324; 1-261; 40-262; 40- 260; 1967-2324; 1830-2252; 1929-2371; 3419-3851; 3205-3629; 2967-3439; 2967-3423; 2967- 3418; 2986-3497.; 3419-3997; 2808-3477; 655-1329; 3170-3808; 2985-3747; 1597-2299; 3502- 3680; 2398-2561
179/LG:331171 ,22:2002JAN18 1-614; 217-727; 220-475; 222-512; 237-672; 245-958; 254-552; 255-723; 258-523; 277-671; 273-
550; 280-551; 309-882; 310-537; 336-413; 349-609; 360-604; 373-633; 379-647; 381-608; 383- 645; 383-476; 388-796; 432-686; 444-654; 449-695; 455-738; 459-717; 466-730; 470-716; 470- 702; 481-726; 482-1126; 512-744; 518-757; 519-767; 559-830; 562-796; 578-993; 583-1098; 586- 839; 590-878; 619-833; 624-858; 631-839; 633-838; 665-938; 671-826; 692-915; 705-1038; 704- 979; 704-944; 706-917; 715-931; 719-966; 753-1114; 766-1114; 780-1114; 780-1006; 782-1017; 791-1059; 791-1080; 791-1193; 792-1209; 791-1071; 807-1080; 819-1026; 827-1114; 852-1372; 856-1128; 861-1114; 870-1141; 888-1114; 935-1114; 954-1114; 961-1114; 961-1112; 976-1114; 982-1119; 987-1127; 993-1114; 1043-1113; 1044-1114; 1056-1114; 1072-1127
TABLE 5
SEQ ID NOJComponent ID Fragments
180/LG:380305.28:2002JAN 18 1-256; 1-608; 98-453; 107-342; 253-830; 267-830; 491-793; 675-830; 734-1 146; 739-965; 755- 919; 81 1-1088; 896-1 1 14
181 /LG:227928.19:2002JAN18 1-430; 127-666
182/LG:1099593.39:2002JAN18 1-495; 324-930; 460-930; 617-1208; 634-1134; 652-1271; 764-1184; 839-1 100; 839-1268 183/LG:1501223.87:2002JAN18 1-306; 1-132; 8-139; 9-133; 8-132; 9-293; 16-631; 24-294; 150-399; 1-188; 1-291; 213-632; 534-
1 139; 542-1089; 550-807; 578-933; 584-787; 609-883; 616-881; 615-924; 620-902; 620-834; 624- 981; 642-832; 662-883; 665-908; 666-840; 674-923; 680-799; 691-932; 695-780; 697-928; 698- 788; 708-952; 711-1142; 713-970; 717-993; 719-939; 719-933; 742-857; 742-967; 744-935; 784- 885; 1007-1140; 8-115; 1-124; 604-740; 903-1105; 8-108
184/LG7690039.1 :2002JAN 18 1 -424; 179-647; 350-707; 579-980; 591-740; 668-881
185/LG:332701.3:2002JAN18 1-368; 184-566; 260-572; 277-741; 277-532; 282-740; 290-477; 290-391; 290-698; 290-351; 290- 716; 290-702; 290-395; 298-567; 309-865; 335-565; 349-700; 354-572; 602-1197; 730-964; 730- 1073; 747-833; 787-1054; 787-984; 797-1059; 921-1369; 936-1180; 950-1208; 958-1216; 1016- 1519; 1018-1270; 1043-1277; 1062-1325; 1062-1310; 1101-1490; 1120-1330; 1134-1492; 1135- 1480; 1135-1471; 1134-1477; 1134-1352; 1135-1431; 1134-1415; 1135-1378; 1135-1425; 1134-
OJ OJ -0 1527; 1144-1674; 1176-1611; 1219-1388; 1233-1942; 1241-1485; 1294-1612; 1297-1571; 1333- 1760; 1378-1961; 1400-1982; 1434-1628; 1497-2105; 1497-2024; 1512-2109; 1530-1838; 1550- 1787; 1575-2103; 1575-1808; 1594-2300; 1682-1852; 1741-2138; 1755-2019; 1755-2366; 1800- 2041; 1807-2095; 1858-2246; 1859-2056; 1931-2709; 1946-2201; 1971-2109; 1988-2485; 1988- 2214; 2045-2619; 2073-2310; 2086-2713; 2092-2603;
2101-2370; 2101-2550, 2110-2649, 2110-2726, 2155-2431 2159-2592, 2160 ■2353; 2204-2818, 2213-2709; 2213-2439; 2277-2568, 2287-2811 2275-2504; 2292-2918, 2304 ■2583.2308-2762, 2319-2932; 2324-25272312-2510 2337-2601 2337-28722363-2806, 2364 ■26082370-2894; 2370-2881; 2376-2920; 2378-2843 2387-2639; 2437-29162440-2663.2445 •2643; 2444-2697 2444-2909; 2446-2642, 2450-2919, 2451-2625, 2453-2641 2457-2908, 2458- •2921 2460-2916, 2464-2916; 2468-2652.2468-2918, 2472-2935, 2473-2919.2474-2638, 2474 •2932, 2476-2922, 2480-2885; 2484-2935; 2485-2911 2488-2915, 2486-29102490-28932504 •2911 2512-2944, 2515-2920; 2525-2936, 2539-2945; 2536-2838
TABLE 5
SEQ ID NOJComponent ID Fragments
185, cont. 2554-2911; 2594-2915, 2598-2920; 2595-2911 2600-2922; 2601-2936; 2605-2918, 2608-2932, 2602-2769; 2612-2959, 2612-2894, 2612-2834, 2615-2918, 2621-2906; 2619-2910. 2621-2882, 2621-2924; 2621-2879; 2629-2918, 2631-2911 2640-2911 2650-2911 2653-2911 2663-2918. 2649-2816; 2678-2920 2686-2906 2692-2873 2721-2906; 2731-2876, 2741-2940, 2754-2911 2758-2936; 2759-2926, 2761-2920; 2767-3396, 2767-3254, 2777-2906; 2793-3212. 2794-3176, 2796-2918; 2839-3217 2853-2921 2936-3396; 2977-3208; 2977-3209; 2979-3597, 3067-3412, 3138-3538; 3246-3583. 3283-6415. 3284-3471 3287-6656; 3287-3650, 3287-4626; 3300-3782, 3304-3999; 3306-6524; 3323-3821 3336-3415; 3352-3542, 3355-3583, 3365-3962, 3415-3821 3424-4202; 3424-3583 3529-3821 3560-3837 3605-3973 3620-3802 3653-4291 3962-4220, 4139-4508; 4605-5129. 4625-5231 4861-5127 4861-5327, 4880-5287 5144-5770, 5150-5382, 5295-5545; 5295-5736 5514-5784, 5543-5964; 5755-6014; 5889-6656 5892-6591 5974-6502, 6079-6656; 6175-6582 6238-6502, 6256-6484
186/LG:237963.28:2002JAN 18 1-594; 1-593; 468-1122; 499- 1055; 564-769; 632-889; 652-914; 768-864; 742-1362; 753-1137; 754-962; 757-993; 781-1351; 935-1409; 952-1152; 956-1293; 961-1276; 974-1371; 1038-1344; 1109-1720; 1122-1783; 1217
OJ 1893; 1219-1779; 1221-1417; 1221-1501; 1259-1548; 1418-1625; OJ 00 1554-2012
187/LG:245267.1 :2002JAN18 1-212; 1-479; 2-448; 6-300; 17-508; 18-454; 49-440; 64-427; 64-280; 108-639; 107-318; 108-620; 108-634
188/LG:77όl 954.21 :2002JAN18 1-477; 1-484; 1-435; 4-488; 177-555; 220-321; 237-774; 358-562; 359-564
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
1 LG:1 100267.1 :2002JAN 18 Connective Tissue - 47%, Musculoskeletal System - 40%, Nervous System - 13% 2 LG:1376818.27:2002JAN18 Nervous System - 24%, Connective Tissue - 13% 3 LG:990561.44:2002JAN18 Germ Cells - 14%, Pancreas - 1 1% 4 LG:990855.9:2002JAN18 Skin - 39%, Connective Tissue - 22%, Embryonic Structures - 12%, Cardiovascular System - 12% 5 LG:898483.1 :2002JAN18 Connective Tissue - 48%, Skin - 44% 6 LG:150971.1 :2002JAN18 Unclassified/Mixed - 33%, Cardiovascular System - 27%, Respiratory System - 13%, Digestive System -
13%
7 LG:7771532.20:2002JAN18 Male Genitalia - 50%, Digestive System - 50%
8 LG:1501261.5:2002JAN18 Unclassified/Mixed - 65%, Nervous System - 17%
9 LG: 1454772.2:2002JAN 18 Connective Tissue - 1 1 %
10 LG:203951.12:2002JAN18 Germ Cells - 22%
1 1 LG:142131.1ό:2002JAN18 Hemic and Immune System - 51%, Connective Tissue - 1 1%
12 LG:333034.3:2002JAN18 Connective Tissue - 69%, Skin - 14%
13 LG:1291525.9:2002JAN18 Pancreas - 20%, Sense Organs - 19%, Skin - 13%
14 LG7771246.13:2002JAN18 Liver - 62%, Urinary Tract - 26%, Hemic and Immune System - 12%
15 LG:1 125820.1 :2002JAN18 Unclassified/Mixed - 63%, Exocrine Glands - 38%
16 LG:299789.1 :2002JAN18 Cardiovascular System - 80%, Hemic and Immune System - 20%
17 LG:044888.1 :2002JAN18 Germ Cells - 53%, Endocrine System - 20%, Digestive System - 18%
18 LG:410020.3:2002JAN18 Hemic and Immune System - 100%
19 LG:7684165.8:2002JAN18 Connective Tissue - 19%, Cardiovascular System - 19%, Unclassified/Mixed - 14%
20 LG:358050.2:2002JAN18 Sense Organs - 94%
21 LG:2309057:2002JAN18 Skin - 15%, Cardiovascular System - 12%, Connective Tissue - 1 1 %
22 LG:7688735,3:2002JAN18 Cardiovascular System - 70%, Male Genitalia - 20%, Nervous System - 10%
23 LG:445084.2:2002JAN18 Germ Cells - 1 1%, Connective Tissue - 10%
24 LG:769ό681.1 :2002JAN18 Male Genitalia - 53%, Unclassified/Mixed - 26%, Respiratory System - 11%, Female Genitalia - 1 1% 25 LG:1446403.4:2002JAN18 Respiratory System - 50%, Digestive System - 50% 26 LG: 1042935.2:2002JAN 18 Hemic and Immune System - 53%, Respiratory System - 41% 27 LG:7691854.1 :2002JAN18 Sense Organs - 52%, Unclassified/Mixed - 16%, Liver - 1 1% 28 LG:979580.1 :2002JAN18 Embryonic Structures - 25%, Cardiovascular System - 25%, Female Genitalia - 14%, Nervous System •
14%
29 LG:185136.4:2002JAN18 Unclassified/Mixed - 57%
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
30 LG:1398319.1 :2002JAN18 Endocrine System - 67%, Respiratory System - 33%
31 LG:375724.10:2002JAN18 Male Genitalia - 49%, Unclassified/Mixed - 18%
32 LG:220407.7:2002JAN18 widely distributed
33 LG:259850.1 :2002JAN 18 Liver - 88%, Nervous System - 13%
34 LG:435726.8:2002JAN18 Unclassified/Mixed - 50%, Endocrine System - 40%, Nervous System - 10%
35 LG:271394.44:2002JAN18 Male Genitalia - 19%, Female Genitalia - 13%, Digestive System - 13%
36 LG:7761755.9:2002JAN18 Digestive System - 35%, Hemic and Immune System - 20%, Nervous System - 15%
37 LG:7762920.1 :2002JAN18 Nervous System - 100%
38 LG:7763326.6:2002JAN18 Embryonic Structures - 29%, Digestive System - 29%, Hemic and Immune System - 21%
39 LG:242234.14:2002JAN 18 Sense Organs - 10%
40 LG:29152ό.l :2002JAN18 Endocrine System - 20%, Embryonic Structures - 17%, Digestive System - 17%
41 LG:243209.10:2002JAN18 Unclassified/Mixed - 56%, Connective Tissue - 26%, Male Genitalia - 15%
42 LG:378592.15:2002JAN18 Embryonic Structures - 39%, Respiratory System - 28%, Hemic and Immune System - 28%
OJ 43 LG:35727ό.1 1 :2002JAN 18 Skin - 10% o 44 LG:1507027.3:2002JAN18 Nervous System - 19%, Unclassified/Mixed - 13%, Pancreas - 12%
45 LG:201342.4:2002JAN18 Embryonic Structures - 20%, Unclassified/Mixed - 19%
46 LG:327504.9:2002JAN18 Respiratory System - 67%, Nervous System - 33%
47 LG:346506.19:2002JAN18 Germ Cells - 15%
48 LG:7771048.3:2002JAN18 Endocrine System - 100%
49 LG:395081.7:2002JAN18 Exocrine Glands - 100%
50 LG: 1452709.28:2002JAN 18 Stomatognathic System - 40%, Sense Organs - 19%, Embryonic Structures - 13%
51 LG:991162.52:2002JAN18 Nervous System - 70%
52 LG:346677.1 1 :2002JAN18 Respiratory System - 50%, Digestive System - 50%
53 LG:1400284.13:2002JAN 18 Sense Organs - 26%, Germ Cells - 17%, Exocrine Glands - 10%
54 LG:7698465.26:2002JAN18 widely distributed
55 LG:769869ό.l8:2002JAN18 Pancreas - 44%, Liver - 12%
56 LG:350410.3:2002JAN18 Male Genitalia - 16%, Sense Organs - 15%
57 LG:7770751.8:2002JAN18 Respiratory System - 100%
58 LG:052513.3:2002JAN18 Connective Tissue - 19%, Exocrine Glands - 19%, Hemic and Immune System - 19%
59 LG: 7692334.1 :2002JAN 18 Endocrine System - 31%, Unclassified/Mixed - 29%, Exocrine Glands - 14%
60 LG: 199284.1 1 :2002JAN 18 NO DATA
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
61 LG:7ό83993.13:2002JANlc- Sense Organs - 83%
62 LG: 1079823.1 :2002JAN 18 Urinary Tract - 38%, Exocrine Glands - 38%, Female Genitalia - 25%
63 LG:1082263.10:2002JAN If Respiratory System - 14%, Endocrine System - 12%, Embryonic Structures - 11%, Cardiovascular
System - 11%, Liver - 11%
64 LG:1076162.1 :2002JAN18 Male Genitalia - 57%, Digestive System - 29%, Nervous System - 14%
65 LG:404157.1 :2002JAN18 Unclassified/Mixed - 27%, Skin - 16%, Respiratory System - 12%
66 LG:474725.1 :2002JAN18 Unclassified/Mixed - 29%, Endocrine System - 17%, Exocrine Glands - 14%
67 LG:1080918.1 :2002JAN18 Connective Tissue - 36%, Respiratory System - 14%, Nervous System - 11%, Endocrine System - 1 1%
68 LG:1092343.1 :2002JAN 18 Germ Cells - 51%
69 LG:7684505.1 :2002JAN18 Hemic and Immune System - 47%; Urinary Tract - 20%, Exocrine Glands - 20%
70 LG.7689627.1 :2002JAN18 Female Genitalia - 67%, Hemic and Immune System - 33%
71 LG: 122863.1 :2002JAN 18 Urinary Tract - 22%, Respiratory System - 19%, Nervous System - 19%, Unclassified/Mixed - 19%
72 LG7690093.1 :2002JAN18 Stomatognathic System - 74%
OJ 73 LG: 1449021.1 :2002JAN 18 Germ Cells - 69%, Unclassified/Mixed - 11% ^ 74 LG:958155.1 :2002JAN18 Exocrine Glands - 78%, Digestive System - 22%
75 LG:7684559.1 :2002JAN18 Sense Organs - 25%, Liver - 11 %
76 LG:080328.2:2002JAN18 Liver- 100%
77 LG:7687730.5:2002JAN18 Digestive System - 17%, Urinary Tract - 16%, Hemic and Immune System - 12%, Nervous System - 12%
78 LG:7691462.5:2002JAN18 Digestive System - 100%
79 LG:7690229.9:2002JAN18 Unclassified/Mixed - 100%
80 LG:7691117.5:2002JAN18 Digestive System - 18%, Unclassified/Mixed - 17%, Female Genitalia - 11%, Exocrine Glands - 11%
81 LG:413642.1 :2002JAN18 Hemic and Immune System - 100%
82 LG:7771639.1 :2002JAN18 Embryonic Structures - 26%, Female Genitalia - 18%, Pancreas - 16%
83 LG:7684553.3:2002JAN 18 Female Genitalia - 100%
84 LG:7690374.7:2002JAN18 Nervous System - 100%
85 LG:7690065.3:2002JAN18 Female Genitalia - 67%, Nervous System - 33%
86 LG:7690583.5:2002JAN18 Nervous System - 100%
87 LG:7771893.1 :2002JAN18 Female Genitalia - 25%, Endocrine System - 17%, Cardiovascular System - 17%
88 LG:7691582.2:2002JAN18 Sense Organs - 19%, Cardiovascular System - 13%, Exocrine Glands - 12%
89 LG:7687809.2:2002JAN18 Unclassified/Mixed - 19%, Male Genitalia - 15%, Embryonic Structures - 13%
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
90 LG-.7691200.3:2002JAN 18 Pancreas - 23%, Connective Tissue - 18%, Hemic and Immune System - 13%
91 LG:405709.4:2002JAN18 Endocrine System - 30%, Connective Tissue - 26%, Exocrine Glands - 26%
92 LG:982979.1 :2002JAN18 Endocrine System - 19%, Connective Tissue - 16%, Digestive System - 16%
93 LG:7669310.1 :2002JAN18 ■ Digestive System - 100%
94 LG:231546.6:2002JAN18 Nervous System - 100%
95 LG:7693668.4:2002JAN18 Urinary Tract - 100%
96 LG:7771057.9:2002JAN18 Nervous System - 36%, Unclassified/Mixed - 36%, Cardiovascular System - 29%
97 LG: 1 14448.25:2002JAN18 Embryonic Structures - 10%
98 LG:180803.3:2002JAN18 Male Genitalia - 100%
99 LG:1094595.3:2002JAN18 Stomatognathic System - 18%
100 LG:150288.12:2002JAN 18 Embryonic Structures - 13%, Female Genitalia - 13%, Unclassified/Mixed - 12%
101 LG:7761700.28:2002JAN18 Cardiovascular System - 16%, Female Genitalia - 1 1%, Endocrine System - 1 1%, Skin - 1 1%
102 LG:1093982.42:2002JAN18 Stomatognathic System - 14%
103 LG:7762752.1 :2002JAN18 Digestive System - 71%, Exocrine Glands - 13%
OJ 4i. to 104 LG:013006.1 1 :2002JAN18 Stomatognathic System - 82%
105 LG:054509.10:2002JAN18 Nervous System - 11 %
106 LG:345276.3:2002JAN18 Germ Cells - 13%, Sense Organs - 10%
107 LG:247354.20:2002JAN18 Urinary Tract - 50%, Exocrine Glands - 50%
108 LG:1454791.33:2002JAN18 Urinary Tract - 40%, Nervous System - 27%, Male Genitalia - 13%, Female Genitalia - 13%
109 LG:7690539.5:2002JAN18 Endocrine System - 69%, Embryonic Structures - 24%
1 10 LG:984007.4:2002JAN 18 Pancreas - 17%, Nervous System - 14%, Urinary Tract - 12%, Digestive System - 12%
1 1 1 LG: 1093386.25:2002JAN 18 Stomatognathic System - 15%, Embryonic Structures - 1 1 %
112 LG:7693871.ό:2002JAN18 Nervous System - 21 %, Liver - 21 %, Musculoskeletal System - 18%
1 13 LG:7693934.1 :2002JAN18 Female Genitalia - 29%, Cardiovascular System - 29%, Urinary Tract - 21%
1 14 LG:7697553.34:2002JAN18 Hemic and Immune System - 22%, Respiratory System - 20%, Female Genitalia - 15%
1 15 LG:337345.5:2002JAN18 Sense Organs - 28%, Embryonic Structures - 19%, Germ Cells - 18%
1 16 LG:410680.7:2002JAN18 Germ Cells - 67%
1 17 LG:7771583.2:2002JAN18 Germ Cells - 39%, Nervous System - 14%, Endocrine System - 1 1%
1 18 LG:074994.14:2002JAN18 Germ Cells - 17%, Pancreas - 13%, Embryonic Structures - 12%
1 19 LG:7691131.1 :2002JAN18 Liver - 20%, Pancreas - 19%, Unclassified/Mixed - 14%
120 LG:983975.1 :2002JAN 18 Pancreas - 39%, Embryonic Structures - 34%
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
121 LG:1383194.7:2002JAN18 widely distributed
122 LG:1328573.4:2002JAN18 Nervous System - 22%, Male Genitalia - 13%, Unclassified/Mixed - 13%, Hemic and Immune System ■
13%
123 LG:7692963.1 :2002JAN18 . Female Genitalia - 34%, Urinary Tract - 16%, Skin - 16%
124 LG:7696423.1 :2002JAN18 Musculoskeletal System - 33%, Urinary Tract - 17%, Digestive System - 17%, Hemic and Immune
System - 17%, Exocrine Glands - 17%
125 LG:7696234.1 :2002JAN18 Digestive System - 41 %, Musculoskeletal System - 16%, Pancreas - 13%
126 LG:1388299.1 :2002JAN 18 Nervous System - 60%, Respiratory System - 40%
127 LG:978521.5:2002JAN18 Female Genitalia - 50%, Digestive System - 38%, Hemic and Immune System - 13%
128 LG:7692599.9:2002JAN18 Sense Organs - 20%, Germ Cells - 15%, Embryonic Structures - 12%
129 LG:1452678.13:2002JAN 18 Liver - 18%, Nervous System - 11 %
130 LG:332947.1 :2002JAN18 Digestive System - 63%, Cardiovascular System - 25%, Respiratory System - 13%
131 LG:1292520.13:2002JAN 18 Musculoskeletal System - 18%, Female Genitalia - 17%, Respiratory System - 15%, Digestive System -
OJ 15%
OJ 132 LG7750009.1 :2002JAN18 Nervous System - 100%
133 LG:238322.4:2002JAN18 Sense Organs - 24%, Male Genitalia - 20%, Musculoskeletal System - 19%
134 LG:7694382.4:2002JAN18 Nervous System - 100%
135 LG:1329198.3:2002JAN18 Pancreas - 97%
136 LG:345314.33:2002JAN18 Hemic and Immune System - 100%
137 LG:2150307:2002JAN18 Nervous System - 57%, Endocrine System - 14%
138 LG:383884.26:2002JAN18 Germ Cells - 66%
139 LG:413518.62:2002JAN18 Pancreas - 17%, Endocrine System - 15%, Liver - 13%, Embryonic Structures - 13%
140 LG:903138.45:2002JAN18 Sense Organs - 50%, Skin - 12%
141 LG:1377804.32:2002JAN18 Skin - 27%, Embryonic Structures - 11%
142 LG:1390822.13:2002JAN 18 widely distributed
143 LG:7698830.22:2002JAN18 Pancreas - 12%
144 LG:7762105.20:2002JAN18 Germ Cells - 14%, Stomatognathic System - 1 1%
145 LG:1382907.104:2002JAN 18 Digestive System - 21%, Male Genitalia - 21%, Sense Organs - 13%
146 LG:294464.12:2002JAN18 Skin - 82%
147 LG:003736.32:2002JAN18 Liver - 93%
148 LG:1502253.2:2002JAN18 Embryonic Structures - 64%, Cardiovascular System - 36%
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
149 LG:216797.51 :2002JAN18 Cardiovascular System - 22%, Endocrine System - 13%, Exocrine Glands - 10%
150 LG:7685287.118:2002JAN18 Hemic and Immune System - 1 1 %
151 LG:405272.4:2002JAN18 Hemic and Immune System - 32%, Male Genitalia - 18%, Respiratory System - 13%, Liver - 13%
152 LG:247382.7:2002JAN18 Urinary Tract - 17%, Exocrine Glands - 14%, Connective Tissue - 10%
153 LG:7763403.34:2002JAN18 Germ Cells - 25%, Sense Organs - 13%, Embryonic Structures - 12%
154 LG:258352.1 :2002JAN18 Endocrine System - 40%, Liver - 35%, Nervous System - 20%
155 LG:109671 1.3:2002JAN18 Digestive System - 67%, Nervous System - 33%
156 LG:7761740.1 :2002JAN18 Cardiovascular System - 44%, Unclassified/Mixed - 31%, Digestive System - 13%, Nervous System -
13%
157 LG:1382987.89:2002JAN18 Embryonic Structures - 33%, Respiratory System - 21%, Endocrine System - 12%
158 LG:444673.50:2002JAN18 Connective' Tissue - 10%, Cardiovascular System - 10%
159 LG:7767853.1 :2002JAN18 Cardiovascular System - 100%
160 LG:1375802.70:2002JAN18 widely distributed
161 LG:414732.1 :2002JAN18 Endocrine System - 67%, Digestive System - 17%, Nervous System - 17%
162 LG:1328394.25:2002JAN18 Sense Organs - 26%, Urinary Tract - 11%, Female Genitalia - 10%
163 LG:336953.5:2002JAN18 Connective Tissue - 19%, Sense Organs - 10%
164 LG:7697931.25:2002JAN18 Skin - 23%, Embryonic Structures - 14%
165 LG:300147.58:2002JAN18 Female Genitalia - 10%, Skin - 10%
166 LG:7763115.9:2002JAN18 Germ Cells - 29%, Nervous System - 20%, Embryonic Structures - 13%
167 LG:7693875.4:2002JAN18 Embryonic Structures - 13%, Endocrine System - 12%
168 LG:089516.22:2002JAN18 Urinary Tract - 56%, Liver - 44%
169 LG:336671.1 :2002JAN18 Germ Cells - 30%, Hemic and Immune System - 23%, Connective Tissue - 10%, Respiratory System ■
10%
170 LG:234504.1 1 :2002JAN18 Unclassified/Mixed - 13%, Embryonic Structures - 1 1 %
171 LG:1018931.3:2002JAN18 Digestive System - 67%, Nervous System - 33%
172 LG:1377369.45:2002JAN18 Embryonic Structures - 13%, Liver - 10%
173 LG: 1135404.113:2002J AN 18 widely distributed
174 LG:1452606.33:2002JAN18 Connective Tissue - 44%
175 LG:018099.22:2002JAN18 Sense Organs - 15%
176 LG:7771625.8:2002JAN18 Hemic and Immune System - 100%
177 LG:1513012.6:2002JAN18 Nervous System - 30%, Unclassified/Mixed - 13%
TABLE 6
SEQ ID Template ID Tissue Distribution NO:
178 LG:903956.34:2002JAN 18 Germ Cells - 24%
179 LG:331 171.22:2002JAN18 Germ Cells - 14%, Endocrine System - 12%, Sense Organs - 11%
180 LG:380305.28:2002JAN18 Embryonic Structures - 54%, Digestive System - 25%, Urinary Tract - 11 %
181 LG:227928.19:2002JAN18 Embryonic Structures - 64%, Endocrine System - 36%
182 LG: 1099593.39:2002JAN18 Musculoskeletal System - 46%, Female Genitalia - 31%, Male Genitalia - 15%
183 LG:1501223.87:2002JAN18 Liver - 14%, Embryonic Structures - 10%
184 LG: 7690039.1 :2002JAN 18 Endocrine System - 57%, Respiratory System - 14%, Female Genitalia - 14%, Nervous System - 14%
185 LG:332701.3:2002JAN18 Unclassified/Mixed - 16%
186 LG:237963.28:2002JAN18 Skin - 24%, Hemic and Immune System - 16%, Respiratory System - 14%
187 LG:245267.1 :2002JAN 18 Embryonic Structures - 91 %
188 LG7761954.21 :2002JAN18 Exocrine Glands - 58%, Nervous System - 42%
OJ Ux
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
OJ • -
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
209 2 56 347 514 gό690227 209 1.0E-13 2 56 PRO0478 347 514 g186774 4.0E-10 209 2 56 zinc finger protein 347 514 g21336190 211 6.0E-10 2 186 392 949 unnamed protein product g18650590 211 4.0E-91 2 186 archease 392 949 g12841926 4.0E-91 data source:SPTR, source key:Q9VD92,
211 186 392 949 evidence:ISS~putative~related to CG6353 PROTEIN g12840887 4.0E-91 data source:SPTR, source key:Q9VD92,
212 214 79 720 g59849 evidence:ISS~putative~related to CG6353 PROTEIN 212 3.0E-06 214 79 720 g59571 IE! 75 (ICP4) transcriptional activator (AA 1 - 1298) 212 3.0E-06 RSI 214 79 720 g59558 213 3.0E-06 RSI 128 388 771 g24270820 213 8.0E-41 128 388 chromosome 21 open reading frame 63
771 gl 7529691 213 8.0E-41 128 388 C21orfό3 isoform A protein
771 g!2855175 3.0E-31 data source:SPTR, source key:Q9IB51, evidence:ISS~putative~related
215 2 107 569 889 to RHAMNOSE BINDING LECTIN STL3 g5726235
216 2.0E-19
2 129 485 unknown protein U5/2
871 g14189960
216 1.0E-25
2 129 PRO0764 485 871 g21104464
216 3.0E-24
2 129 OK/SW-CL.41 485 871 g11493463
218 2.0E-21
2 177 PR02852
2 532 g18204982
218 3.0E-72
2 177 2 532 Similar to hypothetical protein FLJ20897 g14043783
218 2.0E-37
2 177 2 532 Unknown (protein for MGC: 14256) g10436857
219 2.0E-37
1 135 76 unnamed protein product
480 g21740177
219 3.0E-56
1 135 76 hypothetical protein
480 g18676572
219 3.0E-56
1 135 76 FLJ00183 protein
480 g16740957
221 3 3.0E-56 96 129 416 Unknown (protein for MGC:23944) g9929935
221 3 1.0E-16 96 129 hypothetical protein
416 g7959267
221 3.0E-16
3 96 129 KIAA 1503 protein
416 g684112ό
223 3 7.0E-15 352 HSPC093
162 1217 g7239698
223 0.0
3 352 162 1217 myosin light chain kinase isoform 2 g723969ό 0.0 myosin light chain kinase
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
223 3 352 162 1217 g!2597190 0.0 225 1 150 79 myosin light chain kinase
528 g!81268 225 2.0E-66
1 150 cellular yes-1 protein 79 628- gl 2539401 225 2.0E-66 150 79 protein tyrosine kinase c-Yes
528 g50624 227 2.0E-66 c-yes 107 1 321 g4426595 1.0E-60 multifunctional calcium/calmodulin-dependent protein kinase II
227 1 107 delta2 isoform
321 g3088551 227 1.0E-60 1 107 1 calcium/calmodulin-dependent protein kinase II delta
321 g2161922ό 230 1.0E-60 3 356 3 Unknown (protein for MGC4491 1)
1070 g897982ό 1.0E-138 dJ776F14.2 (a novel protein member of the PTPNS (protein tyrosine
230 3 356 3 phosphatase,non-receptor type substrate) family)
1070 g23620474
230 1.OE-21 3 356 3 1070 g2052056 Similar to protein tyrosine phosphatase, non-receptor type substrate 1
231 1.0E-21 2 306 SIRP-alphal
OJ 1049 1966 g20306649
231 1.0E-154
00 2 306 1049 Unknown (protein for MGC:26388)
1966 g94491 1
231 1.0E-152 2 306 1049 inositol polyphosphate 4-phosphatase
1966 g2232031
232 1.0E-152 3 252 774 inositol polyphosphate 4-phosphatase type l-beta
1529 g 14043783
232 1.0E-1 15 3 252 774 Unknown (protein for MGC: 14256)
1529 g 10436857
232 1.0E-1 15 3 252 774 unnamed protein product
1529 g 15215451 1.0E-1 12 eukaryotic translation elongation factor 1 delta (guanine nucleotide
233 3 402 99 exchange protein)
1304 g22713443 0.0
233 3 402 99 Unknown (protein for MGC:33365)
1304 g7291385
233 4.0E-06 3 402 CG1 1 170-PA
99 1304 g21626538
235 4.0E-06 1 262 CG1 1 170-PB
1 786 g5410312
235 1 1.0E-1 10 262 1 786 AMP-activated kinase alpha 1 subunit g41 15829
235 1 l .OE-1 10 262 1 786 AMP-activated protein kinase alpha-! g 1 155267
237 1.0E-108
3 78 3 236 g8250239 5'-AMP-activated protein kinase alpha-! catalytic subunit
241 9.0E-39 2 195 893 1477 protein phosphatase 4 regulatory subunit 2 g771 101 1
241 2.0E-76 2 195 893 hypothetical protein
1477 g7022480
241 2.0E-76 2 195 893 unnamed protein product
1477 gό572185
242 3 2.0E-76 906 174 2891 dJ127B20.1 (Rho GTPase activating protein 8) g5762315 0.0 nuclear factor associated with dsRNA NFAR-2
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
242 3 906 174 2891 g12746296 0.0
242 3 906 1 10 kDa NFAR protein
174 2891 g6911241 0.0
243 3 136 810 interleukin enhancer binding factor 3
1217 g6781344 8.0E-25
243 3 136 unnamed protein product
810 1217 g6781325 8.0E-25
243 3 136 unnamed protein product
810 1217 g3850050 8.0E-25
244 3 912 transcription factor
993 3728 g21740339 l .OE-l lό
244 3 912 hypothetical protein
993 3728 g33969 1.0E-1 12
244 IRLB
3 912 993 3728 g7022072
246 1.0E-105
3 231 3 unnamed protein product
695 g758634 2.0E-80
246 3 statδa
231 3 695 g747972 2.0E-80
246 3 231 mammary gland factor
3 695 g18087727 2.0E-80
247 1 473 49 transcription factor Stat5a
1467 g10443241 1.0E-154
247 1 473 iroquois homeobox protein 6 49 1467 gό689882 1.0E-135
247 1 473 49 iroquois homeobox protein 4
1467 gό689880
248 1.0E-129
2 150 122 iroquois homeobox protein 4
571 g4519621 2.0E-10
248 2 OASIS protein
150 122 571 g16741210 2.0E-10
248 2 150 122 571 old astrocyte specifically induced substance g21668502 7.0E-10
250 1 OASIS
245 1 735 g21753199 1.0E-152
250 1 245 1 unnamed protein product
735 g21439322 1.0E-93
250 1 245 unnamed protein product 1 735 g21753218 8.0E-93
251 3 221 3 unnamed protein product
665 g5441615 7.0E-84
251 3 221 zinc finger protein 3 665 g16549180
251 2.0E-83
3 221 3 unnamed protein product
665 g21753345
252 1.0E-82
2 129 182 unnamed protein product
568 g2810991 4.0E-10
252 2 129 182 KRAB-zinc finger protein KZF-1
568 g13938639
252 9.0E-10
2 129 182 Unknown (protein for MGC:6654)
568 g21754017 1.0E-08
253 3 375 54 unnamed protein product
1178 g488551
253 1.0E-1 18
3 375 54 zinc finger protein ZNF132
1178 g13543419
253 1.0E-1 18
3 375 54 Similar to zinc finger protein 304
1178 gl 199604
254 1.0E-1 17
482 1 zinc finger protein C2H2-25
1446 g2689442 0.0 R28830J
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation
NO: Score
254 482 1446 g 1572600 0.0 Zikl
254 482 1446 g 12652759 1.0E-175 hypothetical protein FLJ20557
255 500 1500 g 16549907 0.0 unnamed protein product
255 500 1500 g21740366 1.0E-163 hypothetical protein
255 500 1500 g3289985 1.0E-160 KIAA0412
256 2 166 23 520 g21751975 9.0E-67 unnamed protein product
256 2 166 23 520 g2970038 l .OE-66 HKL1
256 2 166 23 520 g4164083 2.0E-66 zinc finger protein EZNF
257 1 81 580 822 g7021902 ό.OE-08 unnamed protein product
257 1 81 580 822 g 14250004 ό.OE-08 Similar to cAMP responsive element binding protein-like 1
257 1 81 580 822 g7023820 2.0E-07 unnamed protein product
258 3 88 3 266 g21754891 2.0E-30 unnamed protein product
258 3 88 3 266 g21750657 2.0E-29 J unnamed protein product Ui o 258 3 88 3 266 g 12652727 2.0E-29 Unknown (protein for IMAGE:3352566)
259 3 642 210 2135 g21750777 0.0 unnamed protein product
259 3 642 210 2135 g2689441 0.0 F18547J
259 3 642 210 2135 g 18916783 0.0 KIAA 1956 protein
260 2 758 23 2296 g 13752754 0.0 zinc finger 1 1 1 1
260 2 758 23 2296 g 10047297 0.0 KIAAlόl l protein
260 2 758 23 2296 g 10440398 0.0 FLJ00032 protein
261 2 390 80 1249 g21751975 1.0E-139 unnamed protein product
261 2 390 80 1249 g21753345 6.0E-99 unnamed protein product
261 2 390 80 1249 g21439194 2.0E-98 unnamed protein product
262 3 197 249 839 g21635626 l .OE-51 zinc finger protein KID3
262 3 197 249 839 g7012690 4.0E-35 KRAB-zinc finger protein KID3
262 3 197 249 839 g 1 1935160 1.0E-32 AJ 18 protein
263 2 91 584 856 g23272677 2.0E-36 Similar to zinc finger protein 208
263 2 91 584 856 g21740125 4.0E-36 hypothetical protein
263 2 91 584 856 g 16551398 4.0E-36 unnamed protein product
265 2 680 23 2062 g 16553661 0.0 unnamed protein product
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
265 2 680 23 2062 g4164083 0.0
265 2 680 zinc finger protein EZNF 23 2062 g2970038 0.0
266 2 HKL1 76 236 463 g7023216 3.0E-24
266 2 76 unnamed protein product 236 463 g21757193 ό.OE-23
266 2 76 236 unnamed protein product
463 g16878329 5.0E-22
267 3 101 201 Unknown (protein for MGC:29628)
503 g21758286 9.0E-43
267 3 101 unnamed protein product 201 503 g5262560 5.0E-29
267 3 101 hypothetical protein 201 503 g224771όl 5.0E-29
268 3 305 2187 3101 Unknown (protein for IMAGE:4846514) g21693025 0.0
268 3 305 zinc finger protein 37A
2187 3101 g16198398 0.0
268 3 305 2187 Unknown (protein for MGC:27353)
3101 g9801232 1.0E-152
269 3 226 24 701 bA508N22.2 (zinc finger protein 37a (KOX 21)) g22760639 1.0E-82
269 3 226 24 unnamed protein product
OJ 701 g15559282 1.OE-82
269 3 226 24 Unknown (protein for MGC:20208)
701 g12483904 9.0E-74
270 2 150 zinc finger protein HIT-39 155 604 g21759819 3.0E-57
270 2 150 155 604 similar to hypothetical protein FU 13659 g10435738
270 5.0E-33
2 150 155 unnamed protein product
604 g13623587 5.0E-32
271 1 1 14 97 Similar to zinc finger protein 254
438 g18916833 6.0E-38
271 1 1 14 KIAA1969 protein 97 438 g23272677
271 1.0E-37
1 1 14 97 Similar to zinc finger protein 208
438 g21759819
272 4.0E-30
2 140 542 961 similar to hypothetical protein FLJ 13659 g21740125
272 7.0E-82
2 140 542 hypothetical protein
961 g16550881
272 7.0E-82
2 140 542 unnamed protein product
961 g18916833 2.0E-70
273 1 195 KIAA 1969 protein 13 597 g21757193
273 2.0E-88
1 195 13 unnamed protein product
597 g21336362
273 3.0E-67
1 195 13 unnamed protein product
597 g7023216 2.0E-58
274 3 108 3 unnamed protein product
326 g21314977 5.0E-51
274 3 108 3 326 Similar to zinc finger protein 17 (HPF3, KOX 10) g16553223
274 5.0E-51 3 108 3 unnamed protein product
326 g21758142
275 7.0E-36
1 433 46 unnamed protein product 1344 g15559577 0.0 Unknown (protein for MGC:20707)
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
OJ Ux to
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation NO: Score
289 1 199 772 1368 g 13929457 2.0E-96 dJ337018.2.2 (Lysosomal Protective Protein precursor (EC 3.4.16.5,
Cathepsin A, Carboxypeptidase C) (isoform 2))
290 3 114 465 806 g30309 5.0E-20 cyclophilin (AA 1-165)
290 3 114 465 806 g30168 5.0E-20 peptidylprolyi isomerase
290 3 114 465 806 g2565303 5.0E-20 cyclophilin A
291 3 397 3 1193 gόO13463 8.0E-87 carboxypeptidase homolog
291 3 397 3 1193 g2304669 2.0E-85 unnamed protein product
291 3 397 3 1193 g9558448 3.0E-85 carboxypeptidase R
292 1 213 1 639 g!9171180 l.OE-100 metalloprotease disintegrin 17, with thrombospondin domains
292 1 213 1 639 g!9171152 1.0E-68 ADAMTS-19
292 1 213 1 639 g13927863 1.0E-17 unnamed protein product
293 3 168 1488 1991 g12002207 7.0E-98 chymotrypsin-like protein
293 3 168 1488 1991 gό581056 5.0E-95 CHORD containing protein-1
OJ Ux J 293 3 168 1488 1991 g17390873 7.0E-92 RIKEN cDNA 1 1 10001009 gene
294 2 705 1529 3643 g6002686 0.0 histone acetyltransferase MORF
294 2 705 1529 3643 g20521021 0.0 KIAA0383
294 2 705 1529 3643 g17025966 0.0 histone acetyltransferase MORF
296 3 67 147 347 g517065 5.0E-08 chaperonin-like protein
296 3 67 147 347 g184462 5.0E-08 chaperonin-Iike protein
296 3 67 147 347 gl4517632 5.0E-08 acute morphine dependence related protein 2
297 3 89 303 569 g21754422 1.0E-15 unnamed protein product
300 3 784 57 2408 g20521960 0.0 KIAA1662 protein
300 3 784 57 2408 g22945215 1.0E-36 CG3047-PA
300 3 784 57 2408 g23096125 7.0E-32 CG6004-PB
302 1 271 346 1158 g283259ό 1.0E-142 dJ434P1.3 (DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17 (72kD))
302 1 271 346 1158 g1592565 1.0E-142 DEAD-box protein p72
302 1 271 346 1158 g12653635 1.0E-142 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17 (72kD)
303 2 424 2 1273 g7243033 0.0 KIAA1326 protein
303 2 424 2 1273 g15054519 0.0 protocadherin-S
303 2 424 2 1273 gl3161063 0.0 protocadherin 11
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation NO: Score
304 1 367 1 1101 g818014 1.0E-162 laminin C-terminal fragment
304 1 367 1 1101 g309420 1.0E-162 laminin A chain
304 1 367 1 1 101 - gl98697 1.0E-162 laminin A-chain
305 1 953 1021 3879 g5456922 0.0 protocadherin alpha 9 305 1 953 1021 3879 g12274866 0.0 protocadherin alpha 9 305 1 953 1021 3879 g5456918 0.0 protocadherin alpha 7 307 3 522 108 1673 g5457037 0.0 protocadherin beta 15 307 3 522 108 1673 g 14009449 0.0 protocadherin-beta!5 307 3 522 108 1673 g24416520 0.0 protocadherin beta 15 309 1 362 1 1086 g9971112 0.0 MHC class I antigen 309 1 362 1 1086 g21686607 0.0 MHC class lb antigen 309 1 362 1 1086 g21686603 0.0 MHC class lb antigen
OJ 312 1 171 208 720 g21669375 1.0E-64 immunoglobulin kappa light chain VLJ region
Ol 312 1 171 208 720 g21669383 3.0E-64 immunoglobulin kappa light chain VLJ region
312 1 171 208 720 g21669381 3.0E-64 immunoglobulin kappa light chain VLJ region
313 3 241 3 725 g2765423 1.0E-120 immunoglobulin kappa light chain
313 3 241 3 725 g21707884 1.0E-1 19 similar to anti TNF-alpha antibody light-chain Fab fragment
313 3 241 3 725 gl 7645750 1.0E-118 unnamed protein product
316 2 268 11 814 g22761128 4.0E-06 unnamed protein product
316 2 268 11 814 g22761091 4.0E-06 unnamed protein product
316 2 268 11 814 g22760919 4.0E-06 unnamed protein product
317 2 251 533 1285 gόό36436 1.0E-134 isovaleryl dehydrogenase
317 2 251 533 1285 g306897 1.0E-134 isovaleryl-coA dehydrogenase (IVD)
317 2 251 533 1285 gl 6877964 1.0E-134 isovaleryl Coenzyme A dehydrogenase
318 1 244 1 732 g4103158 1.0E-74 hair keratin acidic 5; Ha5 keratin 318 1 244 1 732 g3724107 7.0E-74 keratin, type I
318 1 244 1 732 glό68744 7.0E-74 HHa5 hair keratin type I intermediate filament
320 2 175 2 526 gl 183937 2.0E-94 gamma-fibrinogen
320 2 175 2 526 g 18044708 3.0E-80 Similar to Fibrinogen, gamma polypeptide
320 2 175 2 526 g 18043449 3.0E-80 Similar to Fibrinogen, gamma polypeptide
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation NO: Score
321 1 860 856 3435 g14017955 0.0 KIAA1869 protein
321 1 860 856 3435 g12081909 0.0 semaphorin Y
321 1 860 856 3435 g18462030 0.0 semaphorin Y short isoform 1
322 2 119 80 436 g23451052 7.0E-50 WNT7B
322 2 119 80 436 g22028435 7.0E-50 wingless-type MMTV integration site family, member 7B
322 2 119 80 436 g202412 7.0E-50 Wnt-7b
325 2 619 206 2062 g1905916 0.0 amyloid precursor-like protein 1
325 2 619 206 2062 g15277602 0.0 Similar to amyloid beta (A4) precursor-like protein 1
325 2 619 206 2062 g1709301 0.0 amyloid precursor-like protein 1
326 1 242 166 891 g6572247 7.0E-91 dJ466Nl .4 (novel protein similar to ANK3 (ankyrin 3, node of Ranvier
(ankyrin G)))
326 1 242 166 891 g15779153 7.0E-91 Unknown (protein for MGC: 15484)
OJ 326 1 242 166 891 g15928539 1.0E-89 Unknown (protein for MGQ25673) Ux Ux 327 3 365 3 1097 g22477764 0.0 Unknown (protein for IMAGE:5274813)
327 3 365 3 1097 g23271107 1.0E-124 Unknown (protein for MGC:30834)
327 3 365 3 1097 g7270588 5.0E-79 actin interacting protein
328 3 546 831 2468 g7242943 0.0 KIAA1294 protein
328 3 546 831 2468 g11022657 2.0E-27 Golgi-associated band 4.1 -like protein
328 3 546 831 2468 g21619448 7.0E-27 Similar to GRP1 binding protein GRSP1
329 2 136 983 1390 g2282036 2.0E-64 p34-Arc
329 2 136 983 1390 gl531594 2.0E-64 unknown
329 2 136 983 1390 g12653625 2.0E-64 actin related protein 2/3 complex, subunit 2 (34 kD)
330 1 198 928 1521 g21740094 1.0E-118 hypothetical protein
330 *1 198 928 1521 g21758365 l.OE-110 unnamed protein product
332 3 255 891 1655 g9653293 5.0E-07 tropomyosin 5; TM-5
332 3 255 891 1655 g9508585 5.0E-07 tropomyosin isoform
332 3 255 891 1655 g553799 5.0E-07 cytoskeletal tropomyosin isoform
333 3 370 93 1202 g2352947 1.0E-135 smooth muscle myosin heavy chain SMI
333 3 370 93 1202 g2104553 1.0E-135 Myosin heavy chain (MHY11) (5'partial)
333 3 370 93 1202 g36507 1.0E-134 smooth muscle mysosin heavy chain
TABLE 7
SEQ ID Frame Length Start Stop Gl Number NO: Probability Annotation Score
334 2 617 8 1858
334 g21961605 2 0.0 617 8 1858 g28317 0.0 keratin 10 (epidermolytic hyperkeratosis; keratosis palmaris et
334 2 617 8 1858- unnamed protein product gό23409
335 2 0.0 82 407 keratin 10 652
335 g21619978 2 2.0E-16 82 407 652 Similar to phosphoglucomutase 5 g1160965
335 2 82 2.0E-16 407 652 g841250 phosphoglucomutase-related protein
338 2 7.0E-10 239 1352 2068 phosphoglucomutase g309090
338 2 239 3.0E-74 1352 A-X actin 2068 g10442727
338 2 ό.OE-74 239 1352 beta-actin 2068
340 g9049272
2 8.0E-74 740 2 2221 beta actin
340 g16550758
2 0.0 740 2 2221 unnamed protein product g4240293
340' 2 740 1.0E-165 2 2221 KIAA0902 protein
341 g4151807
3 251 1.0E-165
OJ Ux 84 836 g2398657 membrane-associated guanylate kinase-interacting protein 2
ON 341' 3 2.0E-26 251 84 836 translocon-associated protein delta subunit precursor
341 g1673433
3 251 2.0E-26 84 836 translocon-associated protein delta subunit precursor
342 g15929882
1 2.0E-26 775 1147 3471 g769848 signal sequence receptor, delta (translocon-associated protein
342 1 0.0 775 1147 3471 g507744 vesicular acetylcholine transporter
342 1 0.0 775 1147 3471 g14043571 vesicular acetylcholine transporter
344 3 0.0 91 114 386 g34199 solute carrier family 18 (vesicular acetylcholine), member 3
344 3 91 3.0E-40
114 386 g21336122 putative ribosomal protein (AA 1-184)
344 3 3.0E-40 91 114 386 unnamed protein product
345 g17932942
1 3.0E-40 548 1 1644 ribosomal protein LI 7
347 g16945892
3 0.0 66 138 335 chromosome 17 open reading frame 27
347 g21751592
3 4.0E-07 66 138 335 unnamed protein product
347 g6690229
3 66 1.0E-06
138 335 PRO0483
348 g16549456
2 151 2.0E-06 2 454 g57690 unnamed protein product
348 2 2.0E-71 151 2 454 ribosomal protein L23a
348 g2739452
2 151 2.0E-71
2 454 ribosomal protein L23A 349 g20987630
1 2.0E-71 153 199 657 ribosomal protein L23a 349 g19684082
1 4.0E-83 153 199 657 similar to glucuronidase, beta g18028950 2.0E-36 beta glucuronidase
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation NO: Score
349 1 153 199 657 g183233 2.0E-34 beta-glucuronidase precursor (EC 3.2.1.31)
350 3 428 1197 2480 g15530220 0.0 Unknown (protein for MGC:11 192)
350 3 428 1197 2480 g7022371 0.0 unnamed protein product
350 3 428 1197 2480 g17390870 0.0 Unknown (protein for MGC: 12075)
351 1 751 280 2532 g577625 0.0 holocarboxylase synthetase
351 1 751 280 2532 g15823777 0.0 holocarboxylase synthetase
351 1 751 280 2532 g!813424 0.0 HCS
352 3 90 531 800 g14249959 5.0E-34 heterogeneous nuclear ribonucleoprotein C (C1/C2)
352 3 90 531 800 g14250048 7.0E-34 heterogeneous nuclear ribonucleoprotein C (C1 /C2)
352 3 90 531 800 g13937888 7.0E-34 Similar to heterogeneous nuclear ribonucleoprotein C
354 2 320 581 1540 g8671586 1.0E-166 ataxin 2-binding protein
354 2 320 581 1540 g702204ό 1.0E-166 unnamed protein product
354 2 320 581 1540 g7670456 1.0E-165 unnamed protein product
OJ Ux
-o 355 2 753 227 2485 g710535 0.0 galσctocerebrosidase
355 2 753 227 2485 g22328079 0.0 galactosylceramidase (Krabbe disease)
355 2 753 227 2485 g457444 0.0 galactocerebrosidase
356 3 210 183 812 g21740179 1.0E-82 hypothetical protein
356 3 210 183 812 g21707394 1.OE-82 Unknown (protein for IMAGE:49971 15)
356 3 210 183 812 g20531145 1.OE-82 140 kDa estrogen receptor associated protein
357 2 187 53 613 g9968296 1.OE-91 inducible T-cell co-stimulator
357 2 187 53 613 g7963650 1.OE-91 inducible costimulator precursor
357 2 187 53 613 g5360719 1,OE-91 activation-inducible lymphocyte immunomediatory molecule AILIM
359 1 113 85 423 g1932801 5.0E-47 synaptotagmin X
359 1 113 85 423 g6136792 7.0E-47 synaptotagmin X
359 1 113 85 423 g14210268 7.0E-47 synaptotagmin 10
360 2 154 83 544 g12005726 3.0E-53 DC21
360 2 154 83 544 g338392 2.0E-34 spermidine/spermine N 1 -acetyltransferase
360 2 154 83 544 g338336 2.0E-34 spermidine/spermine N 1 -acetyltransferase
361 1 253 1 759 g18146614 1.0E-136 eukaryotic initiation factor 4B
361 1 253 1 759 g288100 1.0E-135 initation factor 4B
TABLE 7
SEQ ID Frame Length Start Stop Gl Number Probability Annotation NO: Score
361 253 1 759 g!3938112 1.0E-120 Unknown (protein for MGQ7530)
365 317 1 951 g21754808 1.0E-157 unnamed protein product
365 317 1 951 g11065991 1.0E-151 neuronal calcium binding protein NECAB1
365 317 1 951 g11065993 8.0E-99 neuronal calcium binding protein NECAB1
366 189 256 822 g585297ό 3.0E-68 NUMB isoform 4
366 189 256 822 g5852974 3.0E-68 NUMB isoform 3
366 189 256 822 g5852972 3.0E-68 NUMB isoform 2
367 2 246 2 739 g18157547 1.0E-128 pecanex-like 3
367 2 246 2 739 gό650377 1.0E-18 pecanex 1
367 2 246 2 739 g12852202 1.0E-18 data source: MGD, source key:MGI: 1891924, evidence: ISS~pecanex homolog (Drosophila)-putative
372 2 138 2 415 g7959193 3.0E-27 KIAA1466 protein
OJ 372 2 138 2 415 g332612 1.0E-25 pol polyprotein on
00 372 2 138 2 415 g3033416 3.0E-24 pol polyprotein
374 3 552 357 2012 gό708478 0.0 formin-like protein
374 3 552 357 2012 g4101720 0.0 lymphocyte specific formin related protein
374 3 552 357 2012 g19851921 0.0 CLL-associated antigen KW-13
TABLE 8
Program Description Reference Parameter Threshold ABIFACTURA A program that removes vector sequences and masks Applied Biosystems, Foster City, CA. ambiguous bases in nucleic acid sequences.
ABI/PARACEL A Fast Data Finder useful in comparing and annotating Applied Biosystems, Foster City, CA; Paracel Mismatch <50%
PDF amino acid or nucleic acid sequences. Inc., Pasadena, CA.
ABI A program that assembles nucleic acid sequences. Applied Biosystems, Foster City, CA.
AutoAssembler
BLAST A Basic Local Alignment Search Tool useful in sequence Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403- ESTs: Probability value= 1.0E-8 or similarity search for amino acid and nucleic acid 410; Altschul, S.F. et al. (1997) Nucleic Acids less; Full Length sequences: Probability sequences. BLAST includes five functions: blastp, Res. 25:3389-3402. value= l.OE-10 or less blastn, blastx, tblastn, and tblastx.
FASTA A Pearson and Lipman algorithm that searches for Pearson, W.R. and DJ. Lipman (1988) Proc. ESTs: fasta E value=1.06E-6; similarity between a query sequence and a group of Natl. Acad Sci. USA 85:2444-2448; Pearson, Assembled ESTs: fasta Identity= 95% sequences of the same type. FASTA comprises as least W.R. (1990) Methods Enzymol. 183:63-98; and or greater and Match length=200 bases five functions: fasta, tfasta, fastx, tfastx, and ssearch. Smith, T.F. and M.S. Waterman (1981) Adv. or greater; fastx E value=1.0E-8 or less; J Appl. Math. 2:482-489. Full Length sequences: fastx score=100 on
NO or greater
BLIMPS A BLocks IMProved Searcher that matches a sequence Henikoff, S. and J.G. Henikoff (1991) Nucleic Probability value= 1.0E-3 or less against those in BLOCKS, PRINTS, DOMO, PRODOM, Acids Res. 19:6565-6572; Henikoff, J.G. and S. and PFAM databases to search for gene families, Henikoff (1996) Methods Enzymol. 266:88-105; sequence homology, and structural fingerprint regions. and Attwood, T.K. et al. (1997) J. Chem. Inf. Comput. Sci. 37:417-424.
HMMER An algorithm for searching a query sequence against Krogh, A. et al. (1994) J. Mol. Biol. 235:1501- PFAM hits: Probability value= 1.0E-3 hidden Markov model (HMM)-based databases of 1531; Sonnhammer, E.L.L. et al. (1988) Nucleic or less; protein family consensus sequences, such as PFAM. Acids Res.26:320-322; Durbin, R. et al. (1998) Signal peptide hits: Score= 0 or greater Our World View, in a Nutshell, Cambridge Univ. Press, pp. 1-350.
ProfileScan An algorithm that searches for structural and sequence Gribskov, M. et al. (1988) CABIOS 4:61-66; Normalized quality score≥GCG- motifs in protein sequences that match sequence patterns Gribskov, M. et al. (1989) Methods Enzymol. specified "HIGH" value for that defined in Prosite. 183:146-159; Bairoch, A. et al. (1997) Nucleic particular Prosite motif. Generally, Acids Res.25:217-221. score=1.4-2.1.
TABLE 8
Program Description Reference Parameter Threshold Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. 8:175-185; sequencer traces with high sensitivity and probability. Ewing, B. and P. Green,(1998) Genome Res.
8:186-194.
Phrap A Phils Revised Assembly Program including SWAT Smith, T.F. and M.S. Waterman (1981) Adv. Score= 120 or greater; and CrossMatch, programs based on efficient Appl. Math. 2:482-489; Smith, T.F. and M.S Match length= 56 or greater implementation of the Smith-Waterman algorithm, Waterman (1981) J. Mol. Biol. 147:195-197; useful in searching sequence homology and assembling and Green, P., University of Washington,
DNA sequences. Seattle, WA.
Consed A graphical tool for viewing and editing Phrap Gordon, D. et al. (1998) Genome Res. 8:195- assemblies. 202. SPScan A weight matrix analysis program that scans protein Nielson, H. et al. (1997) Protein Engineering Score=3.5 or greater sequences for the presence of secretory signal peptides. 10:1-6; Claverie, J.M. and S. Audic (1997)
CABIOS 12:431-439.
TMAP A program that uses weight matrices to delineate Persson, B. and P. Argos (1994) J. Mol. Biol.
OJ transmembrane segments on protein sequences and 237:182-192; Persson, B. and P. Argos (1996) O determine orientation. Protein Sci. 5:363-371.
TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segments on protein sequences Conf. On Intelligent Systems for Mol. Biol., and determine orientation. Glasgow et al., eds., The Am. Assoc. for
Artificial Intelligence (AAAI) Press, Menlo
Park, CA, and MIT Press, Cambridge, MA, pp.
175-182.
Motifs A program that searches amino acid sequences for Bairoch, A. et al. (1997) Nucleic Acids Res. patterns that matched those defined in Prosite. 25:217-221; Wisconsin Package Program
Manual, version 9, page M51-59, Genetics
Computer Group, Madison, WI.