EP1373306A2 - Cytoskeletion-associated proteins - Google Patents

Cytoskeletion-associated proteins

Info

Publication number
EP1373306A2
EP1373306A2 EP02757814A EP02757814A EP1373306A2 EP 1373306 A2 EP1373306 A2 EP 1373306A2 EP 02757814 A EP02757814 A EP 02757814A EP 02757814 A EP02757814 A EP 02757814A EP 1373306 A2 EP1373306 A2 EP 1373306A2
Authority
EP
European Patent Office
Prior art keywords
leu
glu
ser
ala
arg
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02757814A
Other languages
German (de)
French (fr)
Other versions
EP1373306A4 (en
Inventor
April J. A. Hafalia
Tom Y. Tang
Henry Yue
Farrah A. Khan
Craig H. Ison
Mariah R. Baughn
Bridget A. Warren
Brendan M. Duggan
Kavitha Thangavelu
Cynthia D. Honchell
Yalda Azimzai
Vicki S. Elliott
Neil Burford
Li Ding
Huibin Yue
Shanya Becha
Brooke M. Emerling
Thomas W. Richardson
Soo Yeun Lee
Olga Bandman
Preeti G. Lal
Sally Lee
Kimberly J. Gietzen
Narinder K. Chawla
Jennifer A. Griffin
Ernestine A. Lee
Anita Swarnakar
Huijun Z. Ring
Karen Anne Jones
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Incyte Corp
Original Assignee
Incyte Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics Inc filed Critical Incyte Genomics Inc
Publication of EP1373306A2 publication Critical patent/EP1373306A2/en
Publication of EP1373306A4 publication Critical patent/EP1373306A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P1/00Drugs for disorders of the alimentary tract or the digestive system
    • A61P1/16Drugs for disorders of the alimentary tract or the digestive system for liver or gallbladder disorders, e.g. hepatoprotective agents, cholagogues, litholytics
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P17/00Drugs for dermatological disorders
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P17/00Drugs for dermatological disorders
    • A61P17/06Antipsoriatics
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P19/00Drugs for skeletal disorders
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P19/00Drugs for skeletal disorders
    • A61P19/04Drugs for skeletal disorders for non-specific disorders of the connective tissue
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P21/00Drugs for disorders of the muscular or neuromuscular system
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P21/00Drugs for disorders of the muscular or neuromuscular system
    • A61P21/04Drugs for disorders of the muscular or neuromuscular system for myasthenia gravis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/02Drugs for disorders of the nervous system for peripheral neuropathies
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/08Antiepileptics; Anticonvulsants
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/14Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/14Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
    • A61P25/16Anti-Parkinson drugs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/28Drugs for disorders of the nervous system for treating neurodegenerative disorders of the central nervous system, e.g. nootropic agents, cognition enhancers, drugs for treating Alzheimer's disease or other forms of dementia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P27/00Drugs for disorders of the senses
    • A61P27/02Ophthalmic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/12Antivirals
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P43/00Drugs for specific purposes, not provided for in groups A61P1/00-A61P41/00
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P7/00Drugs for disorders of the blood or the extracellular fluid
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P9/00Drugs for disorders of the cardiovascular system
    • A61P9/10Drugs for disorders of the cardiovascular system for treating ischaemic or atherosclerotic diseases, e.g. antianginal drugs, coronary vasodilators, drugs for myocardial infarction, retinopathy, cerebrovascula insufficiency, renal arteriosclerosis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Definitions

  • This invention relates to nucleic acid and amino acid sequences of cytoskeleton-associated proteins and to the use of these sequences in the diagnosis, treatment, and prevention of cell proliferative disorders, viral infections, and neurological disorders, and in the assessment of the effects of exogenous compounds on the expression of nucleic acid and amino acid sequences of cytoskeleton-associated proteins.
  • Translocation of components within the cell is critical for maintaining cell structure and function.
  • Cellular components such as proteins and membrane-bound organelles are transported along well-defined routes to specific subcellular compartments.
  • Intracellular transport mechanisms utilize microtubules wliich are filamentous polymers that serve as tracks for directing the movement of molecules.
  • Molecular transport is driven by the microtubule-based motor proteins, kinesin and dynein. These proteins use the energy derived from ATP hydrolysis to power their movement unidirectionally along microtubules and to transport molecular cargo to specific destinations.
  • the cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, and movement.
  • the cytoskeleton supports the cell membrane and forms tracks along which organelles and other elements move in the cytosol.
  • the cytoskeleton is a dynamic structure that allows cells to adopt various shapes and to carry out directed movements.
  • Major cytoskeletal fibers include the "microtubules, the microfilaments, and the intermediate filaments.
  • Motor proteins including myosin, dynein, and kinesin, drive movement of or along the fibers.
  • the motor protein dyna in drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the cell membrane.
  • Microtubules cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell membrane that are necessary for sweeping materials across an epithelium and for swimming of sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along tracks of microtubules. For example, microtubules run through nerve cell axons, allowing bi- directional transport of materials and membrane vesicles between the cell body and the nerve terminal.
  • Microtubules are also critical to chromosomal movement during cell division. Both stable and short-lived populations of microtubules exist in the cell. Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a heterodimer of ⁇ - and ⁇ - tubulin, multiple isoforms of which exist. The hydrolysis of GTP is linked to the addition of tubulin subunits at the end of a microtubule. The subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule.
  • a microtubule is polarized, one end ringed with ⁇ -tubulin and the other with ⁇ -tubulin, and the two ends differ in their rates of assembly.
  • each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found.
  • Cilia and flagella contain doublet microtubules.
  • Microtubules grow from specialized structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules.
  • the basal body, the organizing center located at the base of a cilium or flagellum, contains one centriole.
  • Gamma tubulin present in the MTOC is important for nucleating the polymerization of ⁇ - and ⁇ - tubulin heterodimers but does not polymerize into microtubules.
  • the protein pericentrin is found in the MTOC and has a role in microtubule assembly.
  • Microtubule-associated proteins have roles in the assembly and stabilization of microtubules.
  • assembly MAPs can be identified in neurons as well as non-neuronal cells. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type II.
  • Type I MAPs which include MAPIA and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes.
  • Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neutralizes negatively charged tubulin, leading to stabilization of microtubules.
  • MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolytically processed to generate one heavy chain and one light chain.
  • LC3 Another light chain, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during cell proliferation (Mann, S.S. et al. (1994) J. Biol. Chem. 269:11492-11497).
  • Type II MAPs which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain.
  • MAP2a, MAP2b, and MAP2c are found only in dendrites
  • MAP4 is found in non-neuronal cells
  • Tau is found in axons and dendrites of nerve cells.
  • Alternative splicing of the Tau mRNA leads to the existence of multiple forms of Tau protein.
  • Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and familial frontotemporal dementia and Parkinsonism linked to chromosome 17.
  • the altered Tau phosphorylation leads to a collapse of the microtubule network and the formation of mtraneuronal Tau aggregates (Spillantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428- 433).
  • the cytoplasmic linker protein (CLIP-170) links endocytic vesicles to microtubules. CLIP- 170 may also link microtubule ends to actin cables, thus playing a role in directional cell movement (Goode, B.L. et al. (2000) Curr. Opin. Cell Biol. 12:63-71). CLIP-170 proteins contain two copies of the CAP-Gly domain, a conserved, glycine-rich domain of about 42 residues found in several cytoskeleton-associated proteins (Prosite PDOC00660 CAP-Gly domain signature).
  • STOP stable tubule only polypeptide
  • STOP stable tubule only polypeptide
  • STOP proteins function to stabilize the microtubular network. STOP proteins are associated with axonal microtubules, and are also abundant in neurons (Guillaud, L. et al. (1998) J. Cell Biol. 142: 167-179).
  • STOP proteins are necessary for normal neurite formation, and have been observed to stabilize microtubules, in vitro, against cold-, calcium-, or drug-induced dissassembly (Margolis, R.L. et al. (1990) EMBO 9:4095-502).
  • Microfilaments are vital to cell locomotion, cell shape, cell adhesion, cell division, and muscle contraction. Assembly and disassembly of the microfilaments allow cells to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intracellular protein in the eukaryotic cell. Human cells contain six isoforms of actin. The three ⁇ -actins are found in different kinds of muscle, nonmuscle ⁇ - actin and nonmuscle ⁇ -actin are found in nonmuscle cells, and another ⁇ -actin is found in intestinal smooth muscle cells.
  • G-actin the monomeric form of actin, polymerizes into polarized, helical F- actin filaments, accompanied by the hydrolysis of ATP to ADP.
  • Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine cell shape. These bundles and networks are connected to the cell membrane.
  • thin filaments containing actin slide past thick filaments containing the motor protein yosin during contraction.
  • a family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin-Associated Proteins
  • Actin-associated proteins have roles in cross-linking, severing, and stabilization of actin filaments and in sequestering actin monomers.
  • actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-linking proteins. These proteins have two actin-binding sites, one for each filament. Short cross-linking proteins promote bundle formation while longer, more flexible cross-linking proteins promote network formation.
  • Actin-interacting proteins (AJPs) participate in the regulation of actin filament organization.
  • Other actin-associated proteins such as TARA, a novel F-actin binding protem, function in a similar capacity by regulating actin cytoskeletal organization.
  • Calmodulin-like calcium- binding domains in actin cross-linking proteins allow calcium regulation of cross-linking.
  • Group I cross-linking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin.
  • Group ⁇ cross-linking proteins have a 7,000-MW actin-binding domain and include villin and dematin.
  • Group HI cross-linking proteins have pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and filamin.
  • Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends.
  • Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin.
  • Capping proteins can cap the ends of actin filaments, but cannot break filaments.
  • Capping proteins include CapZ and tropomodulin.
  • the proteins thymosin and profilin sequester actin monomers in the cytosol, allowing a pool of unpolymerized actin to exist.
  • Microtubule and actin filament networks cooperate in processes such as vesicle and organelle transport, cleavage furrow placement, directed cell migration, spindle rotation, and nuclear migration.
  • Microtubules and actin may coordinate to transport vesicles, organelles, and cell fate determinants, or transport may involve targeting and capture of microtubule ends at cortical actin sites.
  • These cytoskeletal systems may be bridged by myosin-kinesin complexes, myosin-CLIP170 complexes, formin-homology (FH) proteins, dynein, the dynactin complex, Kar9p, coronin, ERM proteins, and kelch repeat-containing proteins (for a review, see Goode, B.L.
  • the kelch repeat is a motif originally observed in the kelch protein, which is involved in formation of cytoplasmic bridges called ring canals. A variety of mammalian and other kelch family proteins have been identified. The kelch repeat domain is believed to mediate interaction with actin (Robinson, D.N. and L. Cooley (1997) J. Cell Biol. 138:799-810).
  • ADF/cofilins are a family of conserved 15-18 kDa actin-binding proteins that play a role in cytokinesis, endocytosis, and in development of embryonic tissues, as well as in tissue regeneration and in pathologies such as ischemia, oxidative or osmotic stress.
  • LEVI kinase 1 downregulates ADF (Carlier, M.F. et al. (1999) J. Biol. Chem. 274:33827-33830).
  • coronins are actin-binding proteins having a structure that contains five WD (Trp-Asp) repeats and is similar to the sequence of the ⁇ subunits of heterotrimeric G proteins. Dictyostelium mutants lacking coronin are impaired in all actin-mediated processes, including cell locomotion, cytokinesis, phagocytosis, and macropinocytosis. In human neutrophils, coronin 1 accumulates with F-actin around endocytic vesicles, suggesting an evolutionarily conserved role for coronin in endocytosis. Other coronin proteins have specific activities such as promotion of actin polymerization, actin crosslinking, and binding to microtubules.
  • LEVI is an acronym of three transcription factors, Lin-11, Isl-1, and Mec-3, in which the motif was first identified.
  • the LEVI domain is a double zinc-finger motif that mediates the protein-protein interactions of transcription factors, signaling, and cytoskeleton-associated proteins (Roof, D.J. et al.
  • the Frabin protein is another example of an actin-filament binding protein (Obaishi, H. et al.
  • Frabin FGDl-related F-actin-bindmg protein
  • FAB actin-filament binding
  • DH Dbl homology
  • PH pleckstrin homology
  • Frabin has shown GDP/GTP exchange activity for Cdc42 small G protein (Cdc42), and indirectly induces activation of Rac small G protein (Rac) in intact cells.
  • Rho family of small GTP-binding proteins are important regulators of actin-dependent cell functions including cell shape change, adhesion, and motility.
  • the Rho family consists of three major subfamilies: Cdc42, Rac, and Rho. Rho family members cycle between GDP-bound inactive and GTP-bound active forms by means of a GDP/GTP exchange factor (GEF) (Umikawa, M. et al.
  • GEF GDP/GTP exchange factor
  • Rho GEF family is crucial for microfilament organization.
  • Intermediate filaments are cytoskeletal fibers with a diameter of about 10 nm, intermediate between that of microfilaments and microtubules.
  • IFs serve structural roles in the cell, reinforcing cells and organizing cells into tissues.
  • IFs are particularly abundant in epidermal cells and in neurons.
  • IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in cell motility.
  • Type I and Type II proteins are the acidic and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin IFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithelia such as nails and hair, and in epithelia that line internal body cavities.
  • Type IH IF proteins include desmin, glial fibrillary acidic protein, vimentin, and peripherin.
  • Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting muscle.
  • Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and astrocytes.
  • Vimentin filaments are found in blood vessel endothelial cells, some epithelial cells, and mesenchymal cells such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organelles in place in the cell.
  • Type IV IFs include the neurofilaments and nestin.
  • Neurofilaments composed of three polypeptides, NF-L, NF- M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metabolism of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (Mien, J.P. and Mushynski, W.E. (1998) Prog. Nucleic Acid Res. Mol. Biol. 61: 1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane.
  • EFs have a central ⁇ -helical rod region interrupted by short nonhelical linker segments.
  • the rod region is bracketed, in most cases, by non-helical head and tail domains.
  • the rod regions of intermediate filament proteins associate to form a coiled-coil dimer.
  • a highly ordered assembly process leads from the dimers to the EFs.
  • ATP ATP nor GTP is needed for IF assembly, unlike that of microfilaments and microtubules.
  • DF-associated proteins IFAPs
  • EFAPs cross-link EFs into a bundle, into a network, or to the plasma membrane, and may cross-link EFs to the microfilament and microtubule cytoskeleton.
  • IFAPs include BPAG1, plakoglobin, desmoplakin I, desmoplakin II, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
  • Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining cell shape and for muscle contraction.
  • the spectrin-actin cytoskeleton is attached to the cell membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormally shaped cells which are more rapidly degraded by the spleen, leading to anemia.
  • the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin.
  • the protein dystrophin links actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy.
  • Focal adhesions are specialized structures in the plasma membrane involved in the adhesion of a cell to a substrate, such as the extracellular matrix (ECM). Focal adhesions form the connection between an extracellular substrate and the cytoskeleton, and affect such functions as cell shape, cell motility and cell proliferation.
  • Transmembrane integrin molecules form the basis of focal adhesions. Upon ligand binding, integrins cluster in the plane of the plasma membrane. Cytoskeletal linker proteins such as the actin binding proteins -actinin, talin, tensin, vinculin, paxillin, and filamin are recruited to the clustering site.
  • integrins mediate aggregation of protein complexes on both the cytosolic and extracellular faces of the plasma membrane, leading to the assembly of the focal adhesion.
  • Many signal transduction responses are mediated via various adhesion complex proteins, including Src, FAK, paxillin, and tensin.
  • DFs are also attached to membranes by cytoskeletal-membrane anchors.
  • the nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor.
  • Vimentin EFs are attached to the plasma membrane by ankyrin and plectin.
  • Desmosome and hemidesmosome membrane junctions hold together epithelial cells of organs and skin. These membrane junctions allow shear forces to be distributed across the entire epithelial cell layer, thus providing strength and rigidity to the epithelium.
  • EFs in epithelial cells are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link EFs to hemidesmosomes are not known.
  • Desmin EFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin.
  • Ankyrin Associations between the cytoskeleton and the lipid membranes bounding intercellular compartments involve spectrin, ankyrin, and integral membrane proteins. Spectrin is a major component of the cytoskeleton and acts as a scaffolding protein. Similarly, ankyrin acts to tether the actin-spectrin moiety to membranes and to regulate the interaction between the cytoskeleton and membranous compartments. Different ankyrin isoforms are specific to different organelles and provide specificity for this interaction. Ankyrin also contains a regulatory domain that can respond to cellular signals, allowing remodeling of the cytoskeleton during the cell cycle and differentiation (Lambert, S. and Bennett, V. (1993) Eur. J. Biochem. 211:1-6).
  • Ankyrins have three basic structural components.
  • the N-terminal portion of ankyrin consists of a repeated 33-arnino acid motif, the ankyrin repeat, which is involved in specific protein-protein interactions. Variable regions within the motif are responsible for specific protein binding, such that different ankyrin repeats are involved in binding to tubulin, anion exchange protein, voltage-gated sodium channel, Na + /K + -ATPase, and neurofascin.
  • the ankyrin motif is also found in transcription factors, such as NF-K-B, and in the yeast cell cycle proteins CDC10, SW14, and SW16. Proteins involved in tissue differentiation, such as Drosophila Notch and C.
  • elegans LfN-12 and GLP-1 also contain ankyrin-like repeats.
  • Lux et al. (1990; Nature 344:36-42) suggest that ankyrin-like repeats function as 'built-in' ankyrins and form binding sites for integral membrane proteins, tubulin, and other proteins.
  • the central domain of ankyrin is required for binding spectrin.
  • This domain consists of an acidic region, primarily responsible for binding spectrin, and a basic region. Phosphorylation within the central domain may regulate spectrin binding.
  • the C-terminal domain regulates ankyrin function.
  • the C-terminally-deleted ankyrin, protein 2.2 behaves as a constitutively active ankyrin, displaying increased membrane and spectrin binding.
  • the C-terminal domain is divergent among ankyrin family members, and tissue-specific alternative splicing generates modified C-termini with acidic or basic characteristics (Lambert, supra).
  • ANK1, ANK2, and ANK3 Tliree ankyrin proteins, ANK1, ANK2, and ANK3, have been described which differ in their tissue-specific and subcellular localization patterns.
  • ANK1, erythrocyte protein 2.1 is involved in protecting red cells from circulatory shear stresses and helping maintain the erythrocyte' s unique biconcave shape.
  • An ANK1 deficiency has been linked to hereditary hemolytic anemias, such as hereditary spherocytosis (HS), and a neurodegenerative disorder involving loss of Perkinje cells (Lambert, supra).
  • ANK2 is the major nervous tissue ankyrin. Two alternative splice variants are generated from the ANK2 gene.
  • Brain ankyrin 1 (brankl), which is expressed in adults, is similar to ANK1 in the N-terminal and central domains, but has an entirely dissimilar regulatory domain.
  • An early neuronal form, brank2 includes an additional motif between the spectrin-binding and regulatory domain.
  • An ankyrin homolog in C. elegans, unc-44 produces alternative splice variants similar to ANK2. Mutations in the unc-44 gene affect the direction of axonal outgrowth (Otsuka, AJ. et al. (1995) J. Cell Biol. 129:1081-1092).
  • ANK3 consists of four ankyrin isoforms (G100, G119, G120, and G195), which localize to intracellular compartments and are implicated in vesicular transport.
  • Ank G119 is associated with the Golgi, has a truncated N-terminal domain, and lacks a C-terminal regulatory domain.
  • Ank G120 and Ank Qj oo associate with the late endolysosomes in macrophage, lack N-terminal ankyrin repeats, but contain both spectrin-binding and regulatory domains characteristic of ANK1 and ANK2.
  • Ank G195 is associated with the tr ⁇ ns-Golgi network (TGN).
  • SAATS spectrin-ankyrin- adapter protein trafficking system
  • Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intracellular movements such as phagocytosis and rearrangement of cell contents during mitotic cell division (cytokinesis).
  • the contractile unit of skeletal muscle termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber.
  • Myosins are composed of one or two heavy chains and associated light chains.
  • Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain binding, and a carboxy-terminal tail domain.
  • the tail domains may associate to form an ⁇ -helical coiled coil.
  • Conventional myosins such as those found in muscle tissue, are composed of two myosin heavy- chain subunits, each associated with two light-chain subunits that bind at the neck region and play a regulatory role.
  • Unconventional myosins believed to function in intracellular motion, may contain either one or two heavy chains and associated light chains. There is evidence for about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional.
  • Dvnein-related Motor Proteins Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosolic and axonemal, have been identified. Cytosolic dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve terminal to the cell body and transport of endocytic vesicles to lysosomes. As well, viruses often take advantage of cytoplasmic dyneins to be transported to the nucleus and establish a successful infection (Sodeik, B. et al. (1997) J. Cell Biol. 136: 1007-1021).
  • Virion proteins of herpes simplex virus 1 interact with the cytoplasmic dynein intermediate chain (Ye, G.J. et al. (2000) J. Virol. 74:1355-1363). Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flagella and cilia. Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sliding force produces bending that causes the flagellum or cilium to beat. Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven by the hydrolysis of ATP.
  • the heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and light chains.
  • Cytoplasmic dynein is the largest and most complex of the motor proteins.
  • Kinesin-related Motor Proteins Kinesins are (+) end-directed motor proteins which act on microtubules. The prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function is particularly important for axonal transport in neurons. Kinesin is also important in all cell types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for maintaining the identity and functionality of these secretory organelles.
  • Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement, and cellular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. Cell Biol. 6:63-68.)
  • the prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two light polypeptide chains (KLCs).
  • KHC subunits are typically referred to as "kinesin.” KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length.
  • Two KHCs dimerize to form a rod-shaped molecule with three distinct regions of secondary structure.
  • a globular motor domain that functions in ATP hydrolysis and microtubule binding.
  • Kinesin motor domains are highly conserved and share over 70% identity.
  • an ⁇ -helical coiled-coil region which mediates dimerization.
  • a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
  • KRPs kinesin-related proteins
  • Dynamin is a large GTPase motor protein that functions as a "molecular pinchase,” generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin-coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in neurons. Binding of dynamin to a membrane leads to dynamin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynamin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle.
  • dynamin disassembly. Following disassembly the dynamin may either dissociate from the membrane or remain associated to the vesicle and be transported to another region of the cell.
  • Three homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. conserveed dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin' s GTPase activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on other proteins.
  • Some dynamin-related proteins do not contain the pleckstrin homology domain or the proline-rich domain. (See McNiven, M.A. (1998) Cell 94:151-154; Scaife, R.M. and RL. Margolis (1997) Cell. Signal. 9:395-401.)
  • array technology can provide a simple way to explore the expression of a single polymorphic gene or the expression profile of a large number of related or unrelated genes.
  • arrays are employed to detect the expression of a specific gene or its variants.
  • arrays provide a platform for identifying genes that are tissue specific, are affected by a substance being tested in a toxicology assay, are part of a signaling cascade, carry out housekeeping functions, or are specifically related to a particular genetic predisposition, condition, disease, or disorder.
  • Lung cancer is the leading cause of cancer death for men and the second leading cause of cancer death for women in the U.S.
  • NSCLCs non-small cell lung cancers
  • SCLC small cell lung cancer
  • cytoskeleton-associated proteins satisfies a need in the art by providing new compositions which are useful in the diagnosis, prevention, and treatment of cell proliferative disorders, viral infections, and neurological disorders, and in the assessment of the effects of exogenous compounds on the expression of nucleic acid and amino acid sequences of cytoskeleton-associated proteins.
  • the invention features purified polypeptides, cytoskeleton-associated proteins, referred to collectively as “CSAP” and individually as “CSAP-1,” “CSAP-2,” “CSAP-3,” “CSAP-4,” “CSAP-5,” “CSAP-6,” “CSAP-7,” “CSAP-8,” “CSAP-9,” “CSAP-10,” “CSAP-11,” “CSAP-12,” “CSAP-13,” “CSAP-14,” “CSAP-15,” “CSAP-16,” “CSAP-17,” “CSAP-18,” “CSAP-19,” “CSAP-20,” “CSAP- 21,” “CSAP-22,” “CSAP-23,” “CSAP-24,” “CSAP-25,” “CSAP-26,” “CSAP-27,” and “CSAP-28.”
  • the invention provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical
  • the invention provides an isolated polypeptide comprising the amino acid sequence of SEQ ED NO: 1-28.
  • the invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the polynucleotide encodes a polypeptide selected from the group consisting of SEQ ID NO: 1-28.
  • the polynucleotide is selected from the group consisting of SEQ ID NO
  • the invention provides a recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the invention provides a cell transformed with the recombinant polynucleotide.
  • the invention provides a transgenic organism comprising the recombinant polynucleotide.
  • the invention also provides a method for producing a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the method comprises a) culturing a cell under conditions suitable for expression of the polypeptide, wherein said cell is transformed with a recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide encoding the polypeptide, and b) recovering the polypeptide so expressed.
  • the invention provides an isolated antibody which specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the invention further provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d).
  • the polynucleotide comprises at least 60 contiguous nucleotides. Additionally, the invention provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d).
  • the method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and b) detecting the presence or absence of said hybridization complex, and optionally, if present, the amount thereof.
  • the probe comprises at least 60 contiguous nucleotides.
  • the invention further provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d).
  • the method comprises a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
  • the invention further provides a composition comprising an effective amount of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and a pharmaceutically acceptable excipient.
  • the composition comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 1- 28.
  • the invention additionally provides a method of treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment the composition.
  • the invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample.
  • the invention provides a composition comprising an agonist compound identified by the method and a pharmaceutically acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment the composition.
  • the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample.
  • the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceutically acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with overexpression of functional CSAP, comprising administering to a patient in need of such treatment the composition.
  • the invention further provides a method of screening for a compound that specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the method comprises a) combining the polypeptide with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide to the test compound, thereby identifying a compound that specifically binds to the polypeptide.
  • the invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
  • the method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
  • the invention further provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, the method comprising a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
  • the invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, iii) a polynucleotide having a sequence complementary to i), iv) a polynucleotide complementary to the polynucleotide of ii), and v) an RNA equivalent of i)-i
  • Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, iii) a polynucleotide complementary to the polynucleotide of i), iv) a polynucleotide complementary to the polynucleotide of ii), and v) an RNA equivalent of i)-iv).
  • the target polynucleotide comprises a fragment of a polynucleotide sequence selected from the group consisting of i)-v) above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
  • Table 1 summarizes the nomenclature for the full length polynucleotide and polypeptide sequences ofthe present invention.
  • Table 2 shows the GenBank identification number and annotation of the nearest GenBank homolog for polypeptides of the invention. The probability scores for the matches between each polypeptide and its homolog(s) are also shown.
  • Table 3 shows structural features of polypeptide sequences of the invention, including predicted motifs and domains, along with the methods, algorithms, and searchable databases used for analysis of the polypeptides.
  • Table 4 lists the cDNA and/or genomic DNA fragments which were used to assemble polynucleotide sequences of the invention, along with selected fragments of the polynucleotide sequences.
  • Table 5 shows the representative cDNA library for polynucleotides of the invention.
  • Table 6 provides an appendix which describes the tissues and vectors used for construction of the cDNA libraries shown in Table 5.
  • Table 7 shows the tools, programs, and algorithms used to analyze the polynucleotides and polypeptides of the invention, along with applicable descriptions, references, and threshold parameters.
  • CSAP refers to the amino acid sequences of substantially purified CSAP obtained from any species, particularly a mammalian species, including bovine, ovine, porcine, murine, equine, and human, and from any source, whether natural, synthetic, semi-synthetic, or recombinant.
  • agonist refers to a molecule which intensifies or mimics the biological activity of CSAP.
  • Agonists may include proteins, nucleic acids, carbohydrates, small molecules, or any other compound or composition which modulates the activity of CSAP either by directly interacting with CSAP or by acting on components of the biological pathway in which CSAP participates.
  • An "allelic variant” is an alternative form of the gene encoding CSAP.
  • Allelic variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered.
  • a gene may have none, one, or many allelic variants of its naturally occurring form. Common mutational changes which give rise to allelic variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
  • altered nucleic acid sequences encoding CSAP include those sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as CSAP or a polypeptide with at least one functional characteristic of CSAP. Included within this definition are polymorphisms which may or may not be readily detectable using a particular oligonucleotide probe of the polynucleotide encoding CSAP, and improper or unexpected hybridization to allelic variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding CSAP.
  • the encoded protein may also be "altered,” and may contain deletions, insertions, or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent CSAP.
  • Deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the biological or immunological activity of CSAP is retained.
  • negatively charged amino acids may include aspartic acid and glutamic acid
  • positively charged amino acids may include lysine and arginine.
  • Amino acids with uncharged polar side chains having similar hydrophilicity values may include: asparagine and glutamine; and serine and threonine.
  • Amino acids with uncharged side chains having similar hydrophilicity values may include: leucine, isoleucine, and valine; glycine and alanine; and phenylalanine and tyrosine.
  • amino acid and amino acid sequence refer to an oligopeptide, peptide, polypeptide, or protein sequence, or a fragment of any of these, and to naturally occurring or synthetic molecules. Where “amino acid sequence” is recited to refer to a sequence of a naturally occurring protein molecule, “amino acid sequence” and like terms are not meant to limit the amino acid sequence to the complete native amino acid sequence associated with the recited protein molecule.
  • Amplification relates to the production of additional copies of a nucleic acid sequence. Amplification is generally carried out using polymerase chain reaction (PCR) technologies well known in the art.
  • PCR polymerase chain reaction
  • Antagonist refers to a molecule which inhibits or attenuates the biological activity of CSAP.
  • Antagonists may include proteins such as antibodies, nucleic acids, carbohydrates, small molecules, or any other compound or composition which modulates the activity of CSAP either by directly interacting with CSAP or by acting on components of the biological pathway in which CSAP participates.
  • antibody refers to intact immunoglobulin molecules as well as to fragments thereof, such as Fab, F(ab') 2 , and Fv fragments, which are capable of binding an epitopic determinant.
  • Antibodies that bind CSAP polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen.
  • the polypeptide or oligopeptide used to immunize an animal e.g., a mouse, a rat, or a rabbit
  • an animal e.g., a mouse, a rat, or a rabbit
  • RNA e.g., a mouse, a rat, or a rabbit
  • antigenic determinant refers to that region of a molecule (i.e., an epitope) that makes contact with a particular antibody.
  • an antigenic determinant may compete with the intact antigen (i.e., the immunogen used to elicit the immune response) for binding to an antibody.
  • aptamer refers to a nucleic acid or oligonucleotide molecule that binds to a specific molecular target.
  • Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by Exponential Enrichment), described in U.S. Patent No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial libraries.
  • Aptamer compositions may be double-stranded or single-stranded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-like molecules.
  • the nucleotide components of an aptamer may have modified sugar groups (e.g., the 2'-OH group of a ribonucleotide may be replaced by 2'-F or 2'-NH 2 ), which may improve a desired property, e.g., resistance to nucleases or longer lifetime in blood.
  • Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system.
  • Aptamers may be specifically cross-linked to their cognate ligands, e.g., by photo-activation of a cross-linker. (See, e.g., Brody, E.N. and L. Gold (2000) I. Biotechnol. 74:5-13.)
  • introduction refers to an aptamer which is expressed in vivo.
  • a vaccinia virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl Acad. Sci. USA 96:3606-3610).
  • spiegelmer refers to an aptamer which includes L-DNA, L-RNA, or other left- handed nucleotide derivatives or nucleotide-like molecules. Aptamers containing left-handed nucleotides are resistant to degradation by naturally occurring enzymes, which normally act on substrates containing right-handed nucleotides.
  • antisense refers to any composition capable of base-pairing with the "sense" (coding) strand of a specific nucleic acid sequence.
  • Antisense compositions may include DNA; RNA; peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxy ethyl sugars or 2'-methoxy ethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
  • Antisense molecules may be produced by any method including chemical synthesis or transcription. Once introduced into a cell, the complementary antisense molecule base-pairs with a naturally occurring nucleic acid sequence produced by the cell to form duplexes which block either transcription or translation.
  • the designation "negative” or “minus” can refer to the antisense strand, and the designation “positive” or “plus” can refer to the sense strand of a reference DNA molecule. .
  • biologically active refers to a protein having structural, regulatory, or biochemical functions of a naturally occurring molecule.
  • immunologically active or “immunogenic” refers to the capability of the natural, recombinant, or synthetic CSAP, or of any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies.
  • compositions comprising a given polynucleotide sequence and a “composition comprising a given amino acid sequence” refer broadly to any composition containing the given polynucleotide or amino acid sequence.
  • the composition may comprise a dry formulation or an aqueous solution.
  • Compositions comprising polynucleotide sequences encoding CSAP or fragments of CSAP may be employed as hybridization probes. The probes may be stored in freeze-dried form and may be associated with a stabilizing agent such as a carbohydrate.
  • the probe may be deployed in an aqueous solution containing salts (e.g., NaCl), detergents (e.g., sodium dodecyl sulfate; SDS), and other components (e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.).
  • salts e.g., NaCl
  • detergents e.g., sodium dodecyl sulfate; SDS
  • other components e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.
  • Consensus sequence refers to a nucleic acid sequence which has been subjected to repeated DNA sequence analysis to resolve uncalled bases, extended using the XL-PCR kit (Applied Biosystems, Foster City CA) in the 5' and/or the 3' direction, and resequenced, or which has been assembled from one or more overlapping cDNA, EST, or genomic DNA fragments using a computer program for fragment assembly, such as the GELVIEW fragment assembly system (GCG, Madison WI) or Phrap (University of Washington, Seattle WA). Some sequences have been both extended and assembled to produce the consensus sequence.
  • Constant amino acid substitutions are those substitutions that are predicted to least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions.
  • the table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative amino acid substitutions.
  • Conservative amino acid substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the site of the substitution, and/or (c) the bulk of the side chain.
  • a “deletion” refers to a change in the amino acid or nucleotide sequence that results in the absence of one or more amino acid residues or nucleotides.
  • derivative refers to a chemically modified polynucleotide or polypeptide. Chemical modifications of a polynucleotide can include, for example, replacement of hydrogen by an alkyl, acyl, hydroxyl, or amino group.
  • a derivative polynucleotide encodes a polypeptide which retains at least one biological or immunological function of the natural molecule.
  • a derivative polypeptide is one modified by glycosylation, pegylation, or any similar process that retains at least one biological or immunological function of the polypeptide from which it was derived.
  • a “detectable label” refers to a reporter molecule or enzyme that is capable of generating a measurable signal and is covalently or noncovalently joined to a polynucleotide or polypeptide.
  • “Differential expression” refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons may be carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample.
  • Exon shuffling refers to the recombination of different coding regions (exons). Since an exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus allowing acceleration of the evolution of new protein functions.
  • a “fragment” is a unique portion of CSAP or the polynucleotide encoding CSAP which is identical in sequence to but shorter in length than the parent sequence.
  • a fragment may comprise up to the entire length of the defined sequence, minus one nucleotide/amino acid residue.
  • a fragment may comprise from 5 to 1000 contiguous nucleotides or amino acid residues.
  • a fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous nucleotides or amino acid residues in length. Fragments may be preferentially selected from certain regions of a molecule.
  • a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence.
  • a fragment of SEQ ID NO:29-56 comprises a region of unique polynucleotide sequence that specifically identifies SEQ ID NO:29-56, for example, as distinct from any other sequence in the genome from which the fragment was obtained.
  • a fragment of SEQ ED NO:29-56 is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish SEQ ED NO:29-56 from related polynucleotide sequences.
  • the precise length of a fragment of SEQ ED NO:29-56 and the region of SEQ ED NO:29-56 to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
  • a fragment of SEQ ID NO: 1-28 is encoded by a fragment of SEQ ID NO:29-56.
  • a fragment of SEQ ED NO: 1-28 comprises a region of unique amino acid sequence that specifically identifies SEQ ID NO: 1-28.
  • a fragment of SEQ ED NO: 1-28 is useful as an immunogenic peptide for the development of antibodies that specifically recognize SEQ ID NO: 1-28.
  • the precise length of a fragment of SEQ ID NO: 1-28 and the region of SEQ ED NO: 1-28 to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
  • a “full length” polynucleotide sequence is one containing at least a translation initiation codon (e.g., methionine) followed by an open reading frame and a translation termination codon.
  • a “full length” polynucleotide sequence encodes a "full length” polypeptide sequence.
  • “Homology” refers to sequence similarity or, interchangeably, sequence identity, between two or more polynucleotide sequences or two or more polypeptide sequences.
  • the terms “percent identity” and “% identity,” as applied to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent identity between polynucleotide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program.
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • the BLAST software suite includes various sequence analysis programs including "blastn,” that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases.
  • BLAST 2 Sequences are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences” tool Version 2.0.12 (April-21-2000) set at default parameters. Such default parameters may be, for example:
  • Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ED number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures, or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
  • nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
  • percent identity and % identity refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm.
  • Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the charge and iydrophobicity at the site of substitution, thus preserving the structure (and therefore function) ofthe polypeptide.
  • Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ED number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
  • HACs Human artificial chromosomes
  • HACs are linear microchromosomes which may contain DNA sequences of about 6 kb to 10 Mb in size and which contain all of the elements required for chromosome replication, segregation and maintenance.
  • humanized antibody refers to an antibody molecule in which the amino acid sequence in the non-antigen binding regions has been altered so that the antibody more closely resembles a human antibody, and still retains its original binding ability.
  • Hybridization refers to the process by which a polynucleotide strand anneals with a complementary strand through base pairing under defined hybridization conditions. Specific hybridization is an indication that two nucleic acid sequences share a high degree of complementarity. Specific hybridization complexes form under permissive annealing conditions and remain hybridized after the "washing" step(s). The washing step(s) is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid strands that are not perfectly matched.
  • Permissive conditions for annealing of nucleic acid sequences are routinely determinable by one of ordinary skill in the art and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency, and therefore hybridization specificity. Permissive annealing conditions occur, for example, at 68°C in the presence of about 6 x SSC, about 1% (w/v) SDS, and about 100 ⁇ g/ml sheared, denatured salmon sperm DNA.
  • wash temperatures are typically selected to be about 5°C to 20°C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH.
  • T m is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe.
  • High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, 55°C, or 42°C may be used. SSC concentration may be varied from about 0.1 to 2 x SSC, with SDS being present at about 0.1%.
  • blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, sheared and denatured salmon sperm DNA at about 100-200 ⁇ g/ml.
  • Organic solvent such as formamide at a concentration of about 35-50% v/v
  • RNA:DNA hybridizations Useful variations on these wash conditions will be readily apparent to those of ordinary skill in the art.
  • Hybridization particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their encoded polypeptides.
  • hybridization complex refers to a complex formed between two nucleic acid sequences by virtue of the formation of hydrogen bonds between complementary bases.
  • a hybridization complex may be formed in solution (e.g., C 0 t or R 0 t analysis) or formed between one nucleic acid sequence present in solution and another nucleic acid sequence immobilized on a solid support (e.g., paper, membranes, filters, chips, pins or glass slides, or any other appropriate substrate to which cells or their nucleic acids have been fixed).
  • a solid support e.g., paper, membranes, filters, chips, pins or glass slides, or any other appropriate substrate to which cells or their nucleic acids have been fixed.
  • insertion and “addition” refer to changes in an amino acid or nucleotide sequence resulting in the addition of one or more amino acid residues or nucleotides, respectively.
  • Immuno response can refer to conditions associated with inflammation, trauma, immune disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect cellular and systemic defense systems.
  • an “immunogenic fragment” is a polypeptide or oligopeptide fragment of CSAP which is capable of eliciting an immune response when introduced into a living organism, for example, a mammal.
  • the term “immunogenic fragment” also includes any polypeptide or oligopeptide fragment of CSAP which is useful in any of the antibody production methods disclosed herein or known in the art.
  • microarray refers to an arrangement of a plurality of polynucleotides, polypeptides, or other chemical compounds on a substrate.
  • array element refers to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
  • modulate refers to a change in the activity of CSAP.
  • modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional, or immunological properties of CSAP.
  • nucleic acid and nucleic acid sequence refer to a nucleotide, oligonucleotide, polynucleotide, or any fragment thereof. These phrases also refer to DNA or RNA of genomic or synthetic origin which may be single-stranded or double-stranded and may represent the sense or the antisense strand, to peptide nucleic acid (PNA), or to any DNA-like or RNA-like material.
  • “Operably linked” refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with a second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • PNA protein nucleic acid
  • PNA refers to an antisense molecule or anti-gene agent which comprises an oligonucleotide of at least about 5 nucleotides in length linked to a peptide backbone of amino acid residues ending in lysine. The terminal lysine confers solubility to the composition.
  • PNAs preferentially bind complementary single stranded DNA or RNA and stop transcript elongation, and may be pegylated to extend their lifespan in the cell.
  • Post-translational modification of an CSAP may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu of CSAP.
  • Probe refers to nucleic acid sequences encoding CSAP, their complements, or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences.
  • Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes.
  • Primmers are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence, h order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the tables, figures, and Sequence Listing, may be used.
  • PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
  • Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope.
  • the Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.)
  • the PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences.
  • this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments.
  • the oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above.
  • a "recombinant nucleic acid” is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence.
  • recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid.
  • a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence.
  • Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
  • such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
  • a “regulatory element” refers to a nucleic acid sequence usually derived from untranslated regions of a gene and includes enhancers, promoters, introns, and 5' and 3' untranslated regions (UTRs). Regulatory elements interact with host or viral proteins which control transcription, translation, or RNA stability.
  • Reporter molecules are chemical or biochemical moieties used for labeling a nucleic acid, amino acid, or antibody. Reporter molecules include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
  • An "RNA equivalent,” in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
  • sample is used in its broadest sense.
  • a sample suspected of containing CSAP, nucleic acids encoding CSAP, or fragments thereof may comprise a bodily fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, in solution or bound to a substrate; a tissue; a tissue print; etc.
  • binding and “specifically binding” refer to that interaction between a protein or peptide and an agonist, an antibody, an antagonist, a small molecule, or any natural or synthetic binding composition. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A,” the presence of a polypeptide comprising the epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
  • substantially purified refers to nucleic acid or amino acid sequences that are removed from their natural environment and are isolated or separated, and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other components with which they are naturally associated.
  • substitution refers to the replacement of one or more amino acid residues or nucleotides by different amino acid residues or nucleotides, respectively.
  • Substrate refers to any suitable rigid or semi-rigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries.
  • the substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
  • a “transcript image” or “expression profile” refers to the collective pattern of gene expression by a particular cell type or tissue under given conditions at a given time.
  • Transformation describes a process by which exogenous DNA is introduced into a recipient cell. Transformation may occur under natural or artificial conditions according to various methods well known in the art, and may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method for transformation is selected based on the type of host cell being transformed and may include, but is not limited to, bacteriophage or viral infection, electroporation, heat shock, hpofection, and particle bombardment.
  • transformed cells includes stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as transiently transformed cells which express the inserted DNA or RNA for limited periods of time.
  • a "transgenic organism,” as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art.
  • the nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus.
  • the nucleic acid can be introduced by infection with a recombinant viral vector, such as a lentiviral vector (Lois, C. et al. (2002) Science 295:868-872).
  • the term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule.
  • the transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, plants and animals.
  • the isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
  • a "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 40% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of nucleic acids may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length.
  • a variant may be described as, for example, an "allelic” (as defined above), “splice,” “species,” or “polymorphic” variant.
  • a splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing.
  • the corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule.
  • Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides will generally have significant amino acid identity relative to each other.
  • a polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species.
  • Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one nucleotide base.
  • SNPs single nucleotide polymorphisms
  • the presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
  • a "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length of one of the polypeptides.
  • the invention is based on the discovery of new human cytoskeleton-associated proteins (CSAP), the polynucleotides encoding CSAP, and the use of these compositions for the diagnosis, treatment, or prevention of cell proliferative disorders, viral infections, and neurological disorders.
  • CCP cytoskeleton-associated proteins
  • Table 1 summarizes the nomenclature for the full length polynucleotide and polypeptide sequences of the invention. Each polynucleotide and its corresponding polypeptide are correlated to a single Incyte project identification number (Incyte Project ED). Each polypeptide sequence is denoted by both a polypeptide sequence identification number (Polypeptide SEQ ID NO:) and an Incyte polypeptide sequence number (Incyte Polypeptide ID) as shown.
  • Each polynucleotide sequence is denoted by both a polynucleotide sequence identification number (Polynucleotide SEQ ED NO:) and an Incyte polynucleotide consensus sequence number (Incyte Polynucleotide ED) as shown.
  • Column 6 shows the Incyte ED numbers of physical, full length clones corresponding to the polypeptide and polynucleotide sequences of the invention. The full length clones encode polypeptides which have at least 95% sequence identity to the polypeptide sequences shown in column 3.
  • Table 2 shows sequences with homology to the polypeptides of the invention as identified by BLAST analysis against the GenBank protein (genpept) database.
  • Columns 1 and 2 show the polypeptide sequence identification number (Polypeptide SEQ ED NO:) and the corresponding Incyte polypeptide sequence number (Incyte Polypeptide ED) for polypeptides of the invention.
  • Column 3 shows the GenBank identification number (GenBank ID NO:) of the nearest GenBank homolog.
  • Column 4 shows the probability scores for the matches between each polypeptide and its homolog(s).
  • Column 5 shows the annotation of the GenBank homologs along with relevant citations where applicable, all of which are expressly incorporated by reference herein.
  • Table 3 shows various structural features of the polypeptides of the invention.
  • Columns 1 and 2 show the polypeptide sequence identification number (SEQ ID NO:) and the corresponding Incyte polypeptide sequence number (Incyte Polypeptide ED) for each polypeptide of the invention.
  • Column 3 shows the number of amino acid residues in each polypeptide.
  • Column 4 shows potential phosphorylation sites, and column 5 shows potential glycosylation sites, as determined by the MOTIFS program of the GCG sequence analysis software package (Genetics Computer Group, Madison WI).
  • Column 6 shows amino acid residues comprising signature sequences, domains, and motifs.
  • Column 7 shows analytical methods for protein structure/function analysis and in some cases, searchable databases to which the analytical methods were applied.
  • SEQ ID NO: 1 is 86% identical, from residue Ml to residue S459, to mouse c29 protein (GenBank ID g3868802) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.)
  • the BLAST probability score is 1.4e-207, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ ID NO: 1 also contains an intermediate filament protein domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains.
  • HMM hidden Markov model
  • SEQ ID NO: 1 is a intermediate filament protein.
  • SEQ ED NO:3 is 93% identical from residue Ml to residue Dl 107 and 42% identical from residue E470 to residue N1614, (that is, 74% identical over the length of the sequence) to Mus musculus Kif21a (GenBank ID g6561827) as determined by the Basic Local Alignment Search Tool (BLAST).
  • BLAST Basic Local Alignment Search Tool
  • the BLAST probability score over the length of the sequence is 2.3e-199, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ DD NO:3 also contains a kinesin motor domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCAN analyses provide further corroborative evidence that SEQ ED NO:3 is a kinesin.
  • SEQ DD NO:7 is 95% identical, from residue 1125 to residue T1050, to rat ankyrin binding cell adhesion molecule neurofascin (GenBank ED gl842427) as determined by the Basic Local Alignment Search Tool (BLAST).
  • SEQ ED NO:7 also contains a fibronectin type HI domain and an immunoglobulin domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains.
  • HMM hidden Markov model
  • SEQ ED NO: 9 is 95% identical, from residue Ml to residue D471, to rat coronin relative protein (GenBank ED gl5430628) as determined by the Basic Local Alignment Search Tool (BLAST).
  • BLAST Basic Local Alignment Search Tool
  • the BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ ID NO: 9 also contains WD domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS and MOTIFS analyses provide further corroborative evidence that SEQ ED NO:9 is a coronin.
  • HMM hidden Markov model
  • SEQ DD NO: 14 is 99% identical, from residue Ml to residue R523, to human keratin 6 irs (GenBank ED g6961277) as determined by the Basic Local Alignment Search Tool (BLAST).
  • the BLAST probability score is 0.0, wliich indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ ID NO: 14 also contains intermediate filament protein domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)- based PFAM database of conserved protein family domains.
  • HMM hidden Markov model
  • SEQ DD NO: 14 is an intermediate filament protein, which is a specific subtype of cytoskeletal protein.
  • SEQ ED NO: 18 is 2039 residues in length and is 94% identical, from residue Ml to residue A2039, to mouse myosin containing PDZ domain (GenBank ED g7416032) as determined by the Basic Local Alignment Search Tool (BLAST).
  • BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ ID NO: 18 also contains an IQ calmodulin-binding motif, a PDZ domain (also known as DHR or GLGF), and a myosin head (motor domain) as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains.
  • HMM hidden Markov model
  • SEQ ED NO:26 is 92% identical, from residue Ml to residue L1715, to rat ankyrin repeat-rich membrane-spanning protein (GenBank DD gl 1321435) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance.
  • SEQ DD NO:26 also contains eleven ankyrin repeat domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains.
  • HMM hidden Markov model
  • SEQ ID NO:26 is an ankyrin repeat-rich protein. Many ankyrin repeats have been shown to moderate protein-protein interactions, for example, in cytoskeletal proteins. SEQ ID NO:2, SEQ ED NO:4-6, SEQ DD NO:8, SEQ DD NO: 10-13, SEQ DD NO: 15-17, SEQ ED NO: 19-25, and SEQ ED NO:27-28 were analyzed and annotated in a similar manner. The algorithms and parameters for the analysis of SEQ DD NO: 1-28 are described in Table 7.
  • the full length polynucleotide sequences of the present invention were assembled using cDNA sequences or coding (exon) sequences derived from genomic DNA, or any combination of these two types of sequences.
  • Column 1 lists the polynucleotide sequence identification number (Polynucleotide SEQ DD NO:), the corresponding Incyte polynucleotide consensus sequence number (Incyte DD) for each polynucleotide of the invention, and the length of each polynucleotide sequence in basepairs.
  • Column 2 shows the nucleotide start (5') and stop (3') positions of the cDNA and/or genomic sequences used to assemble the full length polynucleotide sequences of the invention, and of fragments of the polynucleotide sequences which are useful, for example, in hybridization or amplification technologies that identify SEQ DD NO:29-56 or that distinguish between SEQ ED NO:29-56 and related polynucleotide sequences.
  • the polynucleotide fragments described in Column 2 of Table 4 may refer specifically, for example, to Incyte cDNAs derived from tissue-specific cDNA libraries or from pooled cDNA libraries.
  • the polynucleotide fragments described in column 2 may refer to GenBank cDNAs or ESTs which contributed to the assembly ofthe full length polynucleotide sequences.
  • the polynucleotide fragments described in column 2 may identify sequences derived from the ENSEMBL (The Sanger Centre, Cambridge, UK) database (Le., those sequences including the designation "ENST”).
  • the polynucleotide fragments described in column 2 may be derived from the NCBI RefSeq Nucleotide Sequence Records Database (i.e., those sequences including the designation "NM” or "NT") or the NCBI RefSeq Protein Sequence Records (i.e., those sequences including the designation "NP").
  • polynucleotide fragments described in column 2 may refer to assemblages of both cDNA and Genscan-predicted exons brought together by an "exon stitching" algorithm.
  • a polynucleotide sequence identified as FL_XXXXX_N 1 _N 2 _YYYY_N 3 _N 4 represents a "stitched" sequence in which XXXXX is the identification number of the cluster of sequences to which the algorithm was applied, and YYYYY is the number of the prediction generated by the algorithm, and N 1 ⁇ 3 ..., if present, represent specific exons that may have been manually edited during analysis (See Example V).
  • the polynucleotide fragments in column 2 may refer to assemblages of exons brought together by an "exon-stretching" algorithm.
  • a polynucleotide sequence identified as FLXXXXX_gAAAAA_gBBBBB_l_N is a "stretched" sequence, with XXXXX being the Incyte project identification number, gAAAAA being the GenBank identification number of the human genomic sequence to which the "exon-stretching" algorithm was applied, gBBBBB being the GenBank identification number or NCBI RefSeq identification number of the nearest GenBank protem homolog, and N referring to specific exons (See Example V).
  • a RefSeq identifier (denoted by " ⁇ M,” “ ⁇ P,” or “NT”) may be used in place of the GenBank identifier (i.e., gBBBBB).
  • a prefix identifies component sequences that were hand-edited, predicted from genomic DNA sequences, or derived from a combination of sequence analysis methods. The following Table lists examples of component sequence prefixes and corresponding sequence analysis methods associated with the prefixes (see Example TV and Example V).
  • Incyte cDNA coverage redundant with the sequence coverage shown in Table 4 was obtained to confirm the final consensus polynucleotide sequence, but the relevant Incyte cDNA identification numbers are not shown.
  • Table 5 shows the representative cDNA libraries for those full length polynucleotide sequences which were assembled using Incyte cDNA sequences.
  • the representative cDNA library is the Incyte cDNA library which is most frequently represented by the Incyte cDNA sequences which were used to assemble and confirm the above polynucleotide sequences.
  • the tissues and vectors which were used to construct the cDNA libraries shown in Table 5 are described in Table 6.
  • the invention also encompasses CSAP variants.
  • a preferred CSAP variant is one which has at least about 80%, or alternatively at least about 90%, or even at least about 95% amino acid sequence identity to the CSAP amino acid sequence, and which contains at least one functional or structural characteristic of CSAP.
  • the invention also encompasses polynucleotides which encode CSAP.
  • the invention encompasses a polynucleotide sequence comprising a sequence selected from the group consisting of SEQ DD NO:29-56, which encodes CSAP.
  • the polynucleotide sequences of SEQ DD NO: 29-56 as presented in the Sequence Listing, embrace the equivalent RNA sequences, wherein occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
  • the invention also encompasses a variant of a polynucleotide sequence encoding CSAP.
  • a variant polynucleotide sequence will have at least about 70%, or alternatively at least about 85%, or even at least about 95% polynucleotide sequence identity to the polynucleotide sequence encoding CSAP.
  • a particular aspect of the invention encompasses a variant of a polynucleotide sequence comprising a sequence selected from the group consisting of SEQ ED NO:29-56 which has at least about 70%, or alternatively at least about 85%, or even at least about 95% polynucleotide sequence identity to a nucleic acid sequence selected from the group consisting of SEQ ED NO:29-56.
  • Any one of the polynucleotide variants described above can encode an amino acid sequence which contains at least one functional or structural characteristic of CSAP.
  • a polynucleotide variant of the invention is a splice variant of a polynucleotide sequence encoding CSAP.
  • a splice variant may have portions which have significant sequence identity to the polynucleotide sequence encoding CSAP, but will generally have a greater or lesser number of polynucleotides due to additions or deletions of blocks of sequence arising from alternate splicing of exons during mRNA processing.
  • a splice variant may have less than about 70%, or alternatively less than about 60%, or alternatively less than about 50% polynucleotide sequence identity to the polynucleotide sequence encoding CSAP over its entire length; however, portions of the splice variant will have at least about 70%, or alternatively at least about 85%, or alternatively at least about 95%, or alternatively 100% polynucleotide sequence identity to portions of the polynucleotide sequence encoding CSAP.
  • a polynucleotide comprising a sequence of SEQ DD NO:31 is a splice variant of a polynucleotide comprising a sequence of SEQ DD NO:33.
  • a polynucleotide comprising a sequence of SEQ ED NO:34 is a splice variant of a polynucleotide comprising a sequence of SEQ ED NO:35.
  • Any one of the splice variants described above can encode an amino acid sequence which contains at least one functional or structural characteristic of CSAP.
  • nucleotide sequences which encode CSAP and its variants are generally capable of hybridizing to the nucleotide sequence ofthe naturally occurring CSAP under appropriately selected conditions of stringency, it may be advantageous to produce nucleotide sequences encoding CSAP or its derivatives possessing a substantially different codon usage, e.g., inclusion of non-naturally occurring codons. Codons may be selected to increase the rate at which expression of the peptide occurs in a particular prokaryotic or eukaryotic host in accordance with the frequency with which particular codons are utilized by the host.
  • RNA transcripts having more desirable properties such as a greater half -life, than transcripts produced from the naturally occurring sequence.
  • the invention also encompasses production of DNA sequences which encode CSAP and CSAP derivatives, or fragments thereof, entirely by synthetic chemistry.
  • the synthetic sequence may be inserted into any of the many available expression vectors and cell systems using reagents well known in the art.
  • synthetic chemistry may be used to introduce mutations into a sequence encoding CSAP or any fragment thereof.
  • polynucleotide sequences that are capable of hybridizing to the claimed polynucleotide sequences, and, in particular, to those shown in SEQ ED NO:29-56 and fragments thereof under various conditions of stringency.
  • Hybridization conditions including annealing and wash conditions, are described in "Definitions.”
  • Methods for DNA sequencing are well known in the art and may be used to practice any of the embodiments of the invention.
  • the methods may employ such enzymes as the Klenow fragment of DNA polymerase I, SEQUENASE (US Biochemical, Cleveland OH), Taq polymerase (Applied Biosystems), thermostable T7 polymerase (Amersham Pharmacia Biotech, Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies, Gaithersburg MD).
  • sequence preparation is automated with machines such as the MICROLAB 2200 liquid transfer system (Hamilton, Reno NV), PTC200 thermal cycler (MJ Research, Watertown MA) and ABI CATALYST 800 thermal cycler (Applied Biosystems). Sequencing is then carried out using either the ABI 373 or 377 DNA sequencing system (Applied Biosystems), the MEGABACE 1000 DNA sequencing system (Molecular Dynamics, Sunnyvale CA), or other systems known in the art. The resulting sequences are analyzed using a variety of algorithms which are well known in the art. (See, e.g., Ausubel, F.M. (1997) Short Protocols in Molecular Biology. John Wiley & Sons, New York NY, unit 7.7; Meyers, R.A. (1995) Molecular Biology and Biotechnology. Wiley VCH, New York NY, pp. 856-853.)
  • the nucleic acid sequences encoding CSAP may be extended utilizing a partial nucleotide sequence and employing various PCR-based methods known in the art to detect upstream sequences, such as promoters and regulatory elements.
  • PCR-based methods known in the art to detect upstream sequences, such as promoters and regulatory elements.
  • restriction-site PCR uses universal and nested primers to amplify unknown sequence from genomic DNA within a cloning vector. (See, e.g., Sarkar, G. (1993) PCR Methods Applic. 2:318-322.)
  • Another method, inverse PCR uses primers that extend in divergent directions to amplify unknown sequence from a circularized template.
  • the template is derived from restriction fragments comprising a known genomic locus and surrounding sequences.
  • a third method, capture PCR involves PCR amplification of DNA fragments adjacent to known sequences in human and yeast artificial chromosome DNA.
  • capture PCR involves PCR amplification of DNA fragments adjacent to known sequences in human and yeast artificial chromosome DNA.
  • multiple restriction enzyme digestions and ligations may be used to insert an engineered double-stranded sequence into a region of unknown sequence before performing PCR.
  • Other methods which may be used to retrieve unknown sequences are known in the art. (See, e.g., Parker, J.D. et al. (1991) Nucleic Acids Res.
  • primers may be designed using commercially available software, such as OLIGO 4.06 primer analysis software (National Biosciences, Plymouth MN) or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the template at temperatures of about 68°C to 72°C.
  • OLIGO 4.06 primer analysis software National Biosciences, Plymouth MN
  • anneal to the template at temperatures of about 68°C to 72°C.
  • Genomic libraries may be useful for extension of sequence into 5' non-transcribed regulatory regions.
  • Capillary electrophoresis systems which are commercially available may be used to analyze the size or confirm the nucleotide sequence of sequencing or PCR products.
  • capillary sequencing may employ flowable polymers for electrophoretic separation, four different nucleotide- specific, laser-stimulated fluorescent dyes, and a charge coupled device camera for detection of the emitted wavelengths.
  • Output/light intensity may be converted to electrical signal using appropriate software (e.g., GENOTYPER and SEQUENCE NAVIGATOR, Applied Biosystems), and the entire process from loading of samples to computer analysis and electronic data display may be computer controlled. Capillary electrophoresis is especially preferable for sequencing small DNA fragments which may be present in limited amounts in a particular sample.
  • appropriate software e.g., GENOTYPER and SEQUENCE NAVIGATOR, Applied Biosystems
  • polynucleotide sequences or fragments thereof which encode CSAP may be cloned in recombinant DNA molecules that direct expression of CSAP, or fragments or functional equivalents thereof, in appropriate host cells. Due to the inherent degeneracy of the genetic code, other DNA sequences which encode substantially the same or a functionally equivalent amino acid sequence may be produced and used to express CSAP.
  • nucleotide sequences of the present invention can be engineered using methods generally known in the art in order to alter CS AP-encoding sequences for a variety of purposes including, but not limited to, modification of the cloning, processing, and/or expression of the gene product.
  • DNA shuffling by random fragmentation and PCR reassembly of gene fragments and synthetic oligonucleotides may be used to engineer the nucleotide sequences.
  • oligonucleotide- mediated site-directed mutagenesis may be used to introduce mutations that create new restriction sites, alter glycosylation patterns, change codon preference, produce splice variants, and so forth.
  • the nucleotides of the present invention may be subjected to DNA shuffling techniques such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent No. 5,837,458; Chang, C.-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of CSAP, such as its biological or enzymatic activity or its ability to bind to other molecules or compounds.
  • MOLECULARBREEDING Maxygen Inc., Santa Clara CA; described in U.S. Patent No. 5,837,458; Chang, C.-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C.
  • DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening.
  • genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner.
  • sequences encoding CSAP may be synthesized, in whole or in part, using chemical methods well known in the art. (See, e.g., Caruthers, M.H. et al. (1980) Nucleic Acids Symp. Ser. 7:215-223; and Horn, T. et al. (1980) Nucleic Acids Symp. Ser. 7:225-232.)
  • CSAP itself or a fragment thereof may be synthesized using chemical methods.
  • peptide synthesis can be performed using various solution-phase or solid-phase techniques. (See, e.g., Creighton, T. (1984) Proteins, Structures and Molecular Properties, WH Freeman, New York NY, pp. 55-60; and Roberge, J.Y. et al. (1995) Science 269:202-204.) Automated synthesis may be achieved using the ABI 431 A peptide synthesizer (Applied Biosystems).
  • amino acid sequence of CSAP may be altered during direct synthesis and/or combined with sequences from other proteins, or any part thereof, to produce a variant polypeptide or a polypeptide having a sequence of a naturally occurring polypeptide.
  • the peptide may be substantially purified by preparative high performance liquid chromatography. (See, e.g., Chiez, R.M. and F.Z. Regnier (1990) Methods Enzymol. 182:392-421.)
  • the composition of the synthetic peptides may be confirmed by amino acid analysis or by sequencing. (See, e.g., Creighton, supra, pp. 28-53.)
  • the nucleotide sequences encoding CSAP or derivatives thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host.
  • these elements include regulatory sequences, such as enhancers, constitutive and inducible promoters, and 5' and 3' untranslated regions in the vector and in polynucleotide sequences encoding CSAP. Such elements may vary in their strength and specificity.
  • Specific initiation signals may also be used to achieve more efficient translation of sequences encoding CSAP. Such signals include the ATG initiation codon and adjacent sequences, e.g. the Kozak sequence.
  • a variety of expression vector/host systems may be utilized to contain and express sequences encoding CSAP. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic vims, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal cell systems.
  • microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors
  • yeast transformed with yeast expression vectors e.g., insect cell systems infected with viral expression vectors (e.g., baculovirus)
  • plant cell systems transformed with viral expression vectors e.g., cauliflower
  • Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population.
  • the invention is not limited by the host cell employed.
  • a number of cloning and expression vectors may be selected depending upon the use intended for polynucleotide sequences encoding CSAP.
  • routine cloning, subcloning, and propagation of polynucleotide sequences encoding CSAP can be achieved using a multifunctional E. coli vector such as PBLUESCRIPT (Stratagene, La Jolla CA) or PSPORT1 plasmid (Life Technologies).
  • PBLUESCRIPT Stratagene, La Jolla CA
  • PSPORT1 plasmid Life Technologies
  • these vectors may be useful for in vitro transcription, dideoxy sequencing, single strand rescue with helper phage, and creation of nested deletions in the cloned sequence.
  • vectors which direct high level expression of CSAP may be used.
  • vectors containing the strong, inducible SP6 or T7 bacteriophage promoter may be used.
  • Yeast expression systems may be used for production of CSAP.
  • a number of vectors containing constitutive or inducible promoters, such as alpha factor, alcohol oxidase, and PGH promoters, may be used in the yeast Saccharomyces cerevisiae or Pichia pastoris.
  • constitutive or inducible promoters such as alpha factor, alcohol oxidase, and PGH promoters
  • such vectors direct either the secretion or intracellular retention of expressed proteins and enable integration of foreign sequences into the host genome for stable propagation.
  • Plant systems may also be used for expression of CSAP. Transcription of sequences encoding CSAP may be driven by viral promoters, e.g., the 35S and 19S promoters of CaMV used alone or in combination with the omega leader sequence from TMV (Takamatsu, N. (1987) EMBO J.
  • viral promoters e.g., the 35S and 19S promoters of CaMV used alone or in combination with the omega leader sequence from TMV (Takamatsu, N. (1987) EMBO J.
  • plant promoters such as the small subunit of RUBISCO or heat shock promoters may be used.
  • RUBISCO RUBISCO
  • heat shock promoters See, e.g., Coruzzi, G. et al. (1984) EMBO J. 3: 1671-1680; Broglie, R. et al.
  • a number of viral-based expression systems may be utilized, hi cases where an adenovirus is used as an expression vector, sequences encoding CSAP may be ligated into an adenovirus transcription translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a non-essential El or E3 region of the viral genome may be used to obtain infective virus which expresses CSAP in host cells.
  • sequences encoding CSAP may be ligated into an adenovirus transcription translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a non-essential El or E3 region of the viral genome may be used to obtain infective virus which expresses CSAP in host cells.
  • transcription enhancers such as the Rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells.
  • SV40 or EBV- based vectors may also be used for high-level protein expression.
  • HACs Human artificial chromosomes
  • HACs may also be employed to deliver larger fragments of DNA than can be contained in and expressed from a plasmid.
  • HACs of about 6 kb to 10 Mb are constructed and delivered via conventional delivery methods (liposomes, polycationic amino polymers, or vesicles) for therapeutic purposes. (See, e.g., Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.)
  • sequences encoding CSAP can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Following the introduction of the vector, cells may be allowed to grow for about 1 to 2 days in enriched media before being switched to selective media.
  • the purpose of the selectable marker is to confer resistance to a selective agent, and its presence allows growth and recovery of cells which successfully express the introduced sequences.
  • Resistant clones of stably transformed cells may be propagated using tissue culture techniques appropriate to the cell type.
  • selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase and adenine phosphoribosyltransferase genes, for use in tic and apr cells, respectively. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.) Also, antimetabolite, antibiotic, or herbicide resistance can be used as the basis for selection.
  • dhfr confers resistance to methotrexate
  • neo confers resistance to the aminoglycosides neomycin and G-418
  • als and pat confer resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively.
  • Additional selectable genes have been described, e.g., trpB and hisD, which alter cellular requirements for metabolites.
  • Visible markers e.g., anthocyanins, green fluorescent proteins (GFP; Clontech), ⁇ glucuronidase and its substrate ⁇ -glucuronide, or luciferase and its substrate luciferm may be used. These markers can be used not only to identify transformants, but also to quantify the amount of transient or stable protein expression attributable to a specific vector system. (See, e.g., Rhodes, CA. (1995) Methods Mol. Biol. 55:121-131.)
  • marker gene expression suggests that the gene of interest is also present, the presence and expression ofthe gene may need to be confirmed.
  • sequence encoding CSAP is inserted within a marker gene sequence, transformed cells containing sequences encoding CSAP can be identified by the absence of marker gene function.
  • a marker gene can be placed in tandem with a sequence encoding CSAP under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem gene as well.
  • host cells that contain the nucleic acid sequence encoding CSAP and that express CSAP may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridizations, PCR amplification, and protein bioassay or immunoassay techniques which include membrane, solution, or chip based technologies for the detection and/or quantification of nucleic acid or protein sequences. Immunological methods for detecting and measuring the expression of CSAP using either specific polyclonal or monoclonal antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS).
  • ELISAs enzyme-linked immunosorbent assays
  • RIAs radioimmunoassays
  • FACS fluorescence activated cell sorting
  • a two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on CSAP is preferred, but a competitive binding assay may be employed.
  • assays are well known in the art. (See, e.g., Hampton, R. et al. (1990) Serological Methods, a Laboratory Manual, APS Press, St. Paul MN, Sect. TV; Coligan, J.E. et al. (1997) Current Protocols in Immunology, Greene Pub. Associates and Wiley-Interscience, New York NY; and Pound, J.D. (1998) Immunochemical Protocols, Humana Press, Totowa NJ.)
  • Means for producing labeled hybridization or PCR probes for detecting sequences related to polynucleotides encoding CSAP include oligolabeling, nick translation, end-labeling, or PCR amplification using a labeled nucleotide.
  • the sequences encoding CSAP, or any fragments thereof may be cloned into a vector for the production of an mRNA probe.
  • RNA polymerase such as T7, T3, or SP6 and labeled nucleotides.
  • T7, T3, or SP6 RNA polymerase
  • reporter molecules or labels which may be used for ease of detection include radionuclides, enzymes, fluorescent, chemiluminescent, or chromogenic agents, as well as substrates, cofactors, inhibitors, magnetic particles, and the like.
  • Host cells transformed with nucleotide sequences encoding CSAP may be cultured under conditions suitable for the expression and recovery of the protein from cell culture.
  • the protein produced by a transformed cell may be secreted or retained intracellularly depending on the sequence and/or the vector used.
  • expression vectors containing polynucleotides which encode CSAP may be designed to contain signal sequences which direct secretion of CSAP through a prokaryotic or eukaryotic cell membrane.
  • a host cell strain may be chosen for its ability to modulate expression of the inserted sequences or to process the expressed protein in the desired fashion.
  • modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation.
  • Post-translational processing which cleaves a "prepro” or “pro” form of the protein may also be used to specify protein targeting, folding, and/or activity.
  • Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities (e.g., CHO, HeLa, MDCK, HEK293, and WI38) are available from the American Type Culture Collection (ATCC, Manassas VA) and may be chosen to ensure the correct modification and processing of the foreign protein.
  • ATCC American Type Culture Collection
  • natural, modified, or recombinant nucleic acid sequences encoding CSAP may be ligated to a heterologous sequence resulting in translation of a fusion protein in any of the aforementioned host systems.
  • a chimeric CSAP protein containing a heterologous moiety that can be recognized by a commercially available antibody may facilitate the screening of peptide libraries for inhibitors of CSAP activity.
  • Heterologous protein and peptide moieties may also facilitate purification of fusion proteins using commercially available affinity matrices.
  • Such moieties include, but are not limited to, glutathione S-transferase (GST), maltose binding protein (MBP), thioredoxin (Trx), calmodulin binding peptide (CBP), 6-His, FLAG, c-myc, and hemagglutinin (HA).
  • GST, MBP, Trx, CBP, and 6-His enable purification of their cognate fusion proteins on immobilized glutathione, maltose, phenylarsine oxide, calmodulin, and metal-chelate resins, respectively.
  • FLAG, c-myc, and hemagglutinin (HA) enable immunoaffinity purification of fusion proteins using commercially available monoclonal and polyclonal antibodies that specifically recognize these epitope tags.
  • a fusion protein may also be engineered to contain a proteolytic cleavage site located between the CSAP encoding sequence and the heterologous protein sequence, so that CSAP may be cleaved away from the heterologous moiety following purification. Methods for fusion protein expression and purification are discussed in Ausubel (1995, supra, ch. 10). A variety of commercially available kits may also be used to facilitate expression and purification of fusion proteins.
  • synthesis of radiolabeled CSAP may be achieved in vitro using the TNT rabbit reticulocyte lysate or wheat germ extract system (Promega). These systems couple transcription and translation of protein-coding sequences operably associated with the T7, T3, or SP6 promoters. Translation takes place in the presence of a radiolabeled amino acid precursor, for example, 35 S-methionine.
  • CSAP of the present invention or fragments thereof may be used to screen for compounds that specifically bind to CSAP.
  • At least one and up to a plurality of test compounds may be screened for specific binding to CSAP.
  • test compounds include antibodies, oligonucleotides, proteins (e.g., receptors), or small molecules.
  • the compound thus identified is closely related to the natural ligand of
  • CSAP e.g., a ligand or fragment thereof, a natural substrate, a structural or functional mimetic, or a natural binding partner.
  • the compound can be closely related to the natural receptor to which CSAP binds, or to at least a fragment of the receptor, e.g., the ligand binding site. In either case, the compound can be rationally designed using known techniques.
  • screening for these compounds involves producing appropriate cells which express CSAP, either as a secreted protein or on the cell membrane.
  • Preferred cells include cells from mammals, yeast, Drosophila, or K coli. Cells expressing CSAP or cell membrane fractions which contain CSAP are then contacted with a test compound and binding, stimulation, or inhibition of activity of either CSAP or the compound is analyzed.
  • An assay may simply test binding of a test compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label.
  • the assay may comprise the steps of combining at least one test compound with CSAP, either in solution or affixed to a solid support, and detecting the binding of CSAP to the compound.
  • the assay may detect or measure binding of a test compound in the presence of a labeled competitor.
  • the assay may be carried out using cell-free preparations, chemical libraries, or natural product mixtures, and the test compound(s) may be free in solution or affixed to a solid support.
  • CSAP of the present invention or fragments thereof may be used to screen for compounds that modulate the activity of CSAP.
  • Such compounds may include agonists, antagonists, or partial or inverse agonists.
  • an assay is performed under conditions permissive for CSAP activity, wherein CSAP is combined with at least one test compound, and the activity of CSAP in the presence of a test compound is compared with the activity of CSAP in the absence of the test compound. A change in the activity of CSAP in the presence of the test compound is indicative of a compound that modulates the activity of CSAP.
  • a test compound is combined with an in vitro or cell-free system comprising CSAP under conditions suitable for CSAP activity, and the assay is performed. In either of these assays, a test compound which modulates the activity of CSAP may do so indirectly and need not come in direct contact with the test compound. At least one and up ' to a plurality of test compounds may be screened.
  • polynucleotides encoding CSAP or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells.
  • ES embryonic stem
  • Such techniques are well known in the art and are useful for the generation of animal models of human disease.
  • mouse ES cells such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture.
  • the ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292).
  • a marker gene e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292).
  • the vector integrates into the corresponding region of the host genome by homologous recombination.
  • homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330).
  • Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain.
  • the blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains.
  • Transgenic animals thus generated may be tested with potential therapeutic or toxic agents.
  • Polynucleotides encoding CSAP may also be manipulated in vitro in ES cells derived from human blastocysts.
  • Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
  • Polynucleotides encoding CSAP can also be used to create "knockin” humanized animals (pigs) or transgenic animals (mice or rats) to model human disease.
  • knockin technology a region of a polynucleotide encoding CSAP is injected into animal ES cells, and the injected sequence integrates into the animal cell genome.
  • Transformed cells are injected into blastulae, and the blastulae are implanted as described above.
  • Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease.
  • CSAP overexpress CSAP
  • CSAP a mammal inbred to overexpress CSAP, e.g., by secreting CSAP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74). THERAPEUTICS
  • CSAP Chemical and structural similarity, e.g., in the context of sequences and motifs, exists between regions of CSAP and cytoskeleton-associated proteins.
  • tissues expressing CSAP are normal and cancerous lung tissues, and normal and cancerous breast tissues, and can also be found in Table 6. Therefore, CSAP appears to play a role in cell proliferative disorders, viral infections, and neurological disorders.
  • CSAP In the treatment of disorders associated with increased CSAP expression or activity, it is desirable to decrease the expression or activity of CSAP.
  • CSAP In the treatment of disorders associated with decreased CSAP expression or activity, it is desirable to increase the expression or activity of CSAP.
  • CSAP or a fragment or derivative thereof may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP.
  • disorders include, but are not limited to, a cell proliferative disorder such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and a cancer including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver
  • composition comprising a substantially purified CSAP in conjunction with a suitable pharmaceutical carrier may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP including, but not limited to, those provided above.
  • an agonist which modulates the activity of CSAP may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP including, but not limited to, those listed above.
  • an antagonist of CSAP may be administered to a subject to treat or prevent a disorder associated with increased expression or activity of CSAP. Examples of such disorders include, but are not limited to, those cell proliferative disorders, viral infections, and neurological disorders described above.
  • an antibody which specifically binds CSAP may be used directly as an antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells or tissues which express CSAP.
  • a vector expressing the complement of the polynucleotide encoding CSAP may be administered to a subject to treat or prevent a disorder associated with increased expression or activity of CSAP including, but not limited to, those described above.
  • any of the proteins, antagonists, antibodies, agonists, complementary sequences, or vectors of the invention may be administered in combination with other appropriate therapeutic agents. Selection of the appropriate agents for use in combination therapy may be made by one of ordinary skill in the art, according to conventional pharmaceutical principles.
  • the combination of therapeutic agents may act synergistically to effect the treatment or prevention of the various disorders described above. Using this approach, one may be able to achieve therapeutic efficacy with lower dosages of each agent, thus reducing the potential for adverse side effects.
  • An antagonist of CSAP may be produced using methods which are generally known in the art.
  • purified CSAP may be used to produce antibodies or to screen libraries of pharmaceutical agents to identify those which specifically bind CSAP.
  • Antibodies to CSAP may also be generated using methods that are well known in the art. Such antibodies may include, but are not limited to, polyclonal, monoclonal, chimeric, and single chain antibodies, Fab fragments, and fragments produced by a Fab expression library. Neutralizing antibodies (i.e., those which inhibit dimer formation) are generally preferred for therapeutic use.
  • Single chain antibodies may be potent enzyme inhibitors and may have advantages in the design of peptide mimetics, and in the development of immuno-adsorbents and biosensors (Muyldermans, S. (2001) J. Biotechnol. 74:277-302).
  • various hosts including goats, rabbits, rats, mice, camels, dromedaries, llamas, humans, and others may be immunized by injection with CSAP or with any fragment or oligopeptide thereof which has immunogenic properties.
  • various adjuvants may be used to increase immunological response.
  • adjuvants include, but are not limited to, Freund's, mineral gels such as aluminum hydroxide, and surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, KLH, and dinitrophenol.
  • BCG Bacilli Calmette-Guerin
  • Corvnebacterium parvum are especially preferable. It is preferred that the oligopeptides, peptides, or fragments used to induce antibodies to
  • CSAP have an amino acid sequence consisting of at least about 5 amino acids, and generally will consist of at least about 10 amino acids. It is also preferable that these oligopeptides, peptides, or fragments are identical to a portion of the amino acid sequence of the natural protein. Short stretches of CSAP amino acids may be fused with those of another protein, such as KLH, and antibodies to the chimeric molecule may be produced.
  • Monoclonal antibodies to CSAP may be prepared using any technique which provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique, the human B-cell hybridoma technique, and the EBV-hybridoma technique.
  • the hybridoma technique the human B-cell hybridoma technique
  • EBV-hybridoma technique See, e.g., Kohler, G. et al. (1975) Nature 256:495-497; Kozbor, D. et al. (1985) J. Immunol. Methods 81:31-42; Cote, R.J. et al. (1983) Proc. Natl. Acad. Sci. USA 80:2026-2030; and Cole, S.P. et al. (1984) Mol. Cell Biol. 62:109-120.)
  • chimeric antibodies such as the splicing of mouse antibody genes to human antibody genes to obtain a molecule with appropriate antigen specificity and biological activity.
  • techniques developed for the production of single chain antibodies may be adapted, using methods known in the art, to produce CSAP-specific single chain antibodies.
  • Antibodies with related specificity, but of distinct idiotypic composition may be generated by chain shuffling from random combinatorial immunoglobulin libraries. (See, e.g., Burton, D.R. (1991) Proc. Natl. Acad. Sci. USA 88:10134-10137.)
  • Antibodies may also be produced by inducing in vivo production in the lymphocyte population or by screening immunoglobulin libraries or panels of highly specific binding reagents as disclosed in the literature. (See, e.g., Orlandi, R. et al. (1989) Proc. Natl. Acad. Sci. USA 86:3833-3837; Winter, G. et al. (1991) Nature 349:293-299.)
  • Antibody fragments which contain specific binding sites for CSAP may also be generated.
  • fragments include, but are not limited to, F ab ⁇ fragments produced by pepsin digestion of the antibody molecule and Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments.
  • Fab expression libraries may be constructed to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity. (See, e.g., Huse, W.D. et al. (1989) Science 246:1275-1281.)
  • immunoassays may be used for screening to identify antibodies having the desired specificity.
  • Numerous protocols for competitive binding or immunoradiometric assays using either polyclonal or monoclonal antibodies with established specificities are well known in the art.
  • Such immunoassays typically involve the measurement of complex formation between CSAP and its specific antibody.
  • a two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering CSAP epitopes is generally used, but a competitive binding assay may also be employed (Pound, supra).
  • K a is defined as the molar concentration of CSAP-antibody complex divided by the molar concentrations of free antigen and free antibody under equilibrium conditions.
  • K a association constant
  • the K a determined for a preparation of monoclonal antibodies, which are monospecific for a particular CSAP epitope represents a true measure of affinity.
  • High-affinity antibody preparations with K a ranging from about 10 9 to 10 12 L/mole are preferred for use in immunoassays in which the CSAP-antibody complex must withstand rigorous manipulations.
  • Low-affinity antibody preparations with K a ranging from about 10 6 to 10 7 L/mole are preferred for use in immunopurification and similar procedures which ultimately require dissociation of CSAP, preferably in active form, from the antibody (Catty, D. (1988) Antibodies, Volume I: A Practical Approach, IRL Press, Washington DC; Liddell, J.E. and A. Cryer (1991) A Practical Guide to Monoclonal Antibodies. John Wiley & Sons, New York NY).
  • polyclonal antibody preparations may be further evaluated to determine the quality and suitability of such preparations for certain downstream applications.
  • a polyclonal antibody preparation containing at least 1-2 mg specific antibody/ml, preferably 5-10 mg specific antibody/ml is generally employed in procedures requiring precipitation of CSAP-antibody complexes.
  • Procedures for evaluating antibody specificity, titer, and avidity, and guidelines for antibody quality and usage in various applications, are generally available. (See, e.g., Catty, supra, and Coligan et al. supra.)
  • the polynucleotides encoding CSAP may be used for therapeutic purposes.
  • modifications of gene expression can be achieved by designing complementary sequences or antisense molecules (DNA, RNA, PNA, or modified oligonucleotides) to the coding or regulatory regions of the gene encoding CSAP.
  • complementary sequences or antisense molecules DNA, RNA, PNA, or modified oligonucleotides
  • antisense oligonucleotides or larger fragments can be designed from various locations along the coding or control regions of sequences encoding CSAP. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics.
  • Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein.
  • Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors.
  • viral vectors such as retrovirus and adeno-associated virus vectors.
  • Other gene delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems known in the art.
  • polynucleotides encoding CSAP may be used for somatic or germline gene therapy.
  • Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCDD)-Xl disease characterized by X- linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C. et al.
  • SCDD severe combined immunodeficiency
  • ADA adenosine deaminase
  • diseases or disorders caused by deficiencies in CSAP are treated by constructing mammalian expression vectors encoding CSAP and introducing these vectors by mechanical means into CSAP-deficient cells.
  • Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and W.F. Anderson (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and H. Recipon (1998) Curr.
  • Expression vectors that may be effective for the expression of CSAP include, but are not limited to, the PCDNA 3.1, EPJTAG, PRCCMV2, PREP, PVAX, PCR2-TOPOTA vectors (Invitrogen, Carlsbad CA), PCMV-SCREPT, PCMV-TAG, PEGSH7PERV (Stratagene, La Jolla CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA).
  • CSAP may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and H. Bujard (1992) Proc. Natl. Acad. Sci. USA 89:5547-5551; Gossen, M. et al. (1995) Science 268:1766-1769; Rossi, F.M.V. and H.M. Blau (1998) Curr. Opin. Biotechnol.
  • a constitutively active promoter e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes
  • liposome transformation kits e.g., the PERFECT LJPDD TRANSFECTION KIT, available from Invitrogen
  • PERFECT LJPDD TRANSFECTION KIT available from Invitrogen
  • transformation is performed using the calcium phosphate method (Graham, FL. and AJ. Eb (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845).
  • the introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
  • diseases or disorders caused by genetic defects with respect to CSAP expression are treated by constructing a retrovirus vector consisting of (i) the polynucleotide encoding CSAP under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus cis-acting RNA sequences and coding sequences required for efficient vector propagation.
  • Retrovirus vectors e.g., PFB and PFBNEO
  • Retrovirus vectors are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci.
  • the vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61: 1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and A.D. Miller (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J.
  • VPCL vector producing cell line
  • U.S. Patent No. 5,910,434 to Rigg discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4 + T- cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020- 7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, ML.
  • an adenovirus-based gene therapy delivery system is used to deliver polynucleotides encoding CSAP to cells which have one or more genetic abnormalities with respect to the expression of CSAP.
  • the construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent No. 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference.
  • a herpes-based, gene therapy delivery system is used to deliver polynucleotides encoding CSAP to target cells which have one or more genetic abnormalities with respect to the expression of CSAP.
  • the use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing CSAP to cells of the central nervous system, for which HSV has a tropism.
  • the construction and packaging of herpes-based vectors are well known to those with ordinary skill in the art.
  • a replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res. 169:385-395).
  • HSV-1 virus vector has also been disclosed in detail in U.S. Patent No. 5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference.
  • U.S. Patent No. 5,804,413 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for purposes including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22.
  • HSV vectors see also Goins, W.F. et al. (1999) J. Virol.
  • an alphavirus (positive, single-stranded RNA virus) vector is used to deliver polynucleotides encoding CSAP to target cells.
  • SFV Semliki Forest Virus
  • the specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction.
  • the methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
  • Oligonucleotides derived from the transcription initiation site may also be employed to inhibit gene expression. Similarly, inhibition can be achieved using triple helix base-pairing methodology. Triple helix pairing is useful because it causes inhibition of the ability of the double helix to open sufficiently for the binding of polymerases, transcription factors, or regulatory molecules. Recent therapeutic advances using triplex DNA have been described in the literature. (See, e.g., Gee, J.E. et al. (1994) in Huber, B.E. and B.I. Carr, Molecular and Immunologic Approaches. Futura Publishing, Mt. Kisco NY, pp. 163- 177.) A complementary sequence or antisense molecule may also be designed to block translation of mRNA by preventing the transcript from binding to ribosomes.
  • Ribozymes enzymatic RNA molecules, may also be used to catalyze the specific cleavage of RNA.
  • the mechanism of ribozyme action involves sequence-specific hybridization of the ribozyme molecule to complementary target RNA, followed by endonucleolytic cleavage.
  • engineered hammerhead motif ribozyme molecules may specifically and efficiently catalyze endonucleolytic cleavage of sequences encoding CSAP.
  • ribozyme cleavage sites within any potential RNA target are initially identified by scanning the target molecule for ribozyme cleavage sites, including the following sequences: GUA, GUU, and GUC. Once identified, short RNA sequences of between 15 and 20 ribonucleotides, corresponding to the region of the target gene containing the cleavage site, may be evaluated for secondary structural features which may render the oligonucleotide inoperable. The suitability of candidate targets may also be evaluated by testing accessibility to hybridization with complementary oligonucleotides using ribonuclease protection assays.
  • RNA molecules and ribozymes of the invention may be prepared by any method known in the art for the synthesis of nucleic acid molecules. These include techniques for chemically synthesizing oligonucleotides such as solid phase phosphoramidite chemical synthesis.
  • RNA molecules may be generated by in vitro and in vivo transcription of DNA sequences encoding CSAP. Such DNA sequences may be incorporated into a wide variety of vectors with suitable RNA polymerase promoters such as T7 or SP6.
  • these cDNA constructs that synthesize complementary RNA, constitutively or inducibly, can be introduced into cell lines, cells, or tissues.
  • RNA molecules may be modified to increase intracellular stability and half-life. Possible modifications include, but are not limited to, the addition of flanking sequences at the 5' and/or 3' ends of the molecule, or the use of phosphorothioate or 2' O-methyl rather than phosphodiesterase linkages within the backbone of the molecule.
  • An additional embodiment of the invention encompasses a method for screening for a compound which is effective in altering expression of a polynucleotide encoding CSAP.
  • Compounds which may be effective in altering expression of a specific polynucleotide may include, but are not limited to, oligonucleotides, antisense oligonucleotides, triple helix-forming oligonucleotides, transcription factors and other polypeptide transcriptional regulators, and non-macromolecular chemical entities which are capable of interacting with specific polynucleotide sequences. Effective compounds may alter polynucleotide expression by acting as either inhibitors or promoters of polynucleotide expression.
  • a compound which specifically inhibits expression of the polynucleotide encoding CSAP may be therapeutically useful, and in the treatment of disorders associated with decreased CSAP expression or activity, a compound which specifically promotes expression of the polynucleotide encoding CSAP may be therapeutically useful.
  • test compounds may be screened for effectiveness in altering expression of a specific polynucleotide.
  • a test compound may be obtained by any method commonly known in the art, including chemical modification of a compound known to be effective in altering polynucleotide expression; selection from an existing, commercially-available or proprietary library of naturally-occurring or non-natural chemical compounds; rational design of a compound based on chemical and/or structural properties of the target polynucleotide; and selection from a library of chemical compounds created combinatorially or randomly.
  • a sample comprising a polynucleotide encoding CSAP is exposed to at least one test compound thus obtained.
  • the sample may comprise, for example, an intact or permeabilized cell, or an in vitro cell-free or reconstituted biochemical system.
  • Alterations in the expression of a polynucleotide encoding CSAP are assayed by any method commonly known in the art.
  • the expression of a specific nucleotide is detected by hybridization with a probe having a nucleotide sequence complementary to the sequence ofthe polynucleotide encoding CSAP.
  • the amount of hybridization may be quantified, thus forming the basis for a comparison of the expression of the polynucleotide both with and without exposure to one or more test compounds.
  • a screen for a compound effective in altering expression of a specific polynucleotide can be carried out, for example, using a Schizosaccharomyces pombe gene expression system (Atkins, D. et al. (1999) U.S. Patent No. 5,932,435; Arndt, G.M. et al. (2000) Nucleic Acids Res. 28:E15) or a human cell line such as HeLa cell (Clarke, ML. et al. (2000) Biochem. Biophys. Res. Commun.
  • a particular embodiment of the present invention involves screening a combinatorial library of oligonucleotides (such as deoxyribonucleotides, ribonucleotides, peptide nucleic acids, and modified oligonucleotides) for antisense activity against a specific polynucleotide sequence (Bruice, T.W. et al. (1997) U.S. Patent No. 5,686,242; Bruice, T.W. et al. (2000) U.S. Patent No. 6,022,691).
  • oligonucleotides such as deoxyribonucleotides, ribonucleotides, peptide nucleic acids, and modified oligonucleotides
  • vectors may be introduced into stem cells taken from the patient and clonally propagated for autologous transplant back into that same patient. Delivery by transfection, by liposome injections, or by polycationic amino polymers may be achieved using methods which are well known in the art. (See, e.g., Goldman, C.K. et al. (1997) Nat. Biotechnol. 15:462-466.)
  • any of the therapeutic methods described above may be applied to any subject in need of such therapy, including, for example, mammals such as humans, dogs, cats, cows, horses, rabbits, and monkeys.
  • An additional embodiment of the invention relates to the administration of a composition which generally comprises an active ingredient formulated with a pharmaceutically acceptable excipient.
  • Excipients may include, for example, sugars, starches, celluloses, gums, and proteins.
  • Various formulations are commonly known and are thoroughly discussed in the latest edition of Remington's Pharmaceutical Sciences (Maack Publishing, Easton PA).
  • Such compositions may consist of CSAP, antibodies to CSAP, and mimetics, agonists, antagonists, or inhibitors of CSAP.
  • compositions utilized in this invention may be administered by any number of routes including, but not limited to, oral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means.
  • compositions for pulmonary administration may be prepared in liquid or dry powder form. These compositions are generally aerosolized immediately prior to inhalation by the patient.
  • small molecules e.g. traditional low molecular weight organic drugs
  • aerosol delivery of fast-acting formulations is well-known in the art.
  • macromolecules e.g. larger peptides and proteins
  • Pulmonary delivery has the advantage of administration without needle injection, and obviates the need for potentially toxic penetration enhancers.
  • compositions suitable for use in the invention include compositions wherein the active ingredients are contained in an effective amount to achieve the intended purpose.
  • the determination of an effective dose is well within the capability of those skilled in the art.
  • compositions may be prepared for direct intracellular delivery of macromolecules comprising CSAP or fragments thereof.
  • liposome preparations containing a cell-impermeable macromolecule may promote cell fusion and intracellular delivery of the macromolecule.
  • CSAP or a fragment thereof may be joined to a short cationic N- terminal portion from the HJV Tat-1 protein. Fusion proteins thus generated have been found to transduce into the cells of all tissues, including the brain, in a mouse model system (Schwarze, S.R. et al. (1999) Science 285: 1569-1572).
  • the therapeutically effective dose can be estimated initially either in cell culture assays, e.g., of neoplastic cells, or in animal models such as mice, rats, rabbits, dogs, monkeys, or pigs.
  • An animal model may also be used to determine the appropriate concentration range and route of administration. Such information can then be used to determine useful doses and routes for administration in humans.
  • a therapeutically effective dose refers to that amount of active ingredient, for example CSAP or fragments thereof, antibodies of CSAP, and agonists, antagonists or inhibitors of CSAP, which ameliorates the symptoms or condition.
  • Therapeutic efficacy and toxicity may be determined by standard pharmaceutical procedures in cell cultures or with experimental animals, such as by calculating the ED 50 (the dose therapeutically effective in 50% of the population) or LD 50 (the dose lethal to 50% of the population) statistics.
  • the dose ratio of toxic to therapeutic effects is the therapeutic index, wliich can be expressed as the LD 50 /ED 50 ratio.
  • Compositions which exhibit large therapeutic indices are preferred.
  • the data obtained from cell culture assays and animal studies are used to formulate a range of dosage for human use.
  • the dosage contained in such compositions is preferably within a range of circulating concentrations that includes the ED 50 with little or no toxicity. The dosage varies within this range depending upon the dosage form employed, the sensitivity of the patient, and the route of administration.
  • Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain the desired effect. Factors which may be taken into account include the severity of the disease state, the general health of the subject, the age, weight, and gender of the subject, time and frequency of administration, drug combination(s), reaction sensitivities, and response to therapy. Long-acting compositions may be administered every 3 to 4 days, every week, or biweekly depending on the half-life and clearance rate of the particular formulation.
  • Normal dosage amounts may vary from about 0.1 ⁇ g to 100,000 ⁇ g, up to a total dose of about 1 gram, depending upon the route of administration.
  • Guidance as to particular dosages and methods of delivery is provided in the literature and generally available to practitioners in the art. Those skilled in the art will employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotides or polypeptides will be specific to particular cells, conditions, locations, etc. DIAGNOSTICS
  • antibodies which specifically bind CSAP may be used for the diagnosis of disorders characterized by expression of CSAP, or in assays to monitor patients being treated with CSAP or agonists, antagonists, or inhibitors of CSAP.
  • Antibodies useful for diagnostic purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays for CSAP include methods which utilize the antibody and a label to detect CSAP in human body fluids or in extracts of cells or tissues.
  • the antibodies may be used with or without modification, and may be labeled by covalent or non-covalent attachment of a reporter molecule.
  • a wide variety of reporter molecules, several of which are described above, are known in the art and may be used.
  • a variety of protocols for measuring CSAP are known in the art and provide a basis for diagnosing altered or abnormal levels of CSAP expression.
  • Normal or standard values for CSAP expression are established by combining body fluids or cell extracts taken from normal mammalian subjects, for example, human subjects, with antibodies to CSAP under conditions suitable for complex formation. The amount of standard complex formation may be quantitated by various methods, such as photometric means. Quantities of CSAP expressed in subject, control, and disease samples frombiopsied tissues are compared with the standard values. Deviation between standard and subject values establishes the parameters for diagnosing disease.
  • the polynucleotides encoding CSAP may be used for diagnostic purposes.
  • the polynucleotides which may be used include oligonucleotide sequences, complementary RNA and DNA molecules, and PNAs.
  • the polynucleotides may be used to detect and quantify gene expression in biopsied tissues in which expression of CSAP may be correlated with disease.
  • the diagnostic assay may be used to determine absence, presence, and excess expression of CSAP, and to monitor regulation of CSAP levels during therapeutic intervention.
  • hybridization with PCR probes which are capable of detecting polynucleotide sequences, including genomic sequences, encoding CSAP or closely related molecules may be used to identify nucleic acid sequences which encode CSAP.
  • the specificity of the probe whether it is made from a highly specific region, e.g., the 5'regulatory region, or from a less specific region, e.g., a conserved motif, and the stringency of the hybridization or amplification will determine whether the probe identifies only naturally occurring sequences encoding CSAP, allelic variants, or related sequences.
  • Probes may also be used for the detection of related sequences, and may have at least 50% sequence identity to any of the CSAP encoding sequences.
  • the hybridization probes of the subject invention may be DNA or RNA and may be derived from the sequence of SEQ DD NO:29-56 or from genomic sequences including promoters, enhancers, and introns of the CSAP gene.
  • Means for producing specific hybridization probes for DNAs encoding CSAP include the cloning of polynucleotide sequences encoding CSAP or CSAP derivatives into vectors for the production of mRNA probes.
  • Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the appropriate labeled nucleotides.
  • Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides such as 32 P or 35 S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, and the like.
  • Polynucleotide sequences encoding CSAP may be used for the diagnosis of disorders associated with expression of CSAP.
  • disorders include, but are not limited to, a cell proliferative disorder such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and a cancer including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pan
  • the polynucleotide sequences encoding CSAP may be used in Southern or northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in microarrays utilizing fluids or tissues from patients to detect altered CSAP expression. Such qualitative or quantitative methods are well known in the art.
  • the nucleotide sequences encoding CSAP may be useful in assays that detect the presence of associated disorders, particularly those mentioned above.
  • the nucleotide sequences encoding CSAP may be labeled by standard methods and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and the signal is quantified and compared with a standard value. If the amount of signal in the patient sample is significantly altered in comparison to a control sample then the presence of altered levels of nucleotide sequences encoding CSAP in the sample indicates the presence of the associated disorder.
  • Such assays may also be used to evaluate the efficacy of a particular therapeutic treatment regimen in animal studies, in clinical trials, or to monitor the treatment of an individual patient.
  • a normal or standard profile for expression is established. This may be accomplished by combining body fluids or cell extracts taken from normal subjects, either animal or human, with a sequence, or a fragment thereof, encoding CSAP, under conditions suitable for hybridization or amplification. Standard hybridization may be quantified by comparing the values obtained from normal subjects with values from an experiment in which a known amount of a substantially purified polynucleotide is used. Standard values obtained in this manner may be compared with values obtained from samples from patients who are symptomatic for a disorder. Deviation from standard values is used to establish the presence of a disorder.
  • hybridization assays may be repeated on a regular basis to determine if the level of expression in the patient begins to approximate that which is observed in the normal subject.
  • the results obtained from successive assays may be used to show the efficacy of treatment over a period ranging from several days to months.
  • the presence of an abnormal amount of transcript (either under- or overexpressed) in biopsied tissue from an individual may indicate a predisposition for the development of the disease, or may provide a means for detecting the disease prior to the appearance of actual clinical symptoms.
  • a more definitive diagnosis of this type may allow health professionals to employ preventative measures or aggressive treatment earlier thereby preventing the development or further progression of the cancer.
  • oligonucleotides designed from the sequences encoding CSAP may involve the use of PCR. These oligomers may be chemically synthesized, generated enzymatically, or produced in vitro. Oligomers will preferably contain a fragment of a polynucleotide encoding CSAP, or a fragment of a polynucleotide complementary to the polynucleotide encoding CSAP, and will be employed under optimized conditions for identification of a specific gene or condition. Oligomers may also be employed under less stringent conditions for detection or quantification of closely related DNA or RNA sequences.
  • oligonucleotide primers derived from the polynucleotide sequences encoding CSAP may be used to detect single nucleotide polymorphisms (SNPs).
  • SNPs are substitutions, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans.
  • Methods of SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods.
  • SSCP single-stranded conformation polymorphism
  • fSSCP fluorescent SSCP
  • oligonucleotide primers derived from the polynucleotide sequences encoding CSAP are used to amplify DNA using the polymerase chain reaction (PCR).
  • the DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like.
  • SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels.
  • the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high- throughput equipment such as DNA sequencing machines.
  • sequence database analysis methods termed in silico SNP (isSNP) are capable of identifying polymorphisms by comparing the sequence of individual overlapping DNA fragments which assemble into a common consensus sequence.
  • SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). SNPs may be used to study the genetic basis of human disease. For example, at least 16 common SNPs have been associated with non-insulin-dependent diabetes mellitus. SNPs are also useful for examining differences in disease outcomes in monogenic disorders, such as cystic fibrosis, sickle cell anemia, or chronic granulomatous disease.
  • variants in the mannose-binding lectin, MBL2 have been shown to be correlated with deleterious pulmonary outcomes in cystic fibrosis.
  • SNPs also have utility in pharmacogenomics, the identification of genetic variants that influence a patient's response to a drug, such as life-threatening toxicity.
  • a variation in N-acetyl fransferase is associated with a high incidence of peripheral neuropathy in response to the anti-tuberculosis drug isoniazid, while a variation in the core promoter of the ALOX5 gene results in diminished clinical response to treatment with an anti-asthma drug that targets the 5-lipoxygenase pathway.
  • Methods which may also be used to quantify the expression of CSAP include radiolabeling or biotinylating nucleotides, coamplification of a control nucleic acid, and interpolating results from standard curves.
  • radiolabeling or biotinylating nucleotides include radiolabeling or biotinylating nucleotides, coamplification of a control nucleic acid, and interpolating results from standard curves.
  • oligonucleotides or longer fragments derived from any of the polynucleotide sequences described herein may be used as elements on a microarray.
  • the microarray can be used in transcript imaging techniques which monitor the relative expression levels of large numbers of genes simultaneously as described below.
  • the microarray may also be used to identify genetic variants, mutations, and polymorphisms.
  • This information may be used to determine gene function, to understand the genetic basis of a disorder, to diagnose a disorder, to monitor progression/regression of disease as a function of gene expression, and to develop and monitor the activities of therapeutic agents in the treatment of disease.
  • this information may be used to develop a pharmacogenomic profile of a patient in order to select the most appropriate and effective treatment regimen for that patient. For example, therapeutic agents which are highly effective and display the fewest side effects may be selected for a patient based on his/her pharmacogenomic profile.
  • CSAP fragments of CSAP, or antibodies specific for CSAP may be used as elements on a microarray.
  • the microarray may be used to monitor or measure protein-protein interactions, drug-target interactions, and gene expression profiles, as described above.
  • a particular embodiment relates to the use of the polynucleotides of the present invention to generate a transcript image of a tissue or cell type.
  • a transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent No.
  • a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type. Ixi one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray. The resultant transcript image would provide a profile of gene activity.
  • Transcript images may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples.
  • the transcript image may thus reflect gene expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line.
  • Transcript images which profile the expression of the polynucleotides of the present invention may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E.F. et al. (1999) Mol. Carcinog. 24:153-159; Steiner, S.
  • test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties.
  • fingerprints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds.
  • the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified. The transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
  • proteome refers to the global pattern of protein expression in a particular tissue or cell type.
  • proteome expression patterns, or profiles are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time.
  • a profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type.
  • the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra).
  • the proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains.
  • the optical density of each protein spot is generally proportional to the level of the protein in the sample.
  • the optical densities of equivalently positioned protein spots from different samples for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment.
  • the proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry.
  • the identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
  • a proteomic profile may also be generated using antibodies specific for CSAP to quantify the levels of CSAP expression. Ixi one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem.
  • Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
  • Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level.
  • There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and J. Seilhamer (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile.
  • the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the polypeptides of the present invention.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the polypeptides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
  • Microarrays may be prepared, used, and analyzed using methods known in the art.
  • methods known in the art See, e.g., Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 2155; and Heller, M.J. et al.
  • nucleic acid sequences encoding CSAP may be used to generate hybridization probes useful in mapping the naturally occurring genomic sequence. Either coding or noncoding sequences may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of a coding sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping.
  • sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries.
  • HACs human artificial chromosomes
  • YACs yeast artificial chromosomes
  • BACs bacterial artificial chromosomes
  • PI constructions or single chromosome cDNA libraries.
  • the nucleic acid sequences of the invention may be used to develop genetic linkage maps, for example, which correlate the inheritance of a disease state with the inheritance of a particular chromosome region or restriction fragment length polymorphism (RFLP).
  • RFLP restriction fragment length polymorphism
  • FISH Fluorescent in situ hybridization
  • Examples of genetic map data can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMEVI) World Wide Web site. Correlation between the location of the gene encoding CSAP on a physical map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder and thus may further positional cloning efforts.
  • OMEVI Online Mendelian Inheritance in Man
  • In situ hybridization of chromosomal preparations and physical mapping techniques may be used for extending genetic maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the exact chromosomal locus is not known. This information is valuable to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once the gene or genes responsible for a disease or syndrome have been crudely localized by genetic linkage to a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation.
  • nucleotide sequence of the instant invention may also be used to detect differences in the chromosomal location due to translocation, inversion, etc., among normal, carrier, or affected individuals.
  • CSAP its catalytic or immunogenic fragments, or oligopeptides thereof can be used for screening libraries of compounds in any of a variety of drug screening techniques.
  • the fragment employed in such screening may be free in solution, affixed to a solid support, borne on a cell surface, or located intracellularly. The formation of binding complexes between CSAP and the agent being tested may be measured.
  • Another technique for drug screening provides for high throughput screening of compounds having suitable binding affinity to the protein of interest.
  • This method large numbers of different small test compounds are synthesized on a solid substrate. The test compounds are reacted with CSAP, or fragments thereof, and washed. Bound CSAP is then detected by methods well known in the art. Purified CSAP can also be coated directly onto plates for use in the aforementioned drug screening techniques. Alternatively, non-neutralizing antibodies can be used to capture the peptide and immobilize it on a solid support.
  • nucleotide sequences which encode CSAP may be used in any molecular biology techniques that have yet to be developed, provided the new techniques rely on properties of nucleotide sequences that are currently known, including, but not limited to, such properties as the triplet genetic code and specific base pair interactions.
  • Incyte cDNAs were derived from cDNA libraries described in the LEFESEQ GOLD database (Incyte Genomics, Palo Alto CA). Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRLZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was precipitated from the lysates with either isopropanol or sodium acetate and ethanol, or by other routine methods.
  • poly(A)+ RNA was isolated using oligo d(T)-coupled paramagnetic particles (Promega), OLIGOTEX latex particles (QIAGEN, Chatsworth CA), or an OLIGOTEX mRNA purification kit (QIAGEN).
  • RNA was provided with RNA and constructed the corresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene) or SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra, units 5.1-6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes.
  • the cDNA was size-selected (300- 1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) or preparative agarose gel electrophoresis.
  • cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCREPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (Invitrogen, Carlsbad CA), PBK-CMV plasmid (Stratagene), PCR2-TOPOTA plasmid (Invitrogen), PCMV-ICIS plasmid (Stratagene), pIGEN (Incyte Genomics, Palo Alto CA), pRARE (Incyte Genomics), or pINCY (Incyte Genomics), or derivatives thereof.
  • Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5 ⁇ , DH10B, or ElectroMAX DH10B from Life Technologies.
  • Plasmids obtained as described in Example I were recovered from host cells by in vivo excision using the UNLZAP vector system (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: a Magic or WIZARD Minipreps DNA purification system (Promega); an AGTC Miniprep purification kit (Edge Biosystems, Gaithersburg MD); and QIAWELL 8 Plasmid, QIAWELL 8 Plus Plasmid, QIAWELL 8 Ultra Plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit from QIAGEN. Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or without lyophilization, at 4°C
  • plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format (Rao, V.B. (1994) Anal. Biochem. 216:1-14). Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using PICOGREEN dye (Molecular Probes, Eugene OR) and a FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland).
  • PICOGREEN dye Molecular Probes, Eugene OR
  • FLUOROSKAN II fluorescence scanner Labsystems Oy, Helsinki, Finland.
  • Incyte cDNA recovered in plasmids as described in Example II were sequenced as follows. Sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 (Applied Biosystems) thermal cycler or the PTC-200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific) or the MICROLAB 2200 (Hamilton) liquid transfer system. cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems).
  • Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (Applied Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel,
  • the polynucleotide sequences derived from Incyte cDNAs were validated by removing vector, linker, and poly(A) sequences and by masking ambiguous bases, using algorithms and programs based on BLAST, dynamic programming, and dinucleotide nearest neighbor analysis.
  • Incyte cDNA sequences or translations thereof were then queried against a selection of public databases such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases, and BLOCKS, PRINTS, DOMO, PRODOM; PROTEOME databases with sequences from Homo sapiens, Rattus norvegicus, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae,
  • HMM hidden Markov model
  • PFAM PFAM
  • EMCY EMCY
  • TIGRFAM TIGRFAM
  • HMM-based protein domain databases such as SMART (Schultz et al. (1998) Proc. Natl. Acad. Sci. USA 95:5857-5864; Letunic, I. et al. (2002) Nucleic Acids Res. 30:242-244).
  • HMM is a probabilistic approach which analyzes consensus primary structures of gene families.
  • Incyte cDNA sequences were assembled to produce full length polynucleotide sequences.
  • GenBank cDNAs, GenBank ESTs, stitched sequences, stretched sequences, or Genscan-predicted coding sequences were used to extend Incyte cDNA assemblages to full length.
  • a polypeptide of the invention may begin at any of the methionine residues of the full length translated polypeptide.
  • Full length polypeptide sequences were subsequently analyzed by querying against databases such as the GenBank protein databases (genpept), SwissProt, the PROTEOME databases, BLOCKS, PRINTS, DOMO, PRODOM, Prosite, hidden Markov model (HMM)-based protein family databases such as PFAM, INCY, and TIGRFAM; and HMM-based protein domain databases such as SMART.
  • GenBank protein databases Genpept
  • PROTEOME databases
  • BLOCKS BLOCKS
  • PRINTS DOMO
  • PRODOM hidden Markov model
  • Prosite Prosite
  • HMM-based protein family databases such as PFAM, INCY, and TIGRFAM
  • HMM-based protein domain databases such as SMART.
  • Full length polynucleotide sequences are also analyzed using MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR).
  • Polynucleotide and polypeptide sequence alignments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the MEGALIGN multisequence alignment program (DNASTAR), which also calculates the percent identity between aligned sequences.
  • Table 7 summarizes the tools, programs, and algorithms used for the analysis and assembly of
  • Incyte cDNA and full length sequences and provides applicable descriptions, references, and threshold parameters.
  • the first column of Table 7 shows the tools, programs, and algorithms used, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score or the lower the probability value, the greater the identity between two sequences).
  • Putative cytoskeleton-associated proteins were initially identified by running the Genscan gene identification program against public genomic sequence databases (e.g., gbpri and gbhtg).
  • Genscan is a general-purpose gene identification program which analyzes genomic DNA sequences from a variety of organisms (See Burge, C. and S. Karlin (1997) J. Mol. Biol. 268:78-94, and Burge, C. and S. Karlin (1998) Curr. Opin. Struct. Biol. 8:346-354).
  • the program concatenates predicted exons to form an assembled cDNA sequence extending from a methionine to a stop codon.
  • the output of Genscan is a FASTA database of polynucleotide and polypeptide sequences. The maximum range of sequence for Genscan to analyze at once was set to 30 kb.
  • Genscan predicted cDNA sequences encode cytoskeleton-associated proteins
  • the encoded polypeptides were analyzed by querying against PFAM models for cytoskeleton-associated proteins. Potential cytoskeleton-associated proteins were also identified by homology to Incyte cDNA sequences that had been annotated as cytoskeleton-associated proteins.
  • Genscan- predicted sequences were then compared by BLAST analysis to the genpept and gbpri public databases. Where necessary, the Genscan-predicted sequences were then edited by comparison to the top BLAST hit from genpept to correct errors in the sequence predicted by Genscan, such as extra or omitted exons.
  • BLAST analysis was also used to find any Incyte cDNA or public cDNA coverage of the Genscan-predicted sequences, thus providing evidence for transcription. When Incyte cDNA coverage was available, this information was used to correct or confirm the Genscan predicted sequence.
  • Full length polynucleotide sequences were obtained by assembling Genscan-predicted coding sequences with Incyte cDNA sequences and or public cDNA sequences using the assembly process described in Example JJJ. Alternatively, full length polynucleotide sequences were derived entirely from edited or unedited Genscan-predicted coding sequences.
  • Partial cDNA sequences were extended with exons predicted by the Genscan gene identification program described in Example IV. Partial cDNAs assembled as described in Example ID were mapped to genomic DNA and parsed into clusters containing related cDNAs and Genscan exon predictions from one or more genomic sequences. Each cluster was analyzed using an algorithm based on graph theory and dynamic programming to integrate cDNA and genomic information, generating possible splice variants that were subsequently confirmed, edited, or extended to create a full length sequence. Sequence intervals in which the entire length of the interval was present on more than one sequence in the cluster were identified, and intervals thus identified were considered to be equivalent by transitivity.
  • Partial DNA sequences were extended to full length with an algorithm based on BLAST analysis.
  • First, partial cDNAs assembled as described in Example UI were queried against public databases such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases using the BLAST program.
  • GenBank primate such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases
  • the nearest GenBank protein homolog was then compared by BLAST analysis to either Incyte cDNA sequences or GenScan exon predicted sequences described in Example IV.
  • a chimeric protein was generated by using the resultant high-scoring segment pairs (HSPs) to map the translated sequences onto the GenBank protein homolog. Insertions or deletions may occur in the chimeric protein with respect to the original GenBank protein homolog.
  • HSPs high-scoring segment pairs
  • GenBank protein homolog The GenBank protein homolog, the chimeric protein, or both were used as probes to search for homologous genomic sequences from the public human genome databases. Partial DNA sequences were therefore "stretched” or extended by the addition of homologous genomic sequences. The resultant stretched sequences were examined to determine whether it contained a complete gene. VI. Chromosomal Mapping of CSAP Encoding Polynucleotides
  • sequences which were used to assemble SEQ ID NO:29-56 were compared with sequences from the Incyte LDFESEQ database and public domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that matched SEQ ED NO:29-56 were assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as Phrap (Table 7). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon were used to determine if any of the clustered sequences had been previously mapped. Inclusion of a mapped sequence in a cluster resulted in the assignment of all sequences of that cluster, including its particular SEQ ED NO:, to that map location.
  • SHGC Stanford Human Genome Center
  • WIGR Whitehead Institute for Genome Research
  • Map locations are represented by ranges, or intervals, of human chromosomes.
  • the map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p- arm.
  • centiMorgan cM
  • centiMorgan is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.
  • the cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
  • Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel (1995) supra, ch. 4 and 16.)
  • the product score takes into account both the degree of similarity between two sequences and the length of the sequence match.
  • the product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences).
  • the BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score.
  • the product score represents a balance between fractional overlap and quality in a BLAST alignment.
  • a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared.
  • a product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other.
  • a product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
  • polynucleotide sequences encoding CSAP are analyzed with respect to the tissue sources from which they were derived. For example, some full length sequences are assembled, at least in part, with overlapping Incyte cDNA sequences (see Example DT). Each cDNA sequence is derived from a cDNA library constructed from a human tissue.
  • Each human tissue is classified into one of the following organ/tissue categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract.
  • the number of libraries in each category is counted and divided by the total number of libraries across all categories.
  • each human tissue is classified into one of the following disease/condition categories: cancer, cell line, developmental, inflammation, neurological, trauma, cardiovascular, pooled, and other, and the number of libraries in each category is counted and divided by the total number of libraries across all categories. The resulting percentages reflect the tissue- and disease-specific expression of cDNA encoding CSAP.
  • cDNA sequences and cDNA library/tissue information are found in the LDFESEQ GOLD database (Incyte Genomics, Palo Alto CA). VIII. Extension of CSAP Encoding Polynucleotides
  • Full length polynucleotide sequences were also produced by extension of an appropriate • fragment of the full length molecule using oligonucleotide primers designed from this fragment.
  • One primer was synthesized to initiate 5' extension of the known fragment, and the other primer was synthesized to initiate 3' extension of the known fragment.
  • the initial primers were designed using OLIGO 4.06 software (National Biosciences), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations was avoided.
  • Selected human cDNA libraries were used to extend the sequence. If more than one extension was necessary or desired, additional or nested sets of primers were designed.
  • PCR was performed in 96-well plates using the PTC-200 thermal cycler (MJ Research, Inc.).
  • the reaction mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg 2+ , (NH 4 ) 2 S0 4 , and 2-mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C.
  • the parameters for primer pair T7 and SK+ were as follows: Step 1: 94 °C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C.
  • the concentration of DNA in each well was determined by dispensing 100 ⁇ l PICOGREEN quantitation reagent (0.25% (v/v) PICOGREEN; Molecular Probes, Eugene OR) dissolved in IX TE and 0.5 ⁇ l of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Costar, Acton MA), allowing the DNA to bind to the reagent.
  • the plate was scanned in a Fluoroskan ⁇ (Labsystems Oy, Helsinki, Finland) to measure the fluorescence of the sample and to quantify the concentration of DNA.
  • a 5 ⁇ l to 10 ⁇ l aliquot of the reaction mixture was analyzed by electrophoresis on a 1 % agarose gel to determine which reactions were successful in extending the sequence.
  • the extended nucleotides were desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech).
  • CviJI cholera virus endonuclease Molecular Biology Research, Madison WI
  • sonicated or sheared prior to religation into pUC 18 vector
  • the digested nucleotides were separated on low concentration (0.6 to 0.8%) agarose gels, fragments were excised, and agar digested with Agar ACE (Promega).
  • Extended clones were religated using T4 ligase (New England Biolabs, Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells were selected on antibiotic-containing media, and individual colonies were picked and cultured overnight at 37 °C in 384-well plates in LB/2x carb liquid media. The cells were lysed, and DNA was amplified by PCR using Taq DNA polymerase
  • Step 1 94°C, 3 min
  • Step 2 94°C, 15 sec
  • Step 3 60°C, 1 min
  • Step 4 72°C, 2 min
  • Step 5 steps 2, 3, and 4 repeated 29 times
  • Step 6 72°C, 5 min
  • Step 7 storage at 4°C.
  • DNA was quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reamplified using the same conditions as described above.
  • SNPs single nucleotide polymorphisms
  • LDFESEQ database LDFESEQ database
  • Certain SNPs were selected for further characterization by mass spectrometry using the high throughput MASSARRAY system (Sequenom, Inc.) to analyze allele frequencies at the SNP sites in four different human populations.
  • the Caucasian population comprised 92 individuals (46 male, 46 female), including 83 from Utah, four French, three deciualan, and two Amish individuals.
  • the African population comprised 194 individuals (97 male, 97 female), all African Americans.
  • the Hispanic population comprised 324 individuals (162 male, 162 female), all Mexican Hispanic.
  • the Asian population comprised 126 individuals (64 male, 62 female) with a reported parental breakdown of 43% Chinese, 31% Japanese, 13% Korean, 5% Vietnamese, and 8% other Asian. Allele frequencies were first analyzed in the Caucasian population; in some cases those SNPs which showed no allelic variance in this population were not further tested in the other three populations.
  • Hybridization probes derived from SEQ DD NO:29-56 are employed to screen cDNAs, genomic DNAs, or mRNAs. Although the labeling of oligonucleotides, consisting of about 20 base pairs, is specifically described, essentially the same procedure is used with larger nucleotide fragments. Oligonucleotides are designed using state-of-the-art software such as OLIGO 4.06 software (National Biosciences) and labeled by combining 50 pmol of each oligomer, 250 ⁇ Ci of [ ⁇ - 32 P] adenosine triphosphate (Amersham Pharmacia Biotech), and T4 polynucleotide kinase (DuPont NEN, Boston MA).
  • the labeled oligonucleotides are substantially purified using a SEPHADEX G-25 superfine size exclusion dextran bead column (Amersham Pharmacia Biotech). An aliquot containing 10 7 counts per minute of the labeled probe is used in a typical membrane-based hybridization analysis of human genomic DNA digested with one of the following endonucleases: Ase I, Bgl B, Eco RI, Pst I, Xba I, or Pvu II (DuPont NEN).
  • the DNA from each digest is fractionated on a 0.7% agarose gel and transferred to nylon membranes (Nytran Plus, Schleicher & Schuell, Durham NH). Hybridization is carried out for 16 hours at 40°C. To remove nonspecific signals, blots are sequentially washed at room temperature under conditions of up to, for example, 0.1 x saline sodium citrate and 0.5% sodium dodecyl sulfate. Hybridization patterns are visualized using autoradiography or an alternative imaging means and compared. XI. Microarrays
  • the linkage or synthesis of array elements upon a microarray can be achieved utilizing photolithography, piezoelectric printing (ink-jet printing, See, e.g., Baldeschweiler, supra.), mechanical microspotting technologies, and derivatives thereof.
  • the substrate in each of the aforementioned technologies should be uniform and solid with a non-porous surface (Schena (1999), supra). Suggested substrates include silicon, silica, glass slides, glass chips, and silicon wafers. Alternatively, a procedure analogous to a dot or slot blot may also be used to arrange and link elements to the surface of a substrate using thermal, UV, chemical, or mechanical bonding procedures.
  • a typical array may be produced using available methods and machines well known to those of ordinary skill in the art and may contain any appropriate number of elements. (See, e.g., Schena, M. et al. (1995) Science 270:467-470; Shalon, D. et al. (1996) Genome Res. 6:639-645; Marshall, A. and J. Hodgson (1998) Nat. Biotechnol. 16:27-31.)
  • Full length cDNAs, Expressed Sequence Tags (ESTs), or fragments or oligomers thereof may comprise the elements of the microarray. Fragments or oligomers suitable for hybridization can be selected using software well known in the art such as LASERGENE software (DNASTAR).
  • the array elements are hybridized with polynucleotides in a biological sample.
  • the polynucleotides in the biological sample are conjugated to a fluorescent label or other molecular tag for ease of detection.
  • a fluorescence scanner is used to detect hybridization at each array element.
  • laser desorbtion and mass spectrometry may be used for detection of hybridization.
  • the degree of complementarity and the relative abundance of each polynucleotide which hybridizes to an element on the microarray may be assessed. Ixi one embodiment, microarray preparation and usage is described in detail below.
  • Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and poly(A) + RNA is purified using the oligo-(dT) cellulose method.
  • Each poly (A) + RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/ ⁇ l oligo-(dT) primer (21mer), IX first strand buffer, 0.03 units/ ⁇ l RNase inhibitor, 500 ⁇ M dATP, 500 ⁇ M dGTP, 500 ⁇ M dTTP, 40 ⁇ M dCTP, 40 ⁇ M dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech).
  • the reverse transcription reaction is performed in a 25 ml volume containing 200 ng poly(A) + RNA with GEMBRIGHT kits (Incyte).
  • Specific control poly(A) + RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Samples are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.
  • reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol.
  • the sample is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 ⁇ l 5X SSC/0.2% SDS.
  • nonmalignant primary mammary epithelial cells and breast carcinoma cell lines are grown to 70-80% confluence prior to harvest.
  • Gene expression profiles of nonmalignant primary mammary epithelial cells are compared to those of breast carcinoma cell lines at different stages of tumor progression.
  • Sequences of the present invention are used to generate array elements.
  • Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts.
  • PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert.
  • Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 ⁇ g. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
  • Purified array elements are immobilized on polymer-coated glass slides.
  • Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments.
  • Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Corporation (VWR), West Chester PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven.
  • Array elements are applied to the coated glass substrate using a procedure described in U.S.
  • Patent No. 5,807,522 incorporated herein by reference.
  • 1 ⁇ l of the array element DNA, at an average concentration of 100 ng/ ⁇ l, is loaded into the open capillary printing element by a high-speed robotic apparatus.
  • the apparatus then deposits about 5 nl of array element sample per slide.
  • Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford MA) for 30 minutes at 60° C followed by washes in- 0.2% SDS and distilled water as before. Hybridization Hybridization reactions contain 9 ⁇ l of sample mixture consisting of 0.2 ⁇ g each of Cy3 and
  • Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5.
  • the excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY).
  • the slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- scanned past the objective.
  • the 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
  • a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate filters positioned between the array and the photomultiplier tubes are used to filter the signals.
  • the emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5.
  • Each array is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously.
  • the sensitivity of the scans is typically calibrated using the signal intensity generated by a cDNA control species added to the sample mixture at a known concentration.
  • a specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1:100,000.
  • the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
  • the output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A/D) conversion board (Analog Devices, Inc., Norwood MA) installed in an IBM-compatible PC computer.
  • the digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal).
  • the data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore' s emission spectrum.
  • a grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid.
  • the fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal.
  • the software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
  • component 5504134JHGG3 of SEQ ED NO: 31 and component 5504134JHGG3 of SEQ ED NO: 33 showed differential expression in nonmalignant primary mammary epithelial cells versus breast carcinoma cell lines at different stages of tumor progression, as determined by microarray analysis.
  • the expression of component 5504134JHGG3 was altered by at least a factor of 2 in breast carcinoma cell lines. Therefore, SEQ ED NO:31 and SEQ DD NO:33 are useful in diagnostic assays for cell proliferative disorders.
  • SEQ ED NO:50 showed differential expression in human lung adenocarcinoma and squamous cell carcinoma versus normal lung tissue as determined by microarray analysis.
  • SEQ DD NO:50 is useful in diagnostic assays for lung adenocarcinoma and squamous cell carcinoma.
  • Sequences complementary to the CSAP-encoding sequences, or any parts thereof, are used to detect, decrease, or inhibit expression of naturally occurring CSAP.
  • oligonucleotides comprising from about 15 to 30 base pairs is described, essentially the same procedure is used with smaller or with larger sequence fragments.
  • Appropriate oligonucleotides are designed using OLIGO 4.06 software (National Biosciences) and the coding sequence of CSAP.
  • a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent promoter binding to the coding sequence.
  • To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding to the CSAP-encoding transcript.
  • CSAP expression and purification of CSAP is achieved using bacterial or virus-based expression systems.
  • cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription.
  • promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element.
  • Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3).
  • Antibiotic resistant bacteria express CSAP upon induction with isopropyl beta-D- thiogalactopyranoside (IPTG).
  • CSAP in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus.
  • AcMNPV Autographica califomica nuclear polyhedrosis virus
  • the nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding CSAP by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription.
  • Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases.
  • CSAP is synthesized as a fusion protein with, e.g., glutathione S- transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates.
  • GST glutathione S- transferase
  • a peptide epitope tag such as FLAG or 6-His
  • FLAG an 8-amino acid peptide
  • 6-His a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra, ch. 10 and 16). Purified CSAP obtained by these methods can be used directly in the assays shown in Examples XVH and XVfll, where applicable. XIV. Functional Assays
  • CSAP function is assessed by expressing the sequences encoding CSAP at physiologically elevated levels in mammalian cell culture systems.
  • cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression.
  • Vectors of choice include PCMV SPORT (Life Technologies) and PCR3.1 (Invitrogen, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 ⁇ g of recombinant vector are transiently transfected into a human cell line, for example, an endothelial or hematopoietic cell line, using either liposome formulations or electroporation.
  • 1-2 ⁇ g of an additional plasmid containing sequences encoding a marker protein are co-transfected.
  • Expression of a marker protem provides a means to distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector.
  • Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; Clontech), CD64, or a CD64-GFP fusion protein.
  • FCM Flow cytometry
  • FCM Flow cytometry
  • FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the cell surface. Methods in flow cytometry are discussed in Ormerod, M.G. (1994) Flow Cytometry, Oxford, New York NY.
  • CSAP The influence of CSAP on gene expression can be assessed using highly purified populations of cells transfected with sequences encoding CSAP and either CD64 or CD64-GFP.
  • CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding CSAP and other genes of interest can be analyzed by northern analysis or microarray techniques.
  • CSAP substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize animals (e.g., rabbits, mice, etc.) and to produce antibodies using standard protocols.
  • the CSAP amino acid sequence is analyzed using LASERGENE software (DNASTAR) to determine regions of high immunogenicity, and a corresponding oligopeptide is synthesized and used to raise antibodies by means known to those of skill in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra, ch. 11.)
  • oligopeptides of about 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using FMOC chemistry and coupled to KLH (Sigma- Aldrich, St. Louis MO) by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity.
  • ABI 431 A peptide synthesizer Applied Biosystems
  • KLH Sigma- Aldrich, St. Louis MO
  • MBS N-maleimidobenzoyl-N-hydroxysuccinimide ester
  • Rabbits are immunized with the oligopeptide-KLH complex in complete Freund's adjuvant.
  • Resulting antisera are tested for antipeptide and anti-CSAP activity by, for example, binding the peptide or CSAP to a substrate, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio-iodinated goat anti-rabbit IgG.
  • Naturally occurring or recombinant CSAP is substantially purified by immunoaffinity chromatography using antibodies specific for CSAP.
  • An immunoaffinity column is constructed by covalently coupling anti-CSAP antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions.
  • Media containing CSAP are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of CSAP (e.g., high ionic strength buffers in the presence of detergent).
  • the column is eluted under conditions that disrupt antibody/CSAP binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and CSAP is collected.
  • CSAP or biologically active fragments thereof, are labeled with 125 I Bolton-Hunter reagent.
  • Bolton-Hunter reagent See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.
  • Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled CSAP, washed, and any wells with labeled CSAP complex are assayed. Data obtained using different concentrations of CSAP are used to calculate values for the number, affinity, and association of CSAP with the candidate molecules.
  • molecules interacting with CSAP are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially available kits based on the two-hybrid system, such as the MATCHMAKER system (Clontech).
  • CSAP may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
  • a microtubule motility assay for CSAP measures motor protein activity.
  • recombinant CSAP is immobilized onto a glass slide or similar substrate.
  • Taxol-stabilized bovine brain microtubules (commercially available) in a solution containing ATP and cytosolic extract are perfused onto the slide. Movement of microtubules as driven by CSAP motor activity can be visualized and quantified using video-enhanced light microscopy and image analysis techniques.
  • CSAP activity is directly proportional to the frequency and velocity of microtubule movement.
  • an assay for CSAP measures the formation of protein filaments in vitro.
  • a solution of CSAP at a concentration greater than the "critical concentration" for polymer assembly is applied to carbon-coated grids. Appropriate nucleation sites may be supplied in the solution.
  • the grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electron microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 nm (intermediate filaments) is a demonstration of CSAP activity.
  • CSAP activity is measured by the binding of CSAP to protein filaments.
  • 35 S-Met labeled CSAP sample is incubated with the appropriate filament protein (actin, tubulin, or intermediate filament protein) and complexed protein is collected by immunoprecipitation using an antibody against the filament protein. The immunoprecipitate is then run out on SDS-PAGE and the amount of CSAP bound is measured by autoradiography.
  • 2688 660-1328, 706-1328, 720-910, 722-905, 756-1328, 904-1475, 905-1582, 954-1491, 1022-1570, 1068-1248, 1129-1625, 1129-1769, 1161-1391, 1203-1716, 1270-1985, 1276-1826, 1514-1725, 1565-2127, 1661-2133, 1667-2243, 1671-2269, 1719-1825, 1723-2407, 1729-1977, 1741-2129,
  • ABI FACTURA A program that removes vector sequences and Applied Biosystems, Foster City, CA. masks ambiguous bases in nucleic acid sequences.
  • ABItPARACEL FDF A Fast Data Finder useful in comparing and Applied Biosystems, Foster City, CA; Mismatch ⁇ 50% annotating amino acid or nucleic acid sequences. Paracel Inc., Pasadena, CA.
  • ABI AutoAssembler A program that assembles nucleic acid sequences. Applied Biosystems, Foster City, CA.
  • fastx score 100 or greater
  • Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. sequencer traces with high sensitivity and probability. 8:175-185; Ewing, B. and P. Green (1998) Genome Res. 8:186-194.
  • TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segments on protein sequences Conf. on Intelligent Systems for Mol. Biol., and determine orientation. Glasgow et al., eds., The Am. Assoc. for Artificial Intelligence Press, Menlo Park, CA, pp. 175-182.
  • HMM hidden Markov model

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Neurology (AREA)
  • Biomedical Technology (AREA)
  • Neurosurgery (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Psychology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Orthopedic Medicine & Surgery (AREA)
  • Dermatology (AREA)
  • Virology (AREA)
  • Biochemistry (AREA)
  • Communicable Diseases (AREA)
  • Diabetes (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Pain & Pain Management (AREA)
  • Ophthalmology & Optometry (AREA)
  • Cardiology (AREA)
  • Vascular Medicine (AREA)
  • Urology & Nephrology (AREA)
  • Toxicology (AREA)
  • Zoology (AREA)
  • Hematology (AREA)
  • Oncology (AREA)
  • Biophysics (AREA)

Abstract

The invention provides human cytoskeleton-associated proteins (CSAP) and polynucleotides which identify and encode CSAP. The invention also providing expression vectors, host cells, antibodies, agonists, and antagonists. The invention also provides methods for diagnosing, treating, or preventing disorders associated with aberrant expression of CSAP.

Description

CYTOSKELETON-ASSOCIATED PROTEINS
TECHNICAL FIELD
This invention relates to nucleic acid and amino acid sequences of cytoskeleton-associated proteins and to the use of these sequences in the diagnosis, treatment, and prevention of cell proliferative disorders, viral infections, and neurological disorders, and in the assessment of the effects of exogenous compounds on the expression of nucleic acid and amino acid sequences of cytoskeleton-associated proteins.
BACKGROUND OF THE INVENTION
Translocation of components within the cell is critical for maintaining cell structure and function. Cellular components such as proteins and membrane-bound organelles are transported along well-defined routes to specific subcellular compartments. Intracellular transport mechanisms utilize microtubules wliich are filamentous polymers that serve as tracks for directing the movement of molecules. Molecular transport is driven by the microtubule-based motor proteins, kinesin and dynein. These proteins use the energy derived from ATP hydrolysis to power their movement unidirectionally along microtubules and to transport molecular cargo to specific destinations.
The cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, and movement. The cytoskeleton supports the cell membrane and forms tracks along which organelles and other elements move in the cytosol. The cytoskeleton is a dynamic structure that allows cells to adopt various shapes and to carry out directed movements. Major cytoskeletal fibers include the "microtubules, the microfilaments, and the intermediate filaments. Motor proteins, including myosin, dynein, and kinesin, drive movement of or along the fibers. The motor protein dyna in drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the cell membrane.
Microtubules and Associated Proteins Tubulins
Microtubules, cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell membrane that are necessary for sweeping materials across an epithelium and for swimming of sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along tracks of microtubules. For example, microtubules run through nerve cell axons, allowing bi- directional transport of materials and membrane vesicles between the cell body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during cell division. Both stable and short-lived populations of microtubules exist in the cell. Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a heterodimer of α- and β- tubulin, multiple isoforms of which exist. The hydrolysis of GTP is linked to the addition of tubulin subunits at the end of a microtubule. The subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule. A microtubule is polarized, one end ringed with α-tubulin and the other with β-tubulin, and the two ends differ in their rates of assembly. Generally, each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found. Cilia and flagella contain doublet microtubules. Microtubules grow from specialized structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules. The basal body, the organizing center located at the base of a cilium or flagellum, contains one centriole. Gamma tubulin present in the MTOC is important for nucleating the polymerization of α- and β- tubulin heterodimers but does not polymerize into microtubules. The protein pericentrin is found in the MTOC and has a role in microtubule assembly. Microtubule-Associated Proteins
Microtubule-associated proteins (MAPs) have roles in the assembly and stabilization of microtubules. One major family of MAPs, assembly MAPs, can be identified in neurons as well as non-neuronal cells. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type II. Type I MAPs, which include MAPIA and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes. Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neutralizes negatively charged tubulin, leading to stabilization of microtubules. MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolytically processed to generate one heavy chain and one light chain.
Another light chain, LC3, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during cell proliferation (Mann, S.S. et al. (1994) J. Biol. Chem. 269:11492-11497).
Type II MAPs, which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain. MAP2a, MAP2b, and MAP2c are found only in dendrites, MAP4 is found in non-neuronal cells, and Tau is found in axons and dendrites of nerve cells. Alternative splicing of the Tau mRNA leads to the existence of multiple forms of Tau protein. Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and familial frontotemporal dementia and Parkinsonism linked to chromosome 17. The altered Tau phosphorylation leads to a collapse of the microtubule network and the formation of mtraneuronal Tau aggregates (Spillantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428- 433).
The cytoplasmic linker protein (CLIP-170) links endocytic vesicles to microtubules. CLIP- 170 may also link microtubule ends to actin cables, thus playing a role in directional cell movement (Goode, B.L. et al. (2000) Curr. Opin. Cell Biol. 12:63-71). CLIP-170 proteins contain two copies of the CAP-Gly domain, a conserved, glycine-rich domain of about 42 residues found in several cytoskeleton-associated proteins (Prosite PDOC00660 CAP-Gly domain signature).
Another microtubule associated protein, STOP (stable tubule only polypeptide), is a calmodulin-regulated protem that regulates stability (Denarier, E. et al. (1998) Biochem. Biophys. Res. Commun. 24:791-796). In order for neurons to maintain conductive connections over great distances, they rely upon axodendritic extensions, which in turn are supported by microtubules. STOP proteins function to stabilize the microtubular network. STOP proteins are associated with axonal microtubules, and are also abundant in neurons (Guillaud, L. et al. (1998) J. Cell Biol. 142: 167-179). STOP proteins are necessary for normal neurite formation, and have been observed to stabilize microtubules, in vitro, against cold-, calcium-, or drug-induced dissassembly (Margolis, R.L. et al. (1990) EMBO 9:4095-502).
Microfilaments and Associated Proteins Actins
Microfilaments, cytoskeletal filaments with a diameter of about 7-9 nm, are vital to cell locomotion, cell shape, cell adhesion, cell division, and muscle contraction. Assembly and disassembly of the microfilaments allow cells to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intracellular protein in the eukaryotic cell. Human cells contain six isoforms of actin. The three α-actins are found in different kinds of muscle, nonmuscle β- actin and nonmuscle γ-actin are found in nonmuscle cells, and another γ-actin is found in intestinal smooth muscle cells. G-actin, the monomeric form of actin, polymerizes into polarized, helical F- actin filaments, accompanied by the hydrolysis of ATP to ADP. Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine cell shape. These bundles and networks are connected to the cell membrane. In muscle cells, thin filaments containing actin slide past thick filaments containing the motor protein yosin during contraction. A family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin-Associated Proteins
Actin-associated proteins have roles in cross-linking, severing, and stabilization of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-linking proteins. These proteins have two actin-binding sites, one for each filament. Short cross-linking proteins promote bundle formation while longer, more flexible cross-linking proteins promote network formation. Actin-interacting proteins (AJPs) participate in the regulation of actin filament organization. Other actin-associated proteins such as TARA, a novel F-actin binding protem, function in a similar capacity by regulating actin cytoskeletal organization. Calmodulin-like calcium- binding domains in actin cross-linking proteins allow calcium regulation of cross-linking. Group I cross-linking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin. Group π cross-linking proteins have a 7,000-MW actin-binding domain and include villin and dematin. Group HI cross-linking proteins have pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and filamin.
Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin. Capping proteins can cap the ends of actin filaments, but cannot break filaments. Capping proteins include CapZ and tropomodulin. The proteins thymosin and profilin sequester actin monomers in the cytosol, allowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium.
Microtubule and actin filament networks cooperate in processes such as vesicle and organelle transport, cleavage furrow placement, directed cell migration, spindle rotation, and nuclear migration. Microtubules and actin may coordinate to transport vesicles, organelles, and cell fate determinants, or transport may involve targeting and capture of microtubule ends at cortical actin sites. These cytoskeletal systems may be bridged by myosin-kinesin complexes, myosin-CLIP170 complexes, formin-homology (FH) proteins, dynein, the dynactin complex, Kar9p, coronin, ERM proteins, and kelch repeat-containing proteins (for a review, see Goode, B.L. et al. (2000) Curr. Opin. Cell Biol. 12:63-71). The kelch repeat is a motif originally observed in the kelch protein, which is involved in formation of cytoplasmic bridges called ring canals. A variety of mammalian and other kelch family proteins have been identified. The kelch repeat domain is believed to mediate interaction with actin (Robinson, D.N. and L. Cooley (1997) J. Cell Biol. 138:799-810).
ADF/cofilins are a family of conserved 15-18 kDa actin-binding proteins that play a role in cytokinesis, endocytosis, and in development of embryonic tissues, as well as in tissue regeneration and in pathologies such as ischemia, oxidative or osmotic stress. LEVI kinase 1 downregulates ADF (Carlier, M.F. et al. (1999) J. Biol. Chem. 274:33827-33830).
The coronins are actin-binding proteins having a structure that contains five WD (Trp-Asp) repeats and is similar to the sequence of the β subunits of heterotrimeric G proteins. Dictyostelium mutants lacking coronin are impaired in all actin-mediated processes, including cell locomotion, cytokinesis, phagocytosis, and macropinocytosis. In human neutrophils, coronin 1 accumulates with F-actin around endocytic vesicles, suggesting an evolutionarily conserved role for coronin in endocytosis. Other coronin proteins have specific activities such as promotion of actin polymerization, actin crosslinking, and binding to microtubules. LEVI is an acronym of three transcription factors, Lin-11, Isl-1, and Mec-3, in which the motif was first identified. The LEVI domain is a double zinc-finger motif that mediates the protein-protein interactions of transcription factors, signaling, and cytoskeleton-associated proteins (Roof, D.J. et al.
(1997) J. Cell Biol. 138:575-588). These proteins are distributed in the nucleus, cytoplasm, or both (Brown, S. et al. (1999) J. Biol. Chem. 274:27083-27091). Recently, ALP (actinin-associated LEVI protein) has been shown to bind alpha-actinin-2 (Bouju, S. et al. (1999) Neuromuscul. Disord. 9:3- 10).
The Frabin protein is another example of an actin-filament binding protein (Obaishi, H. et al.
(1998) J. Biol. Chem. 273:18697-18700). Frabin (FGDl-related F-actin-bindmg protein) possesses one actin-filament binding (FAB) domain, one Dbl homology (DH) domain, two pleckstrin homology (PH) domains, and a single cysteine-rich FYVE ( Fablp, YOTB, Vaclp, and EEAl (early endosomal antigen 1)) domain. Frabin has shown GDP/GTP exchange activity for Cdc42 small G protein (Cdc42), and indirectly induces activation of Rac small G protein (Rac) in intact cells. Through the activation of Cdc42 and Rac, Frabin is able to induce formation of both filopodia- and lamellipodia- like processes (Ono, Y. et al. (2000) Oncogene 19:3050-3058). The Rho family of small GTP-binding proteins are important regulators of actin-dependent cell functions including cell shape change, adhesion, and motility. The Rho family consists of three major subfamilies: Cdc42, Rac, and Rho. Rho family members cycle between GDP-bound inactive and GTP-bound active forms by means of a GDP/GTP exchange factor (GEF) (Umikawa, M. et al.
(1999) J. Biol. Chem. 274:25197-25200). The Rho GEF family is crucial for microfilament organization.
Intermediate Filaments and Associated Proteins
Intermediate filaments ( Fs) are cytoskeletal fibers with a diameter of about 10 nm, intermediate between that of microfilaments and microtubules. IFs serve structural roles in the cell, reinforcing cells and organizing cells into tissues. IFs are particularly abundant in epidermal cells and in neurons. IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in cell motility.
Five types of IF proteins are known in mammals. Type I and Type II proteins are the acidic and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin IFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithelia such as nails and hair, and in epithelia that line internal body cavities. Mutations in keratin genes lead to epithelial diseases including epidermolysis bullosa simplex, bullous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of these diseases result in severe skin blistering. (See, e.g., Wawersik, M. et al. (1997) J. Biol. Chem. 272:32557-32565; and Corden L.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.)
Type IH IF proteins include desmin, glial fibrillary acidic protein, vimentin, and peripherin. Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting muscle. Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and astrocytes. Vimentin filaments are found in blood vessel endothelial cells, some epithelial cells, and mesenchymal cells such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organelles in place in the cell. Type IV IFs include the neurofilaments and nestin. Neurofilaments, composed of three polypeptides, NF-L, NF- M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metabolism of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (Mien, J.P. and Mushynski, W.E. (1998) Prog. Nucleic Acid Res. Mol. Biol. 61: 1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane. EFs have a central α-helical rod region interrupted by short nonhelical linker segments. The rod region is bracketed, in most cases, by non-helical head and tail domains. The rod regions of intermediate filament proteins associate to form a coiled-coil dimer. A highly ordered assembly process leads from the dimers to the EFs. Neither ATP nor GTP is needed for IF assembly, unlike that of microfilaments and microtubules. DF-associated proteins (IFAPs) mediate the interactions of IFs with one another and with other cell structures. EFAPs cross-link EFs into a bundle, into a network, or to the plasma membrane, and may cross-link EFs to the microfilament and microtubule cytoskeleton. Microtubules and DFs are particularly closely associated. IFAPs include BPAG1, plakoglobin, desmoplakin I, desmoplakin II, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining cell shape and for muscle contraction. In erythrocytes, the spectrin-actin cytoskeleton is attached to the cell membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormally shaped cells which are more rapidly degraded by the spleen, leading to anemia. In platelets, the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin. In muscle cells the protein dystrophin links actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy. Focal adhesions
Focal adhesions are specialized structures in the plasma membrane involved in the adhesion of a cell to a substrate, such as the extracellular matrix (ECM). Focal adhesions form the connection between an extracellular substrate and the cytoskeleton, and affect such functions as cell shape, cell motility and cell proliferation. Transmembrane integrin molecules form the basis of focal adhesions. Upon ligand binding, integrins cluster in the plane of the plasma membrane. Cytoskeletal linker proteins such as the actin binding proteins -actinin, talin, tensin, vinculin, paxillin, and filamin are recruited to the clustering site. Key regulatory proteins, such as Rho and Ras family proteins, focal adhesion kinase, and Src family members are also recruited. These events lead to the reorganization of actin filaments and the formation of stress fibers. These intracellular rearrangements promote further integrin-ECM interactions and integrin clustering. Thus, integrins mediate aggregation of protein complexes on both the cytosolic and extracellular faces of the plasma membrane, leading to the assembly of the focal adhesion. Many signal transduction responses are mediated via various adhesion complex proteins, including Src, FAK, paxillin, and tensin. (For a review, see Yamada, K.M. and B. Geiger, (1997) Curr. Opin. Cell Biol. 9:76-85.) DFs are also attached to membranes by cytoskeletal-membrane anchors. The nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor. Vimentin EFs are attached to the plasma membrane by ankyrin and plectin. Desmosome and hemidesmosome membrane junctions hold together epithelial cells of organs and skin. These membrane junctions allow shear forces to be distributed across the entire epithelial cell layer, thus providing strength and rigidity to the epithelium. EFs in epithelial cells are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link EFs to hemidesmosomes are not known. Desmin EFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin. Ankyrin Associations between the cytoskeleton and the lipid membranes bounding intercellular compartments involve spectrin, ankyrin, and integral membrane proteins. Spectrin is a major component of the cytoskeleton and acts as a scaffolding protein. Similarly, ankyrin acts to tether the actin-spectrin moiety to membranes and to regulate the interaction between the cytoskeleton and membranous compartments. Different ankyrin isoforms are specific to different organelles and provide specificity for this interaction. Ankyrin also contains a regulatory domain that can respond to cellular signals, allowing remodeling of the cytoskeleton during the cell cycle and differentiation (Lambert, S. and Bennett, V. (1993) Eur. J. Biochem. 211:1-6).
Ankyrins have three basic structural components. The N-terminal portion of ankyrin consists of a repeated 33-arnino acid motif, the ankyrin repeat, which is involved in specific protein-protein interactions. Variable regions within the motif are responsible for specific protein binding, such that different ankyrin repeats are involved in binding to tubulin, anion exchange protein, voltage-gated sodium channel, Na+/K+-ATPase, and neurofascin. The ankyrin motif is also found in transcription factors, such as NF-K-B, and in the yeast cell cycle proteins CDC10, SW14, and SW16. Proteins involved in tissue differentiation, such as Drosophila Notch and C. elegans LfN-12 and GLP-1, also contain ankyrin-like repeats. Lux et al. (1990; Nature 344:36-42) suggest that ankyrin-like repeats function as 'built-in' ankyrins and form binding sites for integral membrane proteins, tubulin, and other proteins.
The central domain of ankyrin is required for binding spectrin. This domain consists of an acidic region, primarily responsible for binding spectrin, and a basic region. Phosphorylation within the central domain may regulate spectrin binding. The C-terminal domain regulates ankyrin function. The C-terminally-deleted ankyrin, protein 2.2, behaves as a constitutively active ankyrin, displaying increased membrane and spectrin binding. The C-terminal domain is divergent among ankyrin family members, and tissue-specific alternative splicing generates modified C-termini with acidic or basic characteristics (Lambert, supra). Tliree ankyrin proteins, ANK1, ANK2, and ANK3, have been described which differ in their tissue-specific and subcellular localization patterns. ANK1, erythrocyte protein 2.1, is involved in protecting red cells from circulatory shear stresses and helping maintain the erythrocyte' s unique biconcave shape. An ANK1 deficiency has been linked to hereditary hemolytic anemias, such as hereditary spherocytosis (HS), and a neurodegenerative disorder involving loss of Perkinje cells (Lambert, supra). ANK2 is the major nervous tissue ankyrin. Two alternative splice variants are generated from the ANK2 gene. Brain ankyrin 1 (brankl), which is expressed in adults, is similar to ANK1 in the N-terminal and central domains, but has an entirely dissimilar regulatory domain. An early neuronal form, brank2, includes an additional motif between the spectrin-binding and regulatory domain. An ankyrin homolog in C. elegans, unc-44, produces alternative splice variants similar to ANK2. Mutations in the unc-44 gene affect the direction of axonal outgrowth (Otsuka, AJ. et al. (1995) J. Cell Biol. 129:1081-1092).
ANK3 consists of four ankyrin isoforms (G100, G119, G120, and G195), which localize to intracellular compartments and are implicated in vesicular transport. AnkG119 is associated with the Golgi, has a truncated N-terminal domain, and lacks a C-terminal regulatory domain. AnkG120 and AnkQjoo associate with the late endolysosomes in macrophage, lack N-terminal ankyrin repeats, but contain both spectrin-binding and regulatory domains characteristic of ANK1 and ANK2. AnkG195 is associated with the trαns-Golgi network (TGN). These ankyrin isoforms are part of a spectrin complex which may mediate transport of proteins through the Golgi complex. A spectrin-ankyrin- adapter protein trafficking system (SAATS) has been proposed for the selective sequestration of membrane proteins into vesicles destined for transport from the ER to the Golgi and beyond. In this model, intra-Golgi, TGN, and plasma membrane transport would involve exchange of SAATS protein components, including ankyrin isoforms, to specify and distinguish the final destination for vesicular cargo (DeMatteis, M.A. and Morrow, J.S. (1998) Curr. Opin. Cell Biol. 10:542-549). Motor Proteins
Mvosin-related Motor Proteins
Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intracellular movements such as phagocytosis and rearrangement of cell contents during mitotic cell division (cytokinesis). The contractile unit of skeletal muscle, termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber. Myosins are composed of one or two heavy chains and associated light chains. Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain binding, and a carboxy-terminal tail domain. The tail domains may associate to form an α-helical coiled coil. Conventional myosins, such as those found in muscle tissue, are composed of two myosin heavy- chain subunits, each associated with two light-chain subunits that bind at the neck region and play a regulatory role. Unconventional myosins, believed to function in intracellular motion, may contain either one or two heavy chains and associated light chains. There is evidence for about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional. Dvnein-related Motor Proteins Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosolic and axonemal, have been identified. Cytosolic dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve terminal to the cell body and transport of endocytic vesicles to lysosomes. As well, viruses often take advantage of cytoplasmic dyneins to be transported to the nucleus and establish a successful infection (Sodeik, B. et al. (1997) J. Cell Biol. 136: 1007-1021). Virion proteins of herpes simplex virus 1, for example, interact with the cytoplasmic dynein intermediate chain (Ye, G.J. et al. (2000) J. Virol. 74:1355-1363). Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flagella and cilia. Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sliding force produces bending that causes the flagellum or cilium to beat. Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven by the hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and light chains. Cytoplasmic dynein is the largest and most complex of the motor proteins. Kinesin-related Motor Proteins Kinesins are (+) end-directed motor proteins which act on microtubules. The prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function is particularly important for axonal transport in neurons. Kinesin is also important in all cell types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for maintaining the identity and functionality of these secretory organelles. Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement, and cellular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. Cell Biol. 6:63-68.) The prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two light polypeptide chains (KLCs). The KHC subunits are typically referred to as "kinesin." KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length. Two KHCs dimerize to form a rod-shaped molecule with three distinct regions of secondary structure. At one end of the molecule is a globular motor domain that functions in ATP hydrolysis and microtubule binding. Kinesin motor domains are highly conserved and share over 70% identity. Beyond the motor domain is an α-helical coiled-coil region which mediates dimerization. At the other end of the molecule is a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
Members of the more divergent subfamilies of kinesins are called kinesin-related proteins (KRPs), many of which function during mitosis in eukaryotes (Hoyt, supra). Some KRPs are required for assembly of the mitotic spindle. In vivo and in vitro analyses suggest that these KRPs exert force on microtubules that comprise the mitotic spindle, resulting in the separation of spindle poles. Phosphorylation of KRP is required for this activity. Failure to assemble the mitotic spindle results in abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer cells. In addition, a unique KRP, centromere protein E, localizes to the kinetochore of human mitotic chromosomes and may play a role in their segregation to opposite spindle poles. Dvnamin-related Motor Proteins
Dynamin is a large GTPase motor protein that functions as a "molecular pinchase," generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin-coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in neurons. Binding of dynamin to a membrane leads to dynamin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynamin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle. Release of GDP and inorganic phosphate leads to dynamin disassembly. Following disassembly the dynamin may either dissociate from the membrane or remain associated to the vesicle and be transported to another region of the cell. Three homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. Conserved dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin' s GTPase activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on other proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the proline-rich domain. (See McNiven, M.A. (1998) Cell 94:151-154; Scaife, R.M. and RL. Margolis (1997) Cell. Signal. 9:395-401.)
The cytoskeleton is reviewed in Lodish, H. et al. (1995) Molecular Cell Biology. Scientific American Books, New York NY. Expression profiling
Array technology can provide a simple way to explore the expression of a single polymorphic gene or the expression profile of a large number of related or unrelated genes. When the expression of a single gene is examined, arrays are employed to detect the expression of a specific gene or its variants. When an expression profile is examined, arrays provide a platform for identifying genes that are tissue specific, are affected by a substance being tested in a toxicology assay, are part of a signaling cascade, carry out housekeeping functions, or are specifically related to a particular genetic predisposition, condition, disease, or disorder. Lung cancer is the leading cause of cancer death for men and the second leading cause of cancer death for women in the U.S. The vast majority of lung cancer cases are attributed to smoking tobacco, and increased use of tobacco products in third world countries is projected to lead to an epidemic of lung cancer in these countries. Exposure of the bronchial epithelium to tobacco smoke appears to result in changes in tissue morphology, which are thought to be precursors of cancer. Lung cancers are divided into four histopathologically distinct groups. Three groups (squamous cell carcinoma, adenocarcinoma, and large cell carcinoma) are classified as non-small cell lung cancers (NSCLCs). The fourth group of cancers is referred to as small cell lung cancer (SCLC). Collectively, NSCLCs account for ~70% of cases while SCLCs account for -18% of cases. The molecular and cellular biology underlying the development and progression of lung cancer are incompletely understood. Analysis of gene expression patterns associated with the development and progression of the disease will yield tremendous insight into the biology underlying this disease, and will lead to the development of improved diagnostics and therapeutics.
The discovery of new cytoskeleton-associated proteins, and the polynucleotides encoding them, satisfies a need in the art by providing new compositions which are useful in the diagnosis, prevention, and treatment of cell proliferative disorders, viral infections, and neurological disorders, and in the assessment of the effects of exogenous compounds on the expression of nucleic acid and amino acid sequences of cytoskeleton-associated proteins.
SUMMARY OF THE INVENTION The invention features purified polypeptides, cytoskeleton-associated proteins, referred to collectively as "CSAP" and individually as "CSAP-1," "CSAP-2," "CSAP-3," "CSAP-4," "CSAP-5," "CSAP-6," "CSAP-7," "CSAP-8," "CSAP-9," "CSAP-10," "CSAP-11," "CSAP-12," "CSAP-13," "CSAP-14," "CSAP-15," "CSAP-16," "CSAP-17," "CSAP-18," "CSAP-19," "CSAP-20," "CSAP- 21," "CSAP-22," "CSAP-23," "CSAP-24," "CSAP-25," "CSAP-26," "CSAP-27," and "CSAP-28." In one aspect, the invention provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. In one alternative, the invention provides an isolated polypeptide comprising the amino acid sequence of SEQ ED NO: 1-28. The invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. In one alternative, the polynucleotide encodes a polypeptide selected from the group consisting of SEQ ID NO: 1-28. In another alternative, the polynucleotide is selected from the group consisting of SEQ ID NO:29-56.
Additionally, the invention provides a recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. In one alternative, the invention provides a cell transformed with the recombinant polynucleotide. hi another alternative, the invention provides a transgenic organism comprising the recombinant polynucleotide.
The invention also provides a method for producing a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. The method comprises a) culturing a cell under conditions suitable for expression of the polypeptide, wherein said cell is transformed with a recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide encoding the polypeptide, and b) recovering the polypeptide so expressed.
Additionally, the invention provides an isolated antibody which specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
The invention further provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d). In one alternative, the polynucleotide comprises at least 60 contiguous nucleotides. Additionally, the invention provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d). The method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and b) detecting the presence or absence of said hybridization complex, and optionally, if present, the amount thereof. In one alternative, the probe comprises at least 60 contiguous nucleotides.
The invention further provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, c) a polynucleotide complementary to the polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d). The method comprises a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof. The invention further provides a composition comprising an effective amount of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and a pharmaceutically acceptable excipient. In one embodiment, the composition comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 1- 28. The invention additionally provides a method of treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment the composition.
The invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample. In one alternative, the invention provides a composition comprising an agonist compound identified by the method and a pharmaceutically acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment the composition. Additionally, the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample. In one alternative, the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceutically acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with overexpression of functional CSAP, comprising administering to a patient in need of such treatment the composition.
The invention further provides a method of screening for a compound that specifically binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. The method comprises a) combining the polypeptide with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide to the test compound, thereby identifying a compound that specifically binds to the polypeptide.
The invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, c) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28. The method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
The invention further provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, the method comprising a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound. The invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ED NO:29-56, ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, iii) a polynucleotide having a sequence complementary to i), iv) a polynucleotide complementary to the polynucleotide of ii), and v) an RNA equivalent of i)-iv). Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, iii) a polynucleotide complementary to the polynucleotide of i), iv) a polynucleotide complementary to the polynucleotide of ii), and v) an RNA equivalent of i)-iv). Alternatively, the target polynucleotide comprises a fragment of a polynucleotide sequence selected from the group consisting of i)-v) above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
BRIEF DESCRIPTION OF THE TABLES
Table 1 summarizes the nomenclature for the full length polynucleotide and polypeptide sequences ofthe present invention. Table 2 shows the GenBank identification number and annotation of the nearest GenBank homolog for polypeptides of the invention. The probability scores for the matches between each polypeptide and its homolog(s) are also shown.
Table 3 shows structural features of polypeptide sequences of the invention, including predicted motifs and domains, along with the methods, algorithms, and searchable databases used for analysis of the polypeptides.
Table 4 lists the cDNA and/or genomic DNA fragments which were used to assemble polynucleotide sequences of the invention, along with selected fragments of the polynucleotide sequences. Table 5 shows the representative cDNA library for polynucleotides of the invention.
Table 6 provides an appendix which describes the tissues and vectors used for construction of the cDNA libraries shown in Table 5.
Table 7 shows the tools, programs, and algorithms used to analyze the polynucleotides and polypeptides of the invention, along with applicable descriptions, references, and threshold parameters.
DESCRIPTION OF THE INVENTION
Before the present proteins, nucleotide sequences, and methods are described, it is understood that this invention is not limited to the particular machines, materials and methods described, as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope ofthe present invention which will be limited only by the appended claims.
It must be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a host cell" includes a plurality of such host cells, and a reference to "an antibody" is a reference to one or more antibodies and equivalents thereof known to those skilled in the art, and so forth.
Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any machines, materials, and methods similar or equivalent to those described herein can be used to practice or test the present invention, the preferred machines, materials and methods are now described. All publications mentioned herein are cited for the purpose of describing and disclosing the cell lines, protocols, reagents and vectors which are reported in the publications and which might be used in connection with the invention. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. DEFINITIONS
"CSAP" refers to the amino acid sequences of substantially purified CSAP obtained from any species, particularly a mammalian species, including bovine, ovine, porcine, murine, equine, and human, and from any source, whether natural, synthetic, semi-synthetic, or recombinant. The term "agonist" refers to a molecule which intensifies or mimics the biological activity of CSAP. Agonists may include proteins, nucleic acids, carbohydrates, small molecules, or any other compound or composition which modulates the activity of CSAP either by directly interacting with CSAP or by acting on components of the biological pathway in which CSAP participates. An "allelic variant" is an alternative form of the gene encoding CSAP. Allelic variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered. A gene may have none, one, or many allelic variants of its naturally occurring form. Common mutational changes which give rise to allelic variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
"Altered" nucleic acid sequences encoding CSAP include those sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as CSAP or a polypeptide with at least one functional characteristic of CSAP. Included within this definition are polymorphisms which may or may not be readily detectable using a particular oligonucleotide probe of the polynucleotide encoding CSAP, and improper or unexpected hybridization to allelic variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding CSAP. The encoded protein may also be "altered," and may contain deletions, insertions, or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent CSAP. Deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the biological or immunological activity of CSAP is retained. For example, negatively charged amino acids may include aspartic acid and glutamic acid, and positively charged amino acids may include lysine and arginine. Amino acids with uncharged polar side chains having similar hydrophilicity values may include: asparagine and glutamine; and serine and threonine.
Amino acids with uncharged side chains having similar hydrophilicity values may include: leucine, isoleucine, and valine; glycine and alanine; and phenylalanine and tyrosine.
The terms "amino acid" and "amino acid sequence" refer to an oligopeptide, peptide, polypeptide, or protein sequence, or a fragment of any of these, and to naturally occurring or synthetic molecules. Where "amino acid sequence" is recited to refer to a sequence of a naturally occurring protein molecule, "amino acid sequence" and like terms are not meant to limit the amino acid sequence to the complete native amino acid sequence associated with the recited protein molecule.
"Amplification" relates to the production of additional copies of a nucleic acid sequence. Amplification is generally carried out using polymerase chain reaction (PCR) technologies well known in the art.
The term "antagonist" refers to a molecule which inhibits or attenuates the biological activity of CSAP. Antagonists may include proteins such as antibodies, nucleic acids, carbohydrates, small molecules, or any other compound or composition which modulates the activity of CSAP either by directly interacting with CSAP or by acting on components of the biological pathway in which CSAP participates.
The term "antibody" refers to intact immunoglobulin molecules as well as to fragments thereof, such as Fab, F(ab')2, and Fv fragments, which are capable of binding an epitopic determinant. Antibodies that bind CSAP polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen. The polypeptide or oligopeptide used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal. The term "antigenic determinant" refers to that region of a molecule (i.e., an epitope) that makes contact with a particular antibody. When a protein or a fragment of a protein is used to immunize a host animal, numerous regions of the protein may induce the production of antibodies which bind specifically to antigenic determinants (particular regions or three-dimensional structures on the protein). An antigenic determinant may compete with the intact antigen (i.e., the immunogen used to elicit the immune response) for binding to an antibody.
The term "aptamer" refers to a nucleic acid or oligonucleotide molecule that binds to a specific molecular target. Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by Exponential Enrichment), described in U.S. Patent No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial libraries. Aptamer compositions may be double-stranded or single-stranded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-like molecules. The nucleotide components of an aptamer may have modified sugar groups (e.g., the 2'-OH group of a ribonucleotide may be replaced by 2'-F or 2'-NH2), which may improve a desired property, e.g., resistance to nucleases or longer lifetime in blood. Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system. Aptamers may be specifically cross-linked to their cognate ligands, e.g., by photo-activation of a cross-linker. (See, e.g., Brody, E.N. and L. Gold (2000) I. Biotechnol. 74:5-13.)
The term "intramer" refers to an aptamer which is expressed in vivo. For example, a vaccinia virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl Acad. Sci. USA 96:3606-3610).
The term "spiegelmer" refers to an aptamer which includes L-DNA, L-RNA, or other left- handed nucleotide derivatives or nucleotide-like molecules. Aptamers containing left-handed nucleotides are resistant to degradation by naturally occurring enzymes, which normally act on substrates containing right-handed nucleotides.
The term "antisense" refers to any composition capable of base-pairing with the "sense" (coding) strand of a specific nucleic acid sequence. Antisense compositions may include DNA; RNA; peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxy ethyl sugars or 2'-methoxy ethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine. Antisense molecules may be produced by any method including chemical synthesis or transcription. Once introduced into a cell, the complementary antisense molecule base-pairs with a naturally occurring nucleic acid sequence produced by the cell to form duplexes which block either transcription or translation. The designation "negative" or "minus" can refer to the antisense strand, and the designation "positive" or "plus" can refer to the sense strand of a reference DNA molecule. .
The term "biologically active" refers to a protein having structural, regulatory, or biochemical functions of a naturally occurring molecule. Likewise, "immunologically active" or "immunogenic" refers to the capability of the natural, recombinant, or synthetic CSAP, or of any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies.
"Complementary" describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing. For example, 5'-AGT-3' pairs with its complement, 3'-TCA-5'. A "composition comprising a given polynucleotide sequence" and a "composition comprising a given amino acid sequence" refer broadly to any composition containing the given polynucleotide or amino acid sequence. The composition may comprise a dry formulation or an aqueous solution. Compositions comprising polynucleotide sequences encoding CSAP or fragments of CSAP may be employed as hybridization probes. The probes may be stored in freeze-dried form and may be associated with a stabilizing agent such as a carbohydrate. In hybridizations, the probe may be deployed in an aqueous solution containing salts (e.g., NaCl), detergents (e.g., sodium dodecyl sulfate; SDS), and other components (e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.).
"Consensus sequence" refers to a nucleic acid sequence which has been subjected to repeated DNA sequence analysis to resolve uncalled bases, extended using the XL-PCR kit (Applied Biosystems, Foster City CA) in the 5' and/or the 3' direction, and resequenced, or which has been assembled from one or more overlapping cDNA, EST, or genomic DNA fragments using a computer program for fragment assembly, such as the GELVIEW fragment assembly system (GCG, Madison WI) or Phrap (University of Washington, Seattle WA). Some sequences have been both extended and assembled to produce the consensus sequence.
"Conservative amino acid substitutions" are those substitutions that are predicted to least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions. The table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative amino acid substitutions. Original Residue Conservative Substitution
Ala Gly, Ser
Arg His, Lys
Asn Asp, Gin, His Asp Asn, Glu
Cys Ala, Ser
Gin Asn, Glu, His
Glu Asp, Gin, His
Gly Ala His Asn, Arg, Gin, Glu
He Leu, Val
Leu lie, Val
Lys Arg, Gin, Glu
Met ' Leu, He Phe His, Met, Leu, Trp, Tyr
Ser Cys, Thr
Thr Ser, Val
Trp Phe, Tyr
Tyr His, Phe, Trp Val lie, Leu, Thr
Conservative amino acid substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the site of the substitution, and/or (c) the bulk of the side chain.
A "deletion" refers to a change in the amino acid or nucleotide sequence that results in the absence of one or more amino acid residues or nucleotides.
The term "derivative" refers to a chemically modified polynucleotide or polypeptide. Chemical modifications of a polynucleotide can include, for example, replacement of hydrogen by an alkyl, acyl, hydroxyl, or amino group. A derivative polynucleotide encodes a polypeptide which retains at least one biological or immunological function of the natural molecule. A derivative polypeptide is one modified by glycosylation, pegylation, or any similar process that retains at least one biological or immunological function of the polypeptide from which it was derived.
A "detectable label" refers to a reporter molecule or enzyme that is capable of generating a measurable signal and is covalently or noncovalently joined to a polynucleotide or polypeptide. "Differential expression" refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons may be carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample.
"Exon shuffling" refers to the recombination of different coding regions (exons). Since an exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus allowing acceleration of the evolution of new protein functions.
A "fragment" is a unique portion of CSAP or the polynucleotide encoding CSAP which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of the defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise from 5 to 1000 contiguous nucleotides or amino acid residues. A fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous nucleotides or amino acid residues in length. Fragments may be preferentially selected from certain regions of a molecule. For example, a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the Sequence Listing, tables, and figures, may be encompassed by the present embodiments. A fragment of SEQ ID NO:29-56 comprises a region of unique polynucleotide sequence that specifically identifies SEQ ID NO:29-56, for example, as distinct from any other sequence in the genome from which the fragment was obtained. A fragment of SEQ ED NO:29-56 is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish SEQ ED NO:29-56 from related polynucleotide sequences. The precise length of a fragment of SEQ ED NO:29-56 and the region of SEQ ED NO:29-56 to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
A fragment of SEQ ID NO: 1-28 is encoded by a fragment of SEQ ID NO:29-56. A fragment of SEQ ED NO: 1-28 comprises a region of unique amino acid sequence that specifically identifies SEQ ID NO: 1-28. For example, a fragment of SEQ ED NO: 1-28 is useful as an immunogenic peptide for the development of antibodies that specifically recognize SEQ ID NO: 1-28. The precise length of a fragment of SEQ ID NO: 1-28 and the region of SEQ ED NO: 1-28 to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment. A "full length" polynucleotide sequence is one containing at least a translation initiation codon (e.g., methionine) followed by an open reading frame and a translation termination codon. A "full length" polynucleotide sequence encodes a "full length" polypeptide sequence.
"Homology" refers to sequence similarity or, interchangeably, sequence identity, between two or more polynucleotide sequences or two or more polypeptide sequences. The terms "percent identity" and "% identity," as applied to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent identity between polynucleotide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in Higgins, D.G. and P.M. Sharp (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) CABIOS 8: 189-191. For pairwise alignments of polynucleotide sequences, the default parameters are set as follows: Ktuple=2, gap penalty=5, window=4, and "diagonals saved"=4. The "weighted" residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polynucleotide sequences.
Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2.html. The "BLAST 2 Sequences" tool can be used for both blastn and blastp (discussed below). BLAST programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.12 (April-21-2000) set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62
Reward for match: 1 Penalty for mismatch: -2
Open Gap: 5 and Extension Gap: 2 penalties
Gap x drop-off: 50
Expect: 10
Word Size: 11 Filter: on
Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ED number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures, or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
The phrases "percent identity" and "% identity," as applied to polypeptide sequences, refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the charge and iydrophobicity at the site of substitution, thus preserving the structure (and therefore function) ofthe polypeptide.
Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program (described and referenced above). For pairwise alignments of polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple=l, gap penalty=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default residue weight table. As with polynucleotide alignments, the percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs. Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.12 (April-21-2000) with blastp set at default parameters. Such default parameters may be, for example: Matrix: BLOSUM62
Open Gap: 11 and Extension Gap: 1 penalties
Gap x drop-off: 50
Expect: 10
Word Size: 3 Filter: on
Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ED number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
"Human artificial chromosomes" (HACs) are linear microchromosomes which may contain DNA sequences of about 6 kb to 10 Mb in size and which contain all of the elements required for chromosome replication, segregation and maintenance.
The term "humanized antibody" refers to an antibody molecule in which the amino acid sequence in the non-antigen binding regions has been altered so that the antibody more closely resembles a human antibody, and still retains its original binding ability.
"Hybridization" refers to the process by which a polynucleotide strand anneals with a complementary strand through base pairing under defined hybridization conditions. Specific hybridization is an indication that two nucleic acid sequences share a high degree of complementarity. Specific hybridization complexes form under permissive annealing conditions and remain hybridized after the "washing" step(s). The washing step(s) is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid strands that are not perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely determinable by one of ordinary skill in the art and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency, and therefore hybridization specificity. Permissive annealing conditions occur, for example, at 68°C in the presence of about 6 x SSC, about 1% (w/v) SDS, and about 100 μg/ml sheared, denatured salmon sperm DNA.
Generally, stringency of hybridization is expressed, in part, with reference to the temperature under which the wash step is carried out. Such wash temperatures are typically selected to be about 5°C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. An equation for calculating Tm and conditions for nucleic acid hybridization are well known and can be found in Sambrook, J. et al. (1989) Molecular Cloning: A Laboratory Manual 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; specifically see volume 2, chapter 9. High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, 55°C, or 42°C may be used. SSC concentration may be varied from about 0.1 to 2 x SSC, with SDS being present at about 0.1%. Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, sheared and denatured salmon sperm DNA at about 100-200 μg/ml. Organic solvent, such as formamide at a concentration of about 35-50% v/v, may also be used under particular circumstances, such as for RNA:DNA hybridizations. Useful variations on these wash conditions will be readily apparent to those of ordinary skill in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their encoded polypeptides. The term "hybridization complex" refers to a complex formed between two nucleic acid sequences by virtue of the formation of hydrogen bonds between complementary bases. A hybridization complex may be formed in solution (e.g., C0t or R0t analysis) or formed between one nucleic acid sequence present in solution and another nucleic acid sequence immobilized on a solid support (e.g., paper, membranes, filters, chips, pins or glass slides, or any other appropriate substrate to which cells or their nucleic acids have been fixed).
The words "insertion" and "addition" refer to changes in an amino acid or nucleotide sequence resulting in the addition of one or more amino acid residues or nucleotides, respectively. "Immune response" can refer to conditions associated with inflammation, trauma, immune disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect cellular and systemic defense systems.
An "immunogenic fragment" is a polypeptide or oligopeptide fragment of CSAP which is capable of eliciting an immune response when introduced into a living organism, for example, a mammal. The term "immunogenic fragment" also includes any polypeptide or oligopeptide fragment of CSAP which is useful in any of the antibody production methods disclosed herein or known in the art.
The term "microarray" refers to an arrangement of a plurality of polynucleotides, polypeptides, or other chemical compounds on a substrate.
The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
The term "modulate" refers to a change in the activity of CSAP. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional, or immunological properties of CSAP.
The phrases "nucleic acid" and "nucleic acid sequence" refer to a nucleotide, oligonucleotide, polynucleotide, or any fragment thereof. These phrases also refer to DNA or RNA of genomic or synthetic origin which may be single-stranded or double-stranded and may represent the sense or the antisense strand, to peptide nucleic acid (PNA), or to any DNA-like or RNA-like material. "Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with a second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame. "Peptide nucleic acid" (PNA) refers to an antisense molecule or anti-gene agent which comprises an oligonucleotide of at least about 5 nucleotides in length linked to a peptide backbone of amino acid residues ending in lysine. The terminal lysine confers solubility to the composition. PNAs preferentially bind complementary single stranded DNA or RNA and stop transcript elongation, and may be pegylated to extend their lifespan in the cell. "Post-translational modification" of an CSAP may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu of CSAP.
"Probe" refers to nucleic acid sequences encoding CSAP, their complements, or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence, h order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the tables, figures, and Sequence Listing, may be used.
Methods for preparing and using probes and primers are described in the references, for example Sambrook, J. et al. (1989) Molecular Cloning: A Laboratory Manual. 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; Ausubel, F.M. et al. (1987) Current Protocols in Molecular Biology. Greene Publ. Assoc. & Wiley-Intersciences, New York NY; Innis, M. et al. (1990) PCR Protocols, A Guide to Methods and Applications, Academic Press, San Diego CA. PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above. A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra. The term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
A "regulatory element" refers to a nucleic acid sequence usually derived from untranslated regions of a gene and includes enhancers, promoters, introns, and 5' and 3' untranslated regions (UTRs). Regulatory elements interact with host or viral proteins which control transcription, translation, or RNA stability.
"Reporter molecules" are chemical or biochemical moieties used for labeling a nucleic acid, amino acid, or antibody. Reporter molecules include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art. An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
The term "sample" is used in its broadest sense. A sample suspected of containing CSAP, nucleic acids encoding CSAP, or fragments thereof may comprise a bodily fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, in solution or bound to a substrate; a tissue; a tissue print; etc.
The terms "specific binding" and "specifically binding" refer to that interaction between a protein or peptide and an agonist, an antibody, an antagonist, a small molecule, or any natural or synthetic binding composition. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide comprising the epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
The term "substantially purified" refers to nucleic acid or amino acid sequences that are removed from their natural environment and are isolated or separated, and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other components with which they are naturally associated. A "substitution" refers to the replacement of one or more amino acid residues or nucleotides by different amino acid residues or nucleotides, respectively.
"Substrate" refers to any suitable rigid or semi-rigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
A "transcript image" or "expression profile" refers to the collective pattern of gene expression by a particular cell type or tissue under given conditions at a given time.
"Transformation" describes a process by which exogenous DNA is introduced into a recipient cell. Transformation may occur under natural or artificial conditions according to various methods well known in the art, and may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method for transformation is selected based on the type of host cell being transformed and may include, but is not limited to, bacteriophage or viral infection, electroporation, heat shock, hpofection, and particle bombardment. The term "transformed cells" includes stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as transiently transformed cells which express the inserted DNA or RNA for limited periods of time.
A "transgenic organism," as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art. The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. In one alternative, the nucleic acid can be introduced by infection with a recombinant viral vector, such as a lentiviral vector (Lois, C. et al. (2002) Science 295:868-872). The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, plants and animals. The isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 40% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length. A variant may be described as, for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" variant. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides will generally have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one nucleotide base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state. A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length of one of the polypeptides.
THE INVENTION
The invention is based on the discovery of new human cytoskeleton-associated proteins (CSAP), the polynucleotides encoding CSAP, and the use of these compositions for the diagnosis, treatment, or prevention of cell proliferative disorders, viral infections, and neurological disorders.
Table 1 summarizes the nomenclature for the full length polynucleotide and polypeptide sequences of the invention. Each polynucleotide and its corresponding polypeptide are correlated to a single Incyte project identification number (Incyte Project ED). Each polypeptide sequence is denoted by both a polypeptide sequence identification number (Polypeptide SEQ ID NO:) and an Incyte polypeptide sequence number (Incyte Polypeptide ID) as shown. Each polynucleotide sequence is denoted by both a polynucleotide sequence identification number (Polynucleotide SEQ ED NO:) and an Incyte polynucleotide consensus sequence number (Incyte Polynucleotide ED) as shown. Column 6 shows the Incyte ED numbers of physical, full length clones corresponding to the polypeptide and polynucleotide sequences of the invention. The full length clones encode polypeptides which have at least 95% sequence identity to the polypeptide sequences shown in column 3.
Table 2 shows sequences with homology to the polypeptides of the invention as identified by BLAST analysis against the GenBank protein (genpept) database. Columns 1 and 2 show the polypeptide sequence identification number (Polypeptide SEQ ED NO:) and the corresponding Incyte polypeptide sequence number (Incyte Polypeptide ED) for polypeptides of the invention. Column 3 shows the GenBank identification number (GenBank ID NO:) of the nearest GenBank homolog. Column 4 shows the probability scores for the matches between each polypeptide and its homolog(s). Column 5 shows the annotation of the GenBank homologs along with relevant citations where applicable, all of which are expressly incorporated by reference herein.
Table 3 shows various structural features of the polypeptides of the invention. Columns 1 and 2 show the polypeptide sequence identification number (SEQ ID NO:) and the corresponding Incyte polypeptide sequence number (Incyte Polypeptide ED) for each polypeptide of the invention. Column 3 shows the number of amino acid residues in each polypeptide. Column 4 shows potential phosphorylation sites, and column 5 shows potential glycosylation sites, as determined by the MOTIFS program of the GCG sequence analysis software package (Genetics Computer Group, Madison WI). Column 6 shows amino acid residues comprising signature sequences, domains, and motifs. Column 7 shows analytical methods for protein structure/function analysis and in some cases, searchable databases to which the analytical methods were applied. Together, Tables 2 and 3 summarize the properties of polypeptides of the invention, and these properties establish that the claimed polypeptides are cytoskeleton-associated proteins. For example, SEQ ID NO: 1 is 86% identical, from residue Ml to residue S459, to mouse c29 protein (GenBank ID g3868802) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 1.4e-207, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ ID NO: 1 also contains an intermediate filament protein domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS and PROFILESCAN analyses provide further corroborative evidence that SEQ ID NO: 1 is a intermediate filament protein. In an alternative example, SEQ ED NO:3 is 93% identical from residue Ml to residue Dl 107 and 42% identical from residue E470 to residue N1614, (that is, 74% identical over the length of the sequence) to Mus musculus Kif21a (GenBank ID g6561827) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score over the length of the sequence is 2.3e-199, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ DD NO:3 also contains a kinesin motor domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCAN analyses provide further corroborative evidence that SEQ ED NO:3 is a kinesin. In an alternative example, SEQ DD NO:7 is 95% identical, from residue 1125 to residue T1050, to rat ankyrin binding cell adhesion molecule neurofascin (GenBank ED gl842427) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ ED NO:7 also contains a fibronectin type HI domain and an immunoglobulin domain as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLEVIPS, MOTIFS, and PROFILESCAN analyses provide further corroborative evidence that SEQ ED NO:7 is a cytoskeleton-associated protein. In an alternative example, SEQ ED NO: 9 is 95% identical, from residue Ml to residue D471, to rat coronin relative protein (GenBank ED gl5430628) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ ID NO: 9 also contains WD domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS and MOTIFS analyses provide further corroborative evidence that SEQ ED NO:9 is a coronin. In an alternative example, SEQ DD NO: 14 is 99% identical, from residue Ml to residue R523, to human keratin 6 irs (GenBank ED g6961277) as determined by the Basic Local Alignment Search Tool (BLAST). The BLAST probability score is 0.0, wliich indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ ID NO: 14 also contains intermediate filament protein domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)- based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCAN analyses provide further corroborative evidence that SEQ DD NO: 14 is an intermediate filament protein, which is a specific subtype of cytoskeletal protein. En an alternative example, SEQ ED NO: 18 is 2039 residues in length and is 94% identical, from residue Ml to residue A2039, to mouse myosin containing PDZ domain (GenBank ED g7416032) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ ID NO: 18 also contains an IQ calmodulin-binding motif, a PDZ domain (also known as DHR or GLGF), and a myosin head (motor domain) as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS, MOTDFS, and additional BLAST analyses provide further corroborative evidence that SEQ ED NO: 18 is a cytoskeleton-associated protein. In an alternative example, SEQ ED NO:26 is 92% identical, from residue Ml to residue L1715, to rat ankyrin repeat-rich membrane-spanning protein (GenBank DD gl 1321435) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0.0, which indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ DD NO:26 also contains eleven ankyrin repeat domains as determined by searching for statistically significant matches in the hidden Markov model (HMM)-based PFAM database of conserved protein family domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCAN analyses provide further corroborative evidence that SEQ ID NO:26 is an ankyrin repeat-rich protein. Many ankyrin repeats have been shown to moderate protein-protein interactions, for example, in cytoskeletal proteins. SEQ ID NO:2, SEQ ED NO:4-6, SEQ DD NO:8, SEQ DD NO: 10-13, SEQ DD NO: 15-17, SEQ ED NO: 19-25, and SEQ ED NO:27-28 were analyzed and annotated in a similar manner. The algorithms and parameters for the analysis of SEQ DD NO: 1-28 are described in Table 7. As shown in Table 4, the full length polynucleotide sequences of the present invention were assembled using cDNA sequences or coding (exon) sequences derived from genomic DNA, or any combination of these two types of sequences. Column 1 lists the polynucleotide sequence identification number (Polynucleotide SEQ DD NO:), the corresponding Incyte polynucleotide consensus sequence number (Incyte DD) for each polynucleotide of the invention, and the length of each polynucleotide sequence in basepairs. Column 2 shows the nucleotide start (5') and stop (3') positions of the cDNA and/or genomic sequences used to assemble the full length polynucleotide sequences of the invention, and of fragments of the polynucleotide sequences which are useful, for example, in hybridization or amplification technologies that identify SEQ DD NO:29-56 or that distinguish between SEQ ED NO:29-56 and related polynucleotide sequences. The polynucleotide fragments described in Column 2 of Table 4 may refer specifically, for example, to Incyte cDNAs derived from tissue-specific cDNA libraries or from pooled cDNA libraries. Alternatively, the polynucleotide fragments described in column 2 may refer to GenBank cDNAs or ESTs which contributed to the assembly ofthe full length polynucleotide sequences. In addition, the polynucleotide fragments described in column 2 may identify sequences derived from the ENSEMBL (The Sanger Centre, Cambridge, UK) database (Le., those sequences including the designation "ENST"). Alternatively, the polynucleotide fragments described in column 2 may be derived from the NCBI RefSeq Nucleotide Sequence Records Database (i.e., those sequences including the designation "NM" or "NT") or the NCBI RefSeq Protein Sequence Records (i.e., those sequences including the designation "NP"). Alternatively, the polynucleotide fragments described in column 2 may refer to assemblages of both cDNA and Genscan-predicted exons brought together by an "exon stitching" algorithm. For example, a polynucleotide sequence identified as FL_XXXXXX_N1_N2_YYYYY_N3_N4 represents a "stitched" sequence in which XXXXXX is the identification number of the cluster of sequences to which the algorithm was applied, and YYYYY is the number of the prediction generated by the algorithm, and N1Λ3..., if present, represent specific exons that may have been manually edited during analysis (See Example V). Alternatively, the polynucleotide fragments in column 2 may refer to assemblages of exons brought together by an "exon-stretching" algorithm. For example, a polynucleotide sequence identified as FLXXXXXX_gAAAAA_gBBBBB_l_N is a "stretched" sequence, with XXXXXX being the Incyte project identification number, gAAAAA being the GenBank identification number of the human genomic sequence to which the "exon-stretching" algorithm was applied, gBBBBB being the GenBank identification number or NCBI RefSeq identification number of the nearest GenBank protem homolog, and N referring to specific exons (See Example V). In instances where a RefSeq sequence was used as a protein homolog for the "exon-stretching" algorithm, a RefSeq identifier (denoted by "ΝM," "ΝP," or "NT") may be used in place of the GenBank identifier (i.e., gBBBBB). Alternatively, a prefix identifies component sequences that were hand-edited, predicted from genomic DNA sequences, or derived from a combination of sequence analysis methods. The following Table lists examples of component sequence prefixes and corresponding sequence analysis methods associated with the prefixes (see Example TV and Example V).
FL Stitched or stretched genomic sequences (see Example V).
E CY Full length transcript and exon prediction from mapping of EST sequences to the genome. Genomic location and EST composition data are combined to predict the exons and resulting transcript.
In some cases, Incyte cDNA coverage redundant with the sequence coverage shown in Table 4 was obtained to confirm the final consensus polynucleotide sequence, but the relevant Incyte cDNA identification numbers are not shown.
Table 5 shows the representative cDNA libraries for those full length polynucleotide sequences which were assembled using Incyte cDNA sequences. The representative cDNA library is the Incyte cDNA library which is most frequently represented by the Incyte cDNA sequences which were used to assemble and confirm the above polynucleotide sequences. The tissues and vectors which were used to construct the cDNA libraries shown in Table 5 are described in Table 6.
The invention also encompasses CSAP variants. A preferred CSAP variant is one which has at least about 80%, or alternatively at least about 90%, or even at least about 95% amino acid sequence identity to the CSAP amino acid sequence, and which contains at least one functional or structural characteristic of CSAP.
The invention also encompasses polynucleotides which encode CSAP. In a particular embodiment, the invention encompasses a polynucleotide sequence comprising a sequence selected from the group consisting of SEQ DD NO:29-56, which encodes CSAP. The polynucleotide sequences of SEQ DD NO: 29-56, as presented in the Sequence Listing, embrace the equivalent RNA sequences, wherein occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
The invention also encompasses a variant of a polynucleotide sequence encoding CSAP. In particular, such a variant polynucleotide sequence will have at least about 70%, or alternatively at least about 85%, or even at least about 95% polynucleotide sequence identity to the polynucleotide sequence encoding CSAP. A particular aspect of the invention encompasses a variant of a polynucleotide sequence comprising a sequence selected from the group consisting of SEQ ED NO:29-56 which has at least about 70%, or alternatively at least about 85%, or even at least about 95% polynucleotide sequence identity to a nucleic acid sequence selected from the group consisting of SEQ ED NO:29-56. Any one of the polynucleotide variants described above can encode an amino acid sequence which contains at least one functional or structural characteristic of CSAP.
In addition, or in the alternative, a polynucleotide variant of the invention is a splice variant of a polynucleotide sequence encoding CSAP. A splice variant may have portions which have significant sequence identity to the polynucleotide sequence encoding CSAP, but will generally have a greater or lesser number of polynucleotides due to additions or deletions of blocks of sequence arising from alternate splicing of exons during mRNA processing. A splice variant may have less than about 70%, or alternatively less than about 60%, or alternatively less than about 50% polynucleotide sequence identity to the polynucleotide sequence encoding CSAP over its entire length; however, portions of the splice variant will have at least about 70%, or alternatively at least about 85%, or alternatively at least about 95%, or alternatively 100% polynucleotide sequence identity to portions of the polynucleotide sequence encoding CSAP. For example, a polynucleotide comprising a sequence of SEQ DD NO:31 is a splice variant of a polynucleotide comprising a sequence of SEQ DD NO:33. In an alternative example, a polynucleotide comprising a sequence of SEQ ED NO:34 is a splice variant of a polynucleotide comprising a sequence of SEQ ED NO:35. Any one of the splice variants described above can encode an amino acid sequence which contains at least one functional or structural characteristic of CSAP.
It will be appreciated by those skilled in the art that as a result of the degeneracy of the genetic code, a multitude of polynucleotide sequences encoding CSAP, some bearing minimal similarity to the polynucleotide sequences of any known and naturally occurring gene, may be produced. Thus, the invention contemplates each and every possible variation of polynucleotide sequence that could be made by selecting combinations based on possible codon choices. These combinations are made in accordance with the standard triplet genetic code as applied to the polynucleotide sequence of naturally occurring CSAP, and all such variations are to be considered as being specifically disclosed.
Although nucleotide sequences which encode CSAP and its variants are generally capable of hybridizing to the nucleotide sequence ofthe naturally occurring CSAP under appropriately selected conditions of stringency, it may be advantageous to produce nucleotide sequences encoding CSAP or its derivatives possessing a substantially different codon usage, e.g., inclusion of non-naturally occurring codons. Codons may be selected to increase the rate at which expression of the peptide occurs in a particular prokaryotic or eukaryotic host in accordance with the frequency with which particular codons are utilized by the host. Other reasons for substantially altering the nucleotide sequence encoding CSAP and its derivatives without altering the encoded amino acid sequences include the production of RNA transcripts having more desirable properties, such as a greater half -life, than transcripts produced from the naturally occurring sequence.
The invention also encompasses production of DNA sequences which encode CSAP and CSAP derivatives, or fragments thereof, entirely by synthetic chemistry. After production, the synthetic sequence may be inserted into any of the many available expression vectors and cell systems using reagents well known in the art. Moreover, synthetic chemistry may be used to introduce mutations into a sequence encoding CSAP or any fragment thereof.
Also encompassed by the invention are polynucleotide sequences that are capable of hybridizing to the claimed polynucleotide sequences, and, in particular, to those shown in SEQ ED NO:29-56 and fragments thereof under various conditions of stringency. (See, e.g., Wahl, G.M. and SL. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions, including annealing and wash conditions, are described in "Definitions."
Methods for DNA sequencing are well known in the art and may be used to practice any of the embodiments of the invention. The methods may employ such enzymes as the Klenow fragment of DNA polymerase I, SEQUENASE (US Biochemical, Cleveland OH), Taq polymerase (Applied Biosystems), thermostable T7 polymerase (Amersham Pharmacia Biotech, Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies, Gaithersburg MD). Preferably, sequence preparation is automated with machines such as the MICROLAB 2200 liquid transfer system (Hamilton, Reno NV), PTC200 thermal cycler (MJ Research, Watertown MA) and ABI CATALYST 800 thermal cycler (Applied Biosystems). Sequencing is then carried out using either the ABI 373 or 377 DNA sequencing system (Applied Biosystems), the MEGABACE 1000 DNA sequencing system (Molecular Dynamics, Sunnyvale CA), or other systems known in the art. The resulting sequences are analyzed using a variety of algorithms which are well known in the art. (See, e.g., Ausubel, F.M. (1997) Short Protocols in Molecular Biology. John Wiley & Sons, New York NY, unit 7.7; Meyers, R.A. (1995) Molecular Biology and Biotechnology. Wiley VCH, New York NY, pp. 856-853.)
The nucleic acid sequences encoding CSAP may be extended utilizing a partial nucleotide sequence and employing various PCR-based methods known in the art to detect upstream sequences, such as promoters and regulatory elements. For example, one method which may be employed, restriction-site PCR, uses universal and nested primers to amplify unknown sequence from genomic DNA within a cloning vector. (See, e.g., Sarkar, G. (1993) PCR Methods Applic. 2:318-322.) Another method, inverse PCR, uses primers that extend in divergent directions to amplify unknown sequence from a circularized template. The template is derived from restriction fragments comprising a known genomic locus and surrounding sequences. (See, e.g., Triglia, T. et al. (1988) Nucleic Acids Res. 16:8186.) A third method, capture PCR, involves PCR amplification of DNA fragments adjacent to known sequences in human and yeast artificial chromosome DNA. (See, e.g., Lagerstrom, M. et al. (1991) PCR Methods Applic. 1:111-119.) In this method, multiple restriction enzyme digestions and ligations may be used to insert an engineered double-stranded sequence into a region of unknown sequence before performing PCR. Other methods which may be used to retrieve unknown sequences are known in the art. (See, e.g., Parker, J.D. et al. (1991) Nucleic Acids Res. 19:3055-3060). Additionally, one may use PCR, nested primers, and PROMOTERFTNDER libraries (Clontech, Palo Alto CA) to walk genomic DNA. This procedure avoids the need to screen libraries and is useful in finding intron/exon junctions. For all PCR-based methods, primers may be designed using commercially available software, such as OLIGO 4.06 primer analysis software (National Biosciences, Plymouth MN) or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the template at temperatures of about 68°C to 72°C. When screening for full length cDNAs, it is preferable to use libraries that have been size-selected to include larger cDNAs. In addition, random-primed libraries, which often include sequences containing the 5' regions of genes, are preferable for situations in which an oligo d(T) library does not yield a full-length cDNA. Genomic libraries may be useful for extension of sequence into 5' non-transcribed regulatory regions. Capillary electrophoresis systems which are commercially available may be used to analyze the size or confirm the nucleotide sequence of sequencing or PCR products. In particular, capillary sequencing may employ flowable polymers for electrophoretic separation, four different nucleotide- specific, laser-stimulated fluorescent dyes, and a charge coupled device camera for detection of the emitted wavelengths. Output/light intensity may be converted to electrical signal using appropriate software (e.g., GENOTYPER and SEQUENCE NAVIGATOR, Applied Biosystems), and the entire process from loading of samples to computer analysis and electronic data display may be computer controlled. Capillary electrophoresis is especially preferable for sequencing small DNA fragments which may be present in limited amounts in a particular sample.
In another embodiment of the invention, polynucleotide sequences or fragments thereof which encode CSAP may be cloned in recombinant DNA molecules that direct expression of CSAP, or fragments or functional equivalents thereof, in appropriate host cells. Due to the inherent degeneracy of the genetic code, other DNA sequences which encode substantially the same or a functionally equivalent amino acid sequence may be produced and used to express CSAP.
The nucleotide sequences of the present invention can be engineered using methods generally known in the art in order to alter CS AP-encoding sequences for a variety of purposes including, but not limited to, modification of the cloning, processing, and/or expression of the gene product. DNA shuffling by random fragmentation and PCR reassembly of gene fragments and synthetic oligonucleotides may be used to engineer the nucleotide sequences. For example, oligonucleotide- mediated site-directed mutagenesis may be used to introduce mutations that create new restriction sites, alter glycosylation patterns, change codon preference, produce splice variants, and so forth.
The nucleotides of the present invention may be subjected to DNA shuffling techniques such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent No. 5,837,458; Chang, C.-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of CSAP, such as its biological or enzymatic activity or its ability to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner.
In another embodiment, sequences encoding CSAP may be synthesized, in whole or in part, using chemical methods well known in the art. (See, e.g., Caruthers, M.H. et al. (1980) Nucleic Acids Symp. Ser. 7:215-223; and Horn, T. et al. (1980) Nucleic Acids Symp. Ser. 7:225-232.)
Alternatively, CSAP itself or a fragment thereof may be synthesized using chemical methods. For example, peptide synthesis can be performed using various solution-phase or solid-phase techniques. (See, e.g., Creighton, T. (1984) Proteins, Structures and Molecular Properties, WH Freeman, New York NY, pp. 55-60; and Roberge, J.Y. et al. (1995) Science 269:202-204.) Automated synthesis may be achieved using the ABI 431 A peptide synthesizer (Applied Biosystems). Additionally, the amino acid sequence of CSAP, or any part thereof, may be altered during direct synthesis and/or combined with sequences from other proteins, or any part thereof, to produce a variant polypeptide or a polypeptide having a sequence of a naturally occurring polypeptide.
The peptide may be substantially purified by preparative high performance liquid chromatography. (See, e.g., Chiez, R.M. and F.Z. Regnier (1990) Methods Enzymol. 182:392-421.) The composition of the synthetic peptides may be confirmed by amino acid analysis or by sequencing. (See, e.g., Creighton, supra, pp. 28-53.)
In order to express a biologically active CSAP, the nucleotide sequences encoding CSAP or derivatives thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host. These elements include regulatory sequences, such as enhancers, constitutive and inducible promoters, and 5' and 3' untranslated regions in the vector and in polynucleotide sequences encoding CSAP. Such elements may vary in their strength and specificity. Specific initiation signals may also be used to achieve more efficient translation of sequences encoding CSAP. Such signals include the ATG initiation codon and adjacent sequences, e.g. the Kozak sequence. In cases where sequences encoding CSAP and its initiation codon and upstream regulatory sequences are inserted into the appropriate expression vector, no additional transcriptional or translational control signals may be needed. However, in cases where only coding sequence, or a fragment thereof, is inserted, exogenous translational control signals including an in-frame ATG initiation codon should be provided by the vector. Exogenous translational elements and initiation codons may be of various origins, both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of enhancers appropriate for the particular host cell system used. (See, e.g., Scharf, D. et al. (1994) Results Probl. Cell Differ. 20:125-162.) Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding CSAP and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, J. et al. (1989) Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Press, Plainview NY, ch. 4, 8, and 16-17; Ausubel, F.M. et al. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, New York NY, ch. 9, 13, and 16.)
A variety of expression vector/host systems may be utilized to contain and express sequences encoding CSAP. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic vims, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal cell systems. (See, e.g., Sambrook, supra; Ausubel, supra; Van Heeke, G and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Engelhard, E.K. et al. (1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; The McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New
York NY, pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al. (1993) Proc. Natl. Acad. Sci.
USA 90(13):6340-6344; Buller, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al.
(1994) Mol. Immunol. 31(3):219-226; and Verma, I.M. and N. Somia (1997) Nature 389:239-242.)
The invention is not limited by the host cell employed. In bacterial systems, a number of cloning and expression vectors may be selected depending upon the use intended for polynucleotide sequences encoding CSAP. For example, routine cloning, subcloning, and propagation of polynucleotide sequences encoding CSAP can be achieved using a multifunctional E. coli vector such as PBLUESCRIPT (Stratagene, La Jolla CA) or PSPORT1 plasmid (Life Technologies). Ligation of sequences encoding CSAP into the vector's multiple cloning site disrupts the lacL gene, allowing a colorimetric screening procedure for identification of transformed bacteria containing recombinant molecules. Jxi addition, these vectors may be useful for in vitro transcription, dideoxy sequencing, single strand rescue with helper phage, and creation of nested deletions in the cloned sequence. (See, e.g., Van Heeke, G. and S.M. Schuster (1989) J. Biol.
Chem. 264:5503-5509.) When large quantities of CSAP are needed, e.g. for the production of antibodies, vectors which direct high level expression of CSAP may be used. For example, vectors containing the strong, inducible SP6 or T7 bacteriophage promoter may be used.
Yeast expression systems may be used for production of CSAP. A number of vectors containing constitutive or inducible promoters, such as alpha factor, alcohol oxidase, and PGH promoters, may be used in the yeast Saccharomyces cerevisiae or Pichia pastoris. In addition, such vectors direct either the secretion or intracellular retention of expressed proteins and enable integration of foreign sequences into the host genome for stable propagation. (See, e.g., Ausubel,
1995, supra; Bitter, G.A. et al. (1987) Methods Enzymol. 153:516-544; and Scorer, CA. et al. (1994)
Bio/Technology 12:181-184.)
Plant systems may also be used for expression of CSAP. Transcription of sequences encoding CSAP may be driven by viral promoters, e.g., the 35S and 19S promoters of CaMV used alone or in combination with the omega leader sequence from TMV (Takamatsu, N. (1987) EMBO J.
6:307-311). Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock promoters may be used. (See, e.g., Coruzzi, G. et al. (1984) EMBO J. 3: 1671-1680; Broglie, R. et al.
(1984) Science 224:838-843; and Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105.) These constructs can be introduced into plant cells by direct DNA transformation or pathogen-mediated transfection. (See, e.g., The McGraw Hill Yearbook of Science and Technology
(1992) McGraw Hill, New York NY, pp. 191-196.)
In mammalian cells, a number of viral-based expression systems may be utilized, hi cases where an adenovirus is used as an expression vector, sequences encoding CSAP may be ligated into an adenovirus transcription translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a non-essential El or E3 region of the viral genome may be used to obtain infective virus which expresses CSAP in host cells. (See, e.g., Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659.) In addition, transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells. SV40 or EBV- based vectors may also be used for high-level protein expression.
Human artificial chromosomes (HACs) may also be employed to deliver larger fragments of DNA than can be contained in and expressed from a plasmid. HACs of about 6 kb to 10 Mb are constructed and delivered via conventional delivery methods (liposomes, polycationic amino polymers, or vesicles) for therapeutic purposes. (See, e.g., Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.)
For long term production of recombinant proteins in mammalian systems, stable expression of CSAP in cell lines is preferred. For example, sequences encoding CSAP can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Following the introduction of the vector, cells may be allowed to grow for about 1 to 2 days in enriched media before being switched to selective media. The purpose of the selectable marker is to confer resistance to a selective agent, and its presence allows growth and recovery of cells which successfully express the introduced sequences. Resistant clones of stably transformed cells may be propagated using tissue culture techniques appropriate to the cell type.
Any number of selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase and adenine phosphoribosyltransferase genes, for use in tic and apr cells, respectively. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.) Also, antimetabolite, antibiotic, or herbicide resistance can be used as the basis for selection. For example, dhfr confers resistance to methotrexate; neo confers resistance to the aminoglycosides neomycin and G-418; and als and pat confer resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively. (See, e.g., Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14.) Additional selectable genes have been described, e.g., trpB and hisD, which alter cellular requirements for metabolites. (See, e.g., Hartman, S.C. and R.C. Mulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051.) Visible markers, e.g., anthocyanins, green fluorescent proteins (GFP; Clontech), β glucuronidase and its substrate β-glucuronide, or luciferase and its substrate luciferm may be used. These markers can be used not only to identify transformants, but also to quantify the amount of transient or stable protein expression attributable to a specific vector system. (See, e.g., Rhodes, CA. (1995) Methods Mol. Biol. 55:121-131.)
Although the presence/absence of marker gene expression suggests that the gene of interest is also present, the presence and expression ofthe gene may need to be confirmed. For example, if the sequence encoding CSAP is inserted within a marker gene sequence, transformed cells containing sequences encoding CSAP can be identified by the absence of marker gene function. Alternatively, a marker gene can be placed in tandem with a sequence encoding CSAP under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem gene as well.
In general, host cells that contain the nucleic acid sequence encoding CSAP and that express CSAP may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridizations, PCR amplification, and protein bioassay or immunoassay techniques which include membrane, solution, or chip based technologies for the detection and/or quantification of nucleic acid or protein sequences. Immunological methods for detecting and measuring the expression of CSAP using either specific polyclonal or monoclonal antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS). A two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on CSAP is preferred, but a competitive binding assay may be employed. These and other assays are well known in the art. (See, e.g., Hampton, R. et al. (1990) Serological Methods, a Laboratory Manual, APS Press, St. Paul MN, Sect. TV; Coligan, J.E. et al. (1997) Current Protocols in Immunology, Greene Pub. Associates and Wiley-Interscience, New York NY; and Pound, J.D. (1998) Immunochemical Protocols, Humana Press, Totowa NJ.)
A wide variety of labels and conjugation techniques are known by those skilled in the art and may be used in various nucleic acid and amino acid assays. Means for producing labeled hybridization or PCR probes for detecting sequences related to polynucleotides encoding CSAP include oligolabeling, nick translation, end-labeling, or PCR amplification using a labeled nucleotide. Alternatively, the sequences encoding CSAP, or any fragments thereof, may be cloned into a vector for the production of an mRNA probe. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3, or SP6 and labeled nucleotides. These procedures may be conducted using a variety of commercially available kits, such as those provided by Amersham Pharmacia Biotech, Promega (Madison WI), and US Biochemical. Suitable reporter molecules or labels which may be used for ease of detection include radionuclides, enzymes, fluorescent, chemiluminescent, or chromogenic agents, as well as substrates, cofactors, inhibitors, magnetic particles, and the like.
Host cells transformed with nucleotide sequences encoding CSAP may be cultured under conditions suitable for the expression and recovery of the protein from cell culture. The protein produced by a transformed cell may be secreted or retained intracellularly depending on the sequence and/or the vector used. As will be understood by those of skill in the art, expression vectors containing polynucleotides which encode CSAP may be designed to contain signal sequences which direct secretion of CSAP through a prokaryotic or eukaryotic cell membrane.
In addition, a host cell strain may be chosen for its ability to modulate expression of the inserted sequences or to process the expressed protein in the desired fashion. Such modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation. Post-translational processing which cleaves a "prepro" or "pro" form of the protein may also be used to specify protein targeting, folding, and/or activity. Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities (e.g., CHO, HeLa, MDCK, HEK293, and WI38) are available from the American Type Culture Collection (ATCC, Manassas VA) and may be chosen to ensure the correct modification and processing of the foreign protein.
In another embodiment of the invention, natural, modified, or recombinant nucleic acid sequences encoding CSAP may be ligated to a heterologous sequence resulting in translation of a fusion protein in any of the aforementioned host systems. For example, a chimeric CSAP protein containing a heterologous moiety that can be recognized by a commercially available antibody may facilitate the screening of peptide libraries for inhibitors of CSAP activity. Heterologous protein and peptide moieties may also facilitate purification of fusion proteins using commercially available affinity matrices. Such moieties include, but are not limited to, glutathione S-transferase (GST), maltose binding protein (MBP), thioredoxin (Trx), calmodulin binding peptide (CBP), 6-His, FLAG, c-myc, and hemagglutinin (HA). GST, MBP, Trx, CBP, and 6-His enable purification of their cognate fusion proteins on immobilized glutathione, maltose, phenylarsine oxide, calmodulin, and metal-chelate resins, respectively. FLAG, c-myc, and hemagglutinin (HA) enable immunoaffinity purification of fusion proteins using commercially available monoclonal and polyclonal antibodies that specifically recognize these epitope tags. A fusion protein may also be engineered to contain a proteolytic cleavage site located between the CSAP encoding sequence and the heterologous protein sequence, so that CSAP may be cleaved away from the heterologous moiety following purification. Methods for fusion protein expression and purification are discussed in Ausubel (1995, supra, ch. 10). A variety of commercially available kits may also be used to facilitate expression and purification of fusion proteins. In a further embodiment of the invention, synthesis of radiolabeled CSAP may be achieved in vitro using the TNT rabbit reticulocyte lysate or wheat germ extract system (Promega). These systems couple transcription and translation of protein-coding sequences operably associated with the T7, T3, or SP6 promoters. Translation takes place in the presence of a radiolabeled amino acid precursor, for example, 35S-methionine.
CSAP of the present invention or fragments thereof may be used to screen for compounds that specifically bind to CSAP. At least one and up to a plurality of test compounds may be screened for specific binding to CSAP. Examples of test compounds include antibodies, oligonucleotides, proteins (e.g., receptors), or small molecules. In one embodiment, the compound thus identified is closely related to the natural ligand of
CSAP, e.g., a ligand or fragment thereof, a natural substrate, a structural or functional mimetic, or a natural binding partner. (See, e.g., Coligan, J.E. et al. (1991) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the compound can be closely related to the natural receptor to which CSAP binds, or to at least a fragment of the receptor, e.g., the ligand binding site. In either case, the compound can be rationally designed using known techniques. In one embodiment, screening for these compounds involves producing appropriate cells which express CSAP, either as a secreted protein or on the cell membrane. Preferred cells include cells from mammals, yeast, Drosophila, or K coli. Cells expressing CSAP or cell membrane fractions which contain CSAP are then contacted with a test compound and binding, stimulation, or inhibition of activity of either CSAP or the compound is analyzed.
An assay may simply test binding of a test compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. For example, the assay may comprise the steps of combining at least one test compound with CSAP, either in solution or affixed to a solid support, and detecting the binding of CSAP to the compound. Alternatively, the assay may detect or measure binding of a test compound in the presence of a labeled competitor. Additionally, the assay may be carried out using cell-free preparations, chemical libraries, or natural product mixtures, and the test compound(s) may be free in solution or affixed to a solid support.
CSAP of the present invention or fragments thereof may be used to screen for compounds that modulate the activity of CSAP. Such compounds may include agonists, antagonists, or partial or inverse agonists. In one embodiment, an assay is performed under conditions permissive for CSAP activity, wherein CSAP is combined with at least one test compound, and the activity of CSAP in the presence of a test compound is compared with the activity of CSAP in the absence of the test compound. A change in the activity of CSAP in the presence of the test compound is indicative of a compound that modulates the activity of CSAP. Alternatively, a test compound is combined with an in vitro or cell-free system comprising CSAP under conditions suitable for CSAP activity, and the assay is performed. In either of these assays, a test compound which modulates the activity of CSAP may do so indirectly and need not come in direct contact with the test compound. At least one and up' to a plurality of test compounds may be screened.
In another embodiment, polynucleotides encoding CSAP or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells. Such techniques are well known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent No. 5,175,383 and U.S. Patent No. 5,767,337.) For example, mouse ES cells, such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292). The vector integrates into the corresponding region of the host genome by homologous recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents.
Polynucleotides encoding CSAP may also be manipulated in vitro in ES cells derived from human blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
Polynucleotides encoding CSAP can also be used to create "knockin" humanized animals (pigs) or transgenic animals (mice or rats) to model human disease. With knockin technology, a region of a polynucleotide encoding CSAP is injected into animal ES cells, and the injected sequence integrates into the animal cell genome. Transformed cells are injected into blastulae, and the blastulae are implanted as described above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to overexpress CSAP, e.g., by secreting CSAP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74). THERAPEUTICS
Chemical and structural similarity, e.g., in the context of sequences and motifs, exists between regions of CSAP and cytoskeleton-associated proteins. In addition, examples of tissues expressing CSAP are normal and cancerous lung tissues, and normal and cancerous breast tissues, and can also be found in Table 6. Therefore, CSAP appears to play a role in cell proliferative disorders, viral infections, and neurological disorders. In the treatment of disorders associated with increased CSAP expression or activity, it is desirable to decrease the expression or activity of CSAP. In the treatment of disorders associated with decreased CSAP expression or activity, it is desirable to increase the expression or activity of CSAP. Therefore, in one embodiment, CSAP or a fragment or derivative thereof may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP. Examples of such disorders include, but are not limited to, a cell proliferative disorder such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and a cancer including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; a viral infection such as those caused by adenoviruses (acute respiratory disease, pneumonia), arenaviruses (lymphocytic choriomeningitis), bunyaviruses (Hantavirus), coronaviruses (pneumonia, chronic bronchitis), hepadnaviruses (hepatitis), herpesviruses (herpes simplex virus, varicella-zoster virus, Epstein-Barr virus, cytomegalovirus), flaviviruses (yellow fever), orthomyxoviruses (influenza), papillomaviruses (cancer), paramyxoviruses (measles, mumps), picornoviruses (rhinovirus, poliovirus, coxsackie- virus), polyomaviruses (BK virus, JC virus), poxviruses (smallpox), reovirus (Colorado tick fever), retroviruses (human immunodeficiency virus, human T lymphotropic virus), rhabdoviruses (rabies), rotaviruses (gastroenteritis), and togaviruses (encephalitis, rubella); and a neurological disorder such as epilepsy, ischemic cerebrovascular disease, stroke, cerebral neoplasms, Alzheimer's disease, Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal disorders, amyotrophic lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural empyema, epidural abscess, suppurative intracranial thrombophlebitis, myelitis and radiculitis, viral central nervous system disease, a prion disease including kuru, Creutzfeldt- Jakob disease, and Gerstmann-Straussler-Scheinker syndrome, fatal familial insomnia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental disorders of the central nervous system, cerebral palsy, neuroskeletal disorders, autonomic nervous system disorders, cranial nerve disorders, spinal cord diseases, muscular dystrophy and other neuromuscular disorders, peripheral nervous system disorders, dermatomyositis and polymyositis, inherited, metabolic, endocrine, and toxic myopathies, myasthenia gravis, periodic paralysis, mental disorders including mood, anxiety, and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, postherpetic neuralgia, and Tourette's disorder. In another embodiment, a vector capable of expressing CSAP or a fragment or derivative thereof may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP including, but not limited to, those described above.
In a further embodiment, a composition comprising a substantially purified CSAP in conjunction with a suitable pharmaceutical carrier may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP including, but not limited to, those provided above.
In still another embodiment, an agonist which modulates the activity of CSAP may be administered to a subject to treat or prevent a disorder associated with decreased expression or activity of CSAP including, but not limited to, those listed above. a further embodiment, an antagonist of CSAP may be administered to a subject to treat or prevent a disorder associated with increased expression or activity of CSAP. Examples of such disorders include, but are not limited to, those cell proliferative disorders, viral infections, and neurological disorders described above. Jxi one aspect, an antibody which specifically binds CSAP may be used directly as an antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells or tissues which express CSAP.
In an additional embodiment, a vector expressing the complement of the polynucleotide encoding CSAP may be administered to a subject to treat or prevent a disorder associated with increased expression or activity of CSAP including, but not limited to, those described above.
En other embodiments, any of the proteins, antagonists, antibodies, agonists, complementary sequences, or vectors of the invention may be administered in combination with other appropriate therapeutic agents. Selection of the appropriate agents for use in combination therapy may be made by one of ordinary skill in the art, according to conventional pharmaceutical principles. The combination of therapeutic agents may act synergistically to effect the treatment or prevention of the various disorders described above. Using this approach, one may be able to achieve therapeutic efficacy with lower dosages of each agent, thus reducing the potential for adverse side effects.
An antagonist of CSAP may be produced using methods which are generally known in the art. In particular, purified CSAP may be used to produce antibodies or to screen libraries of pharmaceutical agents to identify those which specifically bind CSAP. Antibodies to CSAP may also be generated using methods that are well known in the art. Such antibodies may include, but are not limited to, polyclonal, monoclonal, chimeric, and single chain antibodies, Fab fragments, and fragments produced by a Fab expression library. Neutralizing antibodies (i.e., those which inhibit dimer formation) are generally preferred for therapeutic use. Single chain antibodies (e.g., from camels or llamas) may be potent enzyme inhibitors and may have advantages in the design of peptide mimetics, and in the development of immuno-adsorbents and biosensors (Muyldermans, S. (2001) J. Biotechnol. 74:277-302).
For the production of antibodies, various hosts including goats, rabbits, rats, mice, camels, dromedaries, llamas, humans, and others may be immunized by injection with CSAP or with any fragment or oligopeptide thereof which has immunogenic properties. Depending on the host species, various adjuvants may be used to increase immunological response. Such adjuvants include, but are not limited to, Freund's, mineral gels such as aluminum hydroxide, and surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, KLH, and dinitrophenol. Among adjuvants used in humans, BCG (bacilli Calmette-Guerin) and Corvnebacterium parvum are especially preferable. It is preferred that the oligopeptides, peptides, or fragments used to induce antibodies to
CSAP have an amino acid sequence consisting of at least about 5 amino acids, and generally will consist of at least about 10 amino acids. It is also preferable that these oligopeptides, peptides, or fragments are identical to a portion of the amino acid sequence of the natural protein. Short stretches of CSAP amino acids may be fused with those of another protein, such as KLH, and antibodies to the chimeric molecule may be produced.
Monoclonal antibodies to CSAP may be prepared using any technique which provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique, the human B-cell hybridoma technique, and the EBV-hybridoma technique. (See, e.g., Kohler, G. et al. (1975) Nature 256:495-497; Kozbor, D. et al. (1985) J. Immunol. Methods 81:31-42; Cote, R.J. et al. (1983) Proc. Natl. Acad. Sci. USA 80:2026-2030; and Cole, S.P. et al. (1984) Mol. Cell Biol. 62:109-120.)
In addition, techniques developed for the production of "chimeric antibodies," such as the splicing of mouse antibody genes to human antibody genes to obtain a molecule with appropriate antigen specificity and biological activity, can be used. (See, e.g., Morrison, S.L. et al. (1984) Proc. Natl. Acad. Sci. USA 81:6851-6855; Neuberger, M.S. et al. (1984) Nature 312:604-608; and Takeda, S. et al. (1985) Nature 314:452-454.) Alternatively, techniques described for the production of single chain antibodies may be adapted, using methods known in the art, to produce CSAP-specific single chain antibodies. Antibodies with related specificity, but of distinct idiotypic composition, may be generated by chain shuffling from random combinatorial immunoglobulin libraries. (See, e.g., Burton, D.R. (1991) Proc. Natl. Acad. Sci. USA 88:10134-10137.)
Antibodies may also be produced by inducing in vivo production in the lymphocyte population or by screening immunoglobulin libraries or panels of highly specific binding reagents as disclosed in the literature. (See, e.g., Orlandi, R. et al. (1989) Proc. Natl. Acad. Sci. USA 86:3833-3837; Winter, G. et al. (1991) Nature 349:293-299.)
Antibody fragments which contain specific binding sites for CSAP may also be generated. For example, such fragments include, but are not limited to, F ab^ fragments produced by pepsin digestion of the antibody molecule and Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, Fab expression libraries may be constructed to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity. (See, e.g., Huse, W.D. et al. (1989) Science 246:1275-1281.)
Various immunoassays may be used for screening to identify antibodies having the desired specificity. Numerous protocols for competitive binding or immunoradiometric assays using either polyclonal or monoclonal antibodies with established specificities are well known in the art. Such immunoassays typically involve the measurement of complex formation between CSAP and its specific antibody. A two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering CSAP epitopes is generally used, but a competitive binding assay may also be employed (Pound, supra).
Various methods such as Scatchard analysis in conjunction with radioimmunoassay techniques may be used to assess the affinity of antibodies for CSAP. Affinity is expressed as an association constant, Ka, which is defined as the molar concentration of CSAP-antibody complex divided by the molar concentrations of free antigen and free antibody under equilibrium conditions. The Ka determined for a preparation of polyclonal antibodies, which are heterogeneous in their affinities for multiple CSAP epitopes, represents the average affinity, or avidity, of the antibodies for CSAP. The Ka determined for a preparation of monoclonal antibodies, which are monospecific for a particular CSAP epitope, represents a true measure of affinity. High-affinity antibody preparations with Ka ranging from about 109 to 1012 L/mole are preferred for use in immunoassays in which the CSAP-antibody complex must withstand rigorous manipulations. Low-affinity antibody preparations with Ka ranging from about 106 to 107 L/mole are preferred for use in immunopurification and similar procedures which ultimately require dissociation of CSAP, preferably in active form, from the antibody (Catty, D. (1988) Antibodies, Volume I: A Practical Approach, IRL Press, Washington DC; Liddell, J.E. and A. Cryer (1991) A Practical Guide to Monoclonal Antibodies. John Wiley & Sons, New York NY). The titer and avidity of polyclonal antibody preparations may be further evaluated to determine the quality and suitability of such preparations for certain downstream applications. For example, a polyclonal antibody preparation containing at least 1-2 mg specific antibody/ml, preferably 5-10 mg specific antibody/ml, is generally employed in procedures requiring precipitation of CSAP-antibody complexes. Procedures for evaluating antibody specificity, titer, and avidity, and guidelines for antibody quality and usage in various applications, are generally available. (See, e.g., Catty, supra, and Coligan et al. supra.)
In another embodiment ofthe invention, the polynucleotides encoding CSAP, or any fragment or complement thereof, may be used for therapeutic purposes. In one aspect, modifications of gene expression can be achieved by designing complementary sequences or antisense molecules (DNA, RNA, PNA, or modified oligonucleotides) to the coding or regulatory regions of the gene encoding CSAP. Such technology is well known in the art, and antisense oligonucleotides or larger fragments can be designed from various locations along the coding or control regions of sequences encoding CSAP. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics. Humana Press Inc., Totawa NJ.) In therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used. Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein. (See, e.g., Slater, J.E. et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K.J. et al. (1995) 9(13): 1288-1296.) Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 76:271; Ausubel, supra; Uckert, W. and W. Walther (1994) Pharmacol. Ther. 63(3):323-347.) Other gene delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems known in the art. (See, e.g., Rossi, J.J. (1995) Br. Med. Bull. 51(l):217-225; Boado, R.J. et al. (1998) J. Pharm. Sci. 87(11): 1308-1315; and Morris, M.C. et al. (1997) Nucleic Acids Res. 25(14):2730-2736.)
In another embodiment of the invention, polynucleotides encoding CSAP may be used for somatic or germline gene therapy. Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCDD)-Xl disease characterized by X- linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207-216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassamias, familial hypercholesterolemia, and hemophilia resulting from Factor VIJJ or Factor LX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, I.M. and N. Somia (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the case of cancers which result from unregulated cell proliferation), or (iii) express a protein which affords protection against intracellular parasites (e.g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltimore, D.
(1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA 93:11395-11399), hepatitis B or C virus (HBV, HCV); fungal parasites, such as Candida albicans and Paracoccidioides brasiliensis; and protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi). In the case where a genetic deficiency in CSAP expression or regulation causes disease, the expression of CSAP from an appropriate population of transduced cells may alleviate the clinical manifestations caused by the genetic deficiency.
In a further embodiment ofthe invention, diseases or disorders caused by deficiencies in CSAP are treated by constructing mammalian expression vectors encoding CSAP and introducing these vectors by mechanical means into CSAP-deficient cells. Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and W.F. Anderson (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and H. Recipon (1998) Curr. Opin. Biotechnol. 9:445-450). Expression vectors that may be effective for the expression of CSAP include, but are not limited to, the PCDNA 3.1, EPJTAG, PRCCMV2, PREP, PVAX, PCR2-TOPOTA vectors (Invitrogen, Carlsbad CA), PCMV-SCREPT, PCMV-TAG, PEGSH7PERV (Stratagene, La Jolla CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). CSAP may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or β-actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and H. Bujard (1992) Proc. Natl. Acad. Sci. USA 89:5547-5551; Gossen, M. et al. (1995) Science 268:1766-1769; Rossi, F.M.V. and H.M. Blau (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX plasmid (Invitrogen)); the ecdysone-inducible promoter (available in the plasmids PVGRXR and PESfD; Invitrogen); the FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter (Rossi, F.M.V. and H.M. Blau, supra)), or (iii) a tissue-specific promoter or the native promoter of the endogenous gene encoding CSAP from a normal individual.
Commercially available liposome transformation kits (e.g., the PERFECT LJPDD TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver polynucleotides to target cells in culture and require minimal effort to optimize experimental parameters. In the alternative, transformation is performed using the calcium phosphate method (Graham, FL. and AJ. Eb (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845). The introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
In another embodiment ofthe invention, diseases or disorders caused by genetic defects with respect to CSAP expression are treated by constructing a retrovirus vector consisting of (i) the polynucleotide encoding CSAP under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus cis-acting RNA sequences and coding sequences required for efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. USA 92:6733-6737), incorporated by reference herein. The vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61: 1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and A.D. Miller (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol. 72:9873-9880). U.S. Patent No. 5,910,434 to Rigg ("Method for obtaining retrovirus packaging cell lines producing high transducing efficiency retroviral supernatant") discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4+ T- cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020- 7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, ML. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. USA 95:1201-1206; Su, L. (1997) Blood 89:2283- 2290).
In the alternative, an adenovirus-based gene therapy delivery system is used to deliver polynucleotides encoding CSAP to cells which have one or more genetic abnormalities with respect to the expression of CSAP. The construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent No. 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, I.M. and N. Somia (1997) Nature 18:389:239-242, both incorporated by reference herein.
In another alternative, a herpes-based, gene therapy delivery system is used to deliver polynucleotides encoding CSAP to target cells which have one or more genetic abnormalities with respect to the expression of CSAP. The use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing CSAP to cells of the central nervous system, for which HSV has a tropism. The construction and packaging of herpes-based vectors are well known to those with ordinary skill in the art. A replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res. 169:385-395). The construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent No. 5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference. U.S. Patent No. 5,804,413 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for purposes including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W.F. et al. (1999) J. Virol. 73:519-532 and Xu, H. et al. (1994) Dev. Biol. 163: 152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, the generation of recombinant virus following the transfection of multiple plasmids containing different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art. fn another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to deliver polynucleotides encoding CSAP to target cells. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and K.-J. Li (1998) Curr. Opin. Biotechnol. 9:464-469). During alphavirus RNA replication, a subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic RNA replicates to higher levels than the full length genomic RNA, resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). Similarly, inserting the coding sequence for CSAP into the alphavirus genome in place of the capsid-coding region results in the production of a large number of CSAP-coding RNAs and the synthesis of high levels of CSAP in vector transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the needs ofthe gene therapy application (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host range of alphaviruses will allow the introduction of CSAP into a variety of cell types. The specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction. The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
Oligonucleotides derived from the transcription initiation site, e.g., between about positions -10 and +10 from the start site, may also be employed to inhibit gene expression. Similarly, inhibition can be achieved using triple helix base-pairing methodology. Triple helix pairing is useful because it causes inhibition of the ability of the double helix to open sufficiently for the binding of polymerases, transcription factors, or regulatory molecules. Recent therapeutic advances using triplex DNA have been described in the literature. (See, e.g., Gee, J.E. et al. (1994) in Huber, B.E. and B.I. Carr, Molecular and Immunologic Approaches. Futura Publishing, Mt. Kisco NY, pp. 163- 177.) A complementary sequence or antisense molecule may also be designed to block translation of mRNA by preventing the transcript from binding to ribosomes.
Ribozymes, enzymatic RNA molecules, may also be used to catalyze the specific cleavage of RNA. The mechanism of ribozyme action involves sequence-specific hybridization of the ribozyme molecule to complementary target RNA, followed by endonucleolytic cleavage. For example, engineered hammerhead motif ribozyme molecules may specifically and efficiently catalyze endonucleolytic cleavage of sequences encoding CSAP.
Specific ribozyme cleavage sites within any potential RNA target are initially identified by scanning the target molecule for ribozyme cleavage sites, including the following sequences: GUA, GUU, and GUC. Once identified, short RNA sequences of between 15 and 20 ribonucleotides, corresponding to the region of the target gene containing the cleavage site, may be evaluated for secondary structural features which may render the oligonucleotide inoperable. The suitability of candidate targets may also be evaluated by testing accessibility to hybridization with complementary oligonucleotides using ribonuclease protection assays.
Complementary ribonucleic acid molecules and ribozymes of the invention may be prepared by any method known in the art for the synthesis of nucleic acid molecules. These include techniques for chemically synthesizing oligonucleotides such as solid phase phosphoramidite chemical synthesis. Alternatively, RNA molecules may be generated by in vitro and in vivo transcription of DNA sequences encoding CSAP. Such DNA sequences may be incorporated into a wide variety of vectors with suitable RNA polymerase promoters such as T7 or SP6. Alternatively, these cDNA constructs that synthesize complementary RNA, constitutively or inducibly, can be introduced into cell lines, cells, or tissues.
RNA molecules may be modified to increase intracellular stability and half-life. Possible modifications include, but are not limited to, the addition of flanking sequences at the 5' and/or 3' ends of the molecule, or the use of phosphorothioate or 2' O-methyl rather than phosphodiesterase linkages within the backbone of the molecule. This concept is inherent in the production of PNAs and can be extended in all of these molecules by the inclusion of nontraditional bases such as inosine, queosine, and wybutosine, as well as acetyl-, methyl-, thio-, and similarly modified forms of adenine, cytidine, guanine, thymine, and uridine which are not as easily recognized by endogenous endonucleases. An additional embodiment of the invention encompasses a method for screening for a compound which is effective in altering expression of a polynucleotide encoding CSAP. Compounds which may be effective in altering expression of a specific polynucleotide may include, but are not limited to, oligonucleotides, antisense oligonucleotides, triple helix-forming oligonucleotides, transcription factors and other polypeptide transcriptional regulators, and non-macromolecular chemical entities which are capable of interacting with specific polynucleotide sequences. Effective compounds may alter polynucleotide expression by acting as either inhibitors or promoters of polynucleotide expression. Thus, in the treatment of disorders associated with increased CSAP expression or activity, a compound which specifically inhibits expression of the polynucleotide encoding CSAP may be therapeutically useful, and in the treatment of disorders associated with decreased CSAP expression or activity, a compound which specifically promotes expression of the polynucleotide encoding CSAP may be therapeutically useful.
At least one, and up to a plurality, of test compounds may be screened for effectiveness in altering expression of a specific polynucleotide. A test compound may be obtained by any method commonly known in the art, including chemical modification of a compound known to be effective in altering polynucleotide expression; selection from an existing, commercially-available or proprietary library of naturally-occurring or non-natural chemical compounds; rational design of a compound based on chemical and/or structural properties of the target polynucleotide; and selection from a library of chemical compounds created combinatorially or randomly. A sample comprising a polynucleotide encoding CSAP is exposed to at least one test compound thus obtained. The sample may comprise, for example, an intact or permeabilized cell, or an in vitro cell-free or reconstituted biochemical system. Alterations in the expression of a polynucleotide encoding CSAP are assayed by any method commonly known in the art. Typically, the expression of a specific nucleotide is detected by hybridization with a probe having a nucleotide sequence complementary to the sequence ofthe polynucleotide encoding CSAP. The amount of hybridization may be quantified, thus forming the basis for a comparison of the expression of the polynucleotide both with and without exposure to one or more test compounds. Detection of a change in the expression of a polynucleotide exposed to a test compound indicates that the test compound is effective in altering the expression of the polynucleotide. A screen for a compound effective in altering expression of a specific polynucleotide can be carried out, for example, using a Schizosaccharomyces pombe gene expression system (Atkins, D. et al. (1999) U.S. Patent No. 5,932,435; Arndt, G.M. et al. (2000) Nucleic Acids Res. 28:E15) or a human cell line such as HeLa cell (Clarke, ML. et al. (2000) Biochem. Biophys. Res. Commun. 268:8-13). A particular embodiment of the present invention involves screening a combinatorial library of oligonucleotides (such as deoxyribonucleotides, ribonucleotides, peptide nucleic acids, and modified oligonucleotides) for antisense activity against a specific polynucleotide sequence (Bruice, T.W. et al. (1997) U.S. Patent No. 5,686,242; Bruice, T.W. et al. (2000) U.S. Patent No. 6,022,691).
Many methods for introducing vectors into cells or tissues are available and equally suitable for use in vivo, in vitro, and ex vivo. For ex vivo therapy, vectors may be introduced into stem cells taken from the patient and clonally propagated for autologous transplant back into that same patient. Delivery by transfection, by liposome injections, or by polycationic amino polymers may be achieved using methods which are well known in the art. (See, e.g., Goldman, C.K. et al. (1997) Nat. Biotechnol. 15:462-466.)
Any of the therapeutic methods described above may be applied to any subject in need of such therapy, including, for example, mammals such as humans, dogs, cats, cows, horses, rabbits, and monkeys.
An additional embodiment of the invention relates to the administration of a composition which generally comprises an active ingredient formulated with a pharmaceutically acceptable excipient. Excipients may include, for example, sugars, starches, celluloses, gums, and proteins. Various formulations are commonly known and are thoroughly discussed in the latest edition of Remington's Pharmaceutical Sciences (Maack Publishing, Easton PA). Such compositions may consist of CSAP, antibodies to CSAP, and mimetics, agonists, antagonists, or inhibitors of CSAP. The compositions utilized in this invention may be administered by any number of routes including, but not limited to, oral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means.
Compositions for pulmonary administration may be prepared in liquid or dry powder form. These compositions are generally aerosolized immediately prior to inhalation by the patient. In the case of small molecules (e.g. traditional low molecular weight organic drugs), aerosol delivery of fast-acting formulations is well-known in the art. In the case of macromolecules (e.g. larger peptides and proteins), recent developments in the field of pulmonary delivery via the alveolar region of the lung have enabled the practical delivery of drugs such as insulin to blood circulation (see, e.g., Patton, J.S. et al., U.S. Patent No. 5,997,848). Pulmonary delivery has the advantage of administration without needle injection, and obviates the need for potentially toxic penetration enhancers.
Compositions suitable for use in the invention include compositions wherein the active ingredients are contained in an effective amount to achieve the intended purpose. The determination of an effective dose is well within the capability of those skilled in the art.
Specialized forms of compositions may be prepared for direct intracellular delivery of macromolecules comprising CSAP or fragments thereof. For example, liposome preparations containing a cell-impermeable macromolecule may promote cell fusion and intracellular delivery of the macromolecule. Alternatively, CSAP or a fragment thereof may be joined to a short cationic N- terminal portion from the HJV Tat-1 protein. Fusion proteins thus generated have been found to transduce into the cells of all tissues, including the brain, in a mouse model system (Schwarze, S.R. et al. (1999) Science 285: 1569-1572).
For any compound, the therapeutically effective dose can be estimated initially either in cell culture assays, e.g., of neoplastic cells, or in animal models such as mice, rats, rabbits, dogs, monkeys, or pigs. An animal model may also be used to determine the appropriate concentration range and route of administration. Such information can then be used to determine useful doses and routes for administration in humans.
A therapeutically effective dose refers to that amount of active ingredient, for example CSAP or fragments thereof, antibodies of CSAP, and agonists, antagonists or inhibitors of CSAP, which ameliorates the symptoms or condition. Therapeutic efficacy and toxicity may be determined by standard pharmaceutical procedures in cell cultures or with experimental animals, such as by calculating the ED50 (the dose therapeutically effective in 50% of the population) or LD50 (the dose lethal to 50% of the population) statistics. The dose ratio of toxic to therapeutic effects is the therapeutic index, wliich can be expressed as the LD50/ED50 ratio. Compositions which exhibit large therapeutic indices are preferred. The data obtained from cell culture assays and animal studies are used to formulate a range of dosage for human use. The dosage contained in such compositions is preferably within a range of circulating concentrations that includes the ED50 with little or no toxicity. The dosage varies within this range depending upon the dosage form employed, the sensitivity of the patient, and the route of administration.
The exact dosage will be determined by the practitioner, in light of factors related to the subject requiring treatment. Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain the desired effect. Factors which may be taken into account include the severity of the disease state, the general health of the subject, the age, weight, and gender of the subject, time and frequency of administration, drug combination(s), reaction sensitivities, and response to therapy. Long-acting compositions may be administered every 3 to 4 days, every week, or biweekly depending on the half-life and clearance rate of the particular formulation.
Normal dosage amounts may vary from about 0.1 μg to 100,000 μg, up to a total dose of about 1 gram, depending upon the route of administration. Guidance as to particular dosages and methods of delivery is provided in the literature and generally available to practitioners in the art. Those skilled in the art will employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotides or polypeptides will be specific to particular cells, conditions, locations, etc. DIAGNOSTICS
In another embodiment, antibodies which specifically bind CSAP may be used for the diagnosis of disorders characterized by expression of CSAP, or in assays to monitor patients being treated with CSAP or agonists, antagonists, or inhibitors of CSAP. Antibodies useful for diagnostic purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays for CSAP include methods which utilize the antibody and a label to detect CSAP in human body fluids or in extracts of cells or tissues. The antibodies may be used with or without modification, and may be labeled by covalent or non-covalent attachment of a reporter molecule. A wide variety of reporter molecules, several of which are described above, are known in the art and may be used.
A variety of protocols for measuring CSAP, including ELISAs, RIAs, and FACS, are known in the art and provide a basis for diagnosing altered or abnormal levels of CSAP expression. Normal or standard values for CSAP expression are established by combining body fluids or cell extracts taken from normal mammalian subjects, for example, human subjects, with antibodies to CSAP under conditions suitable for complex formation. The amount of standard complex formation may be quantitated by various methods, such as photometric means. Quantities of CSAP expressed in subject, control, and disease samples frombiopsied tissues are compared with the standard values. Deviation between standard and subject values establishes the parameters for diagnosing disease.
Ixi another embodiment of the invention, the polynucleotides encoding CSAP may be used for diagnostic purposes. The polynucleotides which may be used include oligonucleotide sequences, complementary RNA and DNA molecules, and PNAs. The polynucleotides may be used to detect and quantify gene expression in biopsied tissues in which expression of CSAP may be correlated with disease. The diagnostic assay may be used to determine absence, presence, and excess expression of CSAP, and to monitor regulation of CSAP levels during therapeutic intervention.
In one aspect, hybridization with PCR probes which are capable of detecting polynucleotide sequences, including genomic sequences, encoding CSAP or closely related molecules may be used to identify nucleic acid sequences which encode CSAP. The specificity of the probe, whether it is made from a highly specific region, e.g., the 5'regulatory region, or from a less specific region, e.g., a conserved motif, and the stringency of the hybridization or amplification will determine whether the probe identifies only naturally occurring sequences encoding CSAP, allelic variants, or related sequences.
Probes may also be used for the detection of related sequences, and may have at least 50% sequence identity to any of the CSAP encoding sequences. The hybridization probes of the subject invention may be DNA or RNA and may be derived from the sequence of SEQ DD NO:29-56 or from genomic sequences including promoters, enhancers, and introns of the CSAP gene.
Means for producing specific hybridization probes for DNAs encoding CSAP include the cloning of polynucleotide sequences encoding CSAP or CSAP derivatives into vectors for the production of mRNA probes. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the appropriate labeled nucleotides. Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides such as 32P or 35S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, and the like. Polynucleotide sequences encoding CSAP may be used for the diagnosis of disorders associated with expression of CSAP. Examples of such disorders include, but are not limited to, a cell proliferative disorder such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and a cancer including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; a viral infection such as those caused by adenoviruses (acute respiratory disease, pneumonia), arenaviruses (lymphocytic choriomeningitis), bunyaviruses (Hantavirus), coronaviruses (pneumonia, chronic bronchitis), hepadnaviruses (hepatitis), herpesviruses (herpes simplex virus, varicella-zoster virus, Epstein-Barr virus, cytomegalovirus), flaviviruses (yellow fever), orthomyxoviruses (influenza), papillomaviruses (cancer), paramyxoviruses (measles, mumps), picornoviruses (rhinovirus, poliovirus, coxsackie- virus), polyomaviruses (BK virus, JC virus), poxviruses (smallpox), reovirus (Colorado tick fever), retroviruses (human immunodeficiency virus, human T lymphotropic virus), rhabdoviruses (rabies), rotaviruses (gastroenteritis), and togaviruses (encephalitis, rubella); and a neurological disorder such as epilepsy, ischemic cerebrovascular disease, stroke, cerebral neoplasms, Alzheimer's disease, Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal disorders, amyotrophic lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural empyema, epidural abscess, suppurative intracranial thrombophlebitis, myelitis and radiculitis, viral central nervous system disease, a prion disease including kuru, Creutzfeldt-Jakob disease, and Gerstmann-Straussler-Scheinker syndrome, fatal familial insomnia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental disorders of the central nervous system, cerebral palsy, neuroskeletal disorders, autonomic nervous system disorders, cranial nerve disorders, spinal cord diseases, muscular dystrophy and other neuromuscular disorders, peripheral nervous system disorders, dermatomyositis and polymyositis, inherited, metabolic, endocrine, and toxic myopathies, myasthenia gravis, periodic paralysis, mental disorders including mood, anxiety, and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, postherpetic neuralgia, and Tourette's disorder. The polynucleotide sequences encoding CSAP may be used in Southern or northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in microarrays utilizing fluids or tissues from patients to detect altered CSAP expression. Such qualitative or quantitative methods are well known in the art.
In a particular aspect, the nucleotide sequences encoding CSAP may be useful in assays that detect the presence of associated disorders, particularly those mentioned above. The nucleotide sequences encoding CSAP may be labeled by standard methods and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and the signal is quantified and compared with a standard value. If the amount of signal in the patient sample is significantly altered in comparison to a control sample then the presence of altered levels of nucleotide sequences encoding CSAP in the sample indicates the presence of the associated disorder. Such assays may also be used to evaluate the efficacy of a particular therapeutic treatment regimen in animal studies, in clinical trials, or to monitor the treatment of an individual patient.
I i order to provide a basis for the diagnosis of a disorder associated with expression of CSAP, a normal or standard profile for expression is established. This may be accomplished by combining body fluids or cell extracts taken from normal subjects, either animal or human, with a sequence, or a fragment thereof, encoding CSAP, under conditions suitable for hybridization or amplification. Standard hybridization may be quantified by comparing the values obtained from normal subjects with values from an experiment in which a known amount of a substantially purified polynucleotide is used. Standard values obtained in this manner may be compared with values obtained from samples from patients who are symptomatic for a disorder. Deviation from standard values is used to establish the presence of a disorder.
Once the presence of a disorder is established and a treatment protocol is initiated, hybridization assays may be repeated on a regular basis to determine if the level of expression in the patient begins to approximate that which is observed in the normal subject. The results obtained from successive assays may be used to show the efficacy of treatment over a period ranging from several days to months.
With respect to cancer, the presence of an abnormal amount of transcript (either under- or overexpressed) in biopsied tissue from an individual may indicate a predisposition for the development of the disease, or may provide a means for detecting the disease prior to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow health professionals to employ preventative measures or aggressive treatment earlier thereby preventing the development or further progression of the cancer.
Additional diagnostic uses for oligonucleotides designed from the sequences encoding CSAP may involve the use of PCR. These oligomers may be chemically synthesized, generated enzymatically, or produced in vitro. Oligomers will preferably contain a fragment of a polynucleotide encoding CSAP, or a fragment of a polynucleotide complementary to the polynucleotide encoding CSAP, and will be employed under optimized conditions for identification of a specific gene or condition. Oligomers may also be employed under less stringent conditions for detection or quantification of closely related DNA or RNA sequences. In a particular aspect, oligonucleotide primers derived from the polynucleotide sequences encoding CSAP may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from the polynucleotide sequences encoding CSAP are used to amplify DNA using the polymerase chain reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high- throughput equipment such as DNA sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP (isSNP), are capable of identifying polymorphisms by comparing the sequence of individual overlapping DNA fragments which assemble into a common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing errors using statistical models and automated analyses of DNA sequence chromatograms. Ixi the alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). SNPs may be used to study the genetic basis of human disease. For example, at least 16 common SNPs have been associated with non-insulin-dependent diabetes mellitus. SNPs are also useful for examining differences in disease outcomes in monogenic disorders, such as cystic fibrosis, sickle cell anemia, or chronic granulomatous disease. For example, variants in the mannose-binding lectin, MBL2, have been shown to be correlated with deleterious pulmonary outcomes in cystic fibrosis. SNPs also have utility in pharmacogenomics, the identification of genetic variants that influence a patient's response to a drug, such as life-threatening toxicity. For example, a variation in N-acetyl fransferase is associated with a high incidence of peripheral neuropathy in response to the anti-tuberculosis drug isoniazid, while a variation in the core promoter of the ALOX5 gene results in diminished clinical response to treatment with an anti-asthma drug that targets the 5-lipoxygenase pathway. Analysis of the distribution of SNPs in different populations is useful for investigating genetic drift, mutation, recombination, and selection, as well as for tracing the origins of populations and their migrations. (Taylor, J.G. et al. (2001) Trends Mol. Med. 7:507-512; Kwok, P.-Y. and Z. Gu (1999) Mol. Med. Today 5:538-543; Nowotny, P. et al. (2001) Curr. Opin. Neurobiol. 11:637-641.)
Methods which may also be used to quantify the expression of CSAP include radiolabeling or biotinylating nucleotides, coamplification of a control nucleic acid, and interpolating results from standard curves. (See, e.g., Melby, P.C. et al. (1993) J. Immunol. Methods 159:235-244; Duplaa, C. et al. (1993) Anal. Biochem. 212:229-236.) The speed of quantitation of multiple samples may be accelerated by running the assay in a high-throughput format where the oligomer or polynucleotide of interest is presented in various dilutions and a spectrophotometric or colorimetric response gives rapid quantitation. In further embodiments, oligonucleotides or longer fragments derived from any of the polynucleotide sequences described herein may be used as elements on a microarray. The microarray can be used in transcript imaging techniques which monitor the relative expression levels of large numbers of genes simultaneously as described below. The microarray may also be used to identify genetic variants, mutations, and polymorphisms. This information may be used to determine gene function, to understand the genetic basis of a disorder, to diagnose a disorder, to monitor progression/regression of disease as a function of gene expression, and to develop and monitor the activities of therapeutic agents in the treatment of disease. In particular, this information may be used to develop a pharmacogenomic profile of a patient in order to select the most appropriate and effective treatment regimen for that patient. For example, therapeutic agents which are highly effective and display the fewest side effects may be selected for a patient based on his/her pharmacogenomic profile.
In another embodiment, CSAP, fragments of CSAP, or antibodies specific for CSAP may be used as elements on a microarray. The microarray may be used to monitor or measure protein-protein interactions, drug-target interactions, and gene expression profiles, as described above.
A particular embodiment relates to the use of the polynucleotides of the present invention to generate a transcript image of a tissue or cell type. A transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent No.
5,840,484, expressly incorporated by reference herein.) Thus a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type. Ixi one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray. The resultant transcript image would provide a profile of gene activity.
Transcript images may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect gene expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line. Transcript images which profile the expression of the polynucleotides of the present invention may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E.F. et al. (1999) Mol. Carcinog. 24:153-159; Steiner, S. and N.L. Anderson (2000) Toxicol. Lett. 112-113:467-471, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties. These fingerprints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in interpretation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatures which leads to prediction of toxicity. (See, for example, Press Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 2000, available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore, it is important and desirable in toxicological screening using toxicant signatures to include all expressed gene sequences.
Ixi one embodiment, the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified. The transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
Another particular embodiment relates to the use of the polypeptide sequences of the present invention to analyze the proteome of a tissue or cell type. The term proteome refers to the global pattern of protein expression in a particular tissue or cell type. Each protein component of a proteome can be subjected individually to further analysis. Proteome expression patterns, or profiles, are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time. A profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type. In one embodiment, the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra). The proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot is generally proportional to the level of the protein in the sample. The optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment. The proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry. The identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification. A proteomic profile may also be generated using antibodies specific for CSAP to quantify the levels of CSAP expression. Ixi one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103- 111; Mendoze, L.G. et al. (1999) Biotechniques 27:778-788). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level. There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and J. Seilhamer (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile. In addition, the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the polypeptides of the present invention.
Jxi another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the polypeptides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g., Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 2155; and Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662.) Various types of microarrays are well known and thoroughly described in DNA Microarrays: A Practical Approach, M. Schena, ed. (1999) Oxford University Press, London, hereby expressly incorporated by reference.
Jxi another embodiment ofthe invention, nucleic acid sequences encoding CSAP may be used to generate hybridization probes useful in mapping the naturally occurring genomic sequence. Either coding or noncoding sequences may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of a coding sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B J. (1991) Trends Genet. 7:149-154.) Once mapped, the nucleic acid sequences of the invention may be used to develop genetic linkage maps, for example, which correlate the inheritance of a disease state with the inheritance of a particular chromosome region or restriction fragment length polymorphism (RFLP). (See, for example, Lander, E.S. and D. Botstein (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) Fluorescent in situ hybridization (FISH) may be correlated with other physical and genetic map data. (See, e.g., Heinz-Ulrich, et al. (1995) in Meyers, supra, pp. 965-968.) Examples of genetic map data can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMEVI) World Wide Web site. Correlation between the location of the gene encoding CSAP on a physical map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder and thus may further positional cloning efforts.
In situ hybridization of chromosomal preparations and physical mapping techniques, such as linkage analysis using established chromosomal markers, may be used for extending genetic maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the exact chromosomal locus is not known. This information is valuable to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once the gene or genes responsible for a disease or syndrome have been crudely localized by genetic linkage to a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequence of the instant invention may also be used to detect differences in the chromosomal location due to translocation, inversion, etc., among normal, carrier, or affected individuals.
Ixi another embodiment of the invention, CSAP, its catalytic or immunogenic fragments, or oligopeptides thereof can be used for screening libraries of compounds in any of a variety of drug screening techniques. The fragment employed in such screening may be free in solution, affixed to a solid support, borne on a cell surface, or located intracellularly. The formation of binding complexes between CSAP and the agent being tested may be measured.
Another technique for drug screening provides for high throughput screening of compounds having suitable binding affinity to the protein of interest. (See, e.g., Geysen, et al. (1984) PCT application WO84/03564.) In this method, large numbers of different small test compounds are synthesized on a solid substrate. The test compounds are reacted with CSAP, or fragments thereof, and washed. Bound CSAP is then detected by methods well known in the art. Purified CSAP can also be coated directly onto plates for use in the aforementioned drug screening techniques. Alternatively, non-neutralizing antibodies can be used to capture the peptide and immobilize it on a solid support.
Ixi another embodiment, one may use competitive drug screening assays in which neutralizing antibodies capable of binding CSAP specifically compete with a test compound for binding CSAP. Jxi this manner, antibodies can be used to detect the presence of any peptide which shares one or more antigenic determinants with CSAP.
In additional embodiments, the nucleotide sequences which encode CSAP may be used in any molecular biology techniques that have yet to be developed, provided the new techniques rely on properties of nucleotide sequences that are currently known, including, but not limited to, such properties as the triplet genetic code and specific base pair interactions.
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder ofthe disclosure in any way whatsoever.
The disclosures of all patents, applications, and publications mentioned above and below, including U.S. Ser. No. 60/280,508, U.S. Ser. No. 60/281,323, U.S. Ser. No. 60/283,769, U.S. Ser. No. 60/288,609, U.S. Ser. No. 60/290,518, U.S. Ser. No. 60/291,870, and U.S. Ser. No. 60/294,451, are hereby expressly incorporated by reference.
EXAMPLES I. Construction of cDNA Libraries
Incyte cDNAs were derived from cDNA libraries described in the LEFESEQ GOLD database (Incyte Genomics, Palo Alto CA). Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRLZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was precipitated from the lysates with either isopropanol or sodium acetate and ethanol, or by other routine methods.
Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA purity. Ixi some cases, RNA was treated with DNase. For most libraries, poly(A)+ RNA was isolated using oligo d(T)-coupled paramagnetic particles (Promega), OLIGOTEX latex particles (QIAGEN, Chatsworth CA), or an OLIGOTEX mRNA purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Austin TX).
Ixi some cases, Stratagene was provided with RNA and constructed the corresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene) or SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra, units 5.1-6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300- 1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCREPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (Invitrogen, Carlsbad CA), PBK-CMV plasmid (Stratagene), PCR2-TOPOTA plasmid (Invitrogen), PCMV-ICIS plasmid (Stratagene), pIGEN (Incyte Genomics, Palo Alto CA), pRARE (Incyte Genomics), or pINCY (Incyte Genomics), or derivatives thereof. Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5α, DH10B, or ElectroMAX DH10B from Life Technologies.
II. Isolation of cDNA Clones Plasmids obtained as described in Example I were recovered from host cells by in vivo excision using the UNLZAP vector system (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: a Magic or WIZARD Minipreps DNA purification system (Promega); an AGTC Miniprep purification kit (Edge Biosystems, Gaithersburg MD); and QIAWELL 8 Plasmid, QIAWELL 8 Plus Plasmid, QIAWELL 8 Ultra Plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit from QIAGEN. Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or without lyophilization, at 4°C
Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format (Rao, V.B. (1994) Anal. Biochem. 216:1-14). Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using PICOGREEN dye (Molecular Probes, Eugene OR) and a FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland).
III. Sequencing and Analysis
Incyte cDNA recovered in plasmids as described in Example II were sequenced as follows. Sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 (Applied Biosystems) thermal cycler or the PTC-200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific) or the MICROLAB 2200 (Hamilton) liquid transfer system. cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (Applied Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel,
1997, supra, unit 1.1). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VD3.
The polynucleotide sequences derived from Incyte cDNAs were validated by removing vector, linker, and poly(A) sequences and by masking ambiguous bases, using algorithms and programs based on BLAST, dynamic programming, and dinucleotide nearest neighbor analysis. The Incyte cDNA sequences or translations thereof were then queried against a selection of public databases such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases, and BLOCKS, PRINTS, DOMO, PRODOM; PROTEOME databases with sequences from Homo sapiens, Rattus norvegicus, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae,
Schizosaccharomyces pombe, and Candida albicans (Incyte Genomics, Palo Alto CA); hidden Markov model (HMM)-based protein family databases such as PFAM, EMCY, and TIGRFAM (Haft, D.H. et al. (2001) Nucleic Acids Res. 29:41-43); and HMM-based protein domain databases such as SMART (Schultz et al. (1998) Proc. Natl. Acad. Sci. USA 95:5857-5864; Letunic, I. et al. (2002) Nucleic Acids Res. 30:242-244). (HMM is a probabilistic approach which analyzes consensus primary structures of gene families. See, for example, Eddy, S.R. (1996) Curr. Opin. Struct. Biol. 6:361-365.) The queries were performed using programs based on BLAST, FASTA, BLIMPS, and HMMER. The Incyte cDNA sequences were assembled to produce full length polynucleotide sequences. Alternatively, GenBank cDNAs, GenBank ESTs, stitched sequences, stretched sequences, or Genscan-predicted coding sequences (see Examples IV and V) were used to extend Incyte cDNA assemblages to full length. Assembly was performed using programs based on Phred, Phrap, and Consed, and cDNA assemblages were screened for open reading frames using programs based on GeneMark, BLAST, and FASTA. The full length polynucleotide sequences were translated to derive the corresponding full length polypeptide sequences. Alternatively, a polypeptide of the invention may begin at any of the methionine residues of the full length translated polypeptide. Full length polypeptide sequences were subsequently analyzed by querying against databases such as the GenBank protein databases (genpept), SwissProt, the PROTEOME databases, BLOCKS, PRINTS, DOMO, PRODOM, Prosite, hidden Markov model (HMM)-based protein family databases such as PFAM, INCY, and TIGRFAM; and HMM-based protein domain databases such as SMART. Full length polynucleotide sequences are also analyzed using MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Polynucleotide and polypeptide sequence alignments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the MEGALIGN multisequence alignment program (DNASTAR), which also calculates the percent identity between aligned sequences. Table 7 summarizes the tools, programs, and algorithms used for the analysis and assembly of
Incyte cDNA and full length sequences and provides applicable descriptions, references, and threshold parameters. The first column of Table 7 shows the tools, programs, and algorithms used, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score or the lower the probability value, the greater the identity between two sequences).
The programs described above for the assembly and analysis of full length polynucleotide and polypeptide sequences were also used to identify polynucleotide sequence fragments from SEQ ED NO:29-56. Fragments from about 20 to about 4000 nucleotides which are useful in hybridization and amplification technologies are described in Table 4, column 2.
IV. Identification and Editing of Coding Sequences from Genomic DNA
Putative cytoskeleton-associated proteins were initially identified by running the Genscan gene identification program against public genomic sequence databases (e.g., gbpri and gbhtg).
Genscan is a general-purpose gene identification program which analyzes genomic DNA sequences from a variety of organisms (See Burge, C. and S. Karlin (1997) J. Mol. Biol. 268:78-94, and Burge, C. and S. Karlin (1998) Curr. Opin. Struct. Biol. 8:346-354). The program concatenates predicted exons to form an assembled cDNA sequence extending from a methionine to a stop codon. The output of Genscan is a FASTA database of polynucleotide and polypeptide sequences. The maximum range of sequence for Genscan to analyze at once was set to 30 kb. To determine which of these Genscan predicted cDNA sequences encode cytoskeleton-associated proteins, the encoded polypeptides were analyzed by querying against PFAM models for cytoskeleton-associated proteins. Potential cytoskeleton-associated proteins were also identified by homology to Incyte cDNA sequences that had been annotated as cytoskeleton-associated proteins. These selected Genscan- predicted sequences were then compared by BLAST analysis to the genpept and gbpri public databases. Where necessary, the Genscan-predicted sequences were then edited by comparison to the top BLAST hit from genpept to correct errors in the sequence predicted by Genscan, such as extra or omitted exons. BLAST analysis was also used to find any Incyte cDNA or public cDNA coverage of the Genscan-predicted sequences, thus providing evidence for transcription. When Incyte cDNA coverage was available, this information was used to correct or confirm the Genscan predicted sequence. Full length polynucleotide sequences were obtained by assembling Genscan-predicted coding sequences with Incyte cDNA sequences and or public cDNA sequences using the assembly process described in Example JJJ. Alternatively, full length polynucleotide sequences were derived entirely from edited or unedited Genscan-predicted coding sequences.
V. Assembly of Genomic Sequence Data with cDNA Sequence Data "Stitched" Sequences
Partial cDNA sequences were extended with exons predicted by the Genscan gene identification program described in Example IV. Partial cDNAs assembled as described in Example ID were mapped to genomic DNA and parsed into clusters containing related cDNAs and Genscan exon predictions from one or more genomic sequences. Each cluster was analyzed using an algorithm based on graph theory and dynamic programming to integrate cDNA and genomic information, generating possible splice variants that were subsequently confirmed, edited, or extended to create a full length sequence. Sequence intervals in which the entire length of the interval was present on more than one sequence in the cluster were identified, and intervals thus identified were considered to be equivalent by transitivity. For example, if an interval was present on a cDNA and two genomic sequences, then all three intervals were considered to be equivalent. This process allows unrelated but consecutive genomic sequences to be brought together, bridged by cDNA sequence. Intervals thus identified were then "stitched" together by the stitching algorithm in the order that they appear along their parent sequences to generate the longest possible sequence, as well as sequence variants. Linkages between intervals which proceed along one type of parent sequence (cDNA to cDNA or genomic sequence to genomic sequence) were given preference over linkages which change parent type (cDNA to genomic sequence). The resultant stitched sequences were translated and compared by BLAST analysis to the genpept and gbpri public databases. Incorrect exons predicted by Genscan were corrected by comparison to the top BLAST hit from genpept. Sequences were further extended with additional cDNA sequences, or by inspection of genomic DNA, when necessary. "Stretched" Sequences
Partial DNA sequences were extended to full length with an algorithm based on BLAST analysis. First, partial cDNAs assembled as described in Example UI were queried against public databases such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases using the BLAST program. The nearest GenBank protein homolog was then compared by BLAST analysis to either Incyte cDNA sequences or GenScan exon predicted sequences described in Example IV. A chimeric protein was generated by using the resultant high-scoring segment pairs (HSPs) to map the translated sequences onto the GenBank protein homolog. Insertions or deletions may occur in the chimeric protein with respect to the original GenBank protein homolog. The GenBank protein homolog, the chimeric protein, or both were used as probes to search for homologous genomic sequences from the public human genome databases. Partial DNA sequences were therefore "stretched" or extended by the addition of homologous genomic sequences. The resultant stretched sequences were examined to determine whether it contained a complete gene. VI. Chromosomal Mapping of CSAP Encoding Polynucleotides
The sequences which were used to assemble SEQ ID NO:29-56 were compared with sequences from the Incyte LDFESEQ database and public domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that matched SEQ ED NO:29-56 were assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as Phrap (Table 7). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon were used to determine if any of the clustered sequences had been previously mapped. Inclusion of a mapped sequence in a cluster resulted in the assignment of all sequences of that cluster, including its particular SEQ ED NO:, to that map location.
Map locations are represented by ranges, or intervals, of human chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p- arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) The cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters. Human genome maps and other resources available to the public, such as the NCBI "GeneMap'99" World Wide Web site (http://www.ncbi.nlm.nih.gov/genemap/), can be employed to determine if previously identified disease genes map within or in proximity to the intervals indicated above. VII. Analysis of Polynucleotide Expression
Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel (1995) supra, ch. 4 and 16.)
Analogous computer techniques applying BLAST were used to search for identical or related molecules in cDNA databases such as GenBank or LDFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations. Ixi addition, the sensitivity of the computer search can be modified to determine whether any particular match is categorized as exact or similar. The basis of the search is the product score, which is defined as:
BLAST Score x Percent Identity
5 x minimum {length(Seq. 1), length(Seq. 2)}
The product score takes into account both the degree of similarity between two sequences and the length of the sequence match. The product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The product score represents a balance between fractional overlap and quality in a BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap. Alternatively, polynucleotide sequences encoding CSAP are analyzed with respect to the tissue sources from which they were derived. For example, some full length sequences are assembled, at least in part, with overlapping Incyte cDNA sequences (see Example DT). Each cDNA sequence is derived from a cDNA library constructed from a human tissue. Each human tissue is classified into one of the following organ/tissue categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. The number of libraries in each category is counted and divided by the total number of libraries across all categories. Similarly, each human tissue is classified into one of the following disease/condition categories: cancer, cell line, developmental, inflammation, neurological, trauma, cardiovascular, pooled, and other, and the number of libraries in each category is counted and divided by the total number of libraries across all categories. The resulting percentages reflect the tissue- and disease-specific expression of cDNA encoding CSAP. cDNA sequences and cDNA library/tissue information are found in the LDFESEQ GOLD database (Incyte Genomics, Palo Alto CA). VIII. Extension of CSAP Encoding Polynucleotides
Full length polynucleotide sequences were also produced by extension of an appropriate fragment of the full length molecule using oligonucleotide primers designed from this fragment. One primer was synthesized to initiate 5' extension of the known fragment, and the other primer was synthesized to initiate 3' extension of the known fragment. The initial primers were designed using OLIGO 4.06 software (National Biosciences), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations was avoided.
Selected human cDNA libraries were used to extend the sequence. If more than one extension was necessary or desired, additional or nested sets of primers were designed.
High fidelity amplification was obtained by PCR using methods well known in the art. PCR was performed in 96-well plates using the PTC-200 thermal cycler (MJ Research, Inc.). The reaction mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg2+, (NH4)2S04, and 2-mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C. In the alternative, the parameters for primer pair T7 and SK+ were as follows: Step 1: 94 °C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C.
The concentration of DNA in each well was determined by dispensing 100 μl PICOGREEN quantitation reagent (0.25% (v/v) PICOGREEN; Molecular Probes, Eugene OR) dissolved in IX TE and 0.5 μl of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Costar, Acton MA), allowing the DNA to bind to the reagent. The plate was scanned in a Fluoroskan π (Labsystems Oy, Helsinki, Finland) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixture was analyzed by electrophoresis on a 1 % agarose gel to determine which reactions were successful in extending the sequence. The extended nucleotides were desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides were separated on low concentration (0.6 to 0.8%) agarose gels, fragments were excised, and agar digested with Agar ACE (Promega). Extended clones were religated using T4 ligase (New England Biolabs, Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells were selected on antibiotic-containing media, and individual colonies were picked and cultured overnight at 37 °C in 384-well plates in LB/2x carb liquid media. The cells were lysed, and DNA was amplified by PCR using Taq DNA polymerase
(Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA was quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reamplified using the same conditions as described above. Samples were diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (Applied Biosystems). In like manner, full length polynucleotide sequences are verified using the above procedure or are used to obtain 5 'regulatory sequences using the above procedure along with oligonucleotides designed for such extension, and an appropriate genomic library.
IX. Identification of Single Nucleotide Polymorphisms in CSAP Encoding Polynucleotides Common DNA sequence variants known as single nucleotide polymorphisms (SNPs) were identified in SEQ ED NO:29-56 using the LDFESEQ database (Incyte Genomics). Sequences from the same gene were clustered together and assembled as described in Example III, allowing the identification of all sequence variants in the gene. An algorithm consisting of a series of filters was used to distinguish SNPs from other sequence variants. Preliminary filters removed the majority of basecall errors by requiring a minimum Phred quality score of 15, and removed sequence alignment errors and errors resulting from improper trimming of vector sequences, chimeras, and splice variants. An automated procedure of advanced chromosome analysis analysed the original chromatogram files in the vicinity of the putative SNP. Clone error filters used statistically generated algorithms to identify errors introduced during laboratory processing, such as those caused by reverse transcriptase, polymerase, or somatic mutation. Clustering error filters used statistically generated algorithms to identify errors resulting from clustering of close homologs or pseudogenes, or due to contamination by non-human sequences. A final set of filters removed duplicates and SNPs found in immunoglobulins or T-cell receptors.
Certain SNPs were selected for further characterization by mass spectrometry using the high throughput MASSARRAY system (Sequenom, Inc.) to analyze allele frequencies at the SNP sites in four different human populations. The Caucasian population comprised 92 individuals (46 male, 46 female), including 83 from Utah, four French, three Venezualan, and two Amish individuals. The African population comprised 194 individuals (97 male, 97 female), all African Americans. The Hispanic population comprised 324 individuals (162 male, 162 female), all Mexican Hispanic. The Asian population comprised 126 individuals (64 male, 62 female) with a reported parental breakdown of 43% Chinese, 31% Japanese, 13% Korean, 5% Vietnamese, and 8% other Asian. Allele frequencies were first analyzed in the Caucasian population; in some cases those SNPs which showed no allelic variance in this population were not further tested in the other three populations.
X. Labeling and Use of Individual Hybridization Probes
Hybridization probes derived from SEQ DD NO:29-56 are employed to screen cDNAs, genomic DNAs, or mRNAs. Although the labeling of oligonucleotides, consisting of about 20 base pairs, is specifically described, essentially the same procedure is used with larger nucleotide fragments. Oligonucleotides are designed using state-of-the-art software such as OLIGO 4.06 software (National Biosciences) and labeled by combining 50 pmol of each oligomer, 250 μCi of [γ-32P] adenosine triphosphate (Amersham Pharmacia Biotech), and T4 polynucleotide kinase (DuPont NEN, Boston MA). The labeled oligonucleotides are substantially purified using a SEPHADEX G-25 superfine size exclusion dextran bead column (Amersham Pharmacia Biotech). An aliquot containing 107 counts per minute of the labeled probe is used in a typical membrane-based hybridization analysis of human genomic DNA digested with one of the following endonucleases: Ase I, Bgl B, Eco RI, Pst I, Xba I, or Pvu II (DuPont NEN).
The DNA from each digest is fractionated on a 0.7% agarose gel and transferred to nylon membranes (Nytran Plus, Schleicher & Schuell, Durham NH). Hybridization is carried out for 16 hours at 40°C. To remove nonspecific signals, blots are sequentially washed at room temperature under conditions of up to, for example, 0.1 x saline sodium citrate and 0.5% sodium dodecyl sulfate. Hybridization patterns are visualized using autoradiography or an alternative imaging means and compared. XI. Microarrays
The linkage or synthesis of array elements upon a microarray can be achieved utilizing photolithography, piezoelectric printing (ink-jet printing, See, e.g., Baldeschweiler, supra.), mechanical microspotting technologies, and derivatives thereof. The substrate in each of the aforementioned technologies should be uniform and solid with a non-porous surface (Schena (1999), supra). Suggested substrates include silicon, silica, glass slides, glass chips, and silicon wafers. Alternatively, a procedure analogous to a dot or slot blot may also be used to arrange and link elements to the surface of a substrate using thermal, UV, chemical, or mechanical bonding procedures. A typical array may be produced using available methods and machines well known to those of ordinary skill in the art and may contain any appropriate number of elements. (See, e.g., Schena, M. et al. (1995) Science 270:467-470; Shalon, D. et al. (1996) Genome Res. 6:639-645; Marshall, A. and J. Hodgson (1998) Nat. Biotechnol. 16:27-31.)
Full length cDNAs, Expressed Sequence Tags (ESTs), or fragments or oligomers thereof may comprise the elements of the microarray. Fragments or oligomers suitable for hybridization can be selected using software well known in the art such as LASERGENE software (DNASTAR). The array elements are hybridized with polynucleotides in a biological sample. The polynucleotides in the biological sample are conjugated to a fluorescent label or other molecular tag for ease of detection. After hybridization, nonhybridized nucleotides from the biological sample are removed, and a fluorescence scanner is used to detect hybridization at each array element. Alternatively, laser desorbtion and mass spectrometry may be used for detection of hybridization. The degree of complementarity and the relative abundance of each polynucleotide which hybridizes to an element on the microarray may be assessed. Ixi one embodiment, microarray preparation and usage is described in detail below.
Tissue or Cell Sample Preparation
Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and poly(A)+ RNA is purified using the oligo-(dT) cellulose method. Each poly (A) + RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/μl oligo-(dT) primer (21mer), IX first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μM dGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription reaction is performed in a 25 ml volume containing 200 ng poly(A) + RNA with GEMBRIGHT kits (Incyte). Specific control poly(A)+ RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Samples are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc. (CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The sample is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 μl 5X SSC/0.2% SDS.
For example, nonmalignant primary mammary epithelial cells and breast carcinoma cell lines are grown to 70-80% confluence prior to harvest. Gene expression profiles of nonmalignant primary mammary epithelial cells are compared to those of breast carcinoma cell lines at different stages of tumor progression.
Microarray Preparation
Sequences of the present invention are used to generate array elements. Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts. PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert. Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 μg. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
Purified array elements are immobilized on polymer-coated glass slides. Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Corporation (VWR), West Chester PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven. Array elements are applied to the coated glass substrate using a procedure described in U.S.
Patent No. 5,807,522, incorporated herein by reference. 1 μl of the array element DNA, at an average concentration of 100 ng/μl, is loaded into the open capillary printing element by a high-speed robotic apparatus. The apparatus then deposits about 5 nl of array element sample per slide.
Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford MA) for 30 minutes at 60° C followed by washes in- 0.2% SDS and distilled water as before. Hybridization Hybridization reactions contain 9 μl of sample mixture consisting of 0.2 μg each of Cy3 and
Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The sample mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm2 coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 μl of 5X SSC in a corner of the chamber. The chamber containing the arrays is incubated for about 6.5 hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45°C in a second wash buffer (0.1X SSC), and dried. Detection Reporter-labeled hybridization complexes are detected with a microscope equipped with an
Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY). The slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate filters positioned between the array and the photomultiplier tubes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously. The sensitivity of the scans is typically calibrated using the signal intensity generated by a cDNA control species added to the sample mixture at a known concentration. A specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1:100,000. When two samples from different sources (e.g., representing test and control cells), each labeled with a different fluorophore, are hybridized to a single array for the purpose of identifying genes that are differentially expressed, the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A/D) conversion board (Analog Devices, Inc., Norwood MA) installed in an IBM-compatible PC computer. The digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore' s emission spectrum. A grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid. The fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal. The software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
For example, component 5504134JHGG3 of SEQ ED NO: 31 and component 5504134JHGG3 of SEQ ED NO: 33 showed differential expression in nonmalignant primary mammary epithelial cells versus breast carcinoma cell lines at different stages of tumor progression, as determined by microarray analysis. The expression of component 5504134JHGG3 was altered by at least a factor of 2 in breast carcinoma cell lines. Therefore, SEQ ED NO:31 and SEQ DD NO:33 are useful in diagnostic assays for cell proliferative disorders. For example, SEQ ED NO:50 showed differential expression in human lung adenocarcinoma and squamous cell carcinoma versus normal lung tissue as determined by microarray analysis. Matched normal and tumorigenic lung tissue samples were provided by the Roy Castle Lung Cancer Foundation, Liverpool, UK. The expression of SEQ DD NO:50 was decreased in lung tumor tissue at least two-fold over normal lung tissue from the same donor. Therefore, SEQ DD NO:50 is useful in diagnostic assays for lung adenocarcinoma and squamous cell carcinoma.
XII. Complementary Polynucleotides
Sequences complementary to the CSAP-encoding sequences, or any parts thereof, are used to detect, decrease, or inhibit expression of naturally occurring CSAP. Although use of oligonucleotides comprising from about 15 to 30 base pairs is described, essentially the same procedure is used with smaller or with larger sequence fragments. Appropriate oligonucleotides are designed using OLIGO 4.06 software (National Biosciences) and the coding sequence of CSAP. To inhibit transcription, a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent promoter binding to the coding sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding to the CSAP-encoding transcript.
XIII. Expression of CSAP
Expression and purification of CSAP is achieved using bacterial or virus-based expression systems. For expression of CSAP in bacteria, cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3). Antibiotic resistant bacteria express CSAP upon induction with isopropyl beta-D- thiogalactopyranoside (IPTG). Expression of CSAP in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding CSAP by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See Engelhard, E.K. et al. (1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945.)
In most expression systems, CSAP is synthesized as a fusion protein with, e.g., glutathione S- transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- kilodalton enzyme from Schistosoma iaponicum, enables the purification of fusion proteins on immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from CSAP at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman Kodak). 6-His, a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra, ch. 10 and 16). Purified CSAP obtained by these methods can be used directly in the assays shown in Examples XVH and XVfll, where applicable. XIV. Functional Assays
CSAP function is assessed by expressing the sequences encoding CSAP at physiologically elevated levels in mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice include PCMV SPORT (Life Technologies) and PCR3.1 (Invitrogen, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 μg of recombinant vector are transiently transfected into a human cell line, for example, an endothelial or hematopoietic cell line, using either liposome formulations or electroporation. 1-2 μg of an additional plasmid containing sequences encoding a marker protein are co-transfected. Expression of a marker protem provides a means to distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; Clontech), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), an automated, laser optics- based technique, is used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of the cells and other cellular properties. FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the cell surface. Methods in flow cytometry are discussed in Ormerod, M.G. (1994) Flow Cytometry, Oxford, New York NY.
The influence of CSAP on gene expression can be assessed using highly purified populations of cells transfected with sequences encoding CSAP and either CD64 or CD64-GFP. CD64 and
CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding CSAP and other genes of interest can be analyzed by northern analysis or microarray techniques.
XV. Production of CSAP Specific Antibodies
CSAP substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize animals (e.g., rabbits, mice, etc.) and to produce antibodies using standard protocols. Alternatively, the CSAP amino acid sequence is analyzed using LASERGENE software (DNASTAR) to determine regions of high immunogenicity, and a corresponding oligopeptide is synthesized and used to raise antibodies by means known to those of skill in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra, ch. 11.)
Typically, oligopeptides of about 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (Applied Biosystems) using FMOC chemistry and coupled to KLH (Sigma- Aldrich, St. Louis MO) by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel, 1995, supra.) Rabbits are immunized with the oligopeptide-KLH complex in complete Freund's adjuvant. Resulting antisera are tested for antipeptide and anti-CSAP activity by, for example, binding the peptide or CSAP to a substrate, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio-iodinated goat anti-rabbit IgG. XVI. Purification of Naturally Occurring CSAP Using Specific Antibodies
Naturally occurring or recombinant CSAP is substantially purified by immunoaffinity chromatography using antibodies specific for CSAP. An immunoaffinity column is constructed by covalently coupling anti-CSAP antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions.
Media containing CSAP are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of CSAP (e.g., high ionic strength buffers in the presence of detergent). The column is eluted under conditions that disrupt antibody/CSAP binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and CSAP is collected.
XVII. Identification of Molecules Which Interact with CSAP
CSAP, or biologically active fragments thereof, are labeled with 125I Bolton-Hunter reagent. (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled CSAP, washed, and any wells with labeled CSAP complex are assayed. Data obtained using different concentrations of CSAP are used to calculate values for the number, affinity, and association of CSAP with the candidate molecules.
Alternatively, molecules interacting with CSAP are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially available kits based on the two-hybrid system, such as the MATCHMAKER system (Clontech).
CSAP may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
XVIII. Demonstration of CSAP Activity
A microtubule motility assay for CSAP measures motor protein activity. In this assay, recombinant CSAP is immobilized onto a glass slide or similar substrate. Taxol-stabilized bovine brain microtubules (commercially available) in a solution containing ATP and cytosolic extract are perfused onto the slide. Movement of microtubules as driven by CSAP motor activity can be visualized and quantified using video-enhanced light microscopy and image analysis techniques. CSAP activity is directly proportional to the frequency and velocity of microtubule movement.
Alternatively, an assay for CSAP measures the formation of protein filaments in vitro. A solution of CSAP at a concentration greater than the "critical concentration" for polymer assembly is applied to carbon-coated grids. Appropriate nucleation sites may be supplied in the solution. The grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electron microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 nm (intermediate filaments) is a demonstration of CSAP activity.
In another alternative, CSAP activity is measured by the binding of CSAP to protein filaments. 35S-Met labeled CSAP sample is incubated with the appropriate filament protein (actin, tubulin, or intermediate filament protein) and complexed protein is collected by immunoprecipitation using an antibody against the filament protein. The immunoprecipitate is then run out on SDS-PAGE and the amount of CSAP bound is measured by autoradiography.
Various modifications and variations of the described methods and systems of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with certain embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology or related fields are intended to be within the scope of the following claims.
Table 1
Table 2
o
Table 2 (cont.)
Table 3
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
SEQ Incyte Amino Potential Potential Signature Sequences, Analytical
ID Polypeptide Acid Phosphorylation GlycosylaDomains and Motifs Methods and
NO: ID Residues Sites tion Sites Databases
6 PRECURSOR SIGNAL ADHESION CELL BLAST_PRODOM cont GLYCOPROTEIN IMMUNOGLOBULIN FOLD REPEAT MOLECULE NEURAL
PD003129: N122-L231
PRECURSOR SIGNAL CONTACTIN CELL ADHESION BLAST_PRODOM NEUROFASCIN GLYCOPROTEIN GP135 IMMUNOGLOBULIN FOLD PD001890: L732-A844
CELL ADHESION PRECURSOR SIGNAL MOLECULE BLAST PRODOM IMMUNOGLOBULIN GLYCOPROTEIN TRANSMEMBRANE REPEAT FOLD PD003273: I1156-S1258
NEURONALGLIAL CELL ADHESION MOLECULE BLAST_PRODOM PRECURSOR NGCAM IMMUNOGLOBULIN FOLD GLYCOPROTEIN SIGNAL PD155119: D646-A742
NEURAL CELL ADHESION MOLECULE Ll BLAST_DOMO
DM02463 I S2618011027-1247 : Q1029-K1243
IMMUNOGLOBULIN BLAST DOMO
DM00001|S26180 |352-436: K351-A436 DM00001 j S26180 j 45-129: T44-S129 DM00001 S261801452-535: S451-V535
Cell attachment sequence MOTIFS R931-D933
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
© i
Table 3 (cont.)
Table 3 (cont.)
o -J
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 3 (cont.)
Table 4
I Polynucleotide Sequence Fragments
SEQ ID NO:/
Incyte ID/
Sequence Length
29/ 1-538, 58- 350, 58-548 , 58-632, 58-697, 58-747, 355-1020, 623-1290 , 782-1277, 944-1612,
6582721CB1/1685 965-1593, 1165-1612, . L165-1683, L165-1684, L225-1671, 1229-1678, 1245-1670, 1316-1685
30/ 1-29^ ', 12- 484, 15-318 , 26-459, 35-326, 167-< 173, 184-714, 187-599, 438-927, 458-927, 549-
2828941CB1/ 839, 550-805, 550-1063, 624-1137 661-1136, 826-1136, 344-1136, 905-1522, 937-1136, 976-
3147 1244, 1038 -1705, 1074- -1722, 1082- -1366, 1174- -1465, 1350 -1456, 1375 -1572, 1375 -1616, 1375-
1946, 1375 -1975, 1415- -2061, 1445- -1864, 1551- -1749, 1675 -1889, 1675 -2246, 1761 -2071, 1763-
2212, 1795 -2090, 1914- -2105, 1914- -2135, 1948 -2582, 1949 -2215, 1949 -2381, 1958 -2017, 1991-
2262, 1991 -2497, 2007- -2304, 2032- -2327, 2033- -2361, 2111 -2670, 2149 -2343, 2149 -2677, 2168-
2459, 2175 -2450, 2185- -2484, 2185- -2504, 2267- -2800, 2286 -2531, 2297 -2620, 2309 -2541, 2309-
2543, 2310 -3021, 2311- -2411, 2336- -2845, 2399- -2643, 2409 -2648, 2410 -2504, 2415 -2504, 2417-
2845, 2424 -2680, 2435- -2786, 2448- -2845, 2487- -3147, 2505 -2665, 2505 -2802, 2505 -2807, 2505-
2818, 2505 -2845, 2505 -3046, 2521- -2845, 2596 -2730, 2596 -2845, 2605 -2845, 2629 -2845, 2639-
2845, 2640 -2845, 2646- -2845, 2661- -2845, 2672- -2845, 2680 -2845, 2682 -2845, 2690 -2845, 2719-
2845, 2723 -2845, 2727- -2845, 2734 -2845, 2742 -2845, 2753 -2845, 2846 -3147, 2918 -3147, 2957- 3147
31/ 1-639, 99- 785, 212-799, 365-616, 366-985, 372-2815, 401-635, 411- 552, 414-922, 421-971,
6260407CB1/ 433-726, 433-727, 433 -967, 495-772, 496-604 503-748, 534-786, 534-989, 545- 1129, 603-
5322 858, 776-1214, 776-1301, 838-1072, 838-1287 838-1326, 870-1111, 396-1125, 906-948, 915-
1323, 915- 1327, 915-1337, 943-985, 946-1004 975-1107, 1082-1360, 1156-1899, 1161-1374,
1161- -1671, 1167-1397, 1270-1521, 1270-1804, 1276-1739, 1344-1403, 1454-1971, 1456-1955,
1469- -1954, 1475-1653, 1475-1768, 1475-1905, 1475-1997, 1501-1970, 1536-1796, 1543-1980,
1544- -1962, 1580-1917, 1587-1974, 1611-1883, 1691-1969, 1703-1740, 1706-1939, 1747-1800,
1785- -1985, 1809-1837, 1879-2250, 1879-2387, 1925-1999, 2008-2641, 2024-2271, 2028-2055,
2032- -2576, 2052-2465, 2058-2460, 2074-2309, 2074-2604, 2080-2663, 2098-2663, 2106-2344,
2106- -2359, 2108-2706, 2129-2436, 2173-2369, 2401-2677, 2413-2707, 2437-2813, 2627-3236,
2677- -3251, 2753-3220, 2779-2994, 2779-3262, 2779-3300, 2779-3341, 2779-3421, 2779-3425,
2837- -3550, 2839-3315, 2839-3446, 2881-3152, 2909-3494, 2938-3616, 2948-3616, 2960-3616,
2965- -3616, 2968-3616, 3018-3616, 3026-3616, 3058-3616, 3071-3616, 3081-3616, 3082-3616,
3088- -3616, 3093-3616, 3105-3613, 3113^3616, 3114-3616, 3117-3531, 3118-3616, 3151-3616,
3155- -3616, 3163-3616, 3169-3796, 3218-3559, 3219-3531, 3226-3616, 3233-3616, 3238-3616,
Table 4 (cont.)
Polynucleotide Sequence Fragments
SEQ ID NO:/
Incyte ID/
Sequence Length
31/ 3245 3539, 3248 3542, 3258 -3531, 3324-3646, 3341 3616, 3532- 3822, 3614- 4088, 3666- 3997,
6260407CB1/ 3668-3726, 3783 4105, 3876-4380, 4005-4555, 4105 4454, 4180-4555, 4203-4453, 4203-•4600, 5322 4205 4493, 4226 4864, 4305-4891, 4465-4888, 4465-4905, 4465-4923, 4465-4970, 4465-4979, (continued) 4465 5001, 4465 5002, 4465-5006, 4465-5007, 4465 5010, 4465- 5020, 4465- 5028, 4465-•5033,
4465-5042, 4465 5044, 4465-5046, 4465-5057, 4465 5068, 4465- 5075, 4465-5080, 4465-•5090,
4465 5096, 4465 5100, 4465-5122, 4467-4986, 4505 5086, 4517-5098, 4517-5165, 4526-5126,
4529 4967, 4548 5132, 4558-5227, 4560-5213, 4571-4850, 4572-5100, 4574-•5054, 4574-5062,
4574 5097, 4574-5128, 4574-5176, 4574-5191, 4574 5194, 4574- 5202, 4574-•5211, 4574-•5223,
4574 5225, 4574 5230, 4574-5235, 4574-5240, 4574-5255, 4574-5262, 4574-5270, 4574-■5282,
4574-5284, 4574 5322, 4577-5269, 4592-5231, 4602-4948, 4626-5269, 4631-•5177, 4640-•5269,
4643-5037, 4645 5269, 4652 -5111, 4655-4913, 4662 5255, 4665-•5252, 4668-5255, 4672-5269,
4677-5269, 4679 4959, 4685-5061, 4686-4853, 4690-5255, 4694-•5034, 4702-4943, 4702-5255,
L> 4704-4975, 4704 4976, 4705-4969, 4709-5243, 4720-5220, 4722-•5255, 4728- 5255, 4734-•5009,
4746 5269, 4748 5255, 4761-5241, 4764-5093, 4764 5255, 4787-5255, 4788-•5056, 4788-5255,
4790 5053, 4799 5150, 4814-5079, 4821-5165, 4832 5255, 4841-5255
32/ 1-116, 14-913, 518-930, 524-920, 525-930, 534-930, 563-930, 633-931, 673-930, 861-931 7488258CB1/931
33/ 1-763, 190- 777, 344-9 63 , 346-2965 392-900 , 399-949 , 411-704 , 411-705 , 411-945, 512-967,
7948948CB1/ 523-1107, 754-1192, 7 54-1279 , 816 -1265 , 816-1304, 893-1301, 893-1305 , c 93-1315, 1134-
5299 1877, 1139- 1649, 1248 1 1778822,, 1 1225544-- -1717, 1432- -1949, 1434- -1933, 1447- -1932 1453-1883, 1453- 1994, 1479- 1948, 1521 1958, 1522- -1940, 1558- -1895, 1565- -1952, 1787- -1815 1903-2515, 1975- 2580, 1991 2404, 1997 2399, 2013- -2543, 2019- -2602, 2037- -2602, 2047- -2645, 2068-2602, 2069-
2602, 2073 2602, 2077 2602, 2090- -2594, 2090- -2602, 2112- -2308, 2134- -2602 2142-2602, 2156- 2602, 2186 3915, 2244 2602, 2272- -2683, 2285- -2602, 2352- -2646, 2566- -3175 2616-3190, 2692- 3159, 2718 3201, 2718 3239, 2718- -3280, 2718- -3360, 2718- -3364, 2749- -2900, 2776-3489, 2778- 3254, 2778 3385, 2848 3433, 2877- -3555, 2887- -3555, 2899- -3555, 2904- -3555, 2907-3555, 2957- 3555, 2965- 3555, 2997 3555, 3010- -3555, 3020- -3555, 3021- -3555, 3027- -3555 3032-3555, 3044- 3552, 3052 3555, 3053 3555, 3056- -3470, 3057- -3555, 3090- -3555, 3094- -3555, 3102-3555, 3108-
3714, 3157- 3498, 3165 3555, 3172- -3555, 3177- -3555, 3263- -3644, 3280- -3555, 3471-3740, 3478- 3555, 3553 4006, 3794 4298, 3901- -4170, 3923- -4473, 4023- -4372, 4024- -4342 4098-4473,
Table 4 (cont.)
Polynucleotide Sequence Fragments SEQ ID NO:/ Incyte ID/
Sequence Length
33/ 4121- 4371 4121 4518 4123- 4411 4144- 4782 4223- •4809 4302- 4561, 4302- 4766 4359- 4602,
7948948CB1/ 4383 4621 4383 4665 4383-4667 4383-4712 4383-•4806 4383- 4823, 4383-•4841 4383-•4888, 5299 4383 4897 4383 4919 4383-4920 4383-4924 4383-•4925 4383-4928, 4383-4938 4383-•4946, (continued) 4383 4951 4383-4960 4383-4962 4383-4964 4383-•4975 4383- 4986, 4383-4993 4383-4998,
4383 5008 4383 5014 4383-5018 4383-5040 4385-•4904 4411-4665, 4423- 5004 4435-•5016,
4435 5083 4444 5044 4447- 4885 4466-5050 4476-•5145 4478-5131, 4489- 4768 4490-5018,
4492 4972 4492 4980 4492-5015 4492-•5046 4492-■5094 4492-5109, 4492- 5112 4492-5120,
4492 5129 4492 5139 4492- 5141 4492-•5152 4492-•5155 4492- 5160, 4492- 5169 4492-•5174,
4492 5177 4492 5189 4492-5201 4492-•5276 4495-•5250 4510-5149, 4520-4866 4544-•5234,
4549 5095 4558-5274 4561-4955 4563-5254 4570-5029 4573-4831, 4580-5191 4583-5196,
4586 5176 4590-5274 4595-5274 4597-4877 4603-■4979 4608-•5241, 4612-4952 4620- 5224,
4623 4887 4638 5138 4640-•5176 4646-5205 4652-•4927 4664-•5274, 4666-5276 4679-5160,
4682 5011 4682-5274 4705-5249 4706-•4974 4706-•5217 4708- 4971, 4717- 5068 4732-4997,
4739 5083 4750 5274 4752-•5274 4754-5299 4755-5274 4759-•5274, 4761-5256 4766-5083,
4767 5274 4768-5088 4773- 5274 4781-5274 4782-■5274 4787- 5274, 4789- 5059 4796-•5274,
4798 5274 4820 5079 4826-5274 4839-•5274 4869-5277 4879-•5274, 4880-5274 4885-5274,
4896-5274 4902 5274 4912-•5274 4924-5274 4927-•5148 4927-5273, 4992-5274 5001-5252,
5005 5274 5008 5274 5024-•5274 5041-•5274 5051-•5274 5054-5274, 5077-5274 5105-5274,
5107 5274 5108 5274 5115-•5261 5115-•5274 5172-■5274 5188-•5274, 5190-5274 5198-•5274,
5227-5274 5252 5274 5253-5274
34/ 1-703 , 30- 51, 30 55, 3 33-958, 36 7-505, 391-958, 392-871, 392-928, 422-958, 442-782, 472-
3467913CB1/ 981, 506-781, 834-1459 , 1045-1495, 1083-1736, 1235-1587, 1310-1957, 1327-1898, 1334-2165,
4080 1335-2165, 1336-2165, 1377-1956, 1378-1635, 1378-1819, 1395-2165, 1423-1958, 1430-2165,
1452 1977, 1508 2118, 1544-2187, 1693-2298, 1977-2330, 2002-2379, 2048-2505, 2048-2556,
2048-2592, 2051-2412, 2164-2702, 2289-2801, 2308-2727, 2400-2567, 2405-2576, 2587-3394,
2590-3148, 2624 3394, 2633-3394, 2651-3394, 2683-3394, 2716-3394, 2845-3394, 2849-3391,
2855 3380, 2913 4080, 3230-3414
Table 4 (cont.)
Polynucleotide Sequence Fragments SEQ ID NO:/ Incyte ID/
Sequence Length
35/ 1-703 , 31-. 51, 313-345 333- 958, 367-505, 390-4355 , 392- 871, 392-928, 3 93-958, 422 -958,
7495062CB1/ 442-782, 472-981 , 507- -781, 834-1412, 1045-1495, 1180-17 28, 1235-1587, 1329-1859, 1334-
4360 2165, 1335 -2165, 1336- -2165, 1364- -1957, 1378-1635, 1378- 1819, 1381-1903 1395-2165 1423-
1958, 1430- -2165, 1452- -1977, 1508- -2118, 1534-2157, 1544- 2136, 1544-2157 1544-2187 1544-
2218, 1564 -2101, 1586- -2111, 1639- -2157, 1640-2298, 1720- 2157, 1725-2157 1756-2157 1778-
2157, 1808- -2157, 1816- -2157, 1819- -2222, 1883-2033, 1919- 2157, 1929-2157 1956-2157 1977-
2330, 2002- -2379, 2014- -2199, 2030- -2157, 2048-2505, 2048- 2556, 2048-2592 2051-2128 2051-
2412, 2053- -2165, 2103- -2702, 2201- -2360, 2201-2482, 2213- 2803, 2218-2803 2289-2868 2308-
2575, 2308- -2727, 2308- -2803, 2313- -2796, 2316-2796, 2318- 2568, 2318-2586 2318-2803 2320-
2803, 2331- -2803, 2337- -2803, 2400- -2567, 2405-2576, 2421- 2796, 2478-2755 2505-2803 2516-
2803, 2517- -2803, 2540- -2796, 2629- -2803, 2650-2803, 2662- 2803, 2665-2803 2712-2803 2717-
2796, 2734- -2796, 2774- -2796, 2845- -3394, 2849-3391, 2855- 3380, 3118-3167 3118-3171 3118-
3208, 3118- -3217, 3118- -3248, 3118- -3260, 3118-3278, 3118- 3337, 3118-3347 3118-3373 3118-
3384, 3118- -3414, 3118- -3416, 3118- -3580, 3118-3609, 3118- 3646, 3121-3652 3124-3599 3125-
3260, 3125- -3295, 3161- -3381, 3230- -3414, 3245-3325, 3245- 3652, 3293-3513 3299-3652 3309-
3652, 3316- -3421, 3331- -3652, 3335- -3421, 3335-3643, 3413- 3466, 3416-3652 3439-3652 3541-
3652, 3553- -3652, 3604- -4034, 3604- -4042, 3630-3652, 3840- 4360, 3841-4360 3921-4040 3921-
4064, 3921- -4157, 3921- -4216, 3921- -4231, 3921-4239, 3921- 4245, 3921-4290 3921-4360 3942-
4360, 3989- -4360, 4004- -4325, 4008- -4360, 4010-4312
36/ 1-636, 133 -759, 156-610, 52 6-1419 , 745-1168, 906-1103, 906-1530, 974-1242, 974-1532,
284191CB1/ 1011-1265, 1106 1419, 1167- 1363, 1167-1368, 1167-1602, 1184-1772, 1208-1508, 1218-1737,
2434 1267-1663, 1322-1774, 1336 1778, 1406-1773, 1417-1594, 1417-1764, 1417-1772, 1417-1777, 1564-1778, 1602-2237, 1711- 2029, 1711-2273, 1721-2244, 1758-2306, 1876-2421, 1929-2434, 1939-2178, 1939-2234, 1939 2414, 1939-2426, 1956-2430, 2019-2434, 2027-2310, 2049-2321, 2090-2430, 2098 2430, 2101- 2430
37/ 1-619, 13-618, 21-459, 49-338, 65-644, 68-416, 78-594, 102-606, 273-470, 316-470, 323-
2361681CB1/ 906, 429-915, 429-990, 429-1087, 450-906, 464-884, 464-1003, 465-1062, 593-725, 645-890,
2688 660-1328, 706-1328, 720-910, 722-905, 756-1328, 904-1475, 905-1582, 954-1491, 1022-1570, 1068-1248, 1129-1625, 1129-1769, 1161-1391, 1203-1716, 1270-1985, 1276-1826, 1514-1725, 1565-2127, 1661-2133, 1667-2243, 1671-2269, 1719-1825, 1723-2407, 1729-1977, 1741-2129,
Table 4 (cont.)
Polynucleotide Sequence Fragments
SEQ ID NO:/
Incyte ID/
Sequence Length
37/ 1742-1983, 1742-2220, 1770-2258, 1772-2138, 1787-2380, 1790-2097, 1798-2281, 1808-2313,
2361681CB1/ 1812-2430, 1830-2401, 1842-2228, 1851-2418, 1867-2402, 1871-2300, 1892-2152, 1905-2163,
2688 1905-2181, 1925-2410, 1930-2236, 1930-2403, 1939-2298, 1945-2594, 1971-2425, 1975-2425,
(continued) 1988-2627, 1989-2437, 1991-2443, 1993-2432, 2024-2647, 2034-2422, 2034-2437, 2034-2667,
2038-2654, 2042-2551, 2055-2564, 2057-2369, 2057-2438, 2058-2644, 2063-2646, 2069-2392,
2069-2401, 2074-2438, 2079-2651, 2092-2649, 2111-2444, 2124-2437, 2130-2688, 2150-2434,
2153-2424, 2157-2404, 2178-2422, 2202-2664, 2206-2467, 2208-2657, 2214-2513, 2235-2414,
2251-2495, 2251-2617, 2251-2657, 2290-2435, 2290-2442, 2290-2443, 2297-2443, 2302-2437,
2321-2425, 2321-2437, 2349-2657
38/ 1-764, 40- 596, 40-625 , 40-647, 106-836, 217 -704, 281-863, 442-869 , 491-991, 491-1063,
1 1683662CB1/ 568-1356, 670-1214, 724-942, 736 -1429, 744- 1429, 771-1276, 881-1108, 884-1106, 909-1465, t—' 1 4264 981-1610, 983-1257, 1112-1600, 1143-1655, 1173-1752, 1176-1752, 1206-1786, 1206-1911,
! 1239-1680, 1288-1554, 1312-1584, 1312-1896, 1377-2036, 1383-1711, 1463-1494, 1661-2055,
' 1736-2307, 1755-2220, 1759-2364, 1761-2219, 1771-2219, 1778-2362, 1781-2250, 1790-2305,
1813-2423, 1820-2393, 1875-2348, 1897-2534, 1909-2157, 1942-2204, 1942-2396, 1944-2333,
1945-2110, 1948-2304, 1951-2537, 1963-2366, 1975-2549, 1976-2616, 2015-2427, 2017-2541,
2080-2611, 2186-2782, 2201-2830, 2213-2861, 2220-2757, 2220-2788, 2275-2720, 2291-2861,
2296-2797, 2429-3038, 2431-2648, 2440-2941, 2456-2902, 2458-3057, 2469-2748, 2542-2792,
2542-2939, 2542-3104, 2569-2801, 2594-2866, 2666-2890, 2688-2986, 2730-3050, 2865-3127,
2873-3168, 2880-3139, 2900-3142, 2900-3151, 2943-3232, 3043-3242, 3043-3273, 3043-3631,
3076-3746, 3109-3350, 3109-3615, 3118-3433, 3131-3412, 3177-3450, 3191-3464, 3220-3494,
3321-3541, 3328-3644, 3388-3633, 3388-3726, 3388-3760, 3394-3666, 3403-3625, 3404-3675,
3457-3741, 3458-3673, 3458-3675, 3458-3738, 3458-4011, 3478-3688, 3478-4088, 3486-3798,
3486-4014, 3491-3691, 3491-3748, 3526-3750, 3526-4204, 3574-4226, 3603-4240, 3610-3858,
3611-4011, 3616-4241, 3617-4236, 3619-4241, 3655-4211, 3675-4233, 3681-4019, 3682-3924,
3723-3965, 3733-4236, 3748-3996, 3753-4229, 3782-4205, 3800-4031, 3816-4057, 3821-4077,
3825-4018, 3829-4262, 3872-4082, 3986-4212, 3986-4239, 3986-4262, 4004-4264, 4037-4263 |
Table 4 (cont.)
Table 4 (cont.)
Polynucleotide Sequence Fragments
SEQ ID NO:/
Incyte ID/
Sequence Length
43/ 1-220 , 38-726, 40-352, 65- 553, 67-604, 68-669, 72 -665, 80-879, 96-901, 101-2629, 114-731,
56022622CB1/ 118-438, 134-657 , 135-783, 149-560, 162-725, 164- 760, 170-932, 174-674, 175-543, 176-832,
2629 218-803, 219-511 , 230-390, 230-416, 268-705, 276- 481, 313-556, 316-607, 354-404, 357-643,
389-707, 408-661 , 415-666, 460-1006, 600-1046, 1124-1376, 1135-1404, 1156-1439, 1161-
1434, 1164-1422, 1164-1679 1176-1748, 1180-1470, 1187-1440, 1187-1715, 1188-1850 1190-
1804, 1191-1440, 1205-1896 1213-1897, 1214-1457, 1214-1459, 1220-1496, 1242-1567 1257-
1546, 1282-1683, 1294-1548 1306-1601, 1306-2023, 1306-2051, 1315-1827, 1316-1615 1320-
1611, 1344-1612, 1344-1719 1344-1759, 1344-1959, 1344-1971, 1347-2111, 1348-1853 1349-
1891, 1351-1993, 1354-1919 1355-1595, 1364-1835, 1372-2014, 1378-1789, 1389-1663 1389-
1799, 1389-1866, 1389-2018 1389-2023, 1395-1779, 1409-1691, 1416-2023, 1418-1982 1430-
1685, 1461-1742, 1461-1983 1478-2013, 1504-2042, 1541-1808, 1541-1835, 1571-1849 1577-
1841, 1587-1861, 1587-1870 1587-1897, 1589-2071, 1593-1900, 1597-1877, 1613-1894 1613-
1909, 1616-1860, 1619-2227 1634-1925, 1649-1893, 1659-2257, 1680-1918, 1681-1983 1702-
1978, 1712-2242, 1723-1989 1723-2204, 1731-1976, 1754-2033, 1759-2005, 1766-2041 1770-
2242, 1770-2255, 1774-2200 1779-2024, 1779-2242, 1783-2009, 1821-2255, 1833-2242 1834-
2242, 1849-2242, 1850-2242 1850-2243, 1856-2240, 1863-2245, 1872-2232, 1878-2242 2026-
2618, 2059-2628, 2105-2626 2110-2597, 2114-2628, 2124-2591, 2126-2628, 2147-2212 2152-
2593, 2154-2628, 2159-2628 2176-2628, 2190-2628, 2206-2491, 2227-2523, 2229-2512 2239-
2497, 2239-2622, 2243-2506 2243-2626, 2244-2526, 2244-2616, 2249-2628, 2250-2508 2253-
2509, 2254-2617, 2257-2611 2258-2626, 2260-2492, 2260-2522, 2260-2532, 2261-2625 2262-
2536, 2264-2590, 2264-2625 2264-2626, 2264-2628, 2264-2629, 2266-2541, 2269-2628 2270-
2625, 2271-2628, 2272-2628 2273-2625, 2273-2628, 2275-2627, 2276-2628, 2277-2624 2277-
2628, 2279-2625, 2281-2628 2285-2626, 2292-2628, 2296-2625, 2298-2549, 2298-2616 2299-
2617, 2305-2625, 2314-2628 2325-2609, 2330-2626, 2331-2628, 2334-2625, 2336-2626 2362-
2628, 2368-2625
Table 4 (cont.)
Table 4 (cont.)
Table 4 (cont.)
Table 4 (cont.)
Table 4 (cont.)
Polynucleotide Sequence Fragments
SEQ ID NO:/
Incyte ID/
Sequence Length
54/ 4661-5139, 4685 5144, 4696-4954, 4706-5379, 4717-5170, 4718-5019, 4721-5130, 4726-5030,
2755454CB1/ 4735-5020, 4738-5268, 4739-5067, 4765-5067, 4771-4978, 4778-5102, 4784-4983, 4784-5359, 5633 (cont.) 4794-5080, 4794-5326, 4837-5118, 4846-5122, 4848-5633, 4853-5134
55/ 1-640 , 1-646, 1- •708, 2 -702, 3-616, 9-704, 41 -769, 51-8 74, 145-648 , 191-861, 265-678, 356-
5868348CB1/ 898, 359-992, 3c 5-1005 , 406-1001, 461-1186, 463-941, 529-1168, 540-1136, 541-1433, 542-
4587 1069, 543-1234, 543-4562, 544-769, 544-1031, 544-1056, 544-1069, 544-1076, 544-1077, 544-
1079, 544-1133, 544-1212, 552-1005, 554-1184 557-1162 , 560-1054, 561-781, 569-1143, 582-
1184, 693-1031, 729-1318, 824-1345, 865-1447 , 1018-1585, 1018-1654, 1029-1341, 1159-1439,
1159- 1945, 1163- 1814, 1194-1711, 1206-1865, 1280-1864, 1389-1978, 1393-1687, 1393-1694,
1397- 2063, 1458- 1736, 1458-1956, 1461-1835, 1478-2108, 1545-2093, 1548-2198, 1590-1856,
1640 2267, 1698- 2182, 1777-2350, 1817-2435, 1830-2037, 1830-2232, 1836-2668, 1853-2114,
1865 2129, 1867- 2298, 1893-2154, 1893-2157, 1920-2298, 1920-2352, 1963-2252, 2002-2320,
2076- 2607, 2132- 2327, 2132-2807, 2133-2867, 2173-2795, 2197-2701, 2247-2914, 2305-2905,
2456 2971, 2460- 2921, 2573-2937, 2631-2921, 2734-2891, 2862-3117, 2898-3458, 2899-3071,
2961 3220, 3209- 3580, 3512-3579, 4046-4587, 4090-4112, 4140-4176
56/ 1-241, 1-416, 1- 438, 1-446, 1-463, 1-477, 1- 496, 1-534 , 1-553 , 1-567 , 1-570 , 1-599 , 1-
2055455CB1/ 613, 1-619, 2-638, 3-494, 6-249, 6-361, 13-215 , 13-436 , 13-451, 16-445 , 26-385 , 30-223 ,
1509 33-560, 40-274, 64-331, 75-439, 95-372, 235- 443 , 235-446, 235-573 , 280-820 , 296-909 , 396- 915, 415-727, 421-625, 427-831, 445-716, 450 -914 , 465-1002 , 487-1022 , 488-1155 , 515-797 , 550-1038, 553-1010, 598-1037, 600-1191, 627- 1228 , 645-1069 , 646-1132 , 737-1246 , 741-1509 , 824-1085, 869-1288, 899-1183, 1027-1194
Table 5
to 4^.
Table 6
Table 6 (cont.)
t
C3>
Table 6 (cont.)
Table 6 (cont.)
Table 6 (cont.)
Table 6 (cont.)
Table 6 (cont.)
Table 7
Program Description Reference Parameter Threshold
ABI FACTURA A program that removes vector sequences and Applied Biosystems, Foster City, CA. masks ambiguous bases in nucleic acid sequences.
ABItPARACEL FDF A Fast Data Finder useful in comparing and Applied Biosystems, Foster City, CA; Mismatch <50% annotating amino acid or nucleic acid sequences. Paracel Inc., Pasadena, CA. ABI AutoAssembler A program that assembles nucleic acid sequences. Applied Biosystems, Foster City, CA.
BLAST A Basic Local Alignment Search Tool useful in Altschul, S.F. et al. (1990) J. Mol. Biol. ESTs: Probability value=1.0E-8 sequence similarity search for amino acid and 215:403-410; Altschul, S.F. et al. (1997) or less nucleic acid sequences. BLAST includes five Nucleic Acids Res. 25:3389-3402. Full Length sequences: Probability functions: blastp, blastn, blastx, tblastn, and tblastx. value= l.OE-10 or less
FASTA A Pearson and Lipman algorithm that searches for Pearson, W.R. and D.J. Lipman (1988) Proc. ESTs: fasta E value=1.06E-6 similarity between a query sequence and a group of Natl. Acad Sci. USA 85:2444-2448; Pearson, Assembled ESTs: fasta Identity= sequences of the same type. FASTA comprises as W.R. (1990) Methods Enzymol. 183:63-98; 95% or greater and least five functions: fasta, tfasta, fastx, tfastx, and and Smith, T.F. and M.S. Waterman (1981) Match length=200 bases or greater; ssearch. Adv. Appl. Math. 2:482-489. fastx E value=1.0E-8 or less
Full Length sequences: fastx score=100 or greater
BLIMPS A BLocks IMProved Searcher that matches a Henikoff, S. and J.G. Henikoff (1991) Nucleic Probability value=1.0E-3 or less sequence against those in BLOCKS, PRINTS, Acids Res. 19:6565-6572; Henikoff, J.G. and DOMO, PRODOM, and PFAM databases to search S. Henikoff (1996) Methods Enzymol. for gene families, sequence homology, and structural 266:88-105; and Attwood, T.K. et al. (1997) J. fingerprint regions. Chem. Inf. Comput. Sci. 37:417-424.
HMMER An algorithm for searching a query sequence against Krogh, A. et al. (1994) J. Mol. Biol. PFAM, INCY, SMART, or TIGRFA hidden Markov model (HMM)-based databases of 235:1501-1531; Sonnhammer, E.L.L. et al. hits: Probability value=1.0E-3 or le protein family consensus sequences, such as PFAM, (1988) Nucleic Acids Res. 26:320-322; Signal peptide hits: Score= 0 or INCY, SMART, and TIGRFAM. Durbin, R. et al. (1998) Our World View, in a greater Nutshell, Cambridge Univ. Press, pp. 1-350.
Table 7 (cont.)
Program Description Reference Parameter Threshold
ProfileScan An algorithm that searches for structural and sequence Gribskov, M. et al. (1988) CABIOS 4:61-66; Normalized quality score≥GCG motifs in protem sequences that match sequence patterns Gribskov, M. et al. (1989) Methods Enzymol. specified "HIGH" value for that defined in Prosite. 183:146-159; Bairoch, A. et al. (1997) particular Prosite motif. Nucleic Acids Res. 25:217-221. Generally, score=l.4-2.1.
Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. sequencer traces with high sensitivity and probability. 8:175-185; Ewing, B. and P. Green (1998) Genome Res. 8:186-194.
Phrap A Phils Revised Assembly Program including SWAT and Smith, T.F. and M.S. Waterman (1981) Adv. Score=120 or greater; CrossMatch, programs based on efficient implementation Appl. Math. 2:482-489; Smith, T.F. and M.S. Match length=56 or greater of the Smith-Waterman algorithm, useful in searching Waterman (1981) J. Mol. Biol. 147:195-197; sequence homology and assembling DNA sequences. and Green, P., University of Washington, Seattle, WA.
Consed A graphical tool for viewing and editing Phrap assemblies. Gordon, D. et al. (1998) Genome Res. 8:195-202. O O SPScan A weight matrix analysis program that scans protein Nielson, H. et al. (1997) Protein Engineering Score=3.5 or greater sequences for the presence of secretory signal peptides. 10:1-6; Claverie, J.M. and S. Audic (1997) CABIOS 12:431-439.
TMAP A program that uses weight matrices to delineate Persson, B. and P. Argos (1994) J. Mol. Biol. transmembrane segments on protein sequences and 237:182-192; Persson, B. and P. Argos (1996) determine orientation. Protein Sci. 5:363-371.
TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segments on protein sequences Conf. on Intelligent Systems for Mol. Biol., and determine orientation. Glasgow et al., eds., The Am. Assoc. for Artificial Intelligence Press, Menlo Park, CA, pp. 175-182.
Motifs A program that searches amino acid sequences for patterns Bairoch, A. et al. (1997) Nucleic Acids Res. 25:217-221; that matched those defined in Prosite. Wisconsin Package Program Manual, version 9, page M51-59, Genetics Computer Group, Madison, WI.

Claims

What is claimed is:
1. An isolated polypeptide selected from the group consisting of: a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, b) a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO: 1-3, SEQ ID NO:5-13, SEQ ID NO: 16-17, and SEQ ID NO: 19-28, c) a polypeptide comprising a naturally occurring amino acid sequence at least 92% identical to an amino acid sequence selected from the group consisting of SEQ ID
NO:4, SEQ ID NO: 14, and SEQ ID NO: 15, d) a polypeptide comprising a naturally occurring amino acid sequence at least 95% identical to the amino acid sequence of SEQ ID NO: 18, e) a biologically active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, and f) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
2. An isolated polypeptide of claim 1 comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
3. An isolated polynucleotide encoding a polypeptide of claim 1.
4. An isolated polynucleotide encoding a polypeptide of claim 2.
5. An isolated polynucleotide of claim 4 comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56.
6. A recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide of claim 3.
7. A cell transformed with a recombinant polynucleotide of claim 6.
8. A transgenic organism comprising a recombinant polynucleotide of claim 6.
9. A method of producing a polypeptide of claim 1, the method comprising: a) culturing a cell under conditions suitable for expression of the polypeptide, wherein said cell is transformed with a recombinant polynucleotide, and said recombinant polynucleotide comprises a promoter sequence operably linked to a polynucleotide encoding the polypeptide of claim 1, and b) recovering the polypeptide so expressed.
10. A method of claim 9, wherein the polypeptide comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
11. An isolated antibody which specifically binds to a polypeptide of claim 1.
12. An isolated polynucleotide selected from the group consisting of: a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-56, b) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:29-31 and SEQ ID NO:33-56, c) a polynucleotide comprising a naturally occurring polynucleotide sequence at least 92% identical to the polynucleotide sequence of SEQ TD NO:32, d) a polynucleotide complementary to a polynucleotide of a), e) a polynucleotide complementary to a polynucleotide of b), f) a polynucleotide complementary to a polynucleotide of c), and g) an RNA equivalent of a)-f).
13. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide of claim 12.
14. A method of detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 12, the method comprising: a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
15. A method of claim 14, wherein the probe comprises at least 60 contiguous nucleotides.
16. A method of detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 12, the method comprising: a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
17. A composition comprising a polypeptide of claim 1 and a pharmaceutically acceptable excipient.
18. A composition of claim 17, wherein the polypeptide comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
19. A method for treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment the composition of claim 17.
20. A method of screening a compound for effectiveness as an agonist of a polypeptide of claim 1, the method comprising: a) exposing a sample comprising a polypeptide of claim 1 to a compound, and b) detecting agonist activity in the sample.
21. A composition comprising an agonist compound identified by a method of claim 20 and a pharmaceutically acceptable excipient.
22. A method for treating a disease or condition associated with decreased expression of functional CSAP, comprising administering to a patient in need of such treatment a composition of claim 21.
23. A method of screening a compound for effectiveness as an antagonist of a polypeptide of claim 1, the method comprising: a) exposing a sample comprising a polypeptide of claim 1 to a compound, and b) detecting antagonist activity in the sample.
24. A composition comprising an antagonist compound identified by a method of claim 23 and a pharmaceutically acceptable excipient.
25. A method for treating a disease or condition associated with overexpression of functional CSAP, comprising administering to a patient in need of such treatment a composition of claim 24.
26. A method of screening for a compound that specifically binds to the polypeptide of claim 1, the method comprising: a) combining the polypeptide of claim 1 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide of claim 1 to the test compound, thereby identifying a compound that specifically binds to the polypeptide of claim 1.
27. A method of screening for a compound that modulates the activity of the polypeptide of claim 1, the method comprising: a) combining the polypeptide of claim 1 with at least one test compound under conditions permissive for the activity of the polypeptide of claim 1, b) assessing the activity of the polypeptide of claim 1 in the presence of the test compound, and c) comparing the activity of the polypeptide of claim 1 in the presence of the test compound with the activity of the polypeptide of claim 1 in the absence of the test compound, wherein a change in the activity of the polypeptide of claim 1 in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide of claim 1.
28. A method of screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a sequence of claim 5, the method comprising: a) exposing a sample comprising the target polynucleotide to a compound, under conditions suitable for the expression of the target polynucleotide, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
29. A method of assessing toxicity of a test compound, the method comprising: a) treating a biological sample containing nucleic acids with the test compound, b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide of claim 12 under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide of claim 12 or fragment thereof, c) quantifying the amount of hybridization complex, and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
30. A diagnostic test for a condition or disease associated with the expression of CSAP in a biological sample, the method comprising: a) combining the biological sample with an antibody of claim 11, under conditions suitable for the antibody to bind the polypeptide and form an antibody '.polypeptide complex, and b) detecting the complex, wherein the presence of the complex correlates with the presence of the polypeptide in the biological sample.
31. The antibody of claim 11 , wherein the antibody is: a) a chimeric antibody, b) a single chain antibody, c) a Fab fragment, d) a F(ab' )2 fragment, or e) a humanized antibody.
32. A composition comprising an antibody of claim 11 and an acceptable excipient.
33. A method of diagnosing a condition or disease associated with the expression of CSAP in a subject, comprising administering to said subject an effective amount of the composition of claim 32.
34. A composition of claim 32, wherein the antibody is labeled.
35. A method of diagnosing a condition or disease associated with the expression of CSAP in a subject, comprising administering to said subject an effective amount of the composition of claim 34.
36. A method of preparing a polyclonal antibody with the specificity of the antibody of claim 11, the method comprising: a) immunizing an animal with a polypeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, or an immunogenic fragment thereof, under conditions to elicit an antibody response, b) isolating antibodies from said animal, and c) screening the isolated antibodies with the polypeptide, thereby identifying a polyclonal antibody which specifically binds to a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
37. A polyclonal antibody produced by a method of claim 36.
38. A composition comprising the polyclonal antibody of claim 37 and a suitable carrier.
39. A method of making a monoclonal antibody with the specificity of the antibody of claim
11, the method comprising: a) immunizing an animal with a polypeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28, or an immunogenic fragment thereof, under conditions to elicit an antibody response, b) isolating antibody producing cells from the animal, c) fusing the antibody producing cells with immortalized cells to form monoclonal antibody-producing hybridoma cells, d) culturing the hybridoma cells, and e) isolating from the culture monoclonal antibody which specifically binds to a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ED NO: 1-28.
40. A monoclonal antibody produced by a method of claim 39.
41. A composition comprising the monoclonal antibody of claim 40 and a suitable carrier.
42. The antibody of claim 11, wherein the antibody is produced by screening a Fab expression library.
43. The antibody of claim 11, wherein the antibody is produced by screening a recombinant immunoglobulin library.
44. A method of detecting a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28 in a sample, the method comprising: a) incubating the antibody of claim 11 with a sample under conditions to allow specific binding of the antibody and the polypeptide, and b) detecting specific binding, wherein specific binding indicates the presence of a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28 in the sample.
45. A method of purifying a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28 from a sample, the method comprising: a) incubating the antibody of claim 11 with a sample under conditions to allow specific binding of the antibody and the polypeptide, and b) separating the antibody from the sample and obtaining the purified polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-28.
46. A microarray wherein at least one element of the microarray is a polynucleotide of claim
13.
47. A method of generating an expression profile of a sample which contains polynucleotides, the method comprising: a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray of claim 46 with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
48. An array comprising different nucleotide molecules affixed in distinct physical locations on a solid substrate, wherein at least one of said nucleotide molecules comprises a first oligonucleotide or polynucleotide sequence specifically hybridizable with at least 30 contiguous nucleotides of a target polynucleotide, and wherein said target polynucleotide is a polynucleotide of claim 12.
49. An array of claim 48, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to at least 30 contiguous nucleotides of said target polynucleotide.
50. An array of claim 48, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to at least 60 contiguous nucleotides of said target polynucleotide.
51. An array of claim 48, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to said target polynucleotide.
52. An array of claim 48, which is a microarray.
53. An array of claim 48, further comprising said target polynucleotide hybridized to a nucleotide molecule comprising said first oligonucleotide or polynucleotide sequence.
54. An array of claim 48, wherein a linker joins at least one of said nucleotide molecules to said solid substrate.
55. An array of claim 48, wherein each distinct physical location on the substrate contains multiple nucleotide molecules, and the multiple nucleotide molecules at any single distinct physical location have the same sequence, and each distinct physical location on the substrate contains nucleotide molecules having a sequence which differs from the sequence of nucleotide molecules at another distinct physical location on the substrate.
56. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:l.
57. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:2.
58. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:3.
59. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:4.
60. A polypeptide of claim 1, comprising the amino acid sequence of SEQ DD NO:5.
61. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:6.
62. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:7.
63. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:8.
64. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:9.
65. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 10.
66. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 11.
67. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 12.
68. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 13.
69. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 14.
70. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 15.
71. A polypeptide of claim 1 , comprising the amino acid sequence of SEQ ID NO: 16.
72. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 17.
73. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 18.
74. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 19.
75. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:20.
76. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:21.
77. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:22.
78. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 23.
79. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:24.
80. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:25.
81. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:26.
82. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO:27.
83. A polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 28.
84. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO-.29.
85. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:30.
86. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:31.
87. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:32.
88. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:33.
89. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:34.
90. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:35.
91. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID
NO-.36.
92. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:37.
93. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:38.
94. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:39.
95. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO-.40.
96. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID
NO:41.
97. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:42.
98. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:43.
99. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:44.
100. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:45.
101. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:46.
102. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:47.
103. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:48.
104. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID
NO:49.
105. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:50.
106. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:51.
107. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:52.
108. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:53.
109. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID
NO:54.
110. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:55.
111. A polynucleotide of claim 12, comprising the polynucleotide sequence of SEQ ID NO:56.
<110> INCYTE GENOMICS, INC. HAFALIA, April J.A. TANG, Y. Tom YUE, Henry KHAN, Farrah A. ISON, Craig H. BAUGHN, Mariah R. WARREN, Bridget A. DUGGAN, Brendan M. THANGAVELU, Kavitha HONCHELL, Cynthia D. AZIMZAI, Yalda ELLIOTT, Vicki S. BURFORD, Neil DING, Li YUE, Huibin BECHA, Shanya EMERLING, Brooke M. RICHARDSON, Thomas W. LEE, Soo Yeun BANDMAN, Olga LAL, Preeti G. LEE, Sally1 GIETZEN, Kimberly J. WALIA, Narinder K. GRIFFIN, Jennifer A. LEE, Ernestine A. SWARNAKAR, Anita RING, Huijun Z. JONES, Karen Anne
<120> CYTOSKELETON-ASSOCIATED PROTEINS
<130> PF-0918 PCT
<140> To Be Assigned <141> Herewith
<150> 60/280,508; 60/281,323; 60/283,769; 60/288,609; 60/290,518;
60/291,870; 60/294,451
<151> 2001-03-29; 2001-04-03; 2001-04-13; 2001-05-04; 2001-05-10;
2001-05-18; 2001-05-29
<160> 56
<170> PERL Program
<210> 1
<211> 459
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6582721CD1
<400> 1
Met Ser Val Arg Phe Ser Ser Thr Ser Arg Arg Leu Gly Ser Cys
1 5 10 15
Gly Gly Thr Gly Ser Val Arg Leu Ser Ser Gly Gly Ala Gly Phe
20 25 30
Gly Ala Gly Asn Thr Cys Gly Val Pro Gly lie Gly Ser Gly Phe
35 40 45
1/86 Ser Cys Ala Phe Gly Gly Ser Ser Ser Ala Gly Gly Tyr Gly Gly
50 55 60
Gly Leu Gly Gly Gly Ser Ala Ser Cys Ala Ala Phe Thr Gly Asn
65 70 75
Glu His Gly Leu Leu Ser Gly Asn Glu Lys Val Thr Met Gin Asn
80 85 90
Leu Asn Asp Arg Leu Ala Ser Tyr Leu Glu Asn Val Arg Ala Leu
95 100 105
Glu Glu Ala Asn Ala Asp Leu Glu Gin Lys lie Lys Gly Trp Tyr
110 115 120
Glu Lys Phe Gly Pro Gly Ser Cys Arg Gly Leu Asp His Asp Tyr
125 130 135
Ser Arg Tyr Phe Pro lie lie Asp Glu Leu Lys Asn Gin lie lie
140 145 150
Ser Ala Thr Thr Ser Asn Ala His Val Val Leu Gin Asn Asp Asn
155 160 165
Ala Arg Leu Thr Ala Asp Asp Phe Arg Leu Lys Phe Glu Asn Glu
170 175 180
Leu Ala Leu His Gin Ser Val Glu Ala Asp lie Asn Ser Leu Arg
185 190 195
Arg Val Leu Asp Glu Leu Thr Leu Cys Arg Thr Asp Leu Glu lie
200 205 210
Gin Leu Glu Thr Leu Ser Glu Glu Leu Ala Tyr Leu Lys Lys Asn
215 220 225
His Glu Glu Glu Met Lys Ala Leu Gin Cys Ala Ala Gly Gly Asn
230 235 240
Val Asn Val Glu Met Asn Ala Ala Pro Gly Val Asp Leu Thr Val
245 250 255
Leu Leu Asn Asn Met Arg Ala Glu Tyr Glu Ala Leu Ala Glu Gin
260 265 270
Asn Arg Arg Asp Ala Glu Ala Trp Phe Asn Glu Lys Ser Ala Ser
275 280 285
Leu Gin Gin Gin lie Ser Asp Asp Ala Gly Ala Thr Thr Ser Ala
290 295 300
Arg Asn Glu Leu lie Glu Met Lys Arg Thr Leu Gin Thr Leu Glu
305 310 315 lie Glu Leu Gin Ser Leu Leu Ala Thr Lys His Ser Leu Glu Cys
320 325 330
Ser Leu Thr Glu Thr Glu Ser Asn Tyr Cys Ala Gin Leu Ala Gin
335 340 345 lie Gin Ala Gin lie Gly Ala Leu Glu Glu Gin Leu His Gin Val
350 355 360
Arg Thr Glu Thr Glu Gly Gin Lys Leu Glu Tyr Glu Gin Leu Leu
365 370 375
Asp lie Lys Val His Leu Glu Lys Glu lie Glu Thr Tyr Cys Leu
380 385 390
Leu lie Asp Gly Glu Asp Gly Ser Cys Ser Lys Ser Lys Gly Tyr
395 400 405
Gly Gly Pro Gly Asn Gin Thr Lys Asp Ser Ser Lys Thr Thr lie
410 415 420
Val Lys Thr Val Val Glu Glu lie Asp Pro Arg Gly Lys Val Leu
425 430 435
Ser Ser Arg Val His Thr Val Glu Glu Lys Ser Thr Lys Val Asn
440 445 450
Asn Lys Asn Glu Gin Arg Val Ser Ser
455
<210> 2
<211> 669
<212> PRT
<213> Homo sapiens
<220>
2/86 <221> misc_feature
<223> Incyte ID No: 2828941CD1
<400> 2
Met Gly Glu Lys Asn Gly Asp Ala Lys Thr Phe Trp Met Glu Leu
1 5 10 15
Glu Asp Asp Gly Lys Val Asp Phe lie Phe Glu Gin Val Gin Asn
20 25 30
Val Leu Gin Ser Leu Lys Gin Lys lie Lys Asp Gly Ser Ala Thr
35 40 45
Asn Lys Glu Tyr lie Gin Ala Met lie Leu Val Asn Glu Ala Thr
50 55 60 lie lie Asn Ser Ser Thr Ser lie Lys Asp Pro Met Pro Val Thr
65 70 75
Gin Lys Glu Gin Glu Asn Lys Ser Asn Ala Phe Pro Ser Thr Ser
80 85 90
Cys Glu Asn Ser Phe Pro Glu Asp Cys Thr Phe Leu Thr Thr Gly
95 100 105
Asn Lys Glu lie Leu Ser Leu Glu Asp Lys Val Val Asp Phe Arg
110 115 120
Glu Lys Asp Ser Ser Ser Asn Leu Ser Tyr Gin Ser His Asp Cys
125 130 135
Ser Gly Ala Cys Leu Met Lys Met Pro Leu Asn Leu Lys Gly Glu
140 145 150
Asn Pro Leu Gin Leu Pro lie Lys Cys His Phe Gin Arg Arg His
155 160 165
Ala Lys Thr Asn Ser His Ser Ser Ala Leu His Val Ser Tyr Lys
170 175 180
Thr Pro Cys Gly Arg Ser Leu Arg Asn Val Glu Glu Val Phe Arg
185 190 195
Tyr Leu Leu Glu Thr Glu Cys Asn Phe Leu Phe Thr Asp Asn Phe
200 205 210
Ser Phe Asn Thr Tyr Val Gin Leu Ala Arg Asn Tyr Pro Lys Gin
215 220 225
Lys Glu Val Val Ser Asp Val Asp lie Ser Asn Gly Val Glu Ser
230 235 240
Val Pro lie Ser Phe Cys Asn Glu lie Asp Ser Arg Lys Leu Pro
245 250 255
Gin Phe Lys Tyr Arg Lys Thr Val Trp Pro Arg Ala Tyr Asn Leu
260 265 270
Thr Asn Phe Ser Ser Met Phe Thr Asp Ser Cys Asp Cys Ser Glu
275 280 285
Gly Cys lie Asp lie Thr Lys Cys Ala Cys Leu Gin Leu Thr Ala
290 295 300
Arg Asn Ala Lys Thr Ser Pro Leu Ser Ser Asp Lys lie Thr Thr
305 310 315
Gly Tyr Lys Tyr Lys Arg Leu Gin Arg Gin lie Pro Thr Gly lie
320 325 330
Tyr Glu Cys Ser Leu Leu Cys Lys Cys Asn Arg Gin Leu Cys Gin
335 340 345
Asn Arg Val Val Gin His Gly Pro Gin Val Arg Leu Gin Val Phe
350 355 360
Lys Thr Glu Gin Lys Gly Trp Gly Val Arg Cys Leu Asp Asp lie
365 370 375
Asp Arg Gly Thr Phe Val Cys lie Tyr Ser Gly Arg Leu Leu Ser
380 385 390
Arg Ala Asn Thr Glu Lys Ser Tyr Gly lie Asp Glu Asn Gly Arg
395 400 405
Asp Glu Asn Thr Met Lys Asn lie Phe Ser Lys Lys Arg Lys Leu
410 415 420
Glu Val Ala Cys Ser Asp Cys Glu Val Glu Val Leu Pro Leu Gly
425 430 435
Leu Glu Thr His Pro Arg Thr Ala Lys Thr Glu Lys Cys Pro Pro
3/86 440 445 450
Lys Phe Ser Asn Asn Pro Lys Glu Leu Thr Val Glu Thr Lys Tyr
455 460 465
Asp Asn lie Ser Arg lie Gin Tyr His Ser Val lie Arg Asp Pro
470 475 480
Glu Ser Lys Thr Ala lie Phe Gin His Asn Gly Lys Lys Met Glu
485 490 495
Phe Val Ser Ser Glu Ser Val Thr Pro Glu Asp Asn Asp Gly Phe
500 505 510
Lys Pro Pro Arg Glu His Leu Asn Ser Lys Thr Lys Gly Ala Gin
515 520 525
Lys Asp Ser Ser Ser Asn His Val Asp Glu Phe Glu Asp Asn Leu
530 535 540
Leu lie Glu Ser Asp Val lie Asp lie Thr Lys Tyr Arg Glu Glu
545 550 555
Thr Pro Pro Arg Ser Arg Cys Asn Gin Ala Thr Thr Leu Asp Asn
560 565 570
Gin Asn lie Lys Lys Ala lie Glu Val Gin lie Gin Lys Pro Gin
575 580 585
Glu Gly Arg Ser Thr Ala Cys Gin Arg Gin Gin Val Phe Cys Asp
590 595 600
Glu Glu Leu Leu Ser Glu Thr Lys Asn Thr Ser Ser Asp Ser Leu
605 610 615
Thr Lys Phe Asn Lys Gly Asn Val Phe Leu Leu Asp Ala Thr Lys
620 625 630
Glu Gly Asn Val Gly Arg Phe Leu Asn Ser Leu Thr Leu Ser Pro
635 640 645
Val Ala Gin Ser Gin Leu Thr Ala Thr Ser Ala Ser Gly Val Gin
650 655 660
Ala He Leu Met Pro Arg Pro Pro Glu
665
<210> 3
<211> 1614
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6260407CD1
<400> 3
Met Leu Gly Ala Pro Asp Glu Ser Ser Val Arg Val Ala Val Arg
1 5 10 15
He Arg Pro Gin Leu Ala Lys Glu Lys He Glu Gly Cys His He
20 25 30
Cys Thr Ser Val Thr Pro Gly Glu Pro Gin Val Phe Leu Gly Lys
35 40 45
Asp Lys Ala Phe Thr Phe Asp Tyr Val Phe Asp He Asp Ser Gin
50 55 60
Gin Glu Gin He Tyr He Gin Cys He Glu Lys Leu He Glu Gly
65 70 75
Cys Phe Glu Gly Tyr Asn Ala Thr Val Phe Ala Tyr Gly Gin Thr
80 85 90
Gly Ala Gly Lys Thr Tyr Thr Met Gly Thr Gly Phe Asp Val Asn
95 100 105
He Val Glu Glu Glu Leu Gly He He Ser Arg Ala Val Lys His
110 115 120
Leu Phe Lys Ser He Glu Glu Lys Lys His He Ala He Lys Asn
125 130 135
Gly Leu Pro Ala Pro Asp Phe Lys Val Asn Ala Gin Phe Leu Glu
140 145 150
Leu Tyr Asn Glu Glu Val Leu Asp Leu Phe Asp Thr Thr Arg Asp
4/86 155 160 165
He Asp Ala Lys Ser Lys Lys Ser Asn He Arg He His Glu Asp
170 175 180
Ser Thr Gly Gly He Tyr Thr Val Gly Val Thr Thr Arg Thr Val
185 190 195
Asn Thr Glu Ser Glu Met Met Gin Cys Leu Lys Leu Gly Ala Leu
200 205 210
Ser Arg Thr Thr Ala Ser Thr Gin Met Asn Val Gin Ser Ser Arg
215 220 225
Ser His Ala He Phe Thr He His Val Cys Gin Thr Arg Val Cys
230 235 240
Pro Gin He Asp Ala Asp Asn Ala Thr Asp Asn Lys He He Ser
245 250 255
Glu Ser Ala Gin Met Asn Glu Phe Glu Thr Leu Thr Ala Lys Phe
260 265 270
His Phe Val Asp Leu Ala Gly Ser Glu Arg Leu Lys Arg Thr Gly
275 280 285
Ala Thr Gly Glu Arg Ala Lys Glu Gly He Ser He Asn Cys Gly
290 295 300
Leu Leu Ala Leu Gly Asn Val He Ser Ala Leu Gly Asp Lys Ser
305 310 315
Lys Arg Ala Thr His Val Pro Tyr Arg Asp Ser Lys Leu Thr Arg
320 325 330
Leu Leu Gin Asp Ser Leu Gly Gly Asn Ser Gin Thr He Met He
335 340 345
Ala Cys Val Ser Pro Ser Asp Arg Asp Phe Met Glu Thr Leu Asn
350 355 360
Thr Leu Lys Tyr Ala Asn Arg Ala Arg Asn He Lys Asn Lys Val
365 370 375
Met Val Asn Gin Asp Arg Ala Ser Gin Gin He Asn Ala Leu Arg
380 385 390
Ser Glu He Thr Arg Leu Gin Met Glu Leu Met Glu Tyr Lys Thr
395 400 405
Gly Lys Arg He He Asp Glu Glu Gly Val Glu Ser He Asn Asp
410 415 420
Met Phe His Glu Asn Ala Met Leu Gin Thr Glu Asn Asn Asn Leu
425 430 435
Arg Val Arg He Lys Ala Met Gin Glu Thr Val Asp Ala Leu Arg
440 445 450
Ser Arg He Thr Gin Leu Val Ser Asp Gin Ala Asn His Val Leu
455 460 465
Ala Arg Ala Gly Glu Gly Asn Glu Glu He Ser Asn Met He His
470 475 480
Ser Tyr He Lys Glu He Glu Asp Leu Arg Ala Lys Leu Leu Glu
485 490 495
Ser Glu Ala Val Asn Glu Asn Leu Arg Lys Asn Leu Thr Arg Ala
500 505 510
Thr Ala Arg Ala Pro Tyr Phe Ser Gly Ser Ser Thr Phe Ser Pro
515 520 525
Thr He Leu Ser Ser Asp Lys Glu Thr He Glu He He Asp Leu
530 535 540
Ala Lys Lys Asp Leu Glu Lys Leu Lys Arg Lys Glu Lys Arg Lys
545 550 555
Lys Lys Arg Leu Gin Lys Leu Glu Glu Ser Asn Arg Glu Glu Arg
560 565 570
Ser Val Ala Gly Lys Glu Asp Asn Thr Asp Thr Asp Gin Glu Lys
575 580 585
Lys Glu Glu Lys Gly Val Ser Glu Arg Glu Asn Asn Glu Leu Glu
590 595 600
Val Glu Glu Ser Gin Glu Val Ser Asp His Glu Asp Glu Glu Glu
605 610 615
Glu Glu Glu Glu Glu Glu Asp Asp He Asp Gly Gly Glu Ser Ser
620 625 630
5/86 Asp Glu Ser Asp Ser Glu Ser Asp Glu Lys Ala Asn Tyr Gin Ala
635 640 645
Asp Leu Ala Asn He Thr Cys Glu He Ala He Lys Gin Lys Leu
650 655 660
He Asp Glu Leu Glu Asn Ser Gin Lys Arg Leu Gin Thr Leu Lys
665 670 675
Lys Gin Tyr Glu Glu Lys Leu Met Met Leu Gin His Lys He Arg
680 685 690
Asp Thr Gin Leu Glu Arg Asp Gin Val Leu Gin Asn Leu Gly Ser
695 700 705
Val Glu Ser Tyr Ser Glu Glu Lys Ala Lys Lys Val Arg Ser Glu
710 715 720
Tyr Glu Lys Lys Leu Gin Ala Met Asn Lys Glu Leu Gin Arg Leu
725 730 735
Gin Ala Ala Gin Lys Glu His Ala Arg Leu Leu Lys Asn Gin Ser
740 745 750
Gin Tyr Glu Lys Gin Leu Lys Lys Leu Gin Gin Asp Val Met Glu
755 760 765
Met Lys Lys Thr Lys Val Arg Leu Met Lys Gin Met Lys Glu Glu
770 775 780
Gin Glu Lys Ala Arg Leu Thr Glu Ser Arg Arg Asn Arg Glu He
785 790 795
Ala Gin Leu Lys Lys Asp Gin Arg Lys Arg Asp His Gin Leu Arg
800 805 810
Leu Leu Glu Ala Gin Lys Arg Asn Gin Glu Val Val Leu Arg Arg
815 820 825
Lys Thr Glu Glu Val Thr Ala Leu Arg Arg Gin Val Arg Pro Met
830 835 840
Ser Asp Lys Val Ala Gly Lys Val Thr Arg Lys Leu Ser Ser Ser
845 850 855 Asp Ala Pro Ala Gin Asp Thr Gly Ser Ser Ala Ala Ala Val Glu
860 865 870
Thr Asp Ala Ser Arg Thr Gly Ala Gin Gin Lys Met Arg He Pro
875 880 885
Val Ala Arg Val Gin Ala Leu Pro Thr Pro Ala Thr Asn Gly Asn
890 895 900 Arg Lys Lys Tyr Gin Arg Lys Gly Leu Thr Gly Arg Val Phe He
905 910 915
Ser Lys Thr Ala Arg Met Lys Trp Gin Leu Leu Glu Arg Arg Val
920 925 930
Thr Asp He He Met Gin Lys Met Thr He Ser Asn Met Glu Ala
935 940 945
Asp Met Asn Arg Leu Leu Lys Gin Arg Glu Glu Leu Thr Lys Arg
950 955 960 Arg Glu Lys Leu Ser Lys Arg Arg Glu Lys He Val Lys Glu Asn
965 970 975
Gly Glu Gly Asp Lys Asn Val Ala Asn He Asn Glu Glu Met Glu
980 985 990 Ser Leu Thr Ala Asn He Asp Tyr He Asn Asp Ser He Ser Asp
995 1000 1005
Cys Gin Ala Asn He Met Gin Met Glu Glu Ala Lys Glu Glu Gly
1010 1015 1020
Glu Thr Leu Asp Val Thr Ala Val He Asn Ala Cys Thr Leu Thr
1025 1030 1035
Glu Ala Arg Tyr Leu Leu Asp His Phe Leu Ser Met Gly He Asn
1040 1045 1050
Lys Gly Leu Gin Ala Ala Gin Lys Glu Ala Gin He Lys Val Leu
1055 1060 1065
Glu Gly Arg Leu Lys Gin Thr Glu He Thr Ser Ala Thr Gin Asn
1070 1075 1080
Gin Leu Leu Phe His Met Leu Lys Glu Lys Ala Glu Leu Asn Pro
1085 1090 1095
Glu Leu Asp Ala Leu Leu Gly His Ala Leu Gin Asp Leu Asp Ser
6/86 1100 1105 1110
Val Pro Leu Glu Asn Val Glu Asp Ser Thr Asp Glu Asp Ala Pro
1115 1120 1125
Leu Asn Ser Pro Gly Ser Glu Gly Ser Thr Leu Ser Ser Asp Leu
1130 1135 1140
Met Lys Leu Cys Gly Glu Val Lys Pro Lys Asn Lys Ala Arg Arg
1145 1150 1155
Arg Thr Thr Thr Gin Met Glu Leu Leu Tyr Ala Asp Ser Ser Glu
1160 1165 1170
Leu Ala Ser Asp Thr Ser Thr Gly Asp Ala Ser Leu Pro Gly Pro
1175 1180 1185
Leu Thr Pro Val Ala Glu Gly Gin Glu He Gly Met Asn Thr Glu
1190 1195 1200
Thr Ser Gly Thr Ser Ala Arg Glu Lys Glu Leu Ser Pro Pro. ro
1205 1210 1215
Gly Leu Pro Ser Lys He Gly Ser He Ser Arg Gin Ser Ser Leu
1220 1225 1230
Ser Glu Lys Lys He Pro Glu Pro Ser Pro Val Thr Arg Arg Lys
1235 1240 1245
Ala Tyr Glu Lys Ala Glu Lys Ser Lys Ala Lys Glu Gin Lys Gin
1250 1255 1260
Gly He He Asn Pro Phe Pro Ala Ser Lys Gly He Arg Ala Phe
1265 1270 1275
Pro Leu Gin Cys He His He Ala Glu Gly His Thr Lys Ala Val
1280 1285 1290
Leu Cys Val Asp Ser Thr Asp Asp Leu Leu Phe Thr Gly Ser Lys
1295 1300 1305
Asp Arg Thr Cys Lys Val Trp Asn Leu Val Thr Gly Gin Glu He
1310 1315 1320
Met Ser Leu Gly Gly His Pro Asn Asn Val Val Ser Val Lys Tyr
1325 1330 1335
Cys Asn Tyr Thr Ser Leu Val Phe Thr Val Ser Thr Ser Tyr He
1340 - 1345 1350
Lys Val Trp Asp He Arg Asp Ser Ala Lys Cys He Arg Thr Leu
1355 1360 1365
Thr Ser Ser Gly Gin Val Thr Leu Gly Asp Ala Cys Ser Ala Ser
1370 1375 1380
Thr Ser Arg Thr Val Ala He Pro Ser Gly Glu Asn Gin He Asn
1385 1390 1395
Gin He Ala Leu Asn Pro Thr Gly Thr Phe Leu Tyr Ala Ala Ser
1400 1405 1410
Gly Asn Ala Val Arg Met Trp Asp Leu Lys Arg Phe Gin Ser Thr
1415 1420 1425
Gly Lys Leu Thr Gly His Leu Gly Pro Val Met Cys Leu Thr Val
1430 1435 1440
Asp Gin He Ser Ser Gly Gin Asp Leu He He Thr Gly Ser Lys
1445 1450 1455
Asp His Tyr He Lys Met Phe Asp Val Thr Glu Gly Ala Leu Gly
1460 1465 1470
Thr Val Ser Pro Thr His Asn Phe Glu Pro Pro His Tyr Asp Gly
1475 1480 1485
He Glu Ala Leu Thr He Gin Gly Asp Asn Leu Phe Ser Gly Ser
1490 1495 1500
Arg Asp Asn Gly He Lys Lys Trp Asp Leu Thr Gin Lys Asp Leu
1505 1510 1515
Leu Gin Gin Val Pro Asn Ala His Lys Asp Trp Val Cys Ala Leu
1520 1525 1530
Gly Val Val Pro Asp His Pro Val Leu Leu Ser Gly Cys Arg Gly
1535 1540 1545
Gly He Leu Lys Val Trp Asn Met Asp Thr Phe Met Pro Val Gly
1550 1555 1560 Glu Met Lys Gly His Asp Ser Pro He Asn Ala He Cys Val Asn
1565 1570 1575
7/86 Ser Thr His He Phe Thr Ala Ala Asp Asp Arg Thr Val Arg He
1580 1585 1590
Trp Lys Ala Arg Asn Leu Gin Asp Gly Gin He Ser Asp Thr Gly
1595 1600 1605
Asp Leu Gly Glu Asp He Ala Ser Asn 1610
<210> 4
<211> 299
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7488258CD1
<400> 4 Met Thr Leu Ser Val Leu Ser Arg Lys Asp Lys Glu Arg Val He
1 5 10 15
Arg Arg Leu Leu Leu Gin Ala Pro Pro Gly Glu Phe Val Asn Ala
20 25 30
Phe Asp Asp Leu Cys Leu Leu He Arg Asp Glu Lys Leu Met His
35 40 45
His Gin Gly Glu Cys Ala Gly His Gin His Cys Gin Lys Tyr Ser
50 55 60
Val Pro Leu Cys He Asp Gly Asn Pro Val Leu Leu Ser His His
65 70 75
Asn Val Met Gly Asp Tyr Arg Phe Phe Asp His Gin Ser Lys Leu
80 85 90
Ser Phe Lys Tyr Asp Leu Leu Gin Asn Gin Leu Lys Asp He Gin
95 100 105
Ser His Gly He He Gin Asn Glu Ala Glu Tyr Leu Arg Val Val
110 115 120
Leu Leu Cys Ala Leu Lys Leu Tyr Val Asn Asp His Tyr Pro Lys
125 130 135
Gly Asn Cys Asn Met Leu Arg Lys Thr Val Lys Ser Lys Glu Tyr
140 145 150
Leu He Ala Cys He Glu Asp His Asn Tyr Glu Thr Gly Glu Cys
155 160 165
Trp Asn Gly Leu Trp Lys Ser Lys Trp He Phe Gin Val Asn Pro
170 175 180
Phe Leu Thr Gin Val Thr Gly Arg He Phe Val Gin Ala His Phe
185 190 195
Phe Arg Cys Val Asn Leu His He Glu He Ser Lys Asp Leu Lys
200 205 210
Glu Ser Leu Glu He Val Asn Gin Ala Gin Leu Ala Leu Ser Phe
215 220 225
Ala Arg Leu Val Glu Glu Gin Glu Asn Lys Phe Gin Ala Ala Val
230 235 ' 240
Leu Glu Glu Leu Gin Glu Leu Ser Asn Glu Ala Leu Arg Lys He
245 250 255
Leu Arg Arg Asp Leu Pro Val Thr Arg Thr Leu He Asp Trp His
260 265 ' 270
Arg He Leu Ser Asp Leu Asn Leu Val Met Tyr Pro Lys Leu Gly
275 280 285
Tyr Val He Tyr Ser Arg Ser Val Leu Cys Asn Trp He He
290 295
<210> 5
<211> 1594
<212> PRT
<213> Homo sapiens
8/86 <220>
<221> misc_feature
<223> Incyte ID No: 7948948CD1
<400> 5
Met Leu Asp Ala Pro Asp Glu Ser Ser Val Arg Val Ala Val Arg
1 5 10 15
He Arg Pro Gin Leu Ala Lys Glu Lys He Glu Gly Cys His He
20 25 30
Cys Thr Ser Val Thr Pro Gly Glu Pro Gin Val Phe Leu Gly Lys
35 40 45
Asp Lys Ala Phe Thr Phe Asp Tyr Val Phe Asp He Asp Ser Gin
50 55 60
Gin Glu Gin He Tyr He Gin Cys He Glu Lys Leu He Glu Gly
65 70 75
Cys Phe Glu Gly Tyr Asn Ala Thr Val Phe Ala Tyr Gly Gin Thr
80 85 90
Gly Ala Gly Lys Thr Tyr Thr Met Gly Thr Gly Phe Asp Val Asn
95 100 105
He Val Glu Glu Glu Leu Gly He He Ser Arg Ala Val Lys His
110 115 120
Leu Phe Lys Ser He Glu Glu Lys Lys His He Ala He Lys Asn
125 130 135
Gly Leu Pro Ala Pro Asp Phe Lys Val Asn Ala Gin Phe Leu Glu
140 145 150
Leu Tyr Asn Glu Glu Val Leu Asp Leu Phe Asp Thr Thr Arg Asp
155 160 165
He Asp Ala Lys Ser Lys Lys Ser Asn He Arg He His Glu Asp
170 175 180
Ser Thr Gly Gly He Tyr Thr Val Gly Val Thr Thr Arg Thr Val
185 190 195
Asn Thr Glu Ser Glu Met Met Gin Cys Leu Lys Leu Gly Ala Leu
200 205 210
Ser Arg Thr Thr Ala Ser Thr Gin Met Asn Val Gin Ser Ser Arg
215 220 225
Ser His Ala He Phe Thr He His Val Cys Gin Thr Arg Val Cys
230 235 240
Pro Gin He Asp Ala Asp Asn Ala Thr Asp Asn Lys He He Ser
245 250 255
Glu Ser Ala Gin Met Asn Glu Phe Glu Thr Leu Thr Ala Lys Phe
260 265 270
His Phe Val Asp Leu Ala Gly Ser Glu Arg Leu Lys Arg Thr Gly
275 280 285
Ala Thr Gly Glu Arg Ala Lys Glu Gly He Ser He Asn Cys Gly
290 295 300
Leu Leu Ala Leu Gly Asn Val He Ser Ala Leu Gly Asp Lys Ser
305 310 315
Lys Arg Ala Thr His Val Pro Tyr Arg Asp Ser Lys Leu Thr Arg
320 325 330
Leu Leu Gin Asp Ser Leu Gly Gly Asn Ser Gin Thr He Met He
335 340 345
Ala Cys Val Ser Pro Ser Asp Arg Asp Phe Met Glu Thr Leu Asn
350 355 360
Thr Leu Lys Tyr Ala Asn Arg Ala Arg Asn He Lys Asn Lys Val
365 370 375
Met Val Asn Gin Asp Arg Ala Ser Gin Gin He Asn Ala Leu Arg
380 385 390
Ser Glu He Thr Arg Leu Gin Met Glu Leu Met Glu Tyr Lys Thr
395 400 405
Gly Lys Arg He He Asp Glu Glu Gly Val Glu Ser He Asn Asp
410 415 420
Met Phe His Glu Asn Ala Met Leu Gin Thr Glu Asn Asn Asn Leu
425 430 435
9/86 Arg Val Arg He Lys Ala Met Gin Glu Thr Val Asp Ala Leu Arg
440 445 450
Ser Arg He Thr Gin Leu Val Ser Asp Gin Ala Asn His Val Leu
455 460 465
Ala Arg Ala Gly Glu Gly Asn Glu Glu He Ser Asn Met He His
470 475 480
Ser Tyr He Lys Glu He Glu Asp Leu Arg Ala Lys Leu Leu Glu
485 490 495
Ser Glu Ala Val Asn Glu Asn Leu Arg Lys Asn Leu Thr Arg Ala
500 505 510
Thr Ala Arg Ala Pro Tyr Phe Ser Gly Ser Ser Thr Phe Ser Pro
515 520 525
Thr He Leu Ser Ser Asp Lys Glu Thr He Glu He He Asp Leu
530 535 540
Ala Lys Lys Asp Leu Glu Lys Leu Lys Arg Lys Glu Lys Arg Lys
545 550 555
Lys Lys Ser Val Ala Gly Lys Glu Asp Asn Thr Asp Thr Asp Gin
560 565 570
Glu Lys Lys Glu Glu Lys Gly Val Ser Glu Arg Glu Asn Asn Glu
575 580 585
Leu Glu Val Glu Glu Ser Gin Glu Val Ser Asp His Glu Asp Glu
590 595 600
Glu Glu Glu Glu Glu Glu Glu Glu Asp Asp He Asp Gly Gly Glu
605 610 615
Ser Ser Asp Glu Ser Asp Ser Glu Ser Asp Glu Lys Ala Asn Tyr
620 625 630
Gin Ala Asp Leu Ala Asn He Thr Cys Glu He Ala He Lys Gin
635 640 645
Lys Leu He Asp Glu Leu Glu Asn Ser Gin Lys Arg Leu Gin Thr
650 655 660
Leu Lys Lys Gin Tyr Glu Glu Lys Leu Met Met Leu Gin His Lys
665 670 675
He Arg Asp Thr Gin Leu Glu Arg Asp Gin Val Leu Gin Asn Leu
680 685 690
Gly Ser Val Glu Ser Tyr Ser Glu Glu Lys Ala Lys Lys Val Arg
695 700 705
Ser Glu Tyr Glu Lys Lys Leu Gin Ala Met Asn Lys Glu Leu Gin
710 715 720
Arg Leu Gin Ala Ala Gin Lys Glu His Ala Arg Leu Leu Lys Asn
725 730 735 Gin Ser Gin Tyr Glu Lys Gin Leu Lys Lys Leu Gin Gin Asp Val
740 745 750
Met Glu Met Lys Lys Thr Lys Val Arg Leu Met Lys Gin Met Lys
755 760 765 Glu Glu Gin Glu Lys Ala Arg Leu Thr Glu Ser Arg Arg Asn Arg
770 775 780 Glu He Ala Gin Leu Lys Lys Asp Gin Arg Lys Arg Asp His Gin
785 790 795
Leu Arg Leu Leu Glu Ala Gin Lys Arg Asn Gin Glu Val Val Leu
800 805 810 Arg Arg Lys Thr Glu Glu Val Thr Ala Leu Arg Arg Gin Val Arg
815 820 825 Pro Met Ser Asp Lys Val Ala Gly Lys Val Thr Arg Lys Leu Ser
830 835 840 Ser Ser Asp Ala Pro Ala Gin Asp Thr Gly Ser Ser Ala Ala Ala
845 850 855 Val Glu Thr Asp Ala Ser Arg Thr Gly Ala Gin Gin Lys Met Arg
860 865 870
He Pro Val Ala Arg Val Gin Ala Leu Pro Thr Pro Ala Thr Asn
875 880 885 Gly Asn Arg Lys Lys Tyr Gin Arg Lys Gly Leu Thr Gly Arg Val
890 895 900 Phe He Ser Lys Thr Ala Arg Met Lys Trp Gin Leu Leu Glu Arg
10/86 905 910 915
Arg Val Thr Asp He He Met Gin Lys Met Thr He Ser Asn Met
920 925 930
Glu Ala Asp Met Asn Arg Leu Leu Lys Gin Arg Glu Glu Leu Thr
935 940 945
Lys Arg Arg Glu Lys Leu Ser Lys Arg Arg Glu Lys He Val Lys
950 955 960
Glu Asn Gly Glu Gly Asp Lys Asn Val Ala Asn He Asn Glu Glu
965 970 975
Met Glu Ser Leu Thr Ala Asn He Asp Tyr He Asn Asp Ser He
980 985 990
Ser Asp Cys Gin Ala Asn He Met Gin Met Glu Glu Ala Lys Glu
995 1000 1005
Glu Gly Glu Thr Leu Asp Val Thr Ala Val He Asn Ala Cys Thr
1010 1015 1020
Leu Thr Glu Ala Arg Tyr Leu Leu Asp His Phe Leu Ser Met Gly
1025 1030 1035
He Asn Lys Gly Leu Gin Ala Ala Gin Lys Glu Ala Gin He Lys
1040 1045 1050
Val Leu Glu Gly Arg Leu Lys Gin Thr Glu He Thr Ser Ala Thr
1055 1060 1065
Gin Asn Gin Leu Leu Phe His Met Leu Lys Glu Lys Ala Glu Leu
1070 1075 1080
Asn Pro Glu Leu Asp Ala Leu Leu Gly His Ala Leu Gin Asp Asn
1085 1090 1095
Val Glu Asp Ser Thr Asp Glu Asp Ala Pro Leu Asn Ser Pro Gly
1100 1105 1110
Ser Glu Gly Ser Thr Leu Ser Ser Asp Leu Met Lys Leu Cys Gly
1115 1120 1125
Glu Val Lys Pro Lys Asn Lys Ala Arg Arg Arg Thr Thr Thr Gin
1130 1135 1140
Met Glu Leu Leu Tyr Ala Asp Ser Ser Glu Leu Ala Ser Asp Thr
1145 1150 1155
Ser Thr Gly Asp Ala Ser Leu Pro Gly Pro Leu Thr Pro Val Ala
1160 1165 1170
Glu Gly Gin Glu He Gly Met Asn Thr Glu Thr Ser Gly Thr Ser
1175 1180 1185
Ala Arg Glu Lys Glu Leu Ser Pro Pro Pro Gly Leu Pro Ser Lys
1190 1195 ■ 1200
He Gly Ser He Ser Arg Gin Ser Ser Leu Ser Glu Lys Lys He
1205 1210 1215
Pro Glu Pro Ser Pro Val Thr Arg Arg Lys Ala Tyr Glu Lys Ala
1220 1225 1230
Glu Lys Ser Lys Ala Lys Glu Gin Lys Gin Gly He He Asn Pro
1235 1240 1245
Phe Pro Ala Ser Lys Gly He Arg Ala Phe Pro Leu Gin Cys He
1250 1255 1260
His He Ala Glu Gly His Thr Lys Ala Val Leu Cys Val Asp Ser
1265 1270 1275 Thr Asp Asp Leu Leu Phe Thr Gly Ser Lys Asp Arg Thr Cys Lys
1280 1285 1290
Val Trp Asn Leu Val Thr Gly Gin Glu He Met Ser Leu Gly Gly
1295 1300 1305
His Pro Asn Asn Val Val Ser Val Lys Tyr Cys Asn Tyr Thr Ser
1310 1315 1320
Leu Val Phe Thr Val Ser Thr Ser Tyr He Lys Val Trp Asp He
1325 1330 1335 Arg Asp Ser Ala Lys Cys He Arg Thr Leu Thr Ser Ser Gly Gin
1340 1345 1350 Val Thr Leu Gly Asp Ala Cys Ser Ala Ser Thr Ser Arg Thr Val
1355 1360 1365
Ala He Pro Ser Gly Glu Asn Gin He Asn Gin He Ala Leu Asn
1370 1375 1380
11/86 Pro Thr Gly Thr Phe Leu Tyr Ala Ala Ser Gly Asn Ala Val Arg
1385 1390 ' 1395
Met Trp Asp Leu Lys Arg Phe Gin Ser Thr Gly Lys Leu Thr Gly
1400 1405 1410
His Leu Gly Pro Val Met Cys Leu Thr Val Asp Gin He Ser Ser
1415 1420 1425
Gly Gin Asp Leu He He Thr Gly Ser Lys Asp His Tyr He Lys
1430 1435 1440
Met Phe Asp Val Thr Glu Gly Ala Leu Gly Thr Val Ser Pro Thr
1445 1450 1455
His Asn Phe Glu Pro Pro His Tyr Asp Gly He Glu Ala Leu Thr
1460 1465 1470
He Gin Gly Asp Asn Leu Phe Ser Gly Ser Arg Asp Asn Gly He
1475 1480 1485
Lys Lys Trp Asp Leu Thr Gin Lys Asp Leu Leu Gin Gin Val Pro
1490 1495 1500
Asn Ala His Lys Asp Trp Val Cys Ala Leu Gly Val Val Pro Asp
1505 1510 1515
His Pro Val Leu Leu Ser Gly Cys Arg Gly Gly He Leu Lys Val
1520 1525 1530
Trp Asn Met Asp Thr Phe Met Pro Val Gly Glu Met Lys Gly His
1535 1540 1545
Asp Ser Pro He Asn Ala He Cys Val Asn Ser Thr His He Phe
1550 1555 1560
Thr Ala Ala Asp Asp Arg Thr Val Arg He Trp Lys Ala Arg Asn
1565 1570 1575
Leu Gin Asp Gly Gin He Ser Asp Thr Gly Asp Leu Gly Glu Asp
1580 1585 1590
He Ala Ser Asn
<210> 6
<211> 1267
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3467913CD1
<400> 6
Met Ala Arg Gin Pro Pro Pro Pro Trp Val His Ala Ala Phe Leu
1 5 10 15
Leu Cys Leu Leu Ser Leu Gly Gly Ala He Glu He Pro Met Asp
20 25 30
Pro Ser He Gin Asn Glu Leu Thr Gin Pro Pro Thr He Thr Lys
35 40 45
Gin Ser Ala Lys Asp His He Val Asp Pro Arg Asp Asn He Leu
50 55 60
He Glu Cys Glu Ala Lys Gly Asn Pro Ala Pro Ser Phe His Trp
65 70 75
Thr Arg Asn Ser Arg Phe Phe Asn He Ala Lys Asp Pro Arg Val
80 85 90
Ser Met Arg Arg Arg Ser Gly Thr Leu Val He Asp Phe Arg Ser
95 100 105
Gly Gly Arg Pro Glu Glu Tyr Glu Gly Glu Tyr Gin Cys Phe Ala
110 115 120
Arg Asn Lys Phe Gly Thr Ala Leu Ser Asn Arg He Arg Leu Gin
125 130 135
Val Ser Lys Ser Pro Leu Trp Pro Lys Glu Asn Leu Asp Pro Val
140 145 150
Val Val Gin Glu Gly Ala Pro Leu Thr Leu Gin Cys Asn Pro Pro
155 160 165
12/86 Pro Gly Leu Pro Ser Pro Val He Phe Trp Met Ser Ser Ser Met
170 175 180
Glu Pro He Thr Gin Asp Lys Arg Val Ser Gin Gly His Asn Gly
185 190 195
Asp Leu Tyr Phe Ser Asn Val Met Leu Gin Asp Met Gin Thr Asp
200 205 210
Tyr Ser Cys Asn Ala Arg Phe His Phe Thr His Thr He Gin Gin
215 220 225
Lys Asn Pro Phe Thr Leu Lys Val Leu Thr Asn His Pro Tyr Asn
230 235 240
Asp Ser Ser Leu Arg Asn His Pro Asp Met Tyr Ser Ala Arg Gly
245 250 255
Val Ala Glu Arg Thr Pro Ser Phe Met Tyr Pro Gin Gly Thr Ala
260 265 270
Ser Ser Gin Met Val Leu Arg Gly Met Asp Leu Leu Leu Glu Cys
275 280 285
He Ala Ser Gly Val Pro Thr Pro Asp He Ala Trp Tyr Lys Lys
290 295 300
Gly Gly Asp Leu Pro Ser Asp Lys Ala Lys Phe Glu Asn Phe Asn
305 310 315
Lys Ala Leu Arg He Thr Asn Val Ser Glu Glu Asp Ser Gly Glu
320 325 330
Tyr Phe Cys Leu Ala Ser Asn Lys Met Gly Ser He Arg His Thr
335 340 345
He Ser Val Arg Val Lys Ala Ala Pro Tyr Trp Leu Asp Glu Pro
350 355 360
Lys Asn Leu He Leu Ala Pro Gly Glu Asp Gly Arg Leu Val Cys
365 370 375
Arg Ala Asn Gly Asn Pro Lys Pro Thr Val Gin Trp Met Val Asn
380 385 390
Gly Glu Pro Leu Gin Ser Ala Pro Pro Asn Pro Asn Arg Glu Val
395 400 405
Ala Gly Asp Thr He He Phe Arg Asp Thr Gin He Ser Ser Arg
410 415 420
Ala Val Tyr Gin Cys Asn Thr Ser Asn Glu His Gly Tyr Leu Leu
425 430 435
Ala Asn Ala Phe Val Ser Val Leu Asp Val Pro Pro Arg Met Leu
440 445 450
Ser Pro Arg Asn Gin Leu He Arg Val He Leu Tyr Asn Arg Thr
455 460 465
Arg Leu Asp Cys Pro Phe Phe Gly Ser Pro He Pro Thr Leu Arg
470 475 480
Trp Phe Lys Asn Gly Gin Gly Ser Asn Leu Asp Gly Gly Asn Tyr
485 490 495
His Val Tyr Glu Asn Gly Ser Leu Glu He Lys Met He Arg Lys
500 505 510
Glu Asp Gin Gly He Tyr Thr Cys Val Ala Thr Asn He Leu Gly
515 520 525
Lys Ala Glu Asn Gin Val Arg Leu Glu Val Lys Asp Pro Thr Arg
530 535 540
He Tyr Arg Met Pro Glu Asp Gin Val Ala Arg Arg Gly Thr Thr
545 550 555
Val Gin Leu Glu Cys Arg Val Lys His Asp Pro Ser Leu Lys Leu
560 565 570
Thr Val Ser Trp Leu Lys Asp Asp Glu Pro Leu Tyr He Gly Asn
575 580 585
Arg Met Lys Lys Glu Asp Asp Ser Leu Thr He Phe Gly Val Ala
590 595 600
Glu Arg Asp Gin Gly Ser Tyr Thr Cys Val Ala Ser Thr Glu Leu
605 610 615
Asp Gin Asp Leu Ala Lys Ala Tyr Leu Thr Val Leu Ala Asp Gin
620 625 630
Ala Thr Pro Thr Asn Arg Leu Ala Ala Leu Pro Lys Gly Arg Pro
13/86 635 640' 645
Asp Arg Pro Arg Asp Leu Glu Leu Thr Asp Leu Ala Glu Arg Ser
650 655 660
Val Arg Leu Thr Trp He Pro Gly Asp Ala Asn Asn Ser Pro He
665 670 675
Thr Asp Tyr Val Val Gin Phe Glu Glu Asp Gin Phe Gin Pro Gly
680 685 690
Val Trp His Asp His Ser Lys Tyr Pro Gly Ser Val Asn Ser Ala
695 700 705
Val Leu Arg Leu Ser Pro Tyr Val Asn Tyr Gin Phe Arg Val He
710 715 720
Ala He Asn Glu Val Gly Ser Ser His Pro Ser Leu Pro Ser Glu
725 730 735
Arg Tyr Arg Thr Ser Gly Ala Pro Pro Glu Ser Asn Pro Gly Asp
740 745 750
Val Lys Gly Glu Gly Thr Arg Lys Asn Asn Met Glu He Thr Trp
755 760 765
Thr Pro Met Asn Ala Thr Ser Ala Phe Gly Pro Asn Leu Arg Tyr
770 775 780
He Val Lys Trp Arg Arg Arg Glu Thr Arg Glu Ala Trp Asn Asn
785 790 795
Val Thr Val Trp Gly Ser Arg Tyr Val Val Gly Gin Thr Pro Val
800 805 810
Tyr Val Pro Tyr Glu He Arg Val Gin Ala Glu Asn Asp Phe Gly
815 820 825
Lys Gly Pro Glu Pro Glu Ser Val He Gly Tyr Ser Gly Glu Asp
830 835 840
Tyr Pro Arg Ala Ala Pro Thr Glu Val Lys Val Arg Val Met Asn
845 850 855
Ser Thr Ala He Ser Leu Gin Trp Asn Arg Val Tyr Ser Asp Thr
860 865 870
Val Gin Gly Gin Leu Arg Glu Tyr Arg Ala Tyr Tyr Trp Arg Glu
875 880 885
Ser Ser Leu Leu Lys Asn Leu Trp Val Ser Gin Lys Arg Gin Gin
890 895 900
Ala Ser Phe Pro Gly Asp Arg Leu Arg Gly Val Val Ser Arg Leu
905 910 915
Phe Pro Tyr Ser Asn Tyr Lys Leu Glu Met Val Val Val Asn Gly
920 925 930
Arg Gly Asp Gly Pro Arg Ser Glu Thr Lys Glu Phe Thr Thr Pro
935 940 945
Glu Gly Val Pro Ser Ala Pro Arg Arg Phe Arg Val Arg Gin Pro
950 955 960
Asn Leu Glu Thr He Asn Leu Glu Trp Asp His Pro Glu His Pro
965 970 975
Asn Gly He Met He Gly Tyr Thr Leu Lys Tyr Val Ala Phe Asn
980 985 990
Gly Thr Lys Val Gly Lys Gin He Val Glu Asn Phe Ser Pro Asn
995 1000 1005
Gin Thr Lys Phe Thr Val Gin Arg Thr Asp Pro Val Ser Arg Tyr
1010 1015 1020
Arg Phe Thr Leu Ser Ala Arg Thr Gin Val Gly Ser Gly Glu Ala
1025 1030 1035
Val Thr Glu Glu Ser Pro Ala Pro Pro Asn Glu Ala Pro Pro Thr
1040 1045 ' 1050
Leu Pro Pro Thr Thr Val Gly Ala Thr Gly Ala Val Ser Ser Thr
1055 1060 1065
Asp Ala Thr Ala He Ala Ala Thr Thr Glu Ala Thr Thr Val Pro
1070 1075 1080
He He Pro Thr Val Ala Pro Thr Thr Met Ala Thr Thr Thr Thr
1085 1090 1095
Val Ala Thr Thr Thr Thr Thr Thr Ala Ala Ala Thr Thr Thr Thr
1100 1105 1110
14/86 Glu Ser Pro Pro Thr Thr Thr Ser Gly Thr Lys He His Glu Ser
1115 1120 1125
Ala Tyr Thr Asn Asn Gin Ala Asp He Ala Thr Gin Gly Trp Phe
1130 1135 1140
He Gly Leu Met Cys Ala He Ala Leu Leu Val Leu He Leu Leu
1145 1150 1155
He Val Cys Phe He Lys Arg Ser Arg Gly Gly Asn Asp Glu Asp
1160 1165 1170
Asn Lys Pro Leu Gin Gly Ser Gin Thr Ser Leu Asp Gly Thr He
1175 1180 1185
Lys Gin Gin Val Arg Glu Lys Lys Asp Val Pro Leu Gly Pro Glu
1190 1195 1200
Asp Pro Lys Glu Glu Asp Gly Ser Phe Asp Tyr Arg Cys Ser Asp
1205 1210 1215
Asp Ser Leu Val Asp Tyr Gly Glu Gly Gly Glu Gly Gin Phe Asn
1220 1225 1230
Glu Asp Gly Ser Phe He Gly Gin Tyr Thr Val Lys Lys Asp Lys
1235 1240 1245
Glu Glu Thr Glu Gly Asn Glu Ser Ser Glu Ala Thr Ser Pro Val
1250 1255 ' 1260
Asn Ala He Tyr Ser Leu Ala
1265
<210> 7
<211> 1359
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7495062CD1
<400> 7
Met Ala Arg Gin Pro Pro Pro Pro Trp Val His Ala Ala Phe Leu
1 5 10 15
Leu Cys Leu Leu Ser Leu Gly Gly Ala He Glu He Pro Met Asp
20 25 30
Pro Ser He Gin Asn Glu Leu Thr Gin Pro Pro Thr He Thr Lys
35 40 45
Gin Ser Ala Lys Asp His He Val Asp Pro Arg Asp Asn He Leu
50 55 60
He Glu Cys Glu Ala Lys Gly Asn Pro Ala Pro Ser Phe His Trp
65 70 75
Thr Arg Asn Ser Arg Phe Phe Asn He Ala Lys Asp Pro Arg Val
80 85 90
Ser Met Arg Arg Arg Ser Gly Thr Leu Val He Asp Phe Arg Ser
95 100 105
Gly Gly Arg Pro Glu Glu Tyr Glu Gly Glu Tyr Gin Cys Phe Ala
110 . 115 120
Arg Asn Lys Phe Gly Thr Ala Leu Ser Asn Arg He Arg Leu Gin
125 130 135
Val Ser Lys Ser Pro Leu Trp Pro Lys Glu Asn Leu Asp Pro Val
140 145 150
Val Val Gin Glu Gly Ala Pro Leu Thr Leu Gin Cys Asn Pro Pro
155 160 165
Pro Gly Leu Pro Ser Pro Val He Phe Trp Met Ser Ser Ser Met
170 175 180
Glu Pro He Thr Gin Asp Lys Arg Val Ser Gin Gly His Asn Gly
185 190 195
Asp Leu Tyr Phe Ser Asn Val Met Leu Gin Asp Met Gin Thr Asp
200 205 210
Tyr Ser Cys Asn Ala Arg Phe His Phe Thr His Thr He Gin Gin
215 220 225
15/86 Lys Asn Pro Phe Thr Leu Lys Val Leu Thr Asn His Pro Tyr Asn
230 235 240
Asp Ser Ser Leu Arg Asn His Pro Asp Met Tyr Ser Ala Arg Gly
245 250 255
Val Ala Glu Arg Thr Pro Ser Phe Met Tyr Pro Gin Gly Thr Ala
260 265 270
Ser Ser Gin Met Val Leu Arg Gly Met Asp Leu Leu Leu Glu Cys
275 280 285
He Ala Ser Gly Val Pro Thr Pro Asp He Ala Trp Tyr Lys Lys
290 295 300
Gly Gly Asp Leu Pro Ser Asp Lys Ala Lys Phe Glu Asn Phe Asn
305 310 315
Lys Ala Leu Arg He Thr Asn Val Ser Glu Glu Asp Ser Gly Glu
320 325 330
Tyr Phe Cys Leu Ala Ser Asn Lys Met Gly Ser He Arg His Thr
335 340 345
He Ser Val Arg Val Lys Ala Ala Pro Tyr Trp Leu Asp Glu Pro
350 355 360
Lys Asn Leu He Leu Ala Pro Gly Glu Asp Gly Arg Leu Val Cys
365 370 375
Arg Ala Asn Gly Asn Pro Lys Pro Thr Val Gin Trp Met Val Asn
380 385 390
Gly Glu Pro Leu Gin Ser Ala Pro Pro Asn Pro Asn Arg Glu Val
395 400 405
Ala Gly Asp Thr He He Phe Arg Asp Thr Gin He Ser Ser Arg
410 415 420
Ala Val Tyr Gin Cys Asn Thr Ser Asn Glu His Gly Tyr Leu Leu
425 430 435
Ala Asn Ala Phe Val Ser Val Leu Asp Val Pro Pro Arg Met Leu
440 445 450
Ser Pro Arg Asn Gin Leu He Arg Val He Leu Tyr Asn Arg Thr
455 460 465
Arg Leu Asp Cys Pro Phe Phe Gly Ser Pro He Pro Thr Leu Arg
470 475 480
Trp Phe Lys Asn Gly Gin Gly Ser Asn Leu Asp Gly Gly Asn Tyr
485 490 495
His Val Tyr Glu Asn Gly Ser Leu Glu He Lys Met He Arg Lys
500 505 510
Glu Asp Gin Gly He Tyr Thr Cys Val Ala Thr Asn He Leu Gly
515 520 525
Lys Ala Glu Asn Gin Val Arg Leu Glu Val Lys Asp Pro Thr Arg
530 535 540
He Tyr Arg Met Pro Glu Asp Gin Val Ala Arg Arg Gly Thr Thr
545 550 555
Val Gin Leu Glu Cys Arg Val Lys His Asp Pro Ser Leu Lys Leu
560 565 570
Thr Val Ser Trp Leu Lys Asp Asp Glu Pro Leu Tyr He Gly Asn
575 580 585
Arg Met Lys Lys Glu Asp Asp Ser Leu Thr He Phe Gly Val Ala
590 595 600
Glu Arg Asp Gin Gly Ser Tyr Thr Cys Val Ala Ser Thr Glu Leu
605 610 615
Asp Gin Asp Leu Ala Lys Ala Tyr Leu Thr Val Leu Ala Asp Gin
620 625 630
Ala Thr Pro Thr Asn Arg Leu Ala Ala Leu Pro Lys Gly Arg Pro
635 640 645
Asp Arg Pro Arg Asp Leu Glu Leu Thr Asp Leu Ala Glu Arg Ser
650 655 660
Val Arg Leu Thr Trp He Pro Gly Asp Ala Asn Asn Ser Pro He
665 670 675
Thr Asp Tyr Val Val Gin Phe Glu Glu Asp Gin Phe Gin Pro Gly
680 685 690
Val Trp His Asp His Ser Lys Tyr Pro Gly Ser Val Asn Ser Ala
16/86 695 700 705
Val Leu Arg Leu Ser Pro Tyr Val Asn Tyr Gin Phe Arg Val He
710 715 720
Ala He Asn Glu Val Gly Ser Ser His Pro Ser Leu Pro Ser Glu
725 730 735
Arg Tyr Arg Thr Ser Gly Ala Pro Pro Glu Ser Asn Pro Gly Asp
740 745 750.
Val Lys Gly Glu Gly Thr Arg Lys Asn Asn Met Glu He Thr Trp
755 760 765
Thr Pro Met Asn Ala Thr Ser Ala Phe Gly Pro Asn Leu Arg Tyr
770 775 780
He Val Lys Trp Arg Arg Arg Glu Thr Arg Glu Ala Trp Asn Asn
785 790 795
Val Thr Val Trp Gly Ser Arg Tyr Val Val Gly Gin Thr Pro Val
800 ' 805 810
Tyr Val Pro Tyr Glu He Arg Val Gin Ala Glu Asn Asp Phe Gly
815 820 825
Lys Gly Pro Glu Pro Glu Ser Val He Gly Tyr Ser Gly Glu Asp
830 835 840
Tyr Pro Arg Ala Ala Pro Thr Glu Val Lys Val Arg Val Met Asn
845 850 855
Arg Thr Ala He Ser Leu Gin Trp Asn Arg Val Tyr Ser Asp Thr
860 865 870
Val Gin Gly Gin Leu Arg Glu Tyr Arg Ala Tyr Tyr Trp Arg Glu
875 880 885
Ser Ser Leu Leu Lys Asn Leu Trp Val Ser Gin Lys Arg Gin Gin
890 895 900
Ala Ser Phe Pro Gly Asp Arg Leu Arg Gly Val Val Ser Arg Leu
905 910 915
Phe Pro Tyr Ser Asn Tyr Lys Leu Glu Met Val Val Val Asn Gly
920 925 930
Arg Gly Asp Gly Pro Arg Ser Glu Thr Lys Glu Phe Thr Thr Pro
935 940 945
Glu Gly Val Pro Ser Ala Pro Arg Arg Phe Arg Val Arg Gin Pro
950 955 960
Asn Leu Glu Thr He Asn Leu Glu Trp Asp His Pro Glu His Pro
965 970 975
Asn Gly He Met He Gly Tyr Thr Leu Lys Tyr Val Ala Phe Asn
980 985 990
Gly Thr Lys Val Gly Lys Gin He Val Glu Asn Phe Ser Pro Asn
995 1000 1005
Gin Thr Lys Phe Thr Val Gin Arg Thr Asp Pro Val Ser Arg Tyr
1010 1015 1020
Arg Phe Thr Leu Ser Ala Arg Thr Gin Val Gly Ser Gly Glu Ala
1025 1030 1035
Val Thr Glu Glu Ser Pro Ala Pro Pro Asn Glu Ala Pro Pro Thr
1040 1045 1050
Leu Pro Pro Thr Thr Val Gly Ala Thr Gly Ala Val Ser Ser Thr
1055 1060 1065
Asp Ala Thr Ala He Ala Ala Thr Thr Glu Ala Thr Thr Val Pro
1070 1075 1080
He He Pro Thr Val Ala Pro Thr Thr Met Ala Thr Thr Thr Thr
1085 1090 1095
Val Ala Thr Thr Thr Thr Thr Thr Ala Ala Ala Thr Thr Thr Thr
1100 1105 1110
Glu Ser Pro Pro Thr Thr Thr Ser Gly Thr Lys He His Glu Ser
1115 1120 1125
Ala Pro Asp Glu Gin Ser He Trp Asn Val Thr Val Leu Pro Asn
1130 1135 1140
Ser Lys Trp Ala Asn He Thr Trp Lys His Asn Phe Gly Pro Gly
1145 1150 1155
Thr Asp Phe Val Val Glu Tyr He Asp Ser Asn His Thr Lys Lys
1160 1165 1170
17/86 Thr Val Pro Val Lys Ala Gin Ala Gin Pro He Gin Leu Thr Asp
1175 1180 1185
Leu Tyr Pro Gly Met Thr Tyr Thr Leu Arg Val Tyr Ser Arg Asp
1190 1195 1200
Asn Glu Gly He Ser Ser Thr Val He Thr Phe Met Thr Ser Thr
1205 1210 1215
Ala Tyr Thr Asn Asn Gin Ala Asp He Ala Thr Gin Gly Trp Phe
1220 1225 1230
He Gly Leu Met Cys Ala He Ala Leu Leu Val Leu He Leu Leu
1235 1240 1245
He Val Cys Phe He Lys Arg Ser Arg Gly Gly Lys Tyr Pro Val
1250 1255 1260
Arg Glu Lys Lys Asp Val Pro Leu Gly Pro Glu Asp Pro Lys Glu
1265 1270 1275
Glu Asp Gly Ser Phe Asp Tyr Ser Asp Glu Asp Asn Lys Pro Leu
1280 1285 1290
Gin Gly Ser Gin Thr Ser Leu Asp Gly Thr He Lys Gin Gin Glu
1295 1300 1305
Ser Asp Asp Ser Leu Val Asp Tyr Gly Glu Gly Gly Glu Gly Gin
1310 1315 1320
Phe Asn Glu Asp Gly Ser Leu He Gly Gin Tyr Thr Val Lys Lys
1325 1330 1335
Asp Lys Glu Glu Thr Glu Gly Asn Glu Ser Ser Glu Ala Thr Ser
1340 1345 1350
Pro Val Asn Ala He Tyr Ser Leu Ala
1355
<210> 8 i
<211> 452
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 284191CD1
<400> 8
Met Ser Ala Ser Leu Asn Tyr Lys Ser Phe Ser Lys Glu Gin Gin
1 5 10 15
Thr Met Asp Asn Leu Glu Lys Gin Leu He Cys Pro He Cys Leu
20 25 30
Glu Met Phe Thr Lys Pro Val Val He Leu Pro Cys Gin His Asn
35 40 45
Leu Cys Arg Lys Cys Ala Ser Asp He Phe Gin Ala Ser Asn Pro
50 55 60
Tyr Leu Pro Thr Arg Gly Gly Thr Thr Met Ala Ser Gly Gly Arg
65 70 75
Phe Arg Cys Pro Ser Cys Arg His Glu Val Val Leu Asp Arg His
80 85 90
Gly Val Tyr Gly Leu Gin Arg Asn Leu Leu Val Glu Asn He He
95 100 105
Asp He Tyr Lys Gin Glu Ser Thr Arg Pro Glu Lys Lys Ser Asp
110 115 120
Gin Pro Met Cys Glu Glu His Glu Glu Glu Arg He Asn He Tyr
125 130 135
Cys Leu Asn Cys Glu Val Pro Thr Cys Ser Leu Cys Lys Val Phe
140 145 150
Gly Ala His Lys Asp Cys Gin Val Ala Pro Leu Thr His Val Phe
155 160 165
Gin Arg Gin Lys Ser Glu Leu Ser Asp Gly He Ala He Leu Val
170 175 180
Gly Ser Asn Asp Arg Val Gin Gly Val He Ser Gin Leu Glu Asp
185 190 195
18/86 Thr Cys Lys Thr He Glu Glu Cys Cys Arg Lys Gin Lys Gin Glu
200 205 210
Leu Cys Glu Lys Phe Asp Tyr Leu Tyr Gly He Leu Glu Glu Arg
215 220 225
Lys Asn Glu Met Thr Gin Val He Thr Arg Thr Gin Glu Glu Lys
230 235 240
Leu Glu His Val Arg Ala Leu He Lys Lys Tyr Ser Asp His Leu
245 250 255
Glu Asn Val Ser Lys Leu Val Glu Ser Gly He Gin Phe Met Asp
260 265 270
Glu Pro Glu Met Ala Val Phe Leu Gin Asn Ala Lys Thr Leu Leu
275 280 285
Lys Lys He Ser Glu Ala Ser Lys Ala Phe Gin Met Glu Lys He
290 295 300
Glu His Gly Tyr Glu Asn Met Asn His Phe Thr Val Asn Leu Asn
305 310 315
Arg Glu Glu Lys He He Arg Glu He Asp Phe Tyr Arg Glu Asp
320 325 330
Glu Asp Glu Glu Glu Glu Glu Gly Gly Glu Gly Glu Lys Glu Gly
335 340 345
Glu Gly Glu Val Gly Gly Glu Ala Val Glu Val Glu Glu Val Glu
350 355 360
Asn Val Gin Thr Glu Phe Pro Gly Glu Asp Glu Asn Pro Glu Lys
365 370 375
Ala Ser Glu Leu Ser Gin Val Glu Leu Gin Ala Ala Pro Gly Ala
380 385 390
Leu Pro Val Ser Ser Pro Glu Pro Pro Pro Ala Leu Pro Pro Ala
395 400 405
Ala Asp Ala Pro Val Thr Gin He Gly Phe Glu Ala Pro Pro Leu
410 415 420
Gin Gly Gin Ala Ala Ala Pro Ala Ser Gly Ser Gly Ala Asp Ser
425 430 435
Glu Pro Ala Arg His He Phe Ser Phe Ser Trp Leu Asn Ser Leu
440 445 450
Asn Glu
<210> 9
<211> 471
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2361681CD1
<400> 9
Met Ser Arg Arg Val Val Arg Gin Ser Lys Phe Arg His Val Phe
1 5 10 15
Gly Gin Ala Ala Lys Ala Asp Gin Ala Tyr Glu Asp He Arg Val
20 25 30
Ser Lys Val Thr Trp Asp Ser Ser Phe Cys Ala Val Asn Pro Lys
35 40 45
Phe Leu Ala He He Val Glu Ala Gly Gly Gly Gly Ala Phe He
50 55 60
Val Leu Pro Leu Ala Lys Thr Gly Arg Val Asp Lys Asn Tyr Pro
65 70 75
Leu Val Thr Gly His Thr Ala Pro Val Leu Asp He Asp Trp Cys
80 85 90
Pro His Asn Asp Asn Val He Ala Ser Ala Ser Asp Asp Thr Thr
95 100 105
He Met Val Trp Gin He Pro Asp Tyr Thr Pro Met Arg Asn He
110 115 120
19/86 Thr Glu Pro He He Thr Leu Glu Gly His Ser Lys Arg Val Gly
125 130 135
He Leu Ser Trp His Pro Thr Ala Arg Asn Val Leu Leu Ser Ala
140 145 150
Gly Gly Asp Asn Val He He He Trp Asn Val Gly Thr Gly Glu
155 160 165
Val Leu Leu Ser Leu Asp Asp Met His Pro Asp Val He His Ser
170 175 180
Val Cys Trp Asn Ser Asn Gly Ser Leu Leu Ala Thr Thr Cys Lys
185 190 195
Asp Lys Thr Leu Arg He He Asp Pro Arg Lys Gly Gin Val Val
200 205 210
Ala Glu Arg Phe Ala Ala His Glu Gly Met Arg Pro Met Arg Ala
215 220 225
Val Phe Thr Arg Gin Gly His He Phe Thr Thr Gly Phe Thr Arg
230 235 240
Met Ser Gin Arg Glu Leu Gly Leu Trp Asp Pro Asn Asn Phe Glu
245 250 255
Glu Pro Val Ala Leu Gin Glu Met Asp Thr Ser Asn Gly Val Leu
260 265 270
Leu Pro Phe Tyr Asp Pro Asp Ser Ser He Val Tyr Leu Cys Gly
275 280 285
Lys Gly Asp Ser Ser He Arg Tyr Phe Glu He Thr Asp Glu Pro
290 295 300
Pro Phe Val His Tyr Leu Asn Thr Phe Ser Ser Lys Glu Pro Gin
305 310 315
Arg Gly Met Gly Phe Met Pro Lys Arg Gly Leu Asp Val Ser Lys
320 325 330
Cys Glu He Ala Arg Phe Tyr Lys Leu His Glu Arg Lys Cys Glu
335 340 345
Pro He He Met Thr Val Pro Arg Lys Ser Asp Leu Phe Gin Asp
350 355 360
Asp Leu Tyr Pro Asp Thr Pro Gly Pro Glu Pro Ala Leu Glu Ala
365 370 375
Asp Glu Trp Leu Ser Gly Gin Asp Ala Glu Pro Val Leu He Ser
380 385 390
Leu Arg Asp Gly Tyr Val Pro Pro Lys His Arg Glu Leu Arg Val
395 400 405
Thr Lys Arg Asn He Leu Asp Val Arg Pro Pro Ser Gly Pro Arg
410 415 420
Arg Ser Gin Ser Ala Ser Asp Ala Pro Leu Ser Gin His Thr Leu
425 430' 435
Glu Thr Leu Leu Glu Glu He Lys Ala Leu Arg Glu Arg Val Gin
440 445 450
Ala Gin Glu Gin Arg He Thr Ala Leu Glu Asn Met Leu Cys Glu
455 460 465
Leu Val Asp Gly Thr Asp
470
<210> 10
<211> 705
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 1683662CD1
<400> 10
Met Thr He Glu Asp Leu Pro Asp Phe Pro Leu Glu Gly Asn Pro
1 5 10 15
Leu Phe Gly Arg Tyr Pro Phe He Phe Ser Ala Ser Asp Thr Pro
20 25 30
20/86 Val He Phe Ser He Ser Ala Ala Pro Met Pro Ser Asp Cys Glu
35 40 45
Phe Ser Phe Phe Asp Pro Asn Asp Ala Ser Cys Gin Glu He Leu
50 55 60
Phe Asp Pro Lys Thr Ser Val Ser Glu Leu Phe Ala He Leu Arg
65 70 75
Gin Trp Val Pro Gin Val Gin Gin Asn He Asp He He Gly Asn
80 85 90
Glu He Leu Lys Arg Gly Cys Asn Val Asn Asp Arg Asp Gly Leu
95 100 105
Thr Asp Met Thr Leu Leu His Tyr Thr Cys Lys Ser Gly Ala His
110 115 120
Gly He Gly Asp Val Glu Thr Ala Val Lys Phe Ala Thr Gin Leu
125 130 135
He Asp Leu Gly Ala Asp He Ser Leu Arg Ser Arg Trp Thr Asn
140 145 150
Met Asn Ala Leu His Tyr Ala Ala Tyr Phe Asp Val Pro Glu Leu
155 160 165
He Arg Val He Leu Lys Thr Ser Lys Pro Lys Asp Val Asp Ala
170 175 180
Thr Cys Ser Asp Phe Asn Phe Gly Thr Ala Leu His He Ala Ala
185 190 195
Tyr Asn Leu Cys Ala Gly Ala Val Lys Cys Leu Leu Glu Gin Gly
200 205 210
Ala Asn Pro Ala Phe Arg Asn Asp Lys Gly Gin He Pro Ala Asp
215 220 225
Val Val Pro Asp Pro Val Asp Met Pro Leu Glu Met Ala Asp Ala
230 235 240
Ala Ala Thr Ala Lys Glu He Lys Gin Met Leu Leu Asp Ala Val
245 250 255
Pro Leu Ser Cys Asn He Ser Lys Ala Met Leu Pro Asn Tyr Asp
260 265 270
His Val Thr Gly Lys Ala Met Leu Thr Ser Leu Gly Leu Lys Leu
275 280 285
Gly Asp Arg Val Val He Ala Gly Gin Lys Val Gly Thr Leu Arg
290 295 300
Phe Cys Gly Thr Thr Glu Phe Ala Ser Gly Gin Trp Ala Gly He
305 310 315
Glu Leu Asp Glu Pro Glu Gly Lys Asn Asn Gly Ser Val Gly Lys
320 325 330
Val Gin Tyr Phe Lys Cys Ala Pro Lys Tyr Gly He Phe Ala Pro
335 340 345
Leu Ser Lys He Ser Lys Ala Lys Gly Arg Arg Lys Asn He Thr
350 355 360
His Thr Pro Ser Thr Lys Ala Ala Val Pro Leu He Arg Ser Gin
365 370 375
Lys He Asp Val Ala His Val Thr Ser Lys Val Asn Thr Gly Leu
380 385 390
Met Thr Ser Lys Lys Asp Ser Ala Ser Glu Ser Thr Leu Ser Leu
395 400 405
Pro Pro Gly Glu Glu Leu Lys Thr Val Thr Glu Lys Asp Val Ala
410 415 420
Leu Leu Gly Ser Val Ser Ser Cys Ser Ser Thr Ser Ser Leu Glu
425 430 435
His Arg Gin Ser Tyr Pro Lys Lys Gin Asn Ala He Ser Ser Asn
440 445 450
Lys Lys Thr Met Ser Lys Ser Pro Ser Leu Ser Ser Arg Ala Ser
455 460 465
Ala Gly Leu Asn Ser Ser Ala Thr Ser Thr Ala Asn Asn Ser Arg
470 475 480
Cys Glu Gly Glu Leu Arg Leu Gly Glu Arg Val Leu Val Val Gly
485 490 495
Gin Arg Leu Gly Thr He Arg Phe Phe Gly Thr Thr Asn Phe Ala
21/86 500 505 510
Pro Gly Tyr Trp Tyr Gly He Glu Leu Glu Lys Pro His Gly Lys
515 520 525
Asn Asp Gly Ser Val Gly Gly Val Gin Tyr Phe Ser Cys Ser Pro
530 535 540
Arg Tyr Gly He Phe Ala Pro Pro Ser Arg Val Gin Arg Val Thr
545 550 555
Asp Ser Leu Asp Thr Leu Ser Glu He Ser Ser Asn Lys Gin Asn
560 565 570
His Ser Tyr Pro Gly Phe Arg Arg Ser Phe Ser Thr Thr Ser Ala
575 580 r 585
Ser Ser Gin Lys Glu He Asn Arg Arg Asn Ala Phe Ser Lys Ser
590 595 600
Lys Ala Ala Leu Arg Arg Ser Trp Ser Ser Thr Pro Thr Ala Gly
605 610 615
Gly He Glu Gly Ser Val Lys Leu His Glu Gly Ser Gin Val Leu
620 625 630
Leu Thr Ser Ser Asn Glu Met Gly Thr Val Arg Tyr Val Gly Pro
635 640 645
Thr Asp Phe Ala Ser Gly He Trp Leu Gly Leu Glu Leu Arg Ser
650 655 660
Ala Lys Gly Lys Asn Asp Gly Ser Val Gly Asp Lys Arg Tyr Phe
665 670 675
Thr Cys Lys Pro Asn His Gly Val Leu Val Arg Pro Ser Arg Val
680 685 690
Thr Tyr Arg Gly He Asn Gly Ser Lys Leu Val Asp Glu Asn Cys
695 700 705
<210> 11
<211> 997
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3750444CD1
<400> 11
Met Leu Asn Asn He Ser Gly Asp Val Leu Val Ala Ala Gly Phe
1 5 10 15
Val Ala Tyr Leu Gly Pro Phe Thr Gly Gin Tyr Arg Thr Val Leu
20 25 30
Tyr Asp Ser Trp Val Lys Gin Leu Arg Ser His Asn Val Pro His
35 40 45
Thr Ser Glu Pro Thr Leu He Gly Thr Leu Gly Asn Pro Val Lys
50 55 60
He Arg Ser Trp Gin He Ala Gly Leu Pro Asn Asp Thr Leu Ser
65 70 75
Val Glu Asn Gly Val He Asn Gin Phe Ser Gin Arg Trp Thr His
80 85 90
Phe He Asp Pro Gin Ser Gin Ala Asn Lys Trp He Lys Asn Met
95 100 105
Glu Lys Asp Asn Gly Leu Asp Val Phe Lys Leu Ser Asp Arg Asp
110 115 120
Phe Leu Arg Ser Met Glu Asn Ala He Arg Phe Gly Lys Pro Cys
125 130 135
Leu Leu Glu Asn Val Gly Glu Glu Leu Asp Pro Ala Leu Glu Pro
140 145 150
Val Leu Leu Lys Gin Thr Tyr Lys Gin Gin Gly Asn Thr Val Leu
155 160 165
Lys Leu Gly Asp Thr Val He Pro Tyr His Glu Asp Phe Arg Met
170 175 180
22/86 Tyr He Thr Thr Lys Leu Pro Asn Pro His Tyr Thr Pro Glu He
185 190 195
Ser Thr Lys Leu Thr Leu He Asn Phe Thr Leu Ser Pro Ser Gly
200 205 210
Leu Glu Asp Gin Leu Leu Gly Gin Val Val Ala Glu Glu Arg Pro
215 220 225
Asp Leu Glu Glu Ala Lys Asn Gin Leu He He Ser Asn Ala Lys
230 235 240
Met Arg Gin Glu Leu Lys Asp He Glu Asp Gin He Leu Tyr Arg
245 250 255
Leu Ser Ser Ser Glu Gly Asn Pro Val Asp Asp Met Glu Leu He
260 265 270
Lys Val Leu Glu Ala Ser Lys Met Lys Ala Ala Glu He Gin Ala
275 280 285
Lys Val Arg He Ala Glu Gin Thr Glu Lys Asp He Asp Leu Thr
290 295 300
Arg Met Glu Tyr He Pro Val Ala He Arg Thr Gin He Leu Phe
305 310 315
Phe Cys Val Ser Asp Leu Ala Asn Val Asp Pro Met Tyr Gin Tyr
320 325 330
Ser Leu Glu Trp Phe Leu Asn He Phe Leu Ser Gly He Ala Asn
335 340 345
Ser Glu Arg Ala Asp Asn Leu Lys Lys Arg He Ser Asn He Asn
350 355 360
Arg Tyr Leu Thr Tyr Ser Leu Tyr Ser Asn Val Cys Arg Ser Leu
365 370 375
Phe Glu Lys His Lys Leu Met Phe Ala Phe Leu Leu Cys Val Arg
380 385 390
He Met Met Asn Glu Gly Lys He Asn Gin Ser Glu Trp Arg Tyr
395 400 405
Leu Leu Ser Gly Gly Ser He Ser He Met Thr Glu Asn Pro Ala
410 415 420
Pro Asp Trp Leu Ser Asp Arg Ala Trp Arg Asp He Leu Ala Leu
425 430 435
Ser Asn Leu Pro Thr Phe Ser Ser Phe Ser Ser Asp Phe Val Lys
440 445 450
His Leu Ser Glu Phe Arg Val He Phe Asp Ser Leu Glu Pro His
455 460 465
Arg Glu Pro Leu Pro Gly He Trp Asp Gin Tyr Leu Asp Gin Phe
470 475 480
Gin Lys Leu Leu Val Leu Arg Cys Leu Arg Gly Asp Lys Val Thr
485 490 495
Asn Ala Met Gin Asp Phe Val Ala Thr Asn Leu Glu Pro Arg Phe
500 505 510
He Glu Pro Gin Thr Ala Asn Leu Ser Val Val Phe Lys Asp Ser
515 520 525
Asn Ser Thr Thr Pro Leu He Phe Val Leu Ser Pro Gly Thr Asp
530 535 540
Pro Ala Ala Asp Leu Tyr Lys Phe Ala Glu Glu Met Lys Phe Ser
545 550 555
Lys Lys Leu Ser Ala He Ser Leu Gly Gin Gly Gin Gly Pro Arg
560 565 570
Ala Glu Ala Met Met Arg Ser Ser He Glu Arg Gly Lys Trp Val
575 580 585
Phe Phe Gin Asn Cys His Leu Ala Pro Ser Trp Met Pro Ala Leu
590 595 600
Glu Arg Leu He Glu His He Asn Pro Asp Lys Val His Arg Asp
605 610 615
Phe Arg Leu Trp Leu Thr Ser Leu Pro Ser Asn Lys Phe Pro Val
620 625 630
Ser He Leu Gin Asn Gly Ser Lys Met Thr He Glu Pro Pro Arg
635 640 645
Gly Val Arg Ala Asn Leu Leu Lys Ser Tyr Ser Ser Leu Gly Glu
23/86 650 655 660
Asp Phe Leu Asn Ser Cys His Lys Val Met Glu Phe Lys Ser Leu
665 670 675
Leu Leu Ser Leu Cys Leu Phe His Gly Asn Ala Leu Glu Arg Arg
680 685 690
Lys Phe Gly Pro Leu Gly Phe Asn He Pro Tyr Glu Phe Thr Asp
695 700 705
Gly Asp Leu Arg He Cys He Ser Gin Leu Lys Met Phe Leu Asp
710 715 720
Glu Tyr Asp Asp He Pro Tyr Lys Val Leu Lys Tyr Thr Ala Gly
725 730 735
Glu He Asn Tyr Gly Gly Arg Val Thr Asp Asp Trp Asp Arg Arg
740 745 750
Cys He Met Asn He Leu Glu Asp Phe Tyr Asn Pro Asp Val Leu
755 760 765
Ser Pro Glu His Ser Tyr Ser Ala Ser Gly He Tyr His Gin He
770 775 780
Pro Pro Thr Tyr Asp Leu His Gly Tyr Leu Ser Tyr He Lys Ser
785 790 795
Leu Pro Leu Asn Asp Met Pro Glu He Phe Gly Leu His Asp Asn
800 805 810
Ala Asn He Thr Phe Ala Gin Asn Glu Thr Phe Ala Leu Leu Gly
815 820 825
Thr He He Gin Leu Gin Pro Lys Ser Ser Ser Ala Gly Ser Gin
830 835 840
Gly Arg Glu Glu He Val Glu Asp Val Thr Gin Asn He Leu Leu
845 850 855
Lys Val Pro Glu Pro He Asn Leu Gin Trp Val Met Ala Lys Tyr
860 865 870
Pro Val Leu Tyr Glu Glu Ser Met Asn Thr Val Leu Val Gin Glu
875 880 885
Val He Arg Tyr Asn Arg Leu Leu Gin Val He Thr Gin Thr Leu
890 895 900
Gin Asp Leu Leu Lys Ala Leu Lys Gly Leu Val Val Met Ser Ser
905 910 915
Gin Leu Glu Leu Met Ala Ala Ser Leu Tyr Asn Asn Thr Val Pro
920 925 930
Glu Leu Trp Ser Ala Lys Ala Tyr Pro Ser Leu Lys Pro Leu Ser
935 940 945
Ser Trp Val Met Asp Leu Leu Gin Arg Leu Asp Phe Leu Gin Ala
950 955 960
Trp He Gin Asp Gly He Pro Ala Val Phe Trp He Ser Gly Phe
965 970 975
Phe Phe Pro Gin Ala Cys Leu Asn Arg His Ser Ala Glu Phe Cys
980 985 990 Pro Gin He Cys His Leu His
995
<210> 12
<211> 1360
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 5500608CD1
<400> 12
Met Ala Lys Trp Thr He Leu His Leu Ala Asn Leu Ser Ser His
1 5 10 15
Leu Lys Thr Leu Ser Gin Gly Ser Tyr Leu Tyr Leu Lys Leu Thr
20 25 30
Phe Asp Leu He Glu Lys Gly Tyr Leu Val Leu Lys Ser Ser Ser
24/86 35 40 45
Tyr Lys Val Val Pro Val Ser Leu Ser Glu Val Tyr Leu Leu Gin
50 55 60
Cys Asn Met Lys Phe Pro Thr Gin Ser Ser Phe Asp Arg Val Met
65 70 75
Pro Leu Leu Asn Val Ala Val Ala Ser Leu His Pro Leu Thr Asp
80 85 90
Glu His He Phe Gin Ala He Asn Ala Gly Ser He Glu Gly Thr
95 100 105
Leu Glu Trp Glu Asp Phe Gin Gin Arg Met Glu Asn Leu Ser Met
110 115 120
Phe Leu He Lys Arg Arg Asp Met Thr Arg Met Phe Val His Pro
125 130 135
Ser Phe Arg Glu Trp Leu He Trp Arg Glu Glu Gly Glu Lys Thr
140 145 150
Lys Phe Leu Cys Asp Pro Arg Ser Gly His Thr Leu Leu Ala Phe
155 160 165
Trp Phe Ser Arg Gin Glu Gly Lys Leu Asn Arg Gin Gin Thr He
170 175 180
Glu Leu Gly His His He Leu Lys Ala His He Phe Lys Gly Leu
185 190 195
Ser Lys Lys Val Gly Val Ser Ser Ser He Leu Gin Gly Leu Trp
200 205 210
He Ser Tyr Ser Thr Glu Gly Leu Ser Met Ala Leu Ala Ser Leu
215 220 225
Arg Asn Leu Tyr Thr Pro Asn He Lys Val Ser Arg Leu Leu He
230 235 240
Leu Gly Gly Ala Asn He Asn Tyr Arg Thr Glu Val Leu Asn Asn
245 250 255
Ala Pro He Leu Cys Val Gin Ser His Leu Gly Tyr Thr Glu Met
260 265 270
Val Ala Leu Leu Leu Glu Phe Gly Ala Asn Val Asp Ala Ser Ser
275 280 285
Glu Ser Gly Leu Thr Pro Leu Gly Tyr Ala Ala Ala Ala Gly Tyr
290 295 300
Leu Ser He Val Val Leu Leu Cys Lys Lys Arg Ala Lys Val Asp
305 310 315
His Leu Asp Lys Asn Gly Gin Cys Ala Leu Val His Ala Ala Leu
320 325 330
Arg Gly His Leu Glu Val Val Lys Phe Leu He Gin Cys Asp Trp
335 340 345
Thr Met Ala Gly Gin Gin Gin Gly Val Phe Lys Lys Ser His Ala
350 355 360
He Gin Gin Ala Leu He Ala Ala Ala Ser Met Gly Tyr Thr Glu
365 370 375
He Val Ser Tyr Leu Leu Asp Leu Pro Glu Lys Asp Glu Glu Glu
380 385 390
Val Glu Arg Ala Gin He Asn Ser Phe Asp Ser Leu Trp Gly Glu
395 400 405
Thr Ala Leu Thr Ala Ala Ala Gly Arg Gly Lys Leu Glu Val Cys
410 415 420
Arg Leu Leu Leu Glu Gin Gly Ala Ala Val Ala Gin Pro Asn Arg
425 430 435
Arg Gly Ala Val Pro Leu Phe Ser Thr Val Arg Gin Gly His Trp
440 445 450
Gin He Val Asp Leu Leu Leu Thr His Gly Ala Asp Val Asn Met
455 460 465
Ala Asp Lys Gin Gly Arg Thr Pro Leu Met Met Ala Ala Ser Glu
470 475 480
Gly His Leu Gly Thr Val Asp Phe Leu Leu Ala Gin Gly Ala Ser
485 490 495
He Ala Leu Met Asp Lys Glu Gly Leu Thr Ala Leu Ser Trp Ala
500 505 510
25/86 Cys Leu Lys Gly His Leu Ser Val Val Arg Ser Leu Val Asp Asn
515 520 525
Gly Ala Ala Thr Asp His Ala Asp Lys Asn Gly Arg Thr Pro Leu
530 535 540
Asp Leu Ala Ala Phe Tyr Gly Asp Ala Glu Val Val Gin Phe Leu
545 550 555
Val Asp His Gly Ala Met He Glu His Val Asp Tyr Ser Gly Met
560 565 570
Arg Pro Leu Asp Arg Ala Val Gly Cys Arg Asn Thr Ser Val Val
575 580 585
Val Thr Leu Leu Lys Lys Gly Ala Lys He Gly Pro Ala Thr Trp
590 595 600
Ala Met Ala Thr Ser Lys Pro Asp He Met He He Leu Leu Ser
605 610 615
Lys Leu Met Glu Glu Gly Asp Met Phe Tyr Lys Lys Gly Lys Val
620 625 630
Lys Glu Ala Ala Gin Arg Tyr Gin Tyr Ala Leu Lys Lys Phe Pro
635 640 645
Arg Glu Gly Phe Gly Glu Asp Leu Lys Thr Phe Arg Glu Leu Lys
650 655 660
Val Ser Leu Leu Leu Asn Leu Ser Arg Cys Arg Arg Lys Met Asn
665 670 675
Asp Phe Gly Met Ala Glu Glu Phe Ala Thr Lys Ala Leu Glu Leu
680 685 690
Lys Pro Lys Ser Tyr Glu Ala Tyr Tyr Ala Arg Ala Arg Ala Lys
695 700 705
Arg Ser Ser Arg Gin Phe Ala Ala Ala Leu Glu Asp Leu Asn Glu
710 715 720
Ala He Lys Leu Cys Pro Asn Asn Arg Glu He Gin Arg Leu Leu
725 730 735
Leu Arg Val Glu Glu Glu Cys Arg Gin Met Gin Gin Pro Gin Gin
740 745 750
Pro Pro Pro Pro Pro Gin Pro Gin Gin Gin Leu Pro Glu Glu Ala
755 760 765
Glu Pro Glu Pro Gin His Glu Asp He Tyr Ser Val Gin Asp He
770 775 780
Phe Glu Glu Glu Tyr Leu Glu Gin Asp Val Glu Asn Val Ser He
785 790 795
Gly Leu Gin Thr Glu Ala Arg Pro Ser Gin Gly Leu Pro Val He
800 805 810
Gin Ser Pro Pro Ser Ser Pro Pro His Arg Asp Ser Ala Tyr He
815 820 825
Ser Ser Ser Pro Leu Gly Ser His Gin Val Phe Asp Phe Arg Ser
830 835 840
Ser Ser Ser Val Gly Ser Pro Thr Arg Gin Thr Tyr Gin Ser Thr
845 850 855
Ser Pro Ala Leu Ser Pro Thr His Gin Asn Ser His Tyr Arg Pro
860 865 870
Ser Pro Pro His Thr Ser Pro Ala His Gin Gly Gly Ser Tyr Arg
875 880 885
Phe Ser Pro Pro Pro Val Gly Gly Gin Gly Lys Glu Tyr Pro Ser
890 895 900
Pro Pro Pro Ser Pro Leu Arg Arg Gly Pro Gin Tyr Arg Ala Ser
905 910 915
Pro Pro Ala Glu Ser Met Ser Val Tyr Arg Ser Gin Ser Gly Ser
920 925 930
Pro Val Arg Tyr Gin Gin Glu Thr Ser Val Ser Gin Leu Pro Gly
935 940 945
Arg Pro Lys Ser Pro Leu Ser Lys Met Ala Gin Arg Pro Tyr Gin
950 955 960
Met Pro Gin Leu Pro Val Ala Val Pro Gin Gin Gly Leu Arg Leu
965 970 975
Gin Pro Ala Lys Ala Gin He Val Arg Ser Asn Gin Pro Ser Pro
26/86 980 985 990 •
Ala Val His Ser Ser Thr Val He Pro Thr Gly Ala Tyr Gly Gin
995 1000 1005
Val Ala His Ser Met Ala Ser Lys Tyr Gin Ser Ser Gin Gly Asp
1010 1015 1020
He Gly Val Ser Gin Ser Arg Leu Val Tyr Gin Gly Ser He Gly
1025 1030 1035
Gly He Val Gly Asp Gly Arg Pro Val Gin His Val Gin Ala Ser
1040 1045 1050
Leu Ser Ala Gly Ala He Cys Gin His Gly Gly Leu Thr Lys Glu
1055 1060 1065
Asp Leu Pro Gin Arg Pro Ser Ser Ala Tyr Arg Gly Gly Val Arg
1070 1075 1080
Tyr Ser Gin Thr Pro Gin He Gly Arg Ser Gin Ser Ala Ser Tyr
1085 1090 1095
Tyr Pro Val Cys His Ser Lys Leu Asp Leu Glu Arg Ser Ser Ser
1100 1105 1110
Gin Leu Gly Ser Pro Asp Val Ser His Leu He Arg Arg Pro He
1115 1120 1125
Ser Val Asn Pro Asn Glu He Lys Pro His Pro Pro Thr Pro Arg
1130 1135 1140
Pro Leu Leu His Ser Gin Ser Val Gly Leu Arg Phe Ser Pro Ser
1145 1150 1155
Ser Asn Ser He Ser Ser Thr Ser Asn Leu Thr Pro Thr Phe Arg
1160 1165 1170
Pro Ser Ser Ser He Gin Gin Met Glu He Pro Leu Lys Pro Ala
1175 1180 1185
Tyr Glu Arg Ser Cys Asp Glu Leu Ser Pro Val Ser Pro Thr Gin
1190 1195 1200
Gly Gly Tyr Pro Ser Glu Pro Thr Arg Ser Arg Thr Thr Pro Phe
1205 1210 1215
Met Gly He He Asp Lys Thr Ala Arg Thr Gin Gin Tyr Pro His
1220 1225 1230
Leu His Gin Gin Asn Arg Thr Trp Ala Val Ser Ser Val Asp Thr
1235 1240 1245
Val Leu Ser Pro Thr Ser Pro Gly Asn Leu Pro Gin Pro Glu Ser
1250 1255 1260
Phe Ser Pro Pro Ser Ser He Ser Asn He Ala Phe Tyr Asn Lys
1265 1270 1275
Thr Asn Asn Ala Gin Asn Gly His Leu Leu Glu Asp Asp Tyr Tyr
1280 1285 1290
Ser Pro His Gly Met Leu Ala Asn Gly Ser Arg Gly Asp Leu Leu
1295 1300 1305
Glu Arg Val Ser Gin Ala Ser Ser Tyr Pro Asp Val Lys Val Ala
1310 1315 1320
Arg Thr Leu Pro Val Ala Gin Ala Tyr Gin Asp Asn Leu Tyr Arg
1325 1330 1335
Gin Leu Ser Arg Asp Ser Arg Gin Gly Gin Thr Ser Pro He Lys
1340 1345 1350
Pro Lys Arg Pro Phe Val Glu Ser Asn Val
1355 1360
<210> 13
<211> 521
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2962837CD1
<400> 13
Met Leu Pro Arg Arg Pro Leu Ala Trp Pro Ala Trp Leu Leu Arg
27/86 1 5 10 15
Gly Ala Pro Gly Ala Ala Gly Ser Trp Gly Arg Pro Val Gly Pro
20 25 30
Leu Ala Arg Arg Gly Cys Cys Ser Ala Pro Gly Thr Pro Glu Val
35 40 45
Pro Leu Thr Arg Glu Arg Tyr Pro Val Arg Arg Leu Pro Phe Ser
50 55 60
Thr Val Ser Lys Gin Asp Leu Ala Ala Phe Glu Arg He Val Pro
65 70 75
Gly Gly Val Val Thr Asp Pro Glu Ala Leu Gin Ala Pro Asn Val
80 85 90
Asp Trp Leu Arg Thr Leu Arg Gly Cys Ser Lys Val Leu Leu Arg
95 100 105
Pro Arg Thr Ser Glu Glu Val Ser His He Leu Arg His Cys His
110 115 120
Glu Arg Asn Leu Ala Val Asn Pro Gin Gly Gly Asn Thr Gly Met
125 130 135
Val Gly Gly Ser Val Pro Val Phe Asp Glu He He Leu Ser Thr
140 145 150
Ala Arg Met Asn Arg Val Leu Ser Phe His Ser Val Ser Gly He
155 160 165
Leu Val Cys Gin Ala Gly Cys Val Leu Glu Glu Leu Ser Arg Tyr
170 175 180
Val Glu Glu Arg Asp Phe He Met Pro Leu Asp Leu Gly Ala Lys
185 190 195
Gly Ser Cys His He Gly Gly Asn Val Ala Thr Asn Ala Gly Gly
200 205 210
Leu Arg Phe Leu Arg Tyr Gly Ser Leu His Gly Thr Val Leu Gly
215 220 225
Leu Glu Val Val Leu Ala Asp Gly Thr Val Leu Asp Cys Leu Thr
230 235 240
Ser Leu Arg Lys Asp Asn Thr Gly Tyr Asp Leu Lys Gin Leu Phe
245 250 255
He Gly Ser Glu Gly Thr Leu Gly He He Thr Thr Val Ser He
260 265 270
Leu Cys Pro Pro Lys Pro Arg Ala Val Asn Val Ala Phe Leu Gly
275 280 285
Cys Pro Gly Phe Ala Glu Val Leu Gin Thr Phe Ser Thr Cys Lys
290 295 300
Gly Met Leu Gly Glu He Leu Ser Ala Phe Glu Phe Met Asp Ala
305 310 315
Val Cys Met Gin Leu Val Gly Arg His Leu His Leu Ala Ser Pro
320 325 330
Val Gin Glu Ser Pro Phe Tyr Val Leu He Glu Thr Ser Gly Ser
335 340 345
Asn Ala Gly His Asp Ala Glu Lys Leu Gly His Phe Leu Glu His
350 355 360
Ala Leu Gly Ser Gly Leu Val Thr Asp Gly Thr Met Ala Thr Asp
365 370 375
Gin Arg Lys Val Lys Met Leu Trp Ala Leu Arg Glu Arg He Thr
380 385 390
Glu Ala Leu Ser Arg Asp Gly Tyr Val Tyr Lys Tyr Asp Leu Ser
395 400 405
Leu Pro Val Glu Arg Leu Tyr Asp He Val Thr Asp Leu Arg Ala
410 415 420
Arg Leu Gly Pro His Ala Lys His Val Val Gly Tyr Gly His Leu
425 430 435
Gly Asp Gly Asn Leu His Leu Asn Val Thr Ala Glu Ala Phe Ser
440 445 450
Pro Ser Leu Leu Ala Ala Leu Glu Pro His Val Tyr Glu Trp Thr
455 460 465
Ala Gly Gin Gin Gly Ser Val Ser Ala Glu His Gly Val Gly Phe
470 475 480
28/86 Arg Lys Arg Asp Val Leu Gly Tyr Ser Lys Pro Pro Gly Ala Leu
485 490 495
Gin Leu Met Gin Gin Leu Lys Ala Leu Leu Asp Pro Lys Gly He
500 505 510
Leu Asn Pro Tyr Lys Thr Leu Pro Ser Gin Ala 515 520
<210> 14
<211> 523
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6961277CD1
<400> 14
Met Ser Arg Gin Phe Thr Cys Lys Ser Gly Ala Ala Ala Lys Gly
1 5 10 15
Gly Phe Ser Gly Cys Ser Ala Val Leu Ser Gly Gly Ser Ser Ser
20 25 . 30
Ser Phe Arg Ala Gly Ser Lys Gly Leu Ser Gly Gly Leu Gly Ser
35 40 45
Arg Ser Leu Tyr Ser Leu Gly Gly Val Arg Ser Leu Asn Val Ala
50 55 60
Ser Gly Ser Gly Lys Ser Gly Gly Tyr Gly Phe Gly Arg Gly Arg
65 70 75
Ala Ser Gly Phe Ala Gly Ser Met Phe Gly Ser Val Ala Leu Gly
80 85 90
Pro Val Cys Pro Thr Val Cys Pro Pro Gly Gly He His Gin Val
95 100 105
Thr He Asn Glu Ser Leu Leu Ala Pro Leu Asn Val Glu Leu Asp
110 115 120
Pro Lys He Gin Lys Val Arg Ala Gin Glu Arg Glu Gin He Lys
125 130 135
Ala Leu Asn Asn Lys Phe Ala Ser Phe He Asp Lys Val Arg Phe
140 145 150
Leu Glu Gin Gin Asn Gin Val Leu Glu Thr Lys Trp Glu Leu Leu
155 160 165
Gin Gin Leu Asp Leu Asn Asn Cys Lys Asn Asn Leu Glu Pro He
170 175 180
Leu Glu Gly Tyr He Ser Asn Leu Arg Lys Gin Leu Glu Thr Leu
185 190 195
Ser Gly Asp Arg Val Arg Leu Asp Ser Glu Leu Arg Asn Val Arg
200 205 210
Asp Val Val Glu Asp Tyr Lys Lys Arg Tyr Glu Glu Glu He Asn
215 220 225
Lys Arg Thr Ala Ala Glu Asn Glu Phe Val Leu Leu Lys Lys Asp
230 235 240
Val Asp Ala Ala Tyr Ala Asn Lys Val Glu Leu Gin Ala Lys Val
245 250 255
Glu .Ser Met Asp Gin Glu He Lys Phe Phe Arg Cys Leu Phe Glu
260 265 270
Ala Glu He Thr Gin He Gin Ser His He Ser Asp Met Ser Val
275 280 285
He Leu Ser Met Asp Asn Asn Arg Asn Leu Asp Leu Asp Ser He
290 295 300
He Asp Glu Val Arg Thr Gin Tyr Glu Glu He Ala Leu Lys Ser
305 310 315
Lys Ala Glu Ala Glu Ala Leu Tyr Gin Thr Lys Phe Gin Glu Leu
320 325 330 Gin Leu Ala Ala Gly Arg His Gly Asp Asp Leu Lys Asn Thr Lys
335 340 345
29/86 Asn Glu He Ser Glu Leu Thr Arg Leu He Gin Arg He Arg Ser
350 355 360
Glu He Glu Asn Val Lys Lys Gin Ala Ser Asn Leu Glu Thr Ala
365 370 375
He Ala Asp Ala Glu Gin Arg Gly Asp Asn Ala Leu Lys Asp Ala
380 385 .390
Arg Ala Lys Leu Asp Glu Leu Glu Gly Ala Leu His Gin Ala Lys
395 400 405
Glu Glu Leu Ala Arg Met Leu Arg Glu Tyr Gin Glu Leu Met Ser
410 415 420
Leu Lys Leu Ala Leu Asp Met Glu He Ala Thr Tyr Arg Lys Leu
425 430 435
Leu Glu Ser Glu Glu Cys Arg Met Ser Gly Glu Phe Pro Ser Pro
440 445 450
Val Ser He Ser He He Ser Ser Thr Ser Gly Gly Ser Val Tyr
455 460 465
Gly Phe Arg Pro Ser Met Val Ser Gly Gly Tyr Val Ala Asn Ser
470 475 480
Ser Asn Cys He Ser Gly Val Cys Ser Val Arg Gly Gly Glu Gly
485 490 495
Arg Ser Arg Gly Ser Ala Asn Asp Tyr Lys Asp Thr Leu Gly Lys
500 505 510
Gly Ser Ser Leu Ser Ala Pro Ser Lys Lys Thr Ser Arg
515 520
<210> 15
<211> 615
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 56022622CD1
<400> 15
Met Gly Gly Trp Lys Gly Pro Gly Gin Arg Arg Gly Lys Glu Gly
1 5 10 15
Pro Glu Ala Arg Arg Arg Ala Ala Glu Arg Gly Gly Gly Gly Gly
20 25 30
Gly Gly Gly Val Pro Ala Pro Arg Ser Pro Ala Arg Glu Pro Arg
35 40 45
Pro Arg Ser Cys Leu Leu Leu Pro Pro Pro Trp Gly Ala Ala Met
50 55 60
Thr Pro Asp Leu Leu Asn Phe Lys Lys Gly Trp Met Ser He Leu
65 70 75
Asp Glu Pro Gly Glu Pro Pro Ser Pro Ser Leu Thr Thr Thr Ser
80 85 90
Thr Ser Gin Trp Lys Lys His Trp Phe Val Leu Thr Asp Ser Ser
95 100 105
Leu Lys Tyr Tyr Arg Asp Ser Thr Ala Glu Glu Ala Asp Glu Leu
110 115 120
Asp Gly Glu He Asp Leu Arg Ser Cys Thr Asp Val Thr Glu Tyr
125 130 135
Ala Val Gin Arg Asn Tyr Gly Phe Gin He His Thr Lys Asp Ala
140 145 150
Val Tyr Thr Leu Ser Ala Met Thr Ser Gly He Arg Arg Asn Trp
155 160 165
He Glu Ala Leu Arg Lys Thr Val Arg Pro Thr Ser Ala Pro Asp
170 175 180
Val Thr Lys Leu Ser Asp Ser Asn Lys Glu Asn Ala Leu His Ser
185 190 195
Tyr Ser Thr Gin Lys Gly Pro Leu Lys Ala Gly Glu Gin Arg Ala
200 205 210
30/86 Gly Ser Glu Val He Ser Arg Gly Gly Pro Arg Lys Ala Asp Gly
215 220 225
Gin Arg Gin Ala Leu Asp Tyr Val Glu Leu Ser Pro Leu Thr Gin
230 235 240
Ala Ser Pro Gin Arg Ala Arg Thr Pro Ala Arg Thr Pro Asp Arg
245 250 255
Leu Ala Lys Gin Glu Glu Leu Glu Arg Asp Leu Ala Gin Arg Ser
260 265 270
Glu Glu Arg Arg Lys Trp Phe Glu Ala Thr Asp Ser Arg Thr Pro
275 280 285
Glu Val Pro Ala Gly Glu Gly Pro Arg Arg Gly Leu Gly Ala Pro
290 295 300
Leu Thr Glu Asp Gin Gin Asn Arg Leu Ser Glu Glu He Glu Lys
305 310 315
Lys Trp Gin Glu Leu Glu Lys Leu Pro Leu Arg Glu Asn Lys Arg
320 325 330
Val Pro Leu Thr Ala Leu Leu Asn Gin Ser Arg Gly Glu Arg Arg
335 340 345
Gly Pro Pro Ser Asp Gly His Glu Ala Leu Glu Lys Glu Glu Ala
350 355 360
Cys Glu Arg Ser Leu Ala Glu Met Glu Ser Ser His Gin Gin Val
365 370 375
Met Glu Glu Leu Gin Arg His His Glu Arg Glu Leu Gin Arg Leu
380 385 390
Gin Gin Glu Lys Glu Trp Leu Leu Ala Glu Glu Thr Ala Ala Thr
395 400 405
Ala Ser Ala He Glu Ala Met Lys Lys Ala Tyr Gin Glu Glu Leu
410 415 420
Ser Arg Glu Leu Ser Lys Thr Arg Ser Leu Gin Gin Gly Pro Asp
425 430 435
Gly Leu Arg Lys Gin His Gin Ser Asp Val Glu Ala Leu Lys Arg
440 445 450
Glu Leu Gin Val Leu Ser Glu Gin Tyr Ser Gin Lys Cys Leu Glu
455 460 465
He Gly Ala Leu Met Arg Gin Ala Glu Glu Arg Glu His Thr Leu
470 . 475 480
Arg Arg Cys Gin Gin Glu Gly Gin Glu Leu Leu Arg His Asn Gin
485 490 495
Glu Leu His Gly Arg Leu Ser Glu Glu He Asp Gin Leu Arg Gly
500 505 510
Phe He Ala Ser Gin Gly Met Gly Asn Gly Cys Gly Arg Ser Asn
515 520 525
Glu Arg Ser Ser Cys Glu Leu Glu Val Leu Leu Arg Val Lys Glu
530 535 540
Asn Glu Leu Gin Tyr Leu Lys Lys Glu Val Gin Cys Leu Arg Asp
545 550 555
Glu Leu Gin Met Met Gin Lys Asp Lys Arg Phe Thr Ser Gly Lys
560 565 570
Tyr Gin Asp Val Tyr Val Glu Leu Ser His He Lys Thr Arg Ser
575 580 • 585
Glu Arg Glu He Glu Gin Leu Lys Glu His Leu Arg Leu Ala Met
590 595 600
Ala Ala Leu Gin Glu Lys Glu Ser Met Arg Asn Ser Leu Ala Glu
605 610 615
<210> 16
<211> 875
<212> PRT
<213> Homo sapiens
<220>
<221> misc_f eature
31/86 <223 > Incyte ID No : 542310CD1
<400> 16
Met Ser Arg His His Ser Arg Phe Glu Arg Asp Tyr Arg Val Gly
1 5 10 15
Trp Asp Arg Arg Glu Trp Ser Val Asn Gly Thr His Gly Thr Thr
20 25 30
Ser He Cys Ser Val Thr Ser Gly Ala Gly Gly Gly Thr Ala Ser
35 40 45
Ser Leu Ser Val Arg Pro Gly Leu Leu Pro Leu Pro Val Val Pro
50 55 60
Ser Arg Leu Pro Thr Pro Ala Thr Ala Pro Ala Pro Cys Thr Thr
65 70 75
Gly Ser Ser Glu Ala He Thr Ser Leu Val Ala Ser Ser Ala Ser
80 85 90
Ala Val Thr Thr Lys Ala Pro Gly He Ser Lys Gly Asp Ser Gin
95 100 105
Ser Gin Gly Leu Ala Thr Ser He Arg Trp Gly Gin Thr Pro He
110 115 120
Asn Gin Ser Thr Pro Trp Asp Thr Asp Glu Pro Pro Ser Lys Gin
125 130 135
Met Arg Glu Ser Asp Asn Pro Gly Thr Gly Pro Trp Val Thr Thr
140 145 150
Val Ala Ala Gly Asn Gin Pro Thr Leu He Ala His Ser Tyr Gly
155 160 165
Val Ala Gin Pro Pro Thr Phe Ser Pro Ala Val Asn Val Gin Ala
170 175 180
Pro Val He Gly Val Thr Pro Ser Leu Pro Pro His Val Gly Pro
185 190 195
Gin Leu Pro Leu Met Pro Gly His Tyr Ser Leu Pro Gin Pro Pro
200 205 210
Ser Gin Pro Leu Ser Ser Val Val Val Asn Met Pro Ala Gin Ala
215 220 225
Leu Tyr Ala Ser Pro Gin Pro Leu Ala Val Ser Thr Leu Pro Gly
230 235 240
Val Gly Gin Val Ala Arg Pro Gly Pro Thr Ala Val Gly Asn Gly
245 250 255
His Met Ala Gly Pro Leu Leu Pro Pro Pro Pro Pro Ala Gin Pro
260 265 270
Ser Ala Thr Leu Pro Ser Gly Ala Pro Ala Thr Asn Gly Pro Pro
275 280 285
Thr Thr Asp Ser Ala His Gly Leu Gin Met Leu Arg Thr He Gly
290 295 300
Val Gly Lys Tyr Glu Phe Thr Asp Pro Gly His Pro Arg Glu Met
305 310 315
Leu Lys Glu Leu Asn Gin Gin Arg Arg Ala Lys Ala Phe Thr Asp
320 325 330
Leu Lys He Val Val Glu Gly Arg Glu Phe Glu Val His Gin Asn
335 340 345
Val Leu Ala Ser Cys Ser Leu Tyr Phe Lys Asp Leu He Gin Arg
350 355 360
Ser Val Gin Asp Ser Gly Gin Gly Gly Arg Glu Lys Leu Glu Leu
365 370 375
Val Leu Ser Asn Leu Gin Ala Asp Val Leu Glu Leu Leu Leu Glu
380 385 390
Phe Val Tyr Thr Gly Ser Leu Val He Asp Ser Ala Asn Ala Lys
395 400 405
Thr Leu Leu Glu Ala Ala Ser Lys Phe Gin Phe His Thr Phe Cys
410 415 420
Lys Val Cys Val Ser Phe Leu Glu Lys Gin Leu Thr Ala Ser Asn
425 430 435
Cys Leu Gly Val Leu Ala Met Ala Glu Ala Met Gin Cys Ser Glu
440 445 450
32/86 Leu Tyr His Met Ala Lys Ala Phe Ala Leu Gin He Phe Pro Glu
455 460 465
Val Ala Ala Gin Glu Glu He Leu Ser He Ser Lys Asp Asp Phe
470 475 . 480
He Ala Tyr Val Ser Asn Asp Ser Leu Asn Thr Lys Ala Glu Glu
485 490 495
Leu Val Tyr Glu Thr Val He Lys Trp He Lys Lys Asp Pro Ala
500 505 510
Thr Arg Thr Gin Tyr Ala Ala Glu Leu Leu Ala Val Val Arg Leu
515 520 525
Pro Phe He His Pro Ser Tyr Leu Leu Asn Val Val Asp Asn Glu
530 535 540
Glu Leu He Lys Ser Ser Glu Ala Cys Arg Asp Leu Val Asn Glu
545 550 555
Ala Lys Arg Tyr His Met Leu Pro His Ala Arg Gin Glu Met Gin
560 565 570
Thr Pro Arg Thr Arg Pro Arg Leu Ser Ala Gly Val Ala Glu Val
575 580 585
He Val Leu Val Gly Gly Arg Gin Met Val Gly Met Thr Gin Arg
590 595 600
Ser Leu Val Ala Val Thr Cys Trp Asn Pro Gin Asn Asn Lys Trp
605 610 615
Tyr Pro Leu Ala Ser Leu Pro Phe Tyr .Asp Arg Glu Phe Phe Ser
620 625 630
Val Val Ser Ala Gly Asp Asn He Tyr Leu Ser Gly Gly Met Glu
635 640 645
Ser Gly Val Thr Leu Ala Asp Val Trp Cys Tyr Met Ser Leu Leu
650 655 660
Asp Asn Trp Asn Leu Val Ser Arg Met Thr Val Pro Arg Cys Arg
665 670 675
His Asn Ser Leu Val Tyr Asp Gly Lys He Tyr Thr Leu Gly Gly
680 685 690
Leu Gly Val Ala Gly Asn Val Asp His Val Glu Arg Tyr Asp Thr
695 700 705
He Thr Asn Gin Trp Glu Ala Val Ala Pro Leu Pro Lys Ala Val
710 715 720
His Ser Ala Ala Ala Thr Val Cys Gly Gly Lys He Tyr Val Phe
725 730 735
Gly Gly Val Asn Glu Ala Gly Arg Ala Ala Gly Val Leu Gin Ser
740 745 750
Tyr Val Pro Gin Thr Asn Thr Trp Ser Phe He Glu Ser Pro Met
755 760 765
He Asp Asn Lys Tyr Ala Pro Ala Val Thr Leu Asn Gly Phe Val
770 775 ' 780
Phe He Leu Gly Gly Ala Tyr Ala Arg Ala Thr Thr He Tyr Asp
785 790 795
Pro Glu Lys Gly Asn He Lys Ala Gly Pro Asn Met Asn His Ser
800 805 810
Arg Gin Phe Cys Ser Ala Val Val Leu Asp Gly Lys He Tyr Ala
815 820 825
Thr Gly Gly He Val Ser Ser Glu Gly Pro Ala Leu Gly Asn Met
830 835 840
Glu Ala Tyr Glu Pro Thr Thr Asn Thr Trp Thr Leu Leu Pro His
845 850 855
Met Pro Cys Pro Val Phe Arg His Gly Cys Val Val He Lys Lys
860 865 870
Tyr He Gin Ser Gly
875
<210> 17
<211> 405
<212> PRT
<213> Homo sapiens
33/86 <220>
<221> misc_feature
<223> Incyte ID No: 1732825CD1
<400> 17
Met Asn Gly Ala Asn Leu Thr Ala Gin Asp Asp Arg Gly Cys Thr
1 5 10 15
Pro Leu His Leu Ala Ala Thr His Gly His Ser Phe Thr Leu Gin
20 25 30
He Met Leu Arg Ser Gly Val Asp Pro Ser Val Thr Asp Lys Arg
35 40 45
Glu Trp Arg Pro Val His Tyr Ala Ala Phe His Gly Arg Leu Gly
50 55 60
Cys Leu Gin Leu Leu Val Lys Trp Gly Cys Ser He Glu Asp Val
65 70 75
Asp Tyr Asn Gly Asn Leu Pro Val His Leu Ala Ala Met Glu Gly
80 85 90
His Leu His Cys Phe Lys Phe Leu Val Ser Arg Met Ser Ser Ala
95 100 105
Thr Gin Val Leu Lys Ala Phe Asn Asp Asn Gly Glu Asn Val Leu
110 115 120
Asp Leu Ala Gin Arg Phe Phe Lys Gin Asn He Leu Gin Phe He
125 130 135
Gin Gly Ala Glu Tyr Glu Gly Lys Asp Leu Glu Asp Gin Glu Thr
140 145 150
Leu Ala Phe Pro Gly His Val Ala Ala Phe Lys Gly Asp Leu Gly
155 160 165
Met Leu Lys Lys Leu Val Glu Asp Gly Val He Asn He Asn Glu
170 175 180
Arg Ala Asp Asn Gly Ser Thr Pro Met His Lys Ala Ala Gly Gin
185 190 195
Gly His He Glu Cys Leu Gin Trp Leu He Lys Met Gly Ala Asp
200 205 210
Ser Asn He Thr Asn Lys Ala Gly Glu Arg Pro Ser Asp Val Ala
215 220 225
Lys Arg Phe Ala His Leu Ala Ala Val Lys Leu Leu Glu Glu Leu
230 235 240
Gin Lys Tyr Asp He Asp Asp Glu Asn Glu He Asp Glu Asn Asp
245 250 255
Val Lys Tyr Phe He Arg His Gly Val Glu Gly Ser Thr Asp Ala
260 265 270
Lys Asp Asp Leu Cys Leu Ser Asp Leu Asp Lys Thr Asp Ala Arg
275 280 285
Met Arg Ala Tyr Lys Lys He Val Glu Leu Arg His Leu Leu Glu
290 295 300
He Ala Glu Ser Asn Tyr Lys His Leu Gly Gly He Thr Glu Glu
305 310 315
Asp Leu Lys Gin Lys Lys Glu Gin Leu Glu Ser Glu Lys Thr He
320 325 330
Lys Glu Leu Gin Gly Gin Leu Glu Tyr Glu Arg Leu Arg Arg Glu
335 340 345
Lys Leu Glu Cys Gin Leu Asp Glu Tyr Arg Ala Glu Val Asp Gin
350 355 360
Leu Arg Glu Thr Leu Glu Lys He Gin Val Pro Asn Phe Val Ala
365 370 375
Met Glu Asp Ser Ala Ser Cys Glu Ser Asn Lys Glu Lys Arg Arg
380 385 390
Val Lys Lys Lys Val Ser Ser Gly Gly Val Phe Val Arg Arg Tyr
395 400 405
<210> 18 <211> 2039
34/86 <212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6170242CD1
<400> 18
Met Phe Asn Leu Met Lys Lys Asp Lys Asp Lys Asp Gly Gly Arg
1 5 10 15
Lys Glu Lys Lys Glu Lys Lys Glu Lys Lys Glu Arg Met Ser Ala
20 25 30
Ala Glu Leu Arg Ser Leu Glu Glu Met Ser Leu Arg Arg Gly Phe
35 40 45
Phe Asn Leu Asn Arg Ser Ser Lys Arg Glu Ser Lys Thr Arg Leu
50 55 60
Glu He Ser Asn Pro He Pro He Lys Val Ala Ser Gly Ser Asp
65 70 75
Leu His Leu Thr Asp He Asp Ser Asp Ser Asn Arg Gly Ser Val
80 85 90
He Leu Asp Ser Gly His Leu Ser Thr Ala Ser Ser Ser Asp Asp
95 100 105
Leu Lys Gly Glu Glu Gly Ser Phe Arg Gly Ser Val Leu Gin Arg
110 115 120
Ala Ala Lys Phe Gly Ser Leu Ala Lys Gin Asn Ser Gin Met He
125 130 135
Val Lys Arg Phe Ser Phe Ser Gin Arg Ser Arg Asp Glu Ser Ala
140 145 150
Ser Glu Thr Ser Thr Pro Ser Glu His Ser Ala Ala Pro Ser Pro
155 160 165
Gin Val Glu Val Arg Thr Leu Glu Gly Gin Leu Val Gin His Pro
170 175 180
Gly Pro Gly He Pro Arg Pro Gly His Arg Ser Arg Ala Pro Glu
185 190 195
Leu Val Thr Lys Lys Phe Pro Val Asp Leu Arg Leu Pro Pro Val
200 205 210
Val Pro Leu Pro Pro Pro Thr Leu Arg Glu Leu Glu Leu Gin Arg
215 220 225
Arg Pro Thr Gly Asp Phe Gly Phe Ser Leu Arg Arg Thr Thr Met
230 235 240
Leu Asp Arg Gly Pro Glu Gly Gin Ala Cys Arg Arg Val Val His
245 250 255
Phe Ala Glu Pro Gly Ala Gly Thr Lys Asp Leu Ala Leu Gly Leu
260 265 270
Val Pro Gly Asp Arg Leu Val Glu He Asn Gly His Asn Val Glu
275 280 285
Ser Lys Ser Arg Asp Glu He Val Glu Met He Arg Gin Ser Gly
290 295 300
Asp Ser Val Arg Leu Lys Val Gin Pro He Pro Glu Leu Ser Glu
305 310 315
Leu Ser Arg Ser Trp Leu Arg Ser Gly Glu Gly Pro Arg Arg Glu
320 325 - 330
Pro Ser Asp Ala Lys Thr Glu Glu Gin He Ala Ala Glu Glu Ala
335 340 345
Trp Asn Glu Thr Glu Lys Val Trp Leu Val His Arg Asp Gly Phe
350 355 360
Ser Leu Ala Ser Gin Leu Lys Ser Glu Glu Leu Asn Leu Pro Glu
365 " 370 375
Gly Lys Val Arg Val Lys Leu Asp His Asp Gly Ala He Leu Asp
380 385 390
Val Asp Glu Asp Asp Val Glu Lys Ala Asn Ala Pro Ser Cys Asp
395 400 405
Arg Leu Glu Asp Leu Ala Ser Leu Val Tyr Leu Asn Glu Ser Ser
35/86 410 415 420
Val Leu His Thr Leu Arg Gin Arg Tyr Gly Ala Ser Leu Leu His
425 430 435
Thr Tyr Ala Gly Pro Ser Leu Leu Val Leu Gly Pro Arg Gly Ala
440 445 450
Pro Ala Val Tyr Ser Glu Lys Val Met His Met Phe Lys Gly Cys
455 460 465
Arg Arg Glu Asp Met Ala Pro His He Tyr Ala Val Ala Gin Thr
470 475 480
Ala Tyr Arg Ala Met Leu Met Ser Arg Gin Asp Gin Ser He He
485 490 495
Leu Leu Gly Ser Ser Gly Ser Gly Lys Thr Thr Ser Cys Gin His
500 505 510
Leu Val Gin Tyr Leu Ala Thr He Ala Gly He Ser Gly Asn Lys
515 520 525
Val Phe Ser Val Glu Lys Trp Gin Ala Leu Tyr Thr Leu Leu Glu
530 535 540
Ala Phe Gly Asn Ser Pro Thr He He Asn Gly Asn Ala Thr Arg
545 550 555
Phe Ser Gin He Leu Ser Leu Asp Phe Asp Gin Ala Gly Gin Val
560 565 570
Ala Ser Ala Ser He Gin Thr Met Leu Leu Glu Lys Leu Arg Val
575 580 585
Ala Arg Arg Pro Ala Ser Glu Ala Thr Phe Asn Val Phe Tyr Tyr
590 595 600
Leu Leu Ala Cys Gly Asp Gly Thr Leu Arg Thr Glu Leu His Leu
605 610 615
Asn His Leu Ala Glu Asn Asn Val Phe Gly He Val Pro Leu Ala
620 625 630
Lys Pro Glu Glu Lys Gin Lys Ala Ala Gin Gin Phe Ser Lys Leu
635 640 645
Gin Ala Ala Met Lys Val Leu Gly He Ser Pro Asp Glu Gin Lys
650 655 660
Ala Cys Trp Phe He Leu Ala Ala He Tyr His Leu Gly Ala Ala
665 670 675
Gly Ala Thr Lys Glu Ala Ala Glu Ala Gly Arg Lys Gin Phe Ala
680 685 690
Arg His Glu Trp Ala Gin Lys Ala Ala Tyr Leu Leu Gly Cys Ser
695 700 705
Leu Glu Glu Leu Ser Ser Ala He Phe Lys His Gin His Lys Gly
710 715 720
Gly Thr Leu Gin Arg Ser Thr Ser Phe Arg Gin Gly Pro Glu Glu
725 730 735
Ser Gly Leu Gly Asp Gly Thr Gly Pro Lys Leu Ser Ala Leu Glu
740 745 750
Cys Leu Glu Gly Met Ala Ala Gly Leu Tyr Ser Glu Leu Phe Thr
755 760 765
Leu Leu Val Ser Leu Val Asn Arg Ala Leu Lys Ser Ser Gin His
770 775 780
Ser Leu Cys Ser Met Met He Val Asp Thr Pro Gly Phe Gin Asn
785 790 795
Pro Glu Gin Gly Gly Ser Ala Arg Gly Ala Ser Phe Glu Glu Leu
800 805 810
Cys His Asn Tyr Thr Gin Asp Arg Leu Gin Arg Leu Phe His Glu
815 820 825
Arg Thr Phe Val Gin Glu Leu Glu Arg Tyr Lys Glu Glu Asn He
830 835 840
Glu Leu Ala Phe Asp Asp Leu Glu Pro Pro Thr Asp Asp Ser Val
845 850 855
Ala Ala Val Asp Gin Ala Ser His Gin Ser Leu Val Arg Ser Leu
860 865 870
Ala Arg Thr Asp Glu Ala Arg Gly Leu Leu Trp Leu Leu Glu Glu
875 880 885
36/86 Glu Ala Leu Val Pro Gly Ala Ser Glu Asp Thr Leu Leu Glu Arg
890 895 900
Leu Phe Ser Tyr Tyr Gly Pro Gin Glu Gly Asp Lys Lys Gly Gin
905 910 915
Ser Pro Leu Leu His Ser Ser Lys Pro His His Phe Leu Leu Gly
920 925 930
His Ser His Gly Thr Asn Trp Val Glu Tyr Asn Val Thr Gly Trp
935 940 945
Leu Asn Tyr Thr Lys Gin Asn Pro Ala Thr Gin Asn Val Pro Arg
950 955 960
Leu Leu Gin Asp Ser Gin Lys Lys He He Ser Asn Leu Phe Leu
965 970 975
Gly Arg Ala Gly Ser Ala Thr Val Leu Ser Gly Ser He Ala Gly
980 985 990
Leu Glu Gly Gly Ser Gin Leu Ala Leu Arg Arg Ala Thr Ser Met
995 1000 1005
Arg Lys Thr Phe Thr Thr Gly Met Ala Ala Val Lys Lys Lys Ser
1010 1015 1020
Leu Cys He Gin Met Lys Leu Gin Val Asp Ala Leu He Asp Thr
1025 1030 1035
He Lys Lys Ser Lys Leu His Phe Val His Cys Phe Leu Pro Val
1040 1045 1050
Ala Glu Gly Trp Ala Gly Glu Pro Arg Ser Ala Ser Ser Arg Arg
1055 1060 1065
Val Ser Ser Ser Ser Glu Leu Asp Leu Pro Ser Gly Asp His Cys
1070 1075 1080
Glu Ala Gly Leu Leu Gin Leu Asp Val Pro Leu Leu Arg Thr Gin
1085 1090 1095
Leu Arg Gly Ser Arg Leu Leu Asp Ala Met Arg Met Tyr Arg Gin
1100 1105 1110
Gly Tyr Pro Asp His Met Val Phe Ser Glu Phe Arg Arg Arg Phe
1115 1120 1125
Asp Val Leu Ala Pro His Leu Thr Lys Lys His Gly Arg Asn Tyr
1130 1135 1140
He Val Val Asp Glu Arg Arg Ala Val Glu Glu Leu Leu Glu Cys
1145 1150 1155
Leu Asp Leu Glu Lys Ser Ser Cys Cys Met Gly Leu Ser Arg Val
1160 1165 1170
Phe Phe Arg Ala Gly Thr Leu Ala Arg Leu Glu Glu Gin Arg Asp
1175 1180 1185
Glu Gin Thr Ser Arg Asn Leu Thr Leu Phe Gin Ala Ala Cys Arg
1190 1195 1200
Gly Tyr Leu Ala Arg Gin His Phe Lys Lys Arg Lys He Gin Asp
1205 1210 1215
Leu Ala He Arg Cys Val Gin Lys Asn He Lys Lys Asn Lys Gly
1220 1225 1230
Val Lys Asp Trp Pro Trp Trp Lys Leu Phe Thr Thr Val Arg Pro
1235 1240 1245
Leu He Glu Val Gin Leu Ser Glu Glu Gin He Arg Asn Lys Asp
1250 1255 1260
Glu Glu He Gin Gin Leu Arg Ser Lys Leu Glu Lys Ala Glu Lys
1265 1270 1275
Glu Arg Asn Glu Leu Arg Leu Asn Ser Asp Arg Leu Glu Ser Arg
1280 1285 1290
He Ser Glu Leu Thr Ser Glu Leu Thr Asp Glu Arg Asn Thr Gly
1295 1300 1305
Glu Ser Ala Ser Gin Leu Leu Asp Ala Glu Thr Ala Glu Arg Leu
1310 1315 1320
Arg Ala Glu Lys Glu Met Lys Glu Leu Gin Thr Gin Tyr Asp Ala
1325 1330 1335
Leu Lys Lys Gin Met Glu Val Met Glu Met Glu Val Met Glu Ala
1340 1345 1350
Arg Leu He Arg Ala Ala Glu He Asn Gly Glu Val Asp Asp Asp
37/86 1355 1360 1365
Asp Ala Gly Gly Glu Trp Arg Leu Lys Tyr Glu Arg Ala Val Arg
1370 1375 1380
Glu Val Asp Phe Thr Lys Lys Arg Leu Gin Gin Glu Phe Glu Asp
1385 1390 1395
Lys Leu Glu Val Glu Gin Gin Asn Lys Arg Gin Leu Glu Arg Arg
1400 1405 1410
Leu Gly Asp Leu Gin Ala Asp Ser Glu Glu Ser Gin Arg Ala Leu
1415 1420 1425
Gin Gin Leu Lys Lys Lys Cys Gin Arg Leu Thr Ala Glu Leu Gin
1430 1435 1440
Asp Thr Lys Leu His Leu Glu Gly Gin Gin Val Arg Asn His Glu
1445 1450 1455
Leu Glu Lys Lys Gin Arg Arg Phe Asp Ser Glu Leu Ser Gin Ala
1460 1465 1470
His Glu Glu Ala Gin Arg Glu Lys Leu Gin Arg Glu Lys Leu Gin
1475 1480 1485
Arg Glu Lys Asp Met Leu Leu Ala Glu Ala Phe Ser Leu Lys Gin
1490 1495 1500
Gin Leu Glu Glu Lys Asp Met Asp He Ala Gly Phe Thr Gin Lys
1505 1510 1515
Val Val Ser Leu Glu Ala Glu Leu Gin Asp He Ser Ser Gin Glu
1520 1525 1530
Ser Lys Asp Glu Ala Ser Leu Ala Lys Val Lys Lys Gin Leu Arg
1535 1540 1545
Asp Leu Glu Ala Lys Val Lys Asp Gin Glu Glu Glu Leu Asp Glu
1550 1555 1560
Gin Ala Gly Thr He Gin Met Leu Glu Gin Ala Lys Leu Arg Leu
1565 1570 1575
Glu Met Glu Met Glu Arg Met Arg Gin Thr His Ser Lys Glu Met
1580 1585 1590
Glu Ser Arg Asp Glu Glu Val Glu Glu Ala Arg Gin Ser Cys Gin
1595 1600 1605
Lys Lys Leu Lys Gin Met Glu Val Gin Leu Glu Glu Glu Tyr Glu
1610 1615 1620
Asp Lys Gin Lys Val Leu Arg Glu Lys Arg Glu Leu Glu Gly Lys
1625 1630 1635
Leu Ala Thr Leu Ser Asp Gin Val Asn Arg Arg Asp Phe Glu Ser
1640 1645 1650
Glu Lys Arg Leu Arg Lys Asp Leu Lys Arg Thr Lys Ala Leu Leu
1655 1660 1665
Ala Asp Ala Gin Leu Met Leu Asp His Leu Lys Asn Ser Ala Pro
1670 1675 1680
Ser Lys Arg Glu He Ala Gin Leu Lys Asn Gin Leu Glu Glu Ser
1685 1690 1695
Glu Phe Thr Cys Ala Ala Ala Val Lys Ala Arg Lys Ala Met Glu
1700 1705 1710
Val Glu He Glu Asp Leu His Leu Gin He Asp Asp He Ala Lys
1715 1720 1725
Ala Lys Thr Ala Leu Glu Glu Gin Leu Ser Arg Leu Gin Arg Glu
1730 1735 1740
Lys Asn Glu He Gin Asn Arg Leu Glu Glu Asp Gin Glu Asp Met
1745 1750 1755
Asn Glu Leu Met Lys Lys His Lys Ala Ala Val Ala Gin Ala Ser
1760 1765 1770
Arg Asp Leu Ala Gin He Asn Asp Leu Gin Ala Gin Leu Glu Glu
1775 1780 1785
Ala Asn Lys Glu Lys Gin Glu Leu Gin Glu Lys Leu Gin Ala Leu
1790 1795 1800
Gin Ser Gin Val Glu Phe Leu Glu Gin Ser Met Val Asp Lys Ser
1805 1810 1815
Leu Val Ser Arg Gin Glu Ala Lys He Arg Glu Leu Glu Thr Arg
1820 1825 1830
38/86 Leu Glu Phe Glu Arg Thr Gin Val Lys Arg Leu Glu Ser Leu Ala
1835 1840 1845
Ser Arg Leu Lys Glu Asn Met Glu Lys Leu Thr Glu Glu Arg Asp
1850 1855 1860
Gin Arg He Ala Ala Glu Asn Arg Glu Lys Glu Gin Asn Lys Arg
1865 1870 1875
Leu Gin Arg Gin Leu Arg Asp Thr Lys Glu Glu Met Gly Glu Leu
1880 1885 1890
Ala Arg Lys Glu Ala Glu Ala Ser Arg Lys Lys His Glu Leu Glu
1895 1900 1905
Met Asp Leu Glu Ser Leu Glu Ala Ala Asn Gin Ser Leu Gin Ala
1910 1915 1920
Asp Leu Lys Leu Ala Phe Lys Arg He Gly Asp Leu Gin Ala Ala
1925 1930 1935
He Glu Asp Glu Met Glu Ser Asp Glu Asn Glu Asp Leu He Asn
1940 1945 1950
Ser Glu Gly Asp Ser Asp Val Asp Ser Glu Leu Glu Asp Arg Val
1955 1960 1965
Asp Gly Val Lys Ser Trp Leu Ser Lys Asn Lys Gly Pro Ser Lys
1970 1975 1980
Ala Ala Ser Asp Asp Gly Ser Leu Lys Ser Ser Ser Pro Thr Ser
1985 1990 1995
Tyr Trp Lys Ser Leu Ala Pro Asp Arg Ser Asp Asp Glu His Asp
2000 2005 2010
Pro Leu Asp Asn Thr Ser Arg Pro Arg Tyr Ser His Ser Tyr Leu
2015 2020 2025
Ser Asp Ser Asp Thr Glu Ala Lys Leu Thr Glu Thr Asn Ala
2030 2035
<210> 19
<211> 191
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2287640CD1
<400> 19
Met Gly He Leu Tyr Ser Glu Pro He Cys Gin Ala Ala Tyr Gin
1 5 10 15
Asn Asp Phe Gly Gin Val Trp Arg Trp Val Lys Glu Asp Ser Ser
20 25 30
Tyr Ala Asn Val Gin Asp Gly Phe Asn Gly Asp Thr Pro Leu He
35 40 45
Cys Ala Cys Arg Arg Gly His Val Arg He Val Ser Phe Leu Leu
50 55 60
Arg Arg Asn Ala Asn Val Asn Leu Lys Asn Gin Lys Glu Arg Thr
65 70 75
Cys Leu- His Tyr Ala Val Lys Lys Lys Phe Thr Phe He Asp Tyr
80 85 90
Leu Leu He He Leu Leu Met Pro Val Leu Leu He Gly Tyr Phe
95 100 105
Leu Met Val Ser Lys Thr Lys Gin Asn Glu Ala Leu Val Arg Met
110 115 120
Leu Leu Asp Ala Gly Val Glu Val Asn Ala Thr Asp Cys Tyr Gly
125 130 135
Cys Thr Ala Leu His Tyr Ala Cys Glu Met Lys Asn Gin Ser Leu
140 145 150
He Pro Leu Leu Leu Glu Ala Arg Ala Asp Pro Thr He Lys Asn
155 160 165
Lys His Gly Glu Ser Ser Leu Asp He Ala Arg Arg Leu Lys Phe
170 175 180
39/86 Ser Gin He Glu Leu Met Leu Arg Lys Ala Leu
185 190
<210> 20
<211> 887
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 1990526CD1
<400> 20
Met Pro Ser Leu Pro Gin Glu Gly Val He Gin Gly Pro Ser Pro
1 5 10 15
Leu Asp Leu Asn Thr Glu Leu Pro Tyr Gin Ser Thr Met Lys Arg
20 25 30
Lys Val Arg Lys Lys Lys Lys Lys Gly Thr He Thr Ala Asn Val
35 40 45
Ala Gly Thr Lys Phe Glu He Val Arg Leu Val He Asp Glu Met
50 55 60
Gly Phe Met Lys Thr Pro Asp Glu Asp Glu Thr Ser Asn Leu He
65 70 75
Trp Cys Asp Ser Ala Val Gin Gin Glu Lys He Ser Glu Leu Gin
80 85 90
Asn Tyr Gin Arg He Asn His Phe Pro Gly Met Gly Glu He Cys
95 100 105
Arg Lys Asp Phe Leu Ala Arg Asn Met Thr Lys Met He Lys Ser
110 115 120
Arg Pro Leu Asp Tyr Thr Phe Val Pro Arg Thr Trp He Phe Pro
125 130 135
Ala Glu Tyr Thr Gin Phe Gin Asn Tyr Val Lys Glu Leu Lys Lys
140 145 150
Lys Arg Lys Gin Lys Thr Phe He Val Lys Pro Ala Asn Gly Ala
155 160 165
Met Gly His Gly He Ser Leu He Arg Asn Gly Asp Lys Leu Pro
170 175 180
Ser Gin Asp His Leu He Val Gin Glu Tyr He Glu Lys Pro Phe
185 190 195
Leu Met Glu Gly Tyr Lys Phe Asp Leu Arg He Tyr He Leu Val
200 205 210
Thr Ser Cys Asp Pro Leu Lys He Phe Leu Tyr His Asp Gly Leu
215 220 225
Val Arg Met Gly Thr Glu Lys Tyr He Pro Pro Asn Glu Ser Asn
230 235 240
Leu Thr Gin Leu Tyr Met His Leu Thr Asn Tyr Ser Val Asn Lys
245 250 255
His Asn Glu His Phe Glu Arg Asp Glu Thr Glu Asn Lys Gly Ser
260 265 270
Lys Arg Ser He Lys Trp Phe Thr Glu Phe Leu Gin Ala Asn Gin
275 280 285
His Asp Val Ala Lys Phe Trp Ser Asp He Ser Glu Leu Val Val
290 295 300
Lys Thr Leu He Val Ala Glu Pro His Val Leu His Ala Tyr Arg
305 310 315 Met Cys Arg Pro Gly Gin Pro Pro Gly Ser Glu Ser Val Cys Phe
320 325 330
Glu Val Leu Gly Phe Asp He Leu Leu Asp Arg Lys Leu Lys Pro
335 340 345
Trp Leu Leu Glu He Asn Arg Ala Pro Ser Phe Gly Thr Asp Gin
350 355 360
Lys He Asp Tyr Asp Val Lys Arg Gly Val Leu Leu Asn Ala Leu
365 370 375
40/86 Lys Leu Leu Asn He Arg Thr Ser Asp Lys Arg Arg Asn Leu Ala
380 385 390
Lys Gin Lys Ala Glu Ala Gin Arg Arg Leu Tyr Gly Gin Asn Ser
395 400 405
He Lys Arg Leu Leu Pro Gly Ser Ser Asp Trp Glu Gin Gin Arg
410 415 420
His Gin Leu Glu Arg Arg Lys Glu Glu Leu Lys Glu Arg Leu Ala
425 430 435
Gin Val Arg Lys Gin He Ser Arg Glu Glu His Glu Asn Arg His
440 445 450
Met Gly Asn Tyr Arg Arg He Tyr Pro Pro Glu Asp Lys Ala Leu
455 460 465
Leu Glu Lys Tyr Glu Asn Leu Leu Ala Val Ala Phe Gin Thr Phe
470 475 480
Leu Ser Gly Arg Ala Ala Ser Phe Gin Arg Glu Leu Asn Asn Pro
485 490 495
Leu Lys Arg Met Lys Glu Glu Asp He Leu Asp Leu Leu Glu Gin
500 505 510
Cys Glu He Asp Asp Glu Lys Leu Met Gly Lys Thr Thr Lys Thr
515 520 525
Arg Gly Pro Lys Pro Leu Cys Ser Met Pro Glu Ser Thr Glu He
530 535 540
Met Lys Arg Pro Lys Tyr Cys Ser Ser Asp Ser Ser Tyr Asp Ser
545 550 555
Ser Ser Ser Ser Ser Glu Ser Asp Glu Asn Glu Lys Glu Glu Tyr
560 565 570
Gin Asn Lys Lys Arg Glu Lys Gin Val Thr Tyr Asn Leu Lys Pro
575 580 585
Ser Asn His Tyr Lys Leu He Gin Gin Pro Ser Ser He Arg Arg
590 595 600
Ser Val Ser Cys Pro Arg Ser He Ser Ala Gin Ser Pro Ser Ser
605 610 615
Gly Asp Thr Arg Pro Phe Ser Ala Gin Gin Met He Ser Val Ser
620 625 630
Arg Pro Thr Ser Ala Ser Arg Ser His Ser Leu Asn Arg Ala Ser
635 640 645
Ser Tyr Met Arg His Leu Pro His Ser Asn Asp Ala Cys Ser Thr
650 655 660
Asn Ser Gin Val Ser Glu Ser Leu Arg Gin Leu Lys Thr Lys Glu
665 670 675
Gin Glu Asp Asp Leu Thr Ser Gin Thr Leu Phe Val Leu Lys Asp
680 685 690
Met Lys He Arg Phe Pro Gly Lys Ser Asp Ala Glu Ser Glu Leu
695 700 705
Leu He Glu Asp He He Asp Asn Trp Lys Tyr His Lys Thr Lys
710 715 720
Val Ala Ser Tyr Trp Leu He Lys Leu Asp Ser Val Lys Gin Arg
725 730 . 735
Lys Val Leu Asp He Val Lys Thr Ser He Arg Thr Val Leu Pro
740 745 750
Arg He Trp Lys Val Pro Asp Val Glu Glu Val Asn Leu Tyr Arg
755 760 765
He Phe Asn Arg Val Phe Asn Arg Leu Leu Trp Ser Arg Gly Gin
770 775 780 Gly Leu Trp Asn Cys Phe Cys Asp Ser Gly Ser Ser Trp Glu Ser
785 790 795 He Phe Asn Lys Ser Pro Glu Val Val Thr Pro Leu Gin Leu Gin
800 805 810
Cys Cys Gin Arg Leu Val Glu Leu Cys Lys Gin Cys Leu Leu Val
815 820 825 Val Tyr Lys Tyr Ala Thr Asp Lys Arg Gly Ser Leu Ser Gly He
830 835 840 Gly Pro Asp Trp Gly Asn Ser Arg Tyr Leu Leu Pro Gly Ser Thr
41/86 845 850 ■ 855
Gin Phe Phe Leu Arg Thr Pro Thr Tyr Asn Leu Lys Tyr Asn Ser
860 865 870
Pro Gly Met Thr Arg Ser Asn Val Leu Phe Thr Ser Arg Tyr Gly
875 880 885 His Leu
<210> 21
<211> 423
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3742459CD1
<400> 21
Met Asn Ala Leu Leu Leu Ser Ala Trp Phe Gly His Leu Arg He
1 5 10 15
Leu Gin He Leu Val Asn Ser Gly Ala Lys He His Cys Glu Ser
20 25 30
Lys Asp Gly Leu Thr Leu Leu His Cys Ala Ala Gin Lys Gly His
35 40 45
Val Pro Val Leu Ala Phe He Met Glu Asp Leu Glu Asp Val Ala
50 55 60
Leu Asp His Val Asp Lys Leu Gly Arg Thr Ala Phe His Arg Ala
65 70 75
Ala Glu His Gly Gin Leu Asp Ala Leu Asp Phe Leu Val Gly Ser
80 85 90
Gly Cys Asp His Asn Val Lys Asp Lys Glu Gly Asn Thr Ala Leu
95 100 105
His Leu Ala Ala Gly Arg Gly His Met Ala Val Leu Gin Arg Leu
110 115 120
Val Asp He Gly Leu Asp Leu Glu Glu Gin Asn Ala Glu Gly Leu
125 130 135
Thr Ala Leu His Ser Ala Ala Gly Gly Ser His Pro Asp Cys Val
140 145 150
Gin Leu Leu Leu Arg Ala Gly Ser Thr Val Asn Ala Leu Thr Gin
155 160 165
Lys Asn Leu Ser Cys Leu His Tyr Ala Ala Leu Ser Gly Ser Glu
170 175 180
Asp Val Ser Arg Val Leu He His Ala Gly Gly Cys Ala Asn Val
185 190 195
Val Asp His Gin Gly Ala Ser Pro Leu His Leu Ala Val Arg His
200 205 210
Asn Phe Pro Ala Leu Val Arg Leu Leu He Asn Ser Asp Ser Asp
215 220 225
Val Asn Ala Val Asp Asn Arg Gin Gin Thr Pro Leu His Leu Ala
230 235 240
Ala Glu His Ala Trp Gin Asp He Ala Asp Met Leu Leu He Ala
245 250 255
Gly Val Asp Leu Asn Leu Arg Asp Lys Gin Gly Lys Thr Ala Leu
260 265 270
Ala Val Ala Val Arg Ser Asn His Val Ser Leu Val Asp Met He
275 280 285
He Lys Ala Asp Arg Phe Tyr Arg Trp Glu Lys Asp His Pro Ser
290 295 300
Asp Pro Ser Gly Lys Ser Leu Ser Phe Lys Gin Asp His Arg Gin
305 310 315
Glu Thr Gin Gin Leu Arg Ser Val Leu Trp Arg Leu Ala Ser Arg
320 325 330
Tyr Leu Gin Pro Arg Glu Trp Lys Lys Leu Ala Tyr Ser Trp Glu
42/86 335 340 345
Phe Thr Glu Ala His Val Asp Ala He Glu Gin Gin Trp Thr Gly
350 355 , 360
Thr Arg Ser Tyr Gin Glu His Gly His Arg Met Leu Leu He Trp
365 370 375
Leu His Gly Val Ala Thr Ala Gly Glu Asn Pro Ser Lys Ala Leu
380 385 390
Phe Glu Gly Leu Val Ala He Gly Arg Arg Asp Leu Ala Glu Asn
395 400 405
He Arg Lys Lys Ala Asn Ala Ala Pro Ser Ala Pro Arg Arg Cys
410 415 420 Thr Ala Met
<210> 22
<211> 916
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> incyte ID No: 7468507CD1
<400> 22
Met Glu Val Glu Ser Leu Asn Lys Met Leu Glu Glu Leu Arg Leu
1 5 10 15
Glu Arg Lys Lys Leu He Glu Asp Tyr Glu Gly Lys Leu Asn Lys
20 25 30
Ala Gin Ser Phe Tyr Glu Arg Glu Leu Asp Thr Leu Lys Arg Ser
35 40 45
Gin Leu Phe Thr Ala Glu Ser Leu Gin Ala Ser Lys Glu Lys Glu
50 55 60
Ala Asp Leu Arg Lys Glu Phe Gin Gly Gin Glu Ala He Leu Arg
65 70 75
Lys Thr He Gly Lys Leu Lys Thr Glu Leu Gin Met Val Gin Asp
80 85 90
Glu Ala Gly Ser Leu Leu Asp Lys Cys Gin Lys Leu Gin Thr Ala
95 100 105
Leu Ala He Ala Glu Asn Asn Val Gin Val Leu Gin Lys Gin Leu
110 115 120
Asp Asp Ala Lys Glu Gly Glu Met Ala Leu Leu Ser Lys His Lys
125 130 135
Glu Val Glu Ser Glu Leu Ala Ala Ala Arg Glu Arg Leu Gin Gin
140 145 150
Gin Ala Ser Asp Leu Val Leu Lys Ala Ser His He Gly Met Leu
155 160 165
Gin Ala Thr Gin Met Thr Gin Glu Val Thr He Lys Asp Leu Glu
170 175 180
Ser Glu Lys Ser Arg Val Asn Glu Arg Leu Ser Gin Leu Glu Glu
185 190 195
Glu Arg Ala Phe Leu Arg Ser Lys Thr Gin Ser Leu Asp Glu Glu
200 205 210
Gin Lys Gin Gin He Leu Glu Leu Glu Lys Lys Val Asn Glu Ala
215 220 225
Lys Arg Thr Gin Gin Glu Tyr Tyr Glu Arg Glu Leu Lys Asn Leu
230 235 240
Gin Ser Arg Leu Glu Glu Glu Val Thr Gin Leu Asn Glu Ala His
245 250 255
Ser Lys Thr Leu Glu Glu Leu Ala Trp Lys His His Met Ala He
260 265 270
Glu Ala Val His Ser Asn Ala He Arg Asp Lys Lys Lys Leu Gin
275 280 285
Met Asp Leu Glu Glu Gin His Asn Lys Asp Lys Leu Asn Leu Glu
43/86 290 295 300
Glu Asp Lys Asn Gin Leu Gin Gin Glu Leu Glu Asn Leu Lys Glu
305 310 315
Val Leu Glu Asp Lys Leu Asn Thr Ala Asn Gin Glu He Gly His
320 325 330
Leu Gin Asp Met Val Arg Lys Ser Glu Gin Gly Leu Gly Ser Ala
335 340 345
Glu Gly Leu He Ala Ser Leu Gin Asp Ser Gin Glu Arg Leu Gin
350 355 360
Asn Glu Leu Asp Leu Thr Lys Asp Ser Leu Lys Glu Thr Lys Asp
365 370 375
Ala Leu Leu Asn Val Glu Gly Glu Leu Glu Gin Glu Arg Gin Gin
380 385 390
His Glu Glu Thr He Ala Ala Met Lys Glu Glu Glu Lys Leu Lys
395 400 405
Val Asp Lys Met Ala His Asp Leu Glu He Lys Trp Thr Glu Asn
410 415 420
Leu Arg Gin Glu Cys Ser Lys Leu Arg Glu Glu Leu Arg Leu Gin
425 430 435
His Glu Glu Asp Lys Lys Ser Ala Met Ser Gin Leu Leu Gin Leu
440 445 450
Lys Asp Arg Glu Lys Asn Ala Ala Arg Asp Ser Trp Gin Lys Lys
455 460 465
Val Glu Asp Leu Leu Asn Gin He Ser Leu Leu Lys Gin Asn Leu
470 475 480
Glu He Gin Leu Ser Gin Ser Gin Thr Ser Leu Gin Gin Leu Gin
485 490 495
Ala Gin Phe Thr Gin Glu Arg Gin Arg Leu Thr Gin Glu Leu Glu
500 505 510
Glu Leu Glu Glu Gin His Gin Gin Arg His Lys Ser Leu Lys Glu
515 520 525
Ala His Val Leu Ala Phe Gin Thr Met Glu Glu Glu Lys Glu Lys
530 535 540
Glu Gin Arg Ala Leu Glu Asn His Leu Gin Gin Lys His Ser Ala
545 550 555
Glu Leu Gin Ser Leu Lys Asp Ala His Arg Glu Ser Met Glu Gly
560 565 570
Phe Arg He Glu Met Glu Gin Glu Leu Gin Thr Leu Arg Phe Glu
575 580 585
Leu Glu Asp Glu Gly Lys Ala Met Leu Ala Ser Leu Arg Ser Glu
590 595 600
Leu Asn His Gin His Ala Ala Ala He Asp Leu Leu Arg His Asn
605 610 615
His His Gin Glu Leu Ala Ala Ala Lys Met Glu Leu Glu Arg Ser
620 625 630
He Asp He Ser Arg Arg Gin Ser Lys Glu His He Cys Arg He
635 640 645
Thr Asp Leu Gin Glu Glu Leu Arg His Arg Glu His His He Ser
650 655 660
Glu Leu Asp Lys Glu Val Gin His Leu His Glu Asn He Ser Ala
665 670 675
Leu Thr Lys Glu Leu Glu Phe Lys Gly Lys Glu He Leu Arg He
680 685 690
Arg Ser Glu Ser Asn Gin Gin lie Arg Leu His Glu Gin Asp Leu
695 700 705
Asn Lys Arg Leu Glu Lys Glu Leu Asp Val Met Thr Ala Asp His
710 715 720
Leu Arg Glu Lys Asn He Met Arg Ala Asp Phe Asn Lys Thr Asn
725 730 735
Glu Leu Leu Lys Glu He Asn Ala Ala Leu Gin Val Ser Leu Glu
740 745 750
Glu Met Glu Glu Lys Tyr Leu Met Arg Glu Ser Lys Pro Glu Asp
755 760 765
44/86 He Gin Met He Thr Glu Leu Lys Ala Met Leu Thr Glu Arg Asp
770 775 780
Gin He He Lys Lys Leu He Glu Asp Asn Lys Phe Tyr Gin Leu
785 790 795
Glu Leu Val Asn Arg Glu Thr Asn Phe Asn Lys Val Phe Asn Ser
800 805 810
Ser Pro Thr Val Gly Val He Asn Pro Leu Ala Lys Gin Lys Lys
815 820 825
Lys Asn Asp Lys Ser Pro Thr Asn Arg Phe Val Ser Val Pro Asn
830 835 840
Leu Ser Ala Leu Glu Ser Gly Gly Val Gly Asn Gly His Pro Asn
845 850 855
Arg Leu Asp Pro He Pro Asn Ser Pro Val His Asp He Glu Phe
860 865 870
Asn Ser Ser Lys Pro Leu Pro Gin Pro Val Pro Pro Lys Gly Pro
875 880 885
Lys Thr Phe Leu Ser Pro Ala Gin Ser Glu Ala Ser Pro Val Ala
890 895 900
Ser Pro Asp Pro Gin Arg Gin Glu Trp Phe Ala Arg Tyr Phe Thr
905 910 915 Phe
<210> 23
<211> 399
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3049682CD1
<400> 23
Met Asp Ser Gin Arg Pro Glu Pro Arg Glu Glu Glu Glu Glu Glu
1 5 10 15
Gin Glu Leu Arg Trp Met Glu Leu Asp Ser Glu Glu Ala Leu Gly
20 25 30
Thr Arg Thr Glu Gly Pro Ser Val Val Gin Gly Trp Gly His Leu
35 40 45
Leu Gin Ala Val Trp Arg Gly Pro Ala Gly Leu Val Thr Gin Leu
50 55 60
Leu Arg Gin Gly Ala Ser Val Glu Glu Arg Asp His Ala Gly Arg
65 70 75
Thr Pro Leu His Leu Ala Val Leu Arg Gly His Ala Pro Leu Val
80 85 90
Arg Leu Leu Leu Gin Arg Gly Ala Pro Val Gly Ala Val Asp Arg
95 100 105
Ala Gly Arg Thr Ala Leu His Glu Ala Ala Trp His Gly His Ser
110 115 120
Arg Val Ala Glu Leu Leu Leu Gin Arg Gly Ala Ser Ala Ala Ala
125 130 135
Arg Ser Gly Thr Gly Leu Thr Pro Leu His Trp Ala Ala Ala Leu
140 145 150
Gly His Thr Leu Leu Ala Ala Arg Leu Leu Glu Ala Pro Gly Pro
155 160 165
Gly Pro Ala Ala Ala Glu Ala Glu Asp Ala Arg Gly Trp Thr Ala
170 175 180
Ala His Trp Ala Ala Ala Gly Gly Arg Leu Ala Val Leu Glu Leu
185 190 195
Leu Ala Ala Gly Gly Ala Gly Leu Asp Gly Ala Leu Leu Val Ala
200 205 210
Ala Ala Ala Gly Arg Gly Ala Ala Leu Arg Phe Leu Leu Ala Arg
215 220 225
45/86 Gly Ala Arg Val Asp Ala Arg Asp Gly Ala Gly Ala Thr Ala Leu
230 235 240
Gly Leu Ala Ala Ala Leu Gly Arg Ser Gin Asp He Glu Val Leu
245 250 255
Leu Gly His Gly Ala Asp Pro Gly He Arg Asp Arg His Gly Arg
260 265 270
Ser Ala Leu His Arg Ala Ala Ala Arg Gly His Leu Leu Ala Val
275 280 285
Gin Leu Leu Val Thr Gin Gly Ala Glu Val Asp Ala Arg Asp Thr
290 295 300
Leu Gly Leu Thr Pro Leu His His Ala Ser Arg Glu Gly His Val
305 310 315
Glu Val Ala Gly Cys Leu Leu Asp Arg Gly Ala Gin Val Asp Ala
320 325 330
Thr Gly Trp Leu Arg Lys Thr Pro Leu His Leu Ala Ala Glu Arg
335 340 345
Gly His Gly Pro Thr Val Gly Leu Leu Leu Ser Arg Gly Ala Ser
350 355 360
Pro Thr Leu Arg Thr Gin Trp Ala Glu Val Ala Gin Met Pro Glu
365 370 375
Gly Asp Leu Pro Gin Ala Leu Pro Glu Leu Gly Gly Gly Glu Lys
380 385 390
Glu Cys Glu Gly He Glu Ser Thr Gly
395
<210> 24
<211> 617
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 914468CD1
<400> 24
Met Ala Pro Gly Ala Ala Asp Ala Gin He Gly Thr Ala Asp Pro
1 5 10 15
Gly Asp Phe Asp Gin Leu Thr Gin Cys Leu He Gin Ala Pro Ser
20 25 30
Asn Arg Pro Tyr Phe Leu Leu Leu Gin Gly Tyr Gin Asp Ala Gin
35 40 45
Asp Phe Val Val Tyr Val Met Thr Arg Glu Gin His Val Phe Gly
50 55 60
Arg Gly Gly Asn Ser Ser Gly Arg Gly Gly Ser Pro Ala Pro Tyr
65 70 75
Val Asp Thr Phe Leu Asn Ala Pro Asp He Leu Pro Arg His Cys
80 85 90
Thr Val Arg Ala Gly Pro Glu His Pro Ala Met Val Arg Pro Ser
95 100 105
Arg Gly Ala Pro Val Thr His Asn Gly Cys Leu Leu Leu Arg Glu
110 115 120
Ala Glu Leu His Pro Gly Asp Leu Leu Gly Leu Gly Glu His Phe
125 130 135
Leu Phe Met Tyr Lys Asp Pro Arg Thr Gly- Gly Ser Gly Pro Ala
140 145 150
Arg Pro Pro Trp Leu Pro Ala Arg Pro Gly Ala Thr Pro Pro Gly
155 160 165
Pro Gly Trp Ala Phe Ser Cys Arg Leu Cys Gly Arg Gly Leu Gin
170 175 180
Glu Arg Gly Glu Ala Leu Ala Ala Tyr Leu Asp Gly Arg Glu Pro
185 190 195
Val Leu Arg Phe Arg Pro Arg Glu Glu Glu Ala Leu Leu Gly Glu
200 205 210
46/86 He Val Arg Ala Ala Ala Ala Gly Ser Gly Asp Leu Pro Pro Leu
215 220 225
Gly Pro Ala Thr Leu Leu Ala Leu Cys Val Gin His Ser Ala Arg
230 235 240
Glu Leu Glu Leu Gly His Leu Pro Arg Leu Leu Gly Cys Leu Ala
245 250 255
Arg Leu He Lys Glu Ala Val Trp Glu Lys He Lys Glu He Gly
260 265 270
Asp Arg Gin Pro Glu Asn His Pro Glu Gly Val Pro Glu Val Pro
275 280 285
Leu Thr Pro Glu Ala Val Ser Val Glu Leu Arg Pro Leu Met Leu
290 295 300
Trp Met Ala Asn Thr Thr Glu Leu Leu Ser Phe Val Gin Glu Lys
305 310 315
Val Leu Glu Met Glu Lys Glu Ala Asp Gin Glu Asp Pro Gin Leu
320 325 330
Cys Asn Asp Leu Glu Leu Cys Asp Glu Ala Met Ala Leu Leu Asp
335 340 345
Glu Val He Met Cys Thr Phe- Gin Gin Ser Val Tyr Tyr Leu Thr
350 355 360
Lys Thr Leu Tyr Ser Thr Leu Pro Ala Leu Leu Asp Ser Asn Pro
365 370 375
Phe Thr Ala Gly Ala Glu Leu Pro Gly Pro Gly Ala Glu Leu Gly
380 385 390
Ala Met Pro Pro Gly Leu Arg Pro Thr Leu Gly Val Phe Gin Ala
395 400 405
Ala Leu Glu Leu Thr Ser Gin Cys Glu Leu His Pro Asp Leu Val
410 415 420
Ser Gin Thr Phe Gly Tyr Leu Phe Phe Phe Ser Asn Ala Ser Leu
425 430 435
Leu Asn Ser Leu Met Glu Arg Gly Gin Gly Arg Pro Phe Tyr Gin
440 445 450 Trp Ser Arg Ala Val Gin He Arg Thr Asn Leu Asp Leu Val Leu
455 460 465
Asp Trp Leu Gin Gly Ala Gly Leu Gly Asp He Ala Thr Glu Phe
470 475 480
Phe Arg Lys Leu Ser Met Ala Val Asn Leu Leu Cys Val Pro Arg
485 490 495 Thr Ser Leu Leu Lys Ala Ser Trp Ser Ser Leu Arg Thr Asp His
500 505 510
Pro Thr Leu Thr Pro Ala Gin Leu His His Leu Leu Ser His Tyr
515 520 525 Gin Leu Gly Pro Gly Arg Gly Pro Pro Ala Ala Trp Asp Pro Pro
530 535 540 Pro Ala Glu Arg Glu Ala Val Asp Thr Gly Asp He Phe Glu Ser
545 550 555 Phe Ser Ser His Pro Pro Leu He Leu Pro Leu Gly Ser Ser Arg
560 565 570 Leu Arg Leu Thr Gly Pro Val Thr Asp Asp Ala Leu His Arg Glu
575 580 585 Leu Arg Arg Leu Arg Arg Leu Leu Trp Asp Leu Glu Gin Gin Glu
590 595 600 Leu Pro Ala Asn Tyr Arg His Pro Gly Gly Pro Pro Val Ala Thr
605 610 615 Ser Pro
<210> 25
<211> 305
<212> PRT
<213> Homo sapiens
<220>
47/86 <221> misc_feature
<223> Incyte ID No: 2673631CD1
<400> 25
Met Asp Phe He Ser He Gin Gin Leu Val Ser Gly Glu Arg Val
1 5 10 15
Glu Gly Lys Val Leu Gly Phe Gly His Gly Val Pro Asp Pro Gly
20 25 30
Ala Trp Pro Ser Asp Trp Arg Arg Gly Pro Gin Glu Ala Val Ala
35 40 45
Arg Glu Lys Leu Lys Leu Glu Glu Glu Lys Lys Lys Lys Leu Glu
50 55 60
Arg Phe Asn Ser Thr Arg Phe Asn Leu Asp Asn Leu Ala Asp Leu
65 70 75
Glu Asn Leu Val Gin Arg Arg Lys Lys Arg Leu Arg His Arg Val
80 85 90
Pro Pro Arg Lys Pro Glu Pro Leu Val Lys Pro Gin Ser Gin Ala
95 100 105
Gin Val Glu Pro Val Gly Leu Glu Met Phe Leu Lys Ala Ala Ala
110 115 120
Glu Asn Gin Glu Tyr Leu He Asp Lys Tyr Leu Thr Asp Gly Gly
125 130 135
Asp Pro Asn Ala His Asp Lys Leu His Arg Thr Ala Leu His Trp
140 ' 145 150
Ala Cys Leu Lys Gly His Ser Gin Leu Val Asn Lys Leu Leu Val
155 160 165
Ala Gly Ala Thr Val Asp Ala Arg Asp Leu Leu Asp Arg Thr Pro
170 175 180
Val Phe Trp Ala Cys Arg Gly Gly His Leu Val He Leu Lys Gin
185 190 195
Leu Leu Asn Gin Gly Ala Arg Val Asn Ala Arg Asp Lys He Gly
200 205 210
Ser Thr Pro Leu His Val Ala Val Arg Thr Arg His Pro Asp Cys
215 220 225
Leu Glu His Leu He Glu Cys Gly Ala His Leu Asn Ala Gin Asp
230 235 240
Lys Glu Gly Asp Thr Ala Leu His Glu Ala Val Arg His Gly Ser
245 250 255
Tyr Lys Ala Met Lys Leu Leu Leu Leu Tyr Gly Ala Glu Leu Gly
260 265 270
Val Arg Asn Ala Ala Ser Val Thr Pro Val Gin Leu Ala Arg Asp
275 280 285
Trp Gin Arg Gly He Arg Glu Ala Leu Gin Ala His Val Ala His
290 295 300
Pro Arg Thr Arg Cys
305
<210> 26
<211> 1715
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2755454CD1
<400> 26
Met Ser Val Leu He Ser Gin Ser Val He Asn Tyr Val Glu Glu
1 5 10 15
Glu Asn He Pro Ala Leu Lys Ala Leu Leu Glu Lys Cys Lys Asp
20 25 30
Val Asp Glu Arg Asn Glu Cys Gly Gin Thr Pro Leu Met He Ala
35 40 45
48/86 Ala Glu Gin Gly Asn Leu Glu He Val Lys Glu Leu He Lys Asn
50 55 60
Gly Ala Asn Cys Asn Leu Glu Asp Leu Asp Asn Trp Thr Ala Leu
65 70 75
He Ser Ala Ser Lys Glu Gly His Val His He Val Glu Glu Leu
80 85 90
Leu Lys Cys Gly Val Asn Leu Glu His Arg Asp Met Gly Gly Trp
95 100 105
Thr Ala Leu Met Trp Ala Cys Tyr Lys Gly Arg Thr Asp Val Val
110 115 120
Glu Leu Leu Leu Ser His Gly Ala Asn Pro Ser Val Thr Gly Leu
125 130 135
Gin Tyr Ser Val Tyr Pro He He Trp Ala Ala Gly Arg Gly His
140 145 150
Ala Asp He Val His Leu Leu Leu Gin Asn Gly Ala Lys Val Asn
155 160 165
Cys Ser Asp Lys Tyr Gly Thr Thr Pro Leu Val Trp Ala Ala Arg
170 175 180
Lys Gly His Leu Glu Cys Val Lys His Leu Leu Ala Met Gly Ala
185 190 195
Asp Val Asp Gin Glu Gly Ala Asn Ser Met Thr Ala Leu He Val
200 205 210
Ala Val Lys Gly Gly Tyr Thr Gin Ser Val Lys Glu He Leu Lys
215 220 225
Arg Asn Pro Asn Val Asn Leu Thr Asp Lys Asp Gly Asn Thr Ala
230 235 240
Leu Met He Ala Ser Lys Glu Gly His Thr Glu He Val Gin Asp
245 250 255
Leu Leu Asp Ala Gly Thr Tyr Val Asn He Pro Asp Arg Ser Gly
260 265 270
Asp Thr Val Leu He Gly Ala Val Arg Gly Gly His Val Glu He
275 280 285
Val Arg Ala Leu Leu Gin Lys Tyr Ala Asp He Asp He Arg Gly
290 295 300
Gin Asp Asn Lys Thr Ala Leu Tyr Trp Ala Val Glu Lys Gly Asn
305 310 315
Ala Thr Met Val Arg Asp He Leu Gin Cys Asn Pro Asp Thr Glu
320 325 330
He Cys Thr Lys Asp Gly Glu Thr Pro Leu He Lys Ala Thr Lys
335 340 345
Met Arg Asn He Glu Val Val Glu Leu Leu Leu Asp Lys Gly Ala
350 355 360
Lys Val Ser Ala Val Asp Lys Lys Gly Asp Thr Pro Leu His He
365 370 375
Ala He Arg Gly Arg Ser Arg Lys Leu Ala Glu Leu Leu Leu Arg
380 385 390
Asn Pro Lys Asp Gly Arg Leu Leu Tyr Arg Pro Asn Lys Ala Gly
395 400 405
Glu Thr Pro Tyr Asn He Asp Cys Ser His Gin Lys Ser He Leu
410 415 420
Thr Gin He Phe Gly Ala Arg His Leu Ser Pro Thr Glu Thr Asp
425 430 435
Gly Asp Met Leu Gly Tyr Asp Leu Tyr Ser Ser Ala Leu Ala Asp
440 445 450
He Leu Ser Glu Pro Thr Met Gin Pro Pro He Cys Val Gly Leu
455 460 465
Tyr Ala Gin Trp Gly Ser Gly Lys Ser Phe Leu Leu Lys Lys Leu
470 475 480
Glu Asp Glu Met Lys Thr Phe Ala Gly Gin Gin He Glu Pro Leu
485 490 495
Phe Gin Phe Ser Trp Leu He Val Phe Leu Thr Leu Leu Leu Cys
500 505 510
Gly Gly Leu Gly Leu Leu Phe Ala Phe Thr Val His Pro Asn Leu
49/86 515 520 525
Gly He Ala Val Ser Leu Ser Phe Leu Ala Leu Leu Tyr He Phe
530 535 540
Phe He Val He Tyr Phe Gly Gly Arg Arg Glu Gly Glu Ser Trp
545 550 555
Asn Trp Ala Trp Val Leu Ser Thr Arg Leu Ala Arg His He Gly
560 565 570
Tyr Leu Glu Leu Leu Leu Lys Leu Met Phe Val Asn Pro Pro Glu
575 580 585
Leu Pro Glu Gin Thr Thr Lys Ala Leu Pro Val Arg Phe Leu Phe
590 595 600
Thr Asp Tyr Asn Arg Leu Ser Ser Val Gly Gly Glu Thr Ser Leu
605 610 615
Ala Glu Met He Ala Thr Leu Ser Asp Ala Cys Glu Arg Glu Phe
620 625 630
Gly Phe Leu Ala Thr Arg Leu Phe Arg Val Phe Lys Thr Glu Asp
635 640 645
Thr Gin Gly Lys Lys Lys Trp Lys Lys Thr Cys Cys Leu Pro Ser
650 655 660
Phe Val He Phe Leu Phe He He Gly Cys He He Ser Gly He
665 670 675
Thr Leu Leu Ala He Phe Arg Val Asp Pro Lys His Leu Thr Val
680 685 690
Asn Ala Val Leu He Ser He Ala Ser Val Val Gly Leu Ala Phe
695 700 705
Val Leu Asn Cys Arg Thr Trp Trp Gin Val Leu Asp Ser Leu Leu
710 715 720
Asn Ser Gin Arg Lys Arg Leu His Asn Ala Ala Ser Lys Leu His
725 730 735
Lys Leu Lys Ser Glu Gly Phe Met Lys Val Leu Lys Cys Glu Val
740 745 750
Glu Leu Met Ala Arg Met Ala Lys Thr He Asp Ser Phe Thr Gin
755 760 765
Asn Gin Thr Arg Leu Val Val He He Asp Gly Leu Asp Ala Cys
770 775 780
Glu Gin Asp Lys Val Leu Gin Met Leu Asp Thr Val Arg Val Leu
785 790 795
Phe Ser Lys Gly Pro Phe He Ala He Phe Ala Ser Asp Pro His
800 805 810
He He He Lys Ala He Asn Gin Asn Leu Asn Ser Val Leu Arg
815 820 825
Asp Ser Asn He Asn Gly His Asp Tyr Met Arg Asn He Val His
830 835 840
Leu Pro Val Phe Leu Asn Ser Arg Gly Leu Ser Asn Ala Arg Lys
845 850 855
Phe Leu Val Thr Ser Ala Thr Asn Gly Asp Val Pro Cys Ser Asp
860 865 870
Thr Thr Gly He Gin Glu Asp Ala Asp Arg Arg Val Ser Gin Asn
875 880 885
Ser Leu Gly Glu Met Thr Lys Leu Gly Ser Lys Thr Ala Leu Asn
890 895 900 Arg Arg Asp Thr Tyr Arg Arg Arg Gin Met Gin Arg Thr He Thr
905 910 915 Arg Gin Met Ser Phe Asp Leu Thr Lys Leu Leu Val Thr Glu Asp
920 925 930 Trp Phe Ser Asp He Ser Pro Gin Thr Met Arg Arg Leu Leu Asn
935 940 945 He Val Ser Val Thr Gly Arg Leu Leu Arg Ala Asn Gin He Ser
950 955 960 Phe Asn Trp Asp Arg Leu Ala Ser Trp He Asn Leu Thr Glu Gin
965 970 975 Trp Pro Tyr Arg Thr Ser Trp Leu He Leu Tyr Leu Glu Glu Thr
980 985 990
50/86 Glu Gly He Pro Asp Gin Met Thr Leu Lys Thr He Tyr Glu Arg
995 1000 1005
He Ser Lys Asn He Pro Thr Thr Lys Asp Val Glu Pro Leu Leu
1010 1015 1020
Glu He Asp Gly Asp He Arg Asn Phe Glu Val Phe Leu Ser Ser
1025 1030 1035
Arg Thr Pro Val Leu Val Ala Arg Asp Val Lys Val Phe Leu Pro
1040 1045 1050
Cys Thr Val Asn Leu Asp Pro Lys Leu Arg Glu He He Ala Asp
1055 1060 1065
Val Arg Ala Ala Arg Glu Gin He Ser He Gly Gly Leu Ala Tyr
1070 1075 1080
Pro Pro Leu Pro Leu His Glu Gly Pro Pro Arg Ala Pro Ser Gly
1085 1090 1095
Tyr Ser Gin Pro Pro Ser Val Cys Ser Ser Thr Ser Phe Asn Gly
1100 1105 1110
Pro Phe Ala Gly Gly Val Val Ser Pro Gin Pro His Ser Ser Tyr
1115 1120 1125
Tyr Ser Gly Met Thr Gly Pro Gin His Pro Phe Tyr Asn Arg Gly
1130 1135 1140
Ser Gly Pro Ala Pro Gly Pro Val Val Leu Leu Asn Ser Leu Asn
1145 1150 1155
Val Asp Ala Val Cys Glu Lys Leu Lys Gin He Glu Gly Leu Asp
1160 1165 1170
Gin Ser Met Leu Pro Gin Tyr Cys Thr Thr He Lys Lys Ala Asn
1175 1180 1185
He Asn Gly Arg Val Leu Ala Gin Cys Asn He Asp Glu Leu Lys
1190 1195 1200
Lys Glu Met Asn Met Asn Phe Gly Asp Trp His Leu Phe Arg Ser
1205 1210 1215
Thr Val Leu Glu Met Arg Asn Ala Glu Ser His Val Val Pro Glu
1220 1225 1230
Asp Pro Arg Phe Leu Ser Glu Ser Ser Ser Gly Pro Ala Pro His
1235 1240 1245
Gly Glu Pro Ala Arg Arg Ala Ser His Asn Glu Leu Pro His Thr
1250 1255 1260
Glu Leu Ser Ser Gin Thr Pro Tyr Thr Leu Asn Phe Ser Phe Glu
1265 1270 1275
Glu Leu Asn Thr Leu Gly Leu Asp Glu Gly Ala Pro Arg His Ser
1280 1285 1290
Asn Leu Ser Trp Gin Ser Gin Thr Arg Arg Thr Pro Ser Leu Ser
1295 1300 1305
Ser Leu Asn Ser Gin Asp Ser Ser He Glu He Ser Lys Leu Thr
1310 1315 1320 Asp Lys Val Gin Ala Glu Tyr Arg Asp Ala Tyr Arg Glu Tyr He
1325 1330 1335 Ala Gin Met Ser Gin Leu Glu Gly Gly Pro Gly Ser Thr Thr He
1340 1345 1350
Ser Gly Arg Ser Ser Pro His Ser Thr Tyr Tyr Met Gly Gin Ser
1355 1360 1365 Ser Ser Gly Gly Ser He His Ser Asn Leu Glu Gin Glu Lys Gly
1370 1375 1380
Lys Asp Ser Glu Pro Lys Pro Asp Asp Gly Arg Lys Ser Phe Leu
1385 1390 1395 Met Lys Arg Gly Asp Val He Asp Tyr Ser Ser Ser Gly Val Ser
1400 1405 1410 Thr Asn Asp Ala Ser Pro Leu Asp Pro He Thr Glu Glu Asp Glu
1415 1420 1425 Lys Ser Asp Gin Ser Gly Ser Lys Leu Leu Pro Gly Lys Lys Ser
1430 1435 1440
Ser Glu Arg Ser Ser Leu Phe Gin Thr Asp Leu Lys Leu Lys Gly
1445 1450 1455 Ser Gly Leu Arg Tyr Gin Lys Leu Pro Ser Asp Glu Asp Glu Ser
51/86 1460 1465 1470
Gly Thr Glu Glu Ser Asp Asn Thr Pro Leu Leu Lys Asp Asp Lys
1475 1480 1485
Asp Arg Lys Ala Glu Gly Lys Val Glu Arg Val Pro Lys Ser Pro
1490 1495 1500
Glu His Ser Ala Glu Pro He Arg Thr Phe He Lys Ala Lys Glu
1505 ' 1510 1515
Tyr Leu Ser Asp Ala Leu Leu Asp Lys Lys Asp Ser Ser Asp Ser
1520 1525 1530
Gly Val Arg Ser Ser Glu Ser Ser Pro Asn His Ser Leu His Asn
1535 1540 1545
Glu Val Ala Asp Asp Ser Gin Leu Glu Lys Ala Asn Leu He Glu
1550 1555 1560
Leu Glu Asp Asp Ser His Ser Gly Lys Arg Gly He Pro His Ser
1565 1570 1575
Leu Ser Gly Leu Gin Asp Pro He He Ala Arg Met Ser He Cys
1580 1585 1590
Ser Glu Asp Lys Lys Ser Pro Ser Glu Cys Ser Leu He Ala Ser
1595 1600 1605
Ser Pro Glu Glu Asn Trp Pro Ala Cys Gin Lys Ala Tyr Asn Leu
1610 1615 1620
Asn Arg Thr Pro Ser Thr Val Thr Leu Asn Asn Asn Ser Ala Pro
1625 1630 1635
Ala Asn Arg Ala Asn Gin Asn Phe Asp Glu Met Glu Gly He Arg
1640 1645 1650
Glu Thr Ser Gin Val He Leu Arg Pro Ser Ser Ser Pro Asn Pro
1655 1660 1665
Thr Thr He Gin Asn Glu Asn Leu Lys Ser Met Thr His Lys Arg
1670 1675 1680
Ser Gin Arg Ser Ser Tyr Thr Arg Leu Ser Lys Asp Pro Pro Glu
1685 1690 ' 1695
Leu His Ala Ala Ala Ser Ser Glu Ser Thr Gly Phe Gly Glu Glu
1700 1705 1710
Arg Glu Ser He Leu
1715
<210> 27
<211> 1392
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 5868348CD1
<400> 27
Met Ala Ser Val Lys Val Ala Val Arg Val Arg Pro Met Asn Arg
- 1 5 10 15
Arg Glu Lys Asp Leu Glu Ala Lys Phe He He Gin Met Glu Lys
20 25 30 Ser Lys Thr Thr He Thr Asn Leu Lys He Pro Glu Gly Gly Thr
35 40 45
Gly Asp Ser Gly Arg Glu Arg Thr Lys Thr Phe Thr Tyr Asp Phe
50 55 60
Ser Phe Tyr Ser Ala Asp Thr Lys Ser Pro Asp Tyr Val Ser Gin
65 70 75
Glu Met Val Phe Lys Thr Leu Gly Thr Asp Val Val Lys Ser Ala
<80 85 90
Phe Glu Gly Tyr Asn Ala Cys Val Phe Ala Tyr Gly Gin Thr Gly
95 100 105
Ser Gly Lys Ser Tyr Thr Met Met Gly Asn Ser Gly Asp Ser Gly
110 115 120
Leu He Pro Arg He Cys Glu Gly Leu Phe Ser Arg He Asn Glu
52/86 125 130 135
Thr Thr Arg Trp Asp Glu Ala Ser Phe Arg Thr Glu Val Ser Tyr
140 145 150
Leu Glu He Tyr Asn Glu Arg Val Arg Asp Leu Leu Arg Arg Lys
155 160 165
Ser Ser Lys Thr Phe Asn Leu Arg Val Arg Glu His Pro Lys Glu
170 175 180
Gly Pro Tyr Val Glu Asp Leu Ser Lys His Leu Val Gin Asn Tyr
185 190 195
Gly Asp Val Glu Glu Leu Met Asp Ala Gly Asn He Asn Arg Thr
200 205 210
Thr Ala Ala Thr Gly Met Asn Asp Val Ser Ser Arg Ser His Ala
215 220 225
He Phe Thr He Lys Phe Thr Gin Ala Lys Phe Asp Ser Glu Met
230 235 240
Pro Cys Glu Thr Val Ser Lys He His Leu Val Asp Leu Ala Gly
245 250 255
Ser Glu Arg Ala Asp Ala Thr Gly Ala Thr Gly Val Arg Leu Lys
260 265 270
Glu Gly Gly Asn He Asn Lys Ser Leu Val Thr Leu Gly Asn Val
275 280 285
He Ser Ala Leu Ala Asp Leu Ser Gin Asp Ala Ala Asn Thr Leu
290 295 300
Ala Lys Lys Lys Gin Val Phe Val Pro Tyr Arg Asp Ser Val Leu
305 310 315
Thr Trp Leu Leu Lys Asp Ser Leu Gly Gly Asn Ser Lys Thr He
320 325 330
Met He Ala Thr He Ser Pro Ala Asp Val Asn Tyr Gly Glu Thr
335 340 345
Leu Ser Thr Leu Arg Tyr Ala Asn Arg Ala Lys Asn He He Asn
350 355 360
Lys Pro Thr He Asn Glu Asp Ala Asn Val Lys Leu He Arg Glu
365 370 375
Leu Arg Ala Glu He Ala Arg Leu Lys Thr Leu Leu Ala Gin Gly
380 385 390
Asn Gin He Ala Leu Leu Asp Ser Pro Thr Ala Leu Ser Met Glu
395 400 405
Glu Lys Leu Gin Gin Asn Glu Ala Arg Val Gin Glu Leu Thr Lys
410 415 420
Glu Trp Thr Asn Lys Trp Asn Glu Thr Gin Asn He Leu Lys Glu
425 430 435
Gin Thr Leu Ala Leu Arg Lys Glu Gly He Gly Val Val Leu Asp
440 445 450
Ser Glu Leu Pro His Leu He Gly He Asp Asp Asp Leu Leu Ser
455 460 465
Thr Gly He He Leu Tyr His Leu Lys Glu Gly Gin Thr Tyr Val
470 475 480
Gly Arg Asp Asp Ala Ser Thr Glu Gin Asp He Val Leu His Gly
485 490 495
Leu Asp Leu Glu Ser Glu His Cys He Phe Glu Asn He Gly Gly
500 505 510
Thr Val Thr Leu He Pro Leu Ser Gly Ser Gin Cys Ser Val Asn
515 520 525
Gly Val Gin He Val Glu Ala Thr His Leu Asn Gin Gly Ala Val
530 535 540
He Leu Leu Gly Arg Thr Asn Met Phe Arg Phe Asn His Pro Lys
545 550 555
Glu Ala Ala Lys Leu Arg Glu Lys Arg Lys Ser Gly Leu Leu Ser
560 565 570
Ser Phe Ser Leu Ser Met Thr Asp Leu Ser Lys Ser Arg Glu Asn
575 580 585
Leu Ser Ala Val Met Leu Tyr Asn Pro Gly Leu Glu Phe Glu Arg
590 595 600
53/86 Gin Gin Arg Glu Glu Leu Glu Lys Leu Glu Ser Lys Arg Lys Leu
605 610 615
He Glu Glu Met Glu Glu Lys Gin Lys Ser Asp Lys Ala Glu Leu
620 625 630
Glu Arg Met Gin Gin Glu Val Glu Thr Gin Arg Lys Glu Thr Glu
635 640 645
He Val Gin Leu Gin He Arg Lys Gin Glu Glu Ser Leu Lys Arg
650 655 660
Arg Ser Phe His He Glu Asn Lys Leu Lys Asp Leu Leu Ala Glu
665 670 675
Lys Glu Lys Phe Glu Glu Glu Arg Leu Arg Glu Gin Gin Glu He
680 685 690
Glu Leu Gin Lys Lys Arg Gin Glu Glu Glu Thr Phe Leu Arg Val
695 700 705
Gin Glu Glu Leu Gin Arg Leu Lys Glu Leu Asn Asn Asn Glu Lys
710 715 720
Ala Glu Lys Phe Gin He Phe Gin Glu Leu Asp Gin Leu Gin Lys
725 730 735
Glu Lys Asp Glu Gin Tyr Ala Lys Leu Glu Leu Glu Lys Lys Arg
740 745 750
Leu Glu Glu Gin Glu Lys Glu Gin Val Met Leu Val Ala His Leu
755 760 765
Glu Glu Gin Leu Arg Glu Lys Gin Glu Met He Gin Leu Leu Arg
770 775 780
Arg Gly Glu Val Gin Trp Val Glu Glu Glu Lys Arg Asp Leu Glu
785 790 795
Gly He Arg Glu Ser Leu Leu Arg Val Lys Glu Ala Arg Ala Gly
800 805 810
Gly Asp Glu Asp Gly Glu Glu Leu Glu Lys Ala Gin Leu Arg Phe
815 820 825
Phe Glu Phe Lys Arg Arg Gin Leu Val Lys Leu Val Asn Leu Glu
830 835 ' 840
Lys Asp Leu Val Gin Gin Lys Asp He Leu Lys Lys Glu Val Gin
845 850 855
Glu Glu Gin Glu He Leu Glu Cys Leu Lys Cys Glu His Asp Lys
860 865 870
Glu Ser Arg Leu Leu Glu Lys His Asp Glu Ser Val Thr Asp Val
875 880 885
Thr Glu Val Pro Gin Asp Phe Glu Lys He Lys Pro Val Glu Tyr
890 895 900
Arg Leu Gin Tyr Lys Glu Arg Gin Leu Gin Tyr Leu Leu Gin Asn
905 910 915
His Leu Pro Thr Leu Leu Glu Glu Lys Gin Arg Ala Phe Glu He
920 925 930
Leu Asp Arg Gly Pro Leu Ser Leu Asp Asn Thr Leu Tyr Gin Val
935 940 945
Glu Lys Glu Met Glu Glu Lys Glu Glu Gin Leu Ala Gin Tyr Gin
950 955 960
Ala Asn Ala Asn Gin Leu Gin Lys Leu Gin Ala Thr Phe Glu Phe
965 970 975
Thr Ala Asn He Ala Arg Gin Glu Glu Lys Val Arg Lys Lys Glu
980 985 990
Lys Glu He Leu Glu Ser Arg Glu Lys Gin Gin Arg Glu Ala Leu
995 1000 1005
Glu Arg Ala Leu Ala Arg Leu Glu Arg Arg His Ser Ala Leu Gin
1010 1015 1020
Arg His Ser Thr Leu Gly Thr Glu He Glu Glu Gin Arg Gin Lys
1025 1030 1035
Leu Ala Ser Leu Asn Ser Gly Ser Arg Glu Gin Ser Gly Leu Gin
1040 1045 1050
Ala Ser Leu Glu Ala Glu Gin Glu Ala Leu Glu Lys Asp Gin Glu
1055 1060 1065
Arg Leu Glu Tyr Glu He Gin Gin Leu Lys Gin Lys He Tyr Glu
54/86 1070 1075 1080
Val Asp Gly Val Gin Lys Asp His His Gly Thr Leu Glu Gly Lys
1085 1090 1095
Val Ala Ser Ser Ser Leu Pro Val Ser Ala Glu Lys Ser His Leu
1100 1105 1110
Val Pro Leu Met Asp Ala Arg He Asn Ala Tyr He Glu Glu Glu
1115 1120 1125
Val Gin Arg Arg Leu Gin Asp Leu His Arg Val He Ser Glu Gly
1130 1135 1140
Cys Ser Thr Ser Ala Asp Thr Met Lys Asp Asn Glu Lys Leu His
1145 1150 1155
Lys Gly Thr He Gin Arg Lys Leu Lys Tyr Glu Leu Cys Arg Asp
1160 1165 1170
Leu Leu Cys Val Leu Met Pro Glu Pro Asp Ala Ala Ala Cys Ala
1175 1180 1185
Asn His Pro Leu Leu Gin Gin Asp Leu Val Gin Leu Ser Leu Asp
1190 1195 1200
Trp Lys Thr Glu He Pro Asp Leu Val Leu Pro Asn Gly Val Gin
1205 1210 1215
Val Ser Ser Lys Phe Gin Thr Thr Leu Val Asp Met He Tyr Phe
1220 1225 1230
Leu His Gly Asn Met Glu Val Asn Val Pro Ser Leu Ala Glu Val
1235 1240 1245
Gin Leu Leu Leu Tyr Thr Thr Val Lys Val Met Gly Asp Ser Gly
1250 1255 1260
His Asp Gin Cys Gin Ser Leu Val Leu Leu Asn Thr His He Ala
1265 1270 1275
Leu Val Lys Glu Asp Cys Val Phe Tyr Pro Arg He Arg Ser Arg
1280 1285 1290
Asn He Pro Pro Pro Gly Ala Gin Phe Asp Val He Lys Cys His
1295 1300 1305
Ala Leu Ser Glu Phe Arg Cys Val Val Val Pro Glu Lys Lys Asn
1310 1315 1320
Val Ser Thr Val Glu Leu Val Phe Leu Gin Lys Leu Lys Pro Ser
1325 1330 1335
Val Gly Ser Arg Asn Ser Pro Pro Glu His Leu Gin Glu Ala Pro
1340 1345 1350
Asn Val Gin Leu Phe Thr Thr Pro Leu Tyr Leu Gin Gly Ser Gin
1355 1360 1365
Asn Val Ala Pro Glu Val Trp Lys Leu Thr Phe Asn Ser Gin Asp
1370 1375 1380
Glu Ala Leu Trp Leu He Ser His Leu Thr Arg Leu
1385 1390
<210> 28
<211> 337
<212> PRT
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2055455CD1
<400> 28
Met Ala Glu Gly Gly Ser Pro Asp Gly Arg Ala Gly Pro Gly Leu
1 5 10 15
Arg Ser Ala Gly Arg Asn Leu Lys Glu Trp Leu Arg Glu Gin Phe
20 25 30 Cys Asp His Pro Leu Glu His Cys Glu Asp Thr Arg Leu His Asp
35 40 45 Ala Ala Tyr Val Gly Asp Leu Gin Thr Leu Arg Ser Leu Leu Gin
50 55 60 Glu Glu Ser Tyr Arg Ser Arg He Asn Glu Lys Ser Val Trp Cys
55/86 65 70 75
Cys Gly Trp Leu Pro Cys Thr Pro Leu Arg He Ala Ala Thr Ala
80 85 90
Gly His Gly Ser Cys Val Asp Phe Leu He Arg Lys Gly Ala Glu
95 100 105
Val Asp Leu Val Asp Val Lys Gly Gin Thr Ala Leu Tyr Val Ala
110 115 120
Val Val Asn Gly His Leu Glu Ser Thr Gin He Leu Leu Glu Ala
125 130 135
Gly Ala Asp Pro Asn Gly Ser Arg His His Arg Ser Thr Pro Val
140 145 150
Tyr His Ala Ser Arg Val Gly Arg Ala Asp He Leu Lys Ala Leu
155 160 165
He Arg Tyr Gly Ala Asp Val Asp Val Asn His His Leu Thr Pro
170 175 180
Asp Val Gin Pro Arg Phe Ser Arg Arg Leu Thr Ser Leu Val Val
185 190 195
Cys Pro Leu Tyr He Ser Ala Ala Tyr His Asn Leu Gin Cys Phe
200 205 210
Arg Leu Leu Leu Leu Ala Gly Ala Asn Pro Asp Phe Asn Cys Asn
215 220 225
Gly Pro Val Asn Thr Gin Gly Phe Tyr Arg Gly Ser Pro Gly Cys
230 235 240
Val Met Asp Ala Val Leu Arg His Gly Cys Glu Ala Ala Phe Val
245 250 255
Ser Leu Leu Val Glu Phe Gly Ala Asn Leu Asn Leu Val Lys Trp
260 265 270
Glu Ser Leu Gly Pro Glu Ser Arg Gly Arg Arg Lys Val Asp Pro
275 280 285
Glu Ala Leu Gin Val Phe Lys Glu Ala Arg Ser Val Pro Arg Thr
290 295 300
Leu Leu Cys Leu Cys Arg Val Ala Val Arg Arg Ala Leu Gly Lys
305 310 315
His Arg Leu His Leu He Pro Ser Leu Pro Leu Pro Asp Pro He
320 325 330
Lys Lys Phe Leu Leu His Glu
335
<210> 29
<211> 1685
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6582721CB1
<400> 29 accctaataa tgtgtatata aaggcaaacc aagctgtttg agtaggccgt tcaccatcag 60 agcatcaccg cagaaacaaa ggctccagcc tccggacacc atgtctgtgc gcttttcttc 120 tacctccagg agacttggct cttgcggggg cactggctct gtgaggctct ctagtggggg 180 agcaggcttt ggggctggaa acacatgcgg tgtgccaggc attggaagtg gcttctcttg 240 tgcttttggg ggcagctcat ctgcaggagg ctatggcgga ggtctgggcg ggggaagtgc 300 ttcctgtgct gccttcacag ggaatgagca cggcctcctc tctggcaatg agaaggtgac 360 catgcagaac ctcaacgacc gcttggcctc ctacctggag aatgttcgag ccctagagga 420 ggccaacgct gacttggagc agaagatcaa ggggtggtat gagaaatttg gacctggttc 480 ttgccgtggc cttgatcatg attacagcag atatttccca attattgacg aacttaagaa 540 ccagataatt tctgcaacta ccagtaatgc ccatgttgtc ctgcaaaatg ataatgcaag 600 actaacagct gatgacttca gactaaagtt tgaaaacgag ctagcgcttc accagagcgt 660 ggaggcggac atcaatagtt tgcgaagagt cctggatgag ctgaccttgt gcagaacgga 720 cctggagatc cagctggaaa ctctcagtga ggagctcgct tacctcaaga agaatcatga 780 ggaggaaatg aaagctcttc agtgcgcggc tggaggcaac gtgaacgtgg agatgaacgc 840 ggcccccggg gtagacctca cggttctgct gaacaatatg cgagctgagt acgaagccct 900
56/86 cgcagagcag aaccgcaggg acgcggaggc ctggttcaac gaaaagagcg cctcgctgca 960 gcagcagatc tctgacgacg ctggcgccac cacctcagcc cggaatgagc ttatcgagat 1020 gaaacgcact cttcaaaccc ttgagattga acttcagtcc ctcttagcaa cgaaacactc 1080 cctggagtgc tccttgacag agaccgagag taactactgt gcacagctgg cacagatcca 1140 ggctcagatc ggggccctgg aggagcagct gcaccaggtc agaaccgaga ccgagggcca 1200 gaagctcgag tatgagcagc tccttgacat caaggtccac ctggaaaaag aaattgagac 1260 ctactgcctc ctgatagatg gagaagatgg ctcctgttct aaatcaaaag gctatggagg 1320 cccaggaaat caaacaaaag attcatctaa aaccaccatt gtcaaaacag ttgttgaaga 1380 gatagatcct cgtggcaaag ttctctcatc cagagttcac actgtggaag agaaatccac 1440 caaagtcaac aacaagaatg aacagagggt gtcttcctga actccagcct ctgagacaga 1500 atggccccca aattaaaata ccaaaatgaa gctagtttcc taaataaggg_ tccccttatt 1560 tttctgcttt tcttccaatg aattaagaca agttattttt agaatagtac catttctttg 1620 gctttttctc tatggtggtg tttcaataaa agttcttcct gttgcaagtc aaaaaaaaaa 1680 aaaaa 1685
<210> 30
<211> 3147
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2828941CB1
<400> 30 ggcggaggtt acgccttccc tcatccccgg tagaggcagg gcgggactgt tgtggttgag 60 atgaaggcta gtaaatggtg aagtacttcc cggccagagg gcacctgcgc tcgggaggtt 120 tgggcggctt ggcgtcggag gagagcccca cccgcggagg aacccagcct tgccaacgga 180 gctggcggag ctcactcctc aggtcaggcg ggcggcgtag aaaacgcagc ggagccaggt 240 gaaaccaagg caccgccgtg gctggccccc gacagttcct ctagccggga ggttggagga 300 gctgaaaacg ccgcggagcc ctcggccgcc cgagcagggg ctggacccca gcccttgcag 360 cctcccttct cctggcaccc aagtgcagtc ctggctgcag aaggggccgc gggcgcactg 420 agtttccaac ctccatttca gcctgtctgt ctcagggtgc agccttaatg agaggtgatt 480 cctaagctgc tgggaacctg aggttgtcaa aggggcggca ggaaatggac agcagtataa 540 aacccagaag cagaacttga aggttaaacc actagcccat ttcacagaat gtttcatcca 600 tttgtggacc aaaagatgga gttggttttt atttttaaaa agataatgtt aatgatctga 660 taccactaca aatatttacg tgagaagatt catggacttg tcttttggtt ggactgtcac 720 tcatttctga aagtttcttc agccacaatt tctatttgaa aattcaagta tcaaaggata 780 ccaggtttag aatggtataa tgatgtattt tgtctgagga ctgcaaattt tatagagacc 840 acagttggat tccagtgata ttctgcaatc aaagtgattt gataaaccta attttgaagc 900 attttatatt tataagcgac atcaaaagat gggagaaaaa aatggcgatg caaaaacttt 960 ctggatggag ctagaagatg atggaaaagt ggacttcatt tttgaacaag tacaaaatgt 1020 gctgcagtca ctgaaacaaa agatcaaaga tgggtctgcc accaataaag aatacatcca 1080 agcaatgatt ctagtgaatg aagcaactat aattaacagt tcaacatcaa taaaggatcc 1140 tatgcctgtg actcagaagg aacaggaaaa caaatccaat gcatttccct ctacatcatg 1200 tgaaaactcc tttccagaag actgtacatt tctaacaaca ggaaataagg aaattctctc 1260 tcttgaagat aaagttgtag actttagaga aaaagactca tcttcgaatt tatcttacca 1320 aagtcatgac tgctctggtg cttgtctgat gaaaatgcca ctgaacttga agggagaaaa 1380 ccctctgcag ctgccaatca aatgtcactt ccaaagacga catgcaaaga caaactctca 1440 ttcttcagca ctccacgtga gttataaaac cccttgtgga aggagtctac gaaacgtgga 1500 ggaagttttt cgttacctgc ttgagacaga gtgtaacttt ttatttacag ataacttttc 1560 tttcaatacc tatgttcagt tggctcggaa ttacccaaag caaaaagaag ttgtttctga 1620 tgtggatatt agcaatggag tggaatcagt gcccatttct ttctgtaatg aaattgacag 1680 tagaaagctc ccacagttta agtacagaaa gactgtgtgg cctcgagcat ataatctaac 1740 caacttttcc agcatgttta ctgattcctg tgactgctct gagggctgca tagacataac 1800 aaaatgtgca tgtcttcaac tgacagcaag gaatgccaaa acttccccct tgtcaagtga 1860 caaaataacc actggatata aatataaaag actacagaga cagattccta ctggcattta 1920 tgaatgcagc cttttgtgca aatgtaatcg acaattgtgt caaaaccgag ttgtccaaca 1980 tggtcctcaa gtgaggttac aggtgttcaa aactgagcag aagggatggg gtgtacgctg 2040 tctagatgac attgacagag ggacatttgt ttgcatttat tcaggaagat tactaagcag 2100 agctaacact gaaaaatctt atggtattga tgaaaacggg agagatgaga atactatgaa 2160 aaatatattt tcaaaaaaga ggaaattaga agttgcatgt tcagattgtg aagttgaagt 2220 tctcccatta ggattggaaa cacatcctag aactgctaaa actgagaaat gtccaccaaa 2280
57/86 gttcagtaat aatcccaagg agcttactgt ggaaacgaaa tatgataata tttcaagaat 2340 tcaatatcat tcagttatta gagatcctga atccaagaca gccatttttc aacacaatgg 2400 gaaaaaaatg gaatttgttt cctcggagtc tgtcactcca gaagataatg atggatttaa 2460 accaccccga gagcatctga actctaaaac caagggagca caaaaggact caagttcaaa 2520 ccatgttgat gagtttgaag ataatctgct gattgaatca gatgtgatag atataactaa 2580 atatagagaa gaaactccac caaggagcag atgtaaccag gcgaccacat tggataatca 2640 gaatattaaa aaggcaattg aggttcaaat tcagaaaccc caagagggac gatctacagc 2700 atgtcaaaga cagcaggtat tttgtgatga agagttgcta agtgaaacca agaatacttc 2760 atctgattct ctaacaaagt tcaataaagg gaatgtgttt ttattggatg ccacaaaaga 2820 aggaaatgtc ggccgcttcc ttaatagtct cactttgtca ccagtggcac aatctcagct 2880 cactgcaacc tccgcttctg gggttcaagc aattctcatg cctcggcctc ctgagtagct 2940 gagattacag gcgttaatga atcacatgat gaatgtgtgg agatggcggc tagtgggcaa 3000 cagagcaata ctggaatagt gctaatatga ggaaatggta tcatctattt agaagcctcg 3060 gaacgacgat acataatgac tatcttcagc aaagaaattt gttgcttaca atatctcctc 3120 tccaaaaggc ttgtttgtta cagtgat 3147
<210> 31
<211> 5322
<212> DNA
<213> Homo sapiens
<220>
<221> misc_ .feature
<223> Incyte ID No: 6260407CB1
<400> 31 cggctcgagg , gccgctggcg gcctgttggc ttctccacag gcgcgctcgc cgttcaagcg 60 cgctttgtcc ccgccccaga tcctgggggg tgagcggtgg agaaggggcg ggcgcccgcg 120 agccgtgaat cacctcctcc tcttgctgcc tcagcgccgc cgccaccttt ccattcagtc 180 gcccaacatg gctggagcgc ggcggaggtg agccggccgc ccgcccgcag acgccccagc 240 ctactgcgcc cgagtcccgc ggccccagtg gcgcctcagc tctgcggtgc cgaggcccaa 300 cggctcgatc gctgcccgcc gccagcatgt tgggcgcccc ggacgagagc tccgtgcggg 360 tggctgtcag aataagacca cagcttgcca aagagaagat tgaaggatgc catatttgta 420 catctgtcac accaggagag cctcaggtct tcctagggaa agataaggct tttacttttg 480 actatgtatt tgacattgac tcccagcaag agcagatcta cattcaatgt atagaaaaac 540 taattgaagg ttgctttgaa ggatacaatg ctacagtttt tgcttatgga caaactggag 600 ctggtaaaac atacacaatg ggaacaggat ttgatgttaa cattgttgag gaagaactgg 660 gtattatttc tcgagctgtt aaacaccttt ttaagagtat tgaagaaaaa aaacacatag 720 caattaaaaa tgggcttcct gctccagatt ttaaagtgaa tgcccaattc ttagagctct 780 ataatgaaga ggtccttgac ttatttgata ccactcgtga tattgatgca aaaagtaaaa 840 aatcaaatat aagaattcat gaagattcaa ctggaggaat ttatactgtg ggcgttacaa 900 cacgtactgt gaatacagaa tcagagatga tgcagtgttt gaagttgggt gctttatccc 960 ggacaactgc cagtacccag atgaatgttc agagctctcg ttcacatgcc atttttacca 1020 ttcatgtgtg tcaaaccaga gtgtgtcccc aaatagatgc tgacaatgca actgataata 1080 aaattatttc tgaatcagca cagatgaatg aatttgaaac cctgactgca aagttccatt 1140 ttgttgatct cgcaggatct gaaagactga agcgtactgg agctacaggc gagagggcaa 1200 aagaaggcat ttctatcaac tgtggacttt tggcacttgg caatgtaata agtgccttgg 1260 gagacaagag caagagggcc acacatgtcc cctatagaga ttccaagcta acaagactac 1320 tacaggattc cctcgggggt aatagccaaa caatcatgat agcatgtgtc agcccttcag 1380 acagagactt tatggaaacg ttaaacaccc tgaaatacgc caatcgagct agaaatatca 1440 agaataaggt gatggtcaat caggacagag ctagtcagca aatcaatgca cttcgtagtg 1500 aaatcacacg acttcagatg gagctcatgg agtacaaaac aggtaaaaga ataattgacg 1560 aagagggtgt ggaaagcatc aatgacatgt ttcatgagaa tgctatgcta cagactgaaa 1620 ataataacct gcgtgtaaga attaaagcca tgcaagagac ggttgatgca ttgaggtcca 1680 gaattacaca gcttgttagt gatcaggcca accatgttct tgccagagca ggtgaaggaa 1740 a gaggagat tagtaatatg attcatagtt atataaaaga aatcgaagat ctcagggcaa 1800 aattattaga aagtgaagca gtgaatgaga accttcgaaa aaacttgaca agagccacag 1860 caagagcgcc atatttcagc ggatcatcaa ctttttctcc taccatacta tcctcagaca 1920 aagaaaccat tgaaattata gacctagcaa aaaaagattt agagaagttg aaaagaaaag 1980 aaaagaggaa gaaaaaaagg ctacagaaac ttgaggaaag caatcgagaa gaaagaagtg 2040 tggctggtaa agaggataat acagacactg accaagagaa gaaagaagaa aagggtgttt 2100 cggaaagaga aaacaatgaa ttagaagtgg aagaaagtca agaagtgagt gatcatgagg 2160 atgaagaaga ggaggaggag gaggaggaag atgacattga tgggggtgaa agttctgatg 2220
58/86 aatcagattc tgaatcagat gaaaaagcca attatcaagc agacttggca aacattactt 2280 gtgaaattgc aattaagcaa aagctgattg atgaactaga aaacagccag aaaagactgc 2340 agactctgaa aaagcagtat gaagagaagc taatgatgct gcaacataaa attcgggata 2400 ctcagcttga aagagaccag gtgcttcaaa acttaggctc ggtagaatct tactcagaag 2460 aaaaagcaaa aaaagttagg tctgaatatg aaaagaaact ccaagccatg aacaaagaac 2520 tgcagagact tcaagcagct caaaaagaac atgcaaggtt gcttaaaaat cagtctcagt 2580 atgaaaagca attgaagaaa ttgcagcagg atgtgatgga aatgaaaaaa acaaaggttc 2640 gcctaatgaa acaaatgaaa gaagaacaag agaaagccag actgactgag tctagaagaa 2700 acagagagat tgctcagttg aaaaaggatc aacgtaaaag agatcatcaa cttagacttc 2760 tggaagccca aaaaagaaac caagaagtgg ttctacgtcg caaaactgaa gaggttacgg 2820 ctcttcgtcg gcaagtaaga cccatgtcag ataaagtggc tgggaaagtt actcggaagc 2880 tgagttcatc tgatgcacct gctcaggaca caggttccag tgcagctgct gtcgaaacag 2940 atgcatcaag gacaggagcc cagcagaaaa tgagaattcc tgtggcgaga gtccaggcct 3000 taccaacgcc ggcaacaaat ggaaacagga aaaaatatca gaggaaagga ttgactggcc 3060 gagtgtttat ttccaagaca gctcgcatga agtggcagct ccttgagcgc agggtcacag 3120 acatcatcat gcagaagatg accatttcca acatggaggc agatatgaat agactcctca 3180 agcaacggga ggaactcaca aaaagacgag agaaactttc aaaaagaagg gagaagatag 3240 tcaaggagaa tggagaggga gataaaaatg tggctaatat caatgaagag atggagtcac 3300 tgactgctaa tatcgattac atcaatgaca gtatttctga ttgtcaggcc aacataatgc 3360 agatggaaga agcaaaggaa gaaggtgaga cattggatgt tactgcagtc attaatgcct 3420 gcacccttac agaagcccga tacctgctag atcacttcct gtcaatgggc atcaataagg 3480 gtcttcaggc tgcccagaaa gaggctcaaa ttaaagtact ggaaggtcga ctcaaacaaa 3540 cagaaataac cagtgctacc caaaaccagc tcttattcca tatgttgaaa gagaaggcag 3600 aattaaatcc tgagctagat gctttactag gccatgcttt acaagatcta gatagcgtac 3660 cattagaaaa tgtagaggat agtactgatg aggatgctcc tttaaacagc ccaggatcag 3720 aaggaagcac gctgtcttca gatctcatga agctttgtgg tgaagtgaaa cctaagaaca 3780 aggcccgaag gagaaccacc actcagatgg aattgctgta tgcagatagc agtgaactag 3840 cttcagacac tagtacagga gatgcctcct tgcctggccc tctcacacct gttgcagaag 3900 ggcaagagat tggaatgaat acagagacaa gtggtacttc tgctagggaa aaagagctct 3960 ctcccccacc tggcttacct tctaagatag gcagcatttc caggcagtca tctctatcag 4020 aaaaaaaaat tccagagcct tctcctgtaa caaggagaaa ggcatatgag aaagcagaaa 4080 aatcaaaggc caaggaacaa aagcagggca taatcaaccc atttcctgct tcaaaaggaa 4140 tcagagcttt tccacttcag tgtattcaca tagctgaagg gcatacaaaa gctgtgctct 4200 gtgtggattc tactgatgat ctcctcttca ctggatcaaa agatcgtact tgtaaagtat 4260 ggaatctggt gactgggcag gaaataatgt cactgggggg tcatcccaac aatgtcgtgt 4320 ctgtaaaata ctgtaattat accagtttgg tcttcactgt atcaacatct tatattaagg 4380 tgtgggatat cagagattca gcaaagtgca ttcgaacact aacgtcttca ggtcaagtta 4440 ctcttggaga tgcttgttct gcaagtacca gtcgaacagt agctattcct tctggagaga 4500 accagatcaa tcaaattgcc ctaaacccaa ctggcacctt cctctatgct gcttctggaa 4560 atgctgtcag gatgtgggat cttaaaaggt ttcagtctac aggaaagtta acaggacacc 4620 taggccctgt tatgtgcctt actgtggatc agatttccag tggacaagat ctaatcatca 4680 ctggctccaa ggatcattac atcaaaatgt ttgatgttac agaaggagct cttgggactg 4740 tgagtcccac ccacaatttt gaaccccctc attatgatgg catagaagca ctaaccattc 4800 aaggggataa cctatttagt gggtctagag ataatggaat caagaaatgg gacttaactc 4860 aaaaagacct tcttcagcaa gttccaaatg cacataagga ttgggtctgt gccttgggag 4920 tggtgccaga ccacccagtt ttgctcagtg gctgcagagg gggcattttg aaagtctgga 4980 acatggatac ttttatgcca gtgggagaga tgaagggtca tgatagtcct atcaatgcca 5040 tatgtgttaa ttccacccac atttttactg cagctgatga tcgaactgtg agaatttgga 5100 aggctcgcaa tttgcaagat ggtcagatct ctgacacagg agatctgggg gaagatattg 5160 ccagtaatta aacatgaatg aagataggtt gtaaactgaa tgctgtgata atactctgta 5220 ttctttatgg aaatgttgtc ctgtacttac taggccaacg tttaatcggt taccggactt 5280 ttcgtcccgg cgcatttagg tctaaacccg tctccttgtc ct 5322
<210> 32
<211> 931
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7488258CB1
<400> 32
59/86 gttgcaacca aacatgacac ttagcgtgct gagcaggaag gacaaggaaa gagtaattcg 60 cagactgtta ttacaggccc ctccagggga atttgtaaat gcctttgatg atctctgtct 120 gcttatccgt gatgaaaaac ttatgcacca ccaaggtgag tgtgcaggcc accaacactg 180 ccaaaaatat tctgtaccac tctgcatcga tggaaatcca gtactcttgt ctcaccacaa 240 tgtaatgggc gactaccgat tttttgacca tcaaagcaaa ctttctttca aatatgacct 300 gcttcaaaat cagctgaaag acatccaaag tcatggtatc attcagaatg aggcagaata 360 cctgagagtt gttcttctgt gcgccttaaa actgtatgtg aatgaccact atccaaaagg 420 aaattgcaac atgctgagaa aaactgtcaa aagtaaggag tacttgatag cttgcattga 480 agatcacaac tatgaaacag gagagtgctg gaacggactt tggaaatcta aatggatttt 540 ccaagttaac ccatttctaa cccaagtaac gggaagaata tttgtgcaag ctcacttctt 600 caggtgtgtc aaccttcata ttgaaatatc caaggacctg aaagaaagct tggaaatagt 660 taaccaagct caactggctc taagttttgc aaggcttgtg gaagagcaag agaacaaatt 720 tcaagctgca gtcttggaag aattacagga gttatccaat gaagccctga gaaaaattct 780 acgaagggat cttccagtga cccgcactct tattgactgg cacaggatac tctctgactt 840 gaatctggtg atgtatccta aattaggata tgtcatttat tcaagaagtg tgttgtgcaa 900 ctggataata taaagaattg ctcctggtaa a 931
<210> 33
<211> 5299
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7948948CB1
<400> 33 tagcctgtac gatcactata gcggaaacgc tgatacgcct gtcggtaccg gtcccgaatt 60 cctgggtcga cggggggaga aggggcgggc gcccgcgagc cggtgaatca cctcctcctc 120 ttgctgcctc agcgccgccg ccacctttcc attcagtcgc ccaacatggc tggagcgcgg 180 cggaggtgag ccggccgccc gcccgcagac gccccagcct actgcgcccg agtcccgcgg 240 ccccagtggc gcctcagctc tgcggtgccg aggcccaacg gctcgatcgc tgcccgccgc 300 cagcatgttg gacgccccgg acgagagctc cgtgcgggtg gctgtcagaa taagaccaca 360 gcttgccaaa gagaagattg aaggatgcca tatttgtaca tctgtcacac caggagagcc 420 tcaggtcttc ctagggaaag ataaggcttt tacttttgac tatgtatttg acattgactc 480 ccagcaagag cagatctaca ttcaatgtat agaaaaacta attgaaggtt gctttgaagg 540 atacaatgct acagtttttg cttatggaca aactggagct ggtaaaacat acacaatggg 600 aacaggattt gatgttaaca ttgttgagga agaactgggt attatttctc gagctgttaa 660 acaccttttt aagagtattg aagaaaaaaa acacatagca attaaaaatg ggcttcctgc 720 tccagatttt aaagtgaatg cccaattctt agagctctat aatgaagagg tccttgactt 780 atttgatacc actcgtgata ttgatgcaaa aagtaaaaaa tcaaatataa gaattcatga 840 agattcaact ggaggaattt atactgtggg cgttacaaca cgtactgtga atacagaatc 900 agagatgatg cagtgtttga agttgggtgc tttatcccgg acaactgcca gtacccagat 960 gaatgttcag agctctcgtt cacatgccat ttttaccatt catgtgtgtc aaaccagagt 1020 gtgtccccaa atagatgctg acaatgcaac tgataataaa attatttctg aatcagcaca 1080 gatgaatgaa tttgaaaccc tgactgcaaa gttccatttt gttgatctcg caggatctga 1140 aagactgaag cgtactggag ctacaggcga gagggcaaaa gaaggcattt ctatcaactg 1200 tggacttttg gcacttggca atgtaataag tgccttggga gacaagagca agagggccac 1260 acatgtcccc tatagagatt ccaagctaac aagactacta caggattccc tcgggggtaa 1320 tagccaaaca atcatgatag catgtgtcag cccttcagac agagacttta tggaaacgtt 1380 aaacaccctg aaatacgcca atcgagctag aaatatcaag aataaggtga tggtcaatca 1440 ggacagagct agtcagcaaa tcaatgcact tcgtagtgaa atcacacgac ttcagatgga 1500 gctcatggag tacaaaacag gtaaaagaat aattgacgaa gagggtgtgg aaagcatcaa 1560 tgacatgttt catgagaatg ctatgctaca gactgaaaat aataacctgc gtgtaagaat 1620 taaagccatg caagagacgg ttgatgcatt gaggtccaga attacacagc ttgttagtga 1680 tcaggccaac catgttcttg ccagagcagg tgaaggaaat gaggagatta gtaatatgat 1740 tcatagttat ataaaagaaa tcgaagatct cagggcaaaa ttattagaaa gtgaagcagt 1800 gaatgagaac cttcgaaaaa acttgacaag agccacagca agagcgccat atttcagcgg 1860 atcatcaact ttttctccta ccatactatc ctcagacaaa gaaaccattg aaattataga 1920 cctagcaaaa aaagatttag agaagttgaa aagaaaagaa aagaggaaga aaaaaagtgt 1980 ggctggtaaa gaggataata cagacactga ccaagagaag aaagaagaaa agggtgtttc 2040 ggaaagagaa aacaatgaat tagaagtgga agaaagtcaa gaagtgagtg atcatgagga 2100 tgaagaagag gaggaggagg aggaggaaga tgacattgat gggggtgaaa gttctgatga 2160
60/86 atcagattct gaatcagatg aaaaagccaa ttatcaagca gacttggcaa acattacttg 2220 tgaaattgca attaagcaaa agctgattga tgaactagaa aacagccaga aaagactgca 2280 gactctgaaa aagcagtatg aagagaagct aatgatgctg caacataaaa ttcgggatac 2340 tcagcttgaa agagaccagg tgcttcaaaa cttaggctcg gtagaatctt actcagaaga 2400 aaaagcaaaa aaagttaggt ctgaatatga aaagaaactc caagccatga acaaagaact 2460 gcagagactt caagcagctc aaaaagaaca tgcaaggttg cttaaaaatc agtctcagta 2520 tgaaaagcaa ttgaagaaat tgcagcagga tgtgatggaa atgaaaaaaa caaaggttcg 2580 cctaatgaaa caaatgaaag aagaacaaga gaaagccaga ctgactgagt ctagaagaaa 2640 cagagagatt gctcagttga aaaaggatca acgtaaaaga gatcatcaac ttagacttct 2700 ggaagcccaa aaaagaaacc aagaagtggt tctacgtcgc aaaactgaag aggttacggc 2760 tcttcgtcgg caagtaagac ccatgtcaga taaagtggct gggaaagtta ctcggaagct 2820 gagttcatct gatgcacctg ctcaggacac aggttccagt gcagctgctg tcgaaacaga 2880 tgcatcaagg acaggagccc agcagaaaat gagaattcct gtggcgagag tccaggcctt 2940 accaacgccg gcaacaaatg gaaacaggaa aaaatatcag aggaaaggat tgactggccg 3000 agtgtttatt tccaagacag ctcgcatgaa gtggcagctc cttgagcgca gggtcacaga 3060 catcatcatg cagaagatga ccatttccaa catggaggca gatatgaata gactcctcaa 3120 gcaacgggag gaactcacaa aaagacgaga gaaactttca aaaagaaggg agaagatagt 3180 caaggagaat ggagagggag ataaaaatgt ggctaatatc aatgaagaga tggagtcact 3240 gactgctaat atcgattaca tcaatgacag tatttctgat tgtcaggcca acataatgca 3300 gatggaagaa gcaaaggaag aaggtgagac attggatgtt actgcagtca ttaatgcctg 3360 cacccttaca gaagcccgat acctgctaga tcacttcctg tcaatgggca tcaataaggg 3420 tcttcaggct gcccagaaag aggctcaaat taaagtactg gaaggtcgac tcaaacaaac 3480 agaaataacc agtgctaccc aaaaccagct cttattccat atgttgaaag agaaggcaga 3540 attaaatcct gagctagatg ctttactagg ccatgcttta caagataatg tagaggatag 3600 tactgatgag gatgctcctt taaacagccc aggatcagaa ggaagcacgc tgtcttcaga 3660 tctcatgaag ctttgtggtg aagtgaaacc taagaacaag gcccgaagga gaaccaccac 3720 tcagatggaa ttgctgtatg cagatagcag tgaactagct tcagacacta gtacaggaga 3780 tgcctccttg cctggccctc tcacacctgt tgcagaaggg caagagattg gaatgaatac 3840 agagacaagt ggtacttctg ctagggaaaa agagctctct cccccacctg gcttaccttc 3900 taagataggc agcatttcca ggcagtcatc tctatcagaa aaaaaaattc cagagccttc 3960 tcctgtaaca aggagaaagg catatgagaa agcagaaaaa tcaaaggcca aggaacaaaa 4020 gcagggcata atcaacccat ttcctgcttc aaaaggaatc agagcttttc cacttcagtg 4080 tattcacata gctgaagggc atacaaaagc tgtgctctgt gtggattcta ctgatgatct 4140 cctcttcact ggatcaaaag atcgtacttg taaagtatgg aatctggtga ctgggcagga 4200 aataatgtca ctggggggtc atcccaacaa tgtcgtgtct gtaaaatact gtaattatac 4260 cagtttggtc ttcactgtat caacatctta tattaaggtg tgggatatca gagattcagc 4320 aaagtgcatt cgaacactaa cgtcttcagg tcaagttact cttggagatg cttgttctgc 4380 aagtaccagt cgaacagtag ctattccttc tggagagaac cagatcaatc aaattgccct 4440 aaacccaact ggcaccttcc tctatgctgc ttctggaaat gctgtcagga tgtgggatct 4500 taaaaggttt cagtctacag gaaagttaac aggacaccta ggccctgtta tgtgccttac 4560 tgtggatcag atttccagtg gacaagatct aatcatcact ggctccaagg atcattacat 4620 caaaatgttt gatgttacag aaggagctct tgggactgtg agtcccaccc acaattttga 4680 accccctcat tatgatggca tagaagcact aaccattcaa ggggataacc tatttagtgg 4740 gtctagagat aatggaatca agaaatggga cttaactcaa aaagaccttc ttcagcaagt 4800 tccaaatgca cataaggatt gggtctgtgc cttgggagtg gtgccagacc acccagtttt 4860 gctcagtggc tgcagagggg gcattttgaa agtctggaac atggatactt ttatgccagt 4920 gggagagatg aagggtcatg atagtcctat caatgccata tgtgttaatt ccacccacat 4980 ttttactgca gctgatgatc gaactgtgag aatttggaag gctcgcaatt tgcaagatgg 5040 tcagatctct gacacaggag atctggggga agatattgcc agtaattaaa catgaatgaa 5100 gataggttgt aaactgaatg ctgtgataat actctgtatt ctttatggaa aatgttgtcc 5160 tgtacttact aggcaaaacg tatgaatcgg attaactgga aaatatatct gaattcactg 5220 ctgactataa atggtattct aataaaattg tgtactatcc tgtgtgctta gtttaaatcc 5280 tttccgcctg accgctgcg 5299
<210> 34
<211> 4080
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3467913CB1
61/86 <400> 34 tccaagctgg tcgagctcca tcactgatag cggccgcagt gtgctggaaa gagggccgga 60 gcccgagccc ttggaggttg attgacttat gtgcaatttg ggacgctgga gtttaccttc 120 cctccgcagc ctggaacaga gcctcctctg gtgttgcaag gaagaggctg aatgaggcag 180 agaagctgag tgctgtccag gaggcccagt taaagcggct cgaggtgaca agaccccgag 240 tgctggggag cagggagcag ggccaggtgc cgaggatggc caggcagcca ccgccgccct 300 gggtccatgc agccttcctc ctctgcctcc tcagtcttgg cggagccatc gaaattccta 360 tggatccaag cattcagaat gagctgacgc agccgccaac catcaccaag cagtcagcga 420' aggatcacat cgtggacccc cgtgataaca tcctgattga gtgtgaagca aaagggaacc 480 ctgcccccag cttccactgg acacgaaaca gcagattctt caacatcgcc aaggaccccc 540 gggtgtccat gaggaggagg tctgggaccc tggtgattga cttccgcagt ggcgggcggc 600 cggaggaata tgagggggaa tatcagtgct tcgcccgcaa caaatttggc acggccctgt 660 ccaataggat ccgcctgcag gtgtctaaat ctcctctgtg gcccaaggaa aacctagacc 720 ctgtcgtggt ccaagagggc gctcctttga cgctccagtg caaccccccg cctggacttc 780 catccccggt catcttctgg atgagcagct ccatggagcc catcacccaa gacaaacgtg 840 tctctcaggg ccataacgga gacctatact tctccaacgt gatgctgcag gacatgcaga 900 ccgactacag ttgtaacgcc cgcttccact tcacccacac catccagcag aagaaccctt 960 tcaccctcaa ggtcctcacc aaccaccctt ataatgactc gtccttaaga aaccaccctg 1020 acatgtacag tgcccgagga gttgcagaaa gaacaccaag cttcatgtat ccccagggca 1080 ccgcgagcag ccagatggtg cttcgtggca tggacctcct gctggaatgc atcgcctccg 1140 gggtcccaac accagacatc gcatggtaca agaaaggtgg ggacctccca tctgataagg 1200 ccaagtttga gaactttaat aaggccctgc gtatcacaaa tgtctctgag gaagactccg 1260 gggagtattt ctgcctggcc tccaacaaga tgggcagcat ccggcacacg atctcggtga 1320 gagtaaaggc tgctccctac tggctggacg aacccaagaa ccttattctg gctcctggcg 1380 aggatgggag actggtgtgt cgagccaatg gaaaccccaa acccactgtc cagtggatgg 1440 tgaatgggga acctttgcaa tcggcaccac ctaacccaaa ccgtgaggtg gccggagaca 1500 ccatcatctt ccgggacacc cagatcagca gcagggctgt gtaccagtgc aacacctcca 1560 acgagcatgg ctacctgctg gccaacgcct ttgtcagtgt gctggatgtg ccgcctcgga 1620 tgctgtcgcc ccggaaccag ctcattcgag tgattcttta caaccggacg cggctggact 1680 gccctttctt tgggtctccc atccccacac tgcgatggtt taagaatggg caaggaagca 1740 acctggatgg tggcaactac catgtttatg agaacggcag tctggaaatt aagatgatcc 1800 gcaaagagga ccagggcatc tacacctgtg tcgccaccaa catcctgggc aaagctgaaa 1860 accaagtccg cctggaggtc aaagacccca ccaggatcta ccggatgccc gaggaccagg 1920 tggccagaag gggcaccacg gtgcagctgg agtgtcgggt gaagcacgac ccctccctga 1980 aactcaccgt ctcctggctg aaggatgacg agccgctcta tattggaaac aggatgaaga 2040 aggaagacga ctccctgacc atctttgggg tggcagagcg ggaccagggc agttacacgt 2100 gtgtcgccag caccgagcta gaccaagacc tggccaaggc ctacctcacc gtgctagctg 2160 atcaggccac tccaactaac cgtttggctg ccctgcccaa aggacggcca gaccggcccc 2220 gggacctgga gctgaccgac ctggccgaga ggagcgtgcg gctgacctgg atccccgggg 2280 atgctaacaa cagccccatc acagactacg tcgtccagtt tgaagaagac cagttccaac 2340 ctggggtctg gcatgaccat tccaagtacc ccggcagcgt taactcagcc gtcctccggc 2400 tgtccccgta tgtcaactac cagttccgtg tcattgccat caacgaggtt gggagcagcc 2460 accccagcct cccatccgag cgctaccgaa ccagtggagc tccccccgag tccaatcctg 2520 gtgacgtgaa gggagagggg accagaaaga acaacatgga gatcacgtgg acgcccatga 2580 atgccacctc ggcctttggc cccaacctgc gctacattgt caagtggagg cggagagaga 2640 ctcgagaggc ctggaacaac gtcacagtgt ggggctctcg ctacgtggtg gggcagaccc 2700 cagtctacgt gccctatgag atccgagtcc aggctgaaaa tgacttcggg aagggccctg 2760 agccagagtc cgtcatcggt tactccggag aagattatcc cagggctgcg cccactgaag 2820 ttaaagtccg agtcatgaac agcacagcca tcagccttca gtggaaccgc gtctactccg 2880 acacggtcca gggccagctc agagagtacc gagcctacta ctggagggag agcagcttgc 2940 tgaagaacct gtgggtgtct cagaagagac agcaagccag cttccctggt gaccgcctcc 3000 gtggcgtggt gtcccgcctc ttcccctaca gtaactacaa gctggagatg gttgtggtca 3060 atgggagagg tgatgggcct cgcagtgaga ccaaggagtt caccaccccg gaaggagtac 3120 ccagtgcccc taggcgtttc cgagtccggc agcccaacct ggagacaatc aacctggaat 3180 gggatcatcc tgagcatcca aatgggatca tgattggata cactctcaaa tatgtggcct 3240 ttaacgggac caaagtagga aagcagatag tggaaaactt ctctcccaat cagaccaagt 3300 tcacggtgca aagaacggac cccgtgtcac gctaccgctt taccctcagc gccaggacgc 3360 aggtgggctc tggggaagcc gtcacagagg agtcaccagc acccccgaat gaagctcctc 3420 ccacattgcc cccgactacc gtgggtgcga cgggcgctgt gagcagtacc gatgctactg 3480 ccattgctgc caccaccgaa gccacaacag tccccatcat cccaactgtc gcacctacca 3540 ccatggccac caccaccacc gtcgccacaa ctactacaac cactgctgcc gccaccacca 3600 ccacggagag tcctcccacc accacctccg ggactaagat acacgaatcc gcttacacca 3660 acaaccaagc ggacatcgcc acccagggct ggttcattgg gcttatgtgc gccatcgccc 3720
62/86 tcctggtgct gatcctgctc atcgtctgtt tcatcaagag gagtcgcggc ggcaatgatg 3780 aggacaacaa gcccctgcag ggcagtcaga catctctgga cggcaccatc aagcagcagg 3840 tacgagaaaa gaaggatgtt ccccttggcc ctgaagaccc caaggaagag gatggctcat 3900 ttgactatag gtgcagtgac gacagcctgg tggactatgg cgagggtggc gagggtcagt 3960 tcaatgaaga cggctccttc atcggccagt acacggtcaa aaaggacaag gaggaaacag 4020 agggcaacga aagctcagag gccacgtcac ctgtcaatgc tatctactct ctggcctaac 4080
<210> 35
<211> 4360
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7495062CB1
<400> 35 tccaagctgg tcgagctcca tcactgatag cggccgcagt gtgctggaaa gagggccgga 60 gcccgagccc ttggaggttg attgacttat gtgcaatttg ggacgctgga gtttaccttc 120 σctccgcagc ctggaacaga gcctcctctg gtgttgcaag gaagaggctg aatgaggcag 180 agaagctgag tgctgtccag gaggcccagt taaagcggct cgaggtgaca agaccccgag 240 tgctggggag cagggagcag ggccaggtgc cgaggatggc caggcagcca ccgccgccct 300 gggtccatgc agccttcctc ctctgcctcc tcagtcttgg cggagccatc gaaattccta 360 tggatccaag cattcagaat gagctgacgc agccgccaac catcaccaag cagtcagcga 420 aggatcacat cgtggacccc cgtgataaca tcctgattga gtgtgaagca aaagggaacc 480 ctgcccccag cttccactgg acacgaaaca gcagattctt caacatcgcc aaggaccccc 540 gggtgtccat gaggaggagg tctgggaccc tggtgattga cttccgcagt ggcgggcggc 600 cggaggaata tgagggggaa tatcagtgct tcgcccgcaa caaatttggc acggccctgt 660 ccaataggat ccgcctgcag gtgtctaaat ctcctctgtg gcccaaggaa aacctagacc 720 ctgtcgtggt ccaagagggc gctcctttga cgctccagtg caaccccccg cctggacttc 780 catccccggt catcttctgg atgagcagct ccatggagcc catcacccaa gacaaacgtg 840 tctctcaggg ccataacgga gacctatact tctccaacgt gatgctgcag gacatgcaga 900 ccgactacag ttgtaacgcc cgcttccact tcacccacac catccagcag aagaaccctt 960 tcaccctcaa ggtcctcacc aaccaccctt ataatgactc gtccttaaga aaccaccctg 1020 acatgtacag tgcccgagga gttgcagaaa gaacaccaag cttcatgtat ccccagggca 1080 ccgcgagcag ccagatggtg cttcgtggca tggacctcct gctggaatgc atcgcctccg 1140 gggtcccaac accagacatc gcatggtaca agaaaggtgg ggacctccca tctgataagg 1200 ccaagtttga gaactttaat aaggccctgc gtatcacaaa tgtctctgag gaagactccg 1260 gggagtattt ctgcctggcc tccaacaaga tgggcagcat ccggcacacg atctcggtga 1320 gagtaaaggc tgctccctac tggctggacg aacccaagaa ccttattctg gctcctggcg 1380 aggatgggag actggtgtgt cgagccaatg gaaaccccaa acccactgtc cagtggatgg 1440 tgaatgggga acctttgcaa tcggcaccac ctaacccaaa ccgtgaggtg gccggagaca 1500 ccatcatctt ccgggacacc cagatcagca gcagggctgt gtaccagtgc aacacctcca 1560 acgagcatgg ctacctgctg gccaacgcct ttgtcagtgt gctggatgtg ccgcctcgga 1620 tgctgtcgcc ccggaaccag ctcattcgag tgattcttta caaccggacg cggctggact 1680 gccctttctt tgggtctccc atccccacac tgcgatggtt taagaatggg caaggaagca 1740 acctggatgg tggcaactac catgtttatg agaacggcag tctggaaatt aagatgatcc 1800 gcaaagagga ccagggcatc tacacctgtg tcgccaccaa catcctgggc aaagctgaaa 1860 accaagtccg cctggaggtc aaagacccca ccaggatcta ccggatgccc gaggaccagg 1920 tggccagaag gggcaccacg gtgcagctgg agtgtcgggt gaagcacgac ccctccctga 1980 aactcaccgt ctcctggctg aaggatgacg agccgctcta tattggaaac aggatgaaga 2040 aggaagacga ctccctgacc atctttgggg tggcagagcg ggaccagggc agttacacgt 2100 gtgtcgccag caccgagcta gaccaagacc tggccaaggc ctacctcacc gtgctagctg 2160 atcaggccac tccaactaac cgtttggctg ccctgcccaa aggacggcca gaccggcccc 2220 gggacctgga gctgaccgac ctggccgaga ggagcgtgcg gctgacctgg atccccgggg 2280 atgctaacaa cagccccatc acagactacg tcgtccagtt tgaagaagac cagttccaac 2340 ctggggtctg gcatgaccat tccaagtacc ccggcagcgt taactcagcc gtcctccggc 2400 tgtccccgta tgtcaactac cagttccgtg tcattgccat caacgaggtt gggagcagcc 2460 accccagcct cccatccgag cgctaccgaa ccagtggagc accccccgag tccaatcctg 2520 gtgacgtgaa gggagagggg accagaaaga acaacatgga gatcacgtgg acgcccatga 2580 atgccacctc ggcctttggc cccaacctgc gctacattgt caagtggagg cggagagaga 2640 ctcgagaggc ctggaacaac gtcacagtgt ggggctctcg ctacgtggtg gggcagaccc 2700 cagtctacgt gccctatgag atccgagtcc aggctgaaaa tgacttcggg aagggccctg 2760
63/86 agccagagtc cgtcatcggt tactccggag aagattatcc cagggctgcg cccactgaag 2820 ttaaagtccg agtcatgaac aggacagcca tcagccttca gtggaaccgc gtctactccg 2880 acacggtcca gggccagctc agagagtacc gagcctacta ctggagggag agcagcttgc 2940 tgaagaacct gtgggtgtct cagaagagac agcaagccag cttccctggt gaccgcctcc 3000 gtggcgtggt gtcccgcctc ttcccctaca gtaactacaa gctggagatg gttgtggtca 3060 atgggagagg tgatgggcct cgcagtgaga ccaaggagtt caccaccccg gaaggagtac 3120 ccagtgcccc taggcgtttc cgagtccggc agcccaacct ggagacaatc aacctggaat 3180 gggatcatcc tgagcatcca aatgggatca tgattggata cactctcaaa tatgtggcct 3240 ttaacgggac caaagtagga aagcagatag tggaaaactt ctctcccaat cagaccaagt 3300 tcacggtgca aagaacggac cccgtgtcac gctaccgctt taccctcagc gccaggacgc 3360 aggtgggctc tggggaagcc gtcacagagg agtcaccagc acccccgaat gaagctcctc 3420 ccacattgcc cccgactacc gtgggtgcga cgggcgctgt gagcagtacc gatgctactg 3480 ccattgctgc caccaccgaa gccacaacag tccccatcat cccaactgtc gcacctacca 3540 ccatggccac caccaccacc gtcgccacaa ctactacaac cactgctgcc gccaccacca 3600 ccacggagag tcctcccacc accacctccg ggactaagat acacgaatcc gcccctgatg 3660 agcagtccat atggaacgtc acggtgctcc ccaacagtaa atgggccaac atcacctgga 3720 agcacaattt cgggcccgga actgactttg tggttgagta catcgacagc aaccatacga 3780 aaaaaactgt cccagttaag gσccaggctc agcctataca gctgacagac ctctatcccg 3840 ggatgacata cacgttgcgg gtttattccc gggacaacga gggcatcagc agtaccgtca 3900 tcacctttat gaccagtaca gcttacacca acaaccaagc ggacatcgcc acccagggct 3960 ggttcattgg gcttatgtgc gccatcgccc tcctggtgct gatcctgctc atcgtctgtt 4020 tcatcaagag gagtcgcggc ggcaagtacc cagtacgaga aaagaaggat gttccccttg 4080 gccctgaaga ccccaaggaa gaggatggct catttgacta tagtgatgag gacaacaagc 4140 ccctgcaggg cagtcagaca tctctggacg gcaccatcaa gcagcaggag agtgacgaca 4200 gcctggtgga ctatggcgag ggtggcgagg gtcagttcaa tgaagacggc tccctcatcg 4260 gccagtacac ggtcaaaaag gacaaggagg aaacagaggg caacgaaagc tcagaggcca 4320 cgtcacctgt caatgctatc tactctctgg cctaacggag 4360
<210> 36
<211> 2434
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 284191CB1
<400> 36 ggaccgcagg ctgctaaaaa cagctccagc acccactcca aaccaggcct gaaacaatgt 60 cctccaccga gagaaacgta aaggacactt gatcacacaa tccctggaat aatatccagg 120 aaacacttgc tggagccact cgcagcaccc ttccctggca gcacacttgg ggacagcgag 180 gagatgagcg catctctgaa ttacaaatct ttttccaaag agcagcagac catggataac 240 ttagagaagc aactcatctg tcccatctgc ttagagatgt tcacgaaacc tgtggtgatt 300 ctcccttgtc agcacaacct gtgtaggaaa tgtgccagtg atattttcca ggcctctaac 360 ccgtatttgc ccacaagagg aggtaccacc atggcatcag ggggccgatt ccgctgccca 420 tcctgtagac atgaagtggt tttggataga catggggtat atggacttca gaggaacctg 480 ctggtggaaa atatcattga catctacaag caggagtcca ccaggccaga aaagaaatcc 540 gaccagccca tgtgcgagga acatgaagag gagcgcatca acatctactg tctgaactgc 600 gaagtaccca cctgctctct gtgcaaggtg tttggtgcac acaaagactg ccaggtggct 660 cccctcactc atgtgttcca gagacagaag tctgagctca gtgatggcat cgccatcctc 720 gtgggcagca acgatcgagt ccagggagtg atcagccagc tggaagacac ctgcaaaact 780 atcgaggaat gttgcagaaa acagaaacaa gagctttgtg agaagtttga ttacctgtat 840 ggcattttgg aggagaggaa gaatgaaatg acccaagtca ttacccgaac ccaagaggag 900 aaactggaac atgtccgtgc tctgatcaaa aagtattctg atcatttgga gaacgtctca 960 aagttggttg agtcaggaat tcagtttatg gatgagccag aaatggcagt gtttctgcag 1020 aatgccaaaa ccctgctaaa aaaaatctcg gaagcatcaa aggcatttca gatggagaaa 1080 atagaacatg gctatgagaa catgaaccac ttcacagtca acctcaatag agaagaaaag 1140 ataatacgtg aaattgactt ttacagagaa gatgaagatg aagaagaaga agaaggcgga 1200 gaaggagaaa aagaaggaga aggagaagtg ggaggagaag cagtagaagt ggaagaggta 1260 gaaaatgttc aaacagagtt tccaggagaa gatgaaaacc cagaaaaagc ttcagagctc 1320 tctcaggtgg agctgcaggc tgcccctggg gcacttccag tttcctctcc agagccacct 1380 ccagccctgc cacctgctgc ggatgcccct gtgacacaga ttggatttga ggctcctccc 1440 ctccagggac aggctgcagc tccagcgagt ggcagtggag ctgattctga gccagctcgc 1500
64/86 catatcttct ccttttcctg gttgaactcc ctaaatgaat gatattcatt ccaactgctg 1560 cccctctgtc tgcctggctg agatgcatgt gggcagcagg aagcccaagt gaaattaata 1620 ttatgcagat gatgaaaggg acctctgaac aggatttctg caaaaatagc cccaaactgc 1680 aattccatat gacttatcta acatcttggg gggaaagaat attttgagaa aatagttgca 1740 gaaagcactg gaaataataa acttgatctt atacaaatct tctattgtgt ggaaaatgtt 1800 gtgaagggtg tgtaggtgtg gtacatgtgt atgtcactaa caagtggcaa atggtgaaaa 1860 aagtggtcac tatgcttttg tctctcatag gcactgactt tttgttatta tattatggta 1920 gctttcattt cctttactct ttaacagtgc aggtggtcag tgaaaatcag tgtcaactca 1980 gaagtgactg atttatcaat acatggacaa aaagtaaatc attgaccaaa gctatgaaat 2040 gtttcacaaa gttttcctct tttgcataac agatgtcact ggatgtacat tcagaaatgt 2100 tctttgaatt tggtgacact ttcatggtcc agaaagctga aggcctgggc atctcttgtg 2160 acatttttct aatattagtt ttagattttc acgtattagg cactttagtt gaatcttcca 2220 gcaaaagctg tctactttct cttttattca ctgtggcacc aatctggtaa attgtagaac 2280 aattgcatgt gtttaaatat atatacaaac atatcacaca ttaaatatat atatatttaa 2340 atcatgcttt gttaatattt gtcccaccat aatgcctcct tcagaacata agtgtaactt 2400 tatatgaact cttaaataaa tgatgttttt aaaa 2434
<210> 37
<211> 2688
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2361681CB1
<400> 37 ggcagcggca gctggggctg cagcggcgcc gggctctaga gagccgcagg atcggccaga 60 gtgcggagct ggacacccgg gtcccagata ctacagacac ccggagaggt ggctccttcg 120 ccctgaagcc ttcctcggcc ccctacgcac tcgggcccct tccgcagagg attcgcagcg 180 tgagcgcccc gcagcccgct caggaccagc tcacaggact aaggaccaaa ggca'tttctg 240 ggcactgaga tcctacctct ctgcctgcag ctatgagcag acgtgtggtt cggcaaagca 300 agttccgcca tgtgtttggg caggcagcaa aggccgacca ggcctacgag gacatccgtg 360 tgtccaaggt cacatgggac agctccttct gtgccgtcaa ccccaaattc ctggccatta 420 ttgtggaggc tggaggcggg ggtgccttca tcgtcctgcc tctggccaag acagggcgag 480 tggataagaa ctacccactg gtcactgggc acactgcccc tgtgctggat attgactggt 540 gtccacacaa tgacaacgtt atcgccagtg cctcagacga caccaccatc atggtgtggc 600 agattccaga ctataccccc atgcgcaaca ttacggaacc tatcatcaca cttgagggcc 660 actccaagcg tgtgggcatc ctctcctggc accctactgc caggaatgtc ctgctcagtg 720 caggtggtga caatgtgatc atcatctgga atgtgggcac cggggaggtg ctgctgagcc 780 tggatgatat gcacccagac gtcatccaca gtgtgtgctg gaacagcaac ggtagcctgc 840 tagccaccac ctgcaaggac aagaccttgc gcatcattga ccccagaaaa ggccaagtgg 900 tggcggagag gtttgcggcc cacgagggga tgaggcccat gcgggccgtc ttcacgcgcc 960 agggccatat cttcaccacg ggcttcaccc gcatgagcca gcgagagctg ggcctgtggg 1020 acccgaacaa cttcgaggag ccagtggcac tgcaggagat ggacacaagc aacggggtcc 1080 tattgccctt ttacgatccc gactccagca tcgtctacct gtgtggcaag ggcgacagca 1140 gcattcggta ctttgagatt accgacgagc cgcctttcgt gcactacctg aacacgttca 1200 gcagcaaaga gccgcagcgg ggcatgggtt tcatgcccaa aaggggactg gatgtcagca 1260 agtgtgagat cgcccggttc tacaagctac acgaaagaaa gtgtgaacct atcatcatga 1320 ctgtgccccg caagtcagac ctcttccagg acgatctgta cccggatacg ccaggcccgg 1380 agccggccct agaagcggac gaatggctat ccggccagga cgccgaaccc gtgctcattt 1440 cgctgaggga cggctatgtg ccccccaagc accgcgagct ccgggtcacg aagcgcaaca 1500 tcctggacgt gcgcccgccc tccggccccc gccgcagcca gtcggccagc gacgccccct 1560 tgtcgcagca caccctggag acgctgctgg aagagatcaa ggccctccgc gagcgggtgc 1620 aggcccagga gcagcgcatc acggctctgg agaacatgct gtgcgagctg gtggacggca 1680 cggactagcc ccgcgcgcca ggcaggcgga gcggggcggg gcgcacaagc tcggccccgc 1740 cccggctttt agtcccgaac tccggacccc gccttcttgg gctgggcccg ggggcgggac 1800 tggggaggga actccgcccc tcgcgggaga ccagaactct tggagcttag gggagaccca 1860 cgtcgctcca gcggaggctg gactgcgagc ctcgtctggg actcggctgg agctggccta 1920 gggaggcctg gggtaacctg gggggctcag caatggtgct gcacggcgag gtggtgtccc 1980 cctttgtcct ccgcccaggg cagggaaagt gcttagtatt agcgtgatgc ttggggttat 2040 tggagcctga gcttgacctc aaacgggtgg cgatttgatg ggtaccccca ggctggggaa 2100 aatgacagcg cttctcctaa tcagctcact ggattccatc accctgagcg gtaaaccaga 2160
65/86 tgggcgtcac cccagttctg cagacacata cacaacccgt ttgctgcaga gccggaccca 2220 gtggctacac ccacagcggt ctgtggtaga gaactctctt ccttctttcc accgacaggg 2280 gcgagggctg cttcctcgcg gcagcccccg cgaagaaatc tcgagagaac tggcatgagg 2340 agttaggttc atcacaaata cacacacact gcccccaacc ctctgccgtt gcctctctca 2400 gaaaaacaag acgtactgaa tgaaatattt tactaagcgt tcagtctgtg cctcctgcat 2460 gggtgggagt gaggggaacg agacccccag cctctgcaaa tgctaccccc aggctcctgg 2520 gagacctggc gatgcactcc tgggctcagg gtccatcagg cagcctctta ccctagagct 2580 ctctccactc tgaggttcag aaggacccca acccacaccg taggcgttcc ccccaagtaa 2640 agttaggtag caaaagcaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaa 2688
<210> 38
<211> 4264
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 1683662CB1
<400> 38 cgcgccgccc ggccgcctgc actgcgcgcg cgcccacccc gcgtgggagg cagcgggagg 60 ggcccggaga ggtgtggagc ggcgcggcgg gaggctccgt gggcggccac gggagacagc 120 gccggcggga gcgcgcctct cggcctttcc tccgcgcccc cgcgtcccca gccggccgct 180 ccgagaggac ccggaggagg caggtggctt tctagaagat gaccatagag gaccttccag 240 attttccatt agaaggaaat cctttgtttg gaagataccc atttatattt tctgcttctg 300 ataccccagt tatcttttcc atttctgcag caccaatgcc ttcagactgt gaattttctt 360 tctttgatcc taatgatgca tcatgccagg aaattctttt tgatcccaaa acttcagttt 420 cagaattatt tgccattttg agacagtggg ttcctcaggt ccaacaaaac attgacatta 480 ttggaaatga gattcttaag agaggttgca atgtgaatga tagagatgga ttgacagata 540 tgactctttt acattatacc tgcaaatctg gagctcatgg tattggtgat gtggaaacag 600 ctgtaaaatt tgcaactcag cttattgacc tcggagcaga cattagtttg cggagtcgct 660 ggacaaacat gaatgctttg cattatgctg cttattttga tgtccctgaa cttataagag 720 tgattttgaa aacatcgaaa ccaaaagatg tggatgccac ttgcagtgat tttaattttg 780 gaacagcttt gcatattgca gcatacaact tgtgtgcagg tgctgtgaag tgcctcttgg 840 agcagggagc aaatcctgca tttaggaatg acaaaggaca gatccctgct gatgttgttc 900 cagacccagt agatatgccg ttagagatgg ctgacgccgc agccactgct aaggaaatca 960 agcagatgct tctagatgcg gtgcctctgt catgtaacat ctcaaaggcc atgctcccaa 1020 attatgatca tgtcactggc aaggcaatgc ttacgtcact tggcctgaag ttgggggatc 1080 gtgttgttat tgcaggacag aaggttggta cattaagatt ttgtggaaca actgaatttg 1140 caagtgggca gtgggctggc attgaactgg atgaaccaga aggaaaaaat aatggaagtg 1200 ttggaaaagt ccagtacttt aaatgtgccc ccaagtatgg tatttttgca cctctttcaa 1260 agataagtaa agcaaaaggt cgaaggaaga atataacaca cactccttct acaaaagctg 1320 ctgtacctct catcaggtcc cagaaaattg acgtagctca tgtgacgtca aaagtaaata 1380 ctggattaat gacatcaaaa aaagatagtg cttctgagtc aacactttca ttgcctcctg 1440 gtgaagaact taaaactgtg acagagaaag atgttgccct gcttggatct gtcagcagct 1500 gctcctctac atcttctttg gaacacagac agagctaccc caagaaacag aatgcaatca 1560 gcagtaacaa gaagacaatg agcaaaagcc cttccctttc atccagagcc agtgctggtt 1620 tgaattcctc agcaacatct acagcaaata atagccgttg cgagggggaa ctccgcctcg 1680 gagagagagt gttagtggta ggacagagac tgggcaccat taggttcttt gggacaacaa 1740 acttcgctcc aggatattgg tatggtatag agcttgaaaa accccatggc aagaatgatg 1800 gttcagttgg aggtgtgcag tattttagct gttctccaag atatggaata tttgctcccc 1860 catccagggt gcaaagagta acagattccc tggataccct ttcagaaatt tcttcaaata 1920 aacagaacca ttcttatcct ggttttagga gaagttttag cacaacttct gcttcttccc 1980 aaaaggagat taacagaaga aatgcttttt ccaaatcgaa agctgctttg cgtcgcagtt 2040 ggagcagcac ccccaccgca ggtggcattg aagggagcgt gaagctgcac gaggggtctc 2100 aggtcctgct cacgagctcc aatgagatgg gtactgttag gtatgtgggc cccactgact 2160 ttgcttcagg tatctggctt ggacttgagc tccgaagcgc caagggaaaa aatgatgggt 2220 cagtgggtga caagcgctat ttcacctgta agccgaacca tggagtctta gttcgaccga 2280 gcagagtgac ctatcgggga attaatgggt caaaacttgt ggatgagaat tgttaagctt 2340 ctaaaatatt aaataagctc aaatatatat atttggtgta aataaagagt ccatggtaaa 2400 tggtttactt tatttagcca tattaaaatt ttgaaaatat agttatcttc ttaaaaacca 2460 ttataacaat tcagagagag ttctttacaa agccatgaat atgaactatg gggaatcatg 2520 gttcttttaa agcaattttc aaaataagta ccaattaaag ctttaggttc caagaagatt 2580
66/86 ctgggactca ggaagaaaaa gtgccatcag gtgaccagct gttgcatttc ttgcttattc 2640 tgttttgttt ttgcacatca taatggattt ttcttagtgc cctaattgtg aagggtttct 2700 ctagctttgg ttatgtgtaa tgttcacgtg accttttttt tgtcaatcat ttttggaatt 2760 tttctttctt tctgtgcttt attactaata agtccaatga gtgagtagta gctagatgac 2820 tagtatgtag ttttatattt tggtaaaatt atttgccctt tcagaaatgc ctcatctaaa 2880 gatacatgat aattttggag ttggaagggg ccttagaggc tctccagctc tgcttcttgc 2940 ccattgccaa atactgaaat ggaagcccgt cttacctggg gtcactaact ggttggttaa 3000 ctgagctaag aatagactgt gggtctcctc acttgtggcc cagtgctctt tctgctatac 3060 aaaatgtcta atctcagatt tttcttctgc tgcttgactg cttcatctgg atgaactaca 3120 aaaaacccat gattaaggtt tatgaattca agtaataatt agattttttt tgcacagact 3180 tacttaactt ccttattgga tatgtttgta acacataaac acaaagcact tttcaaacat 3240 gatgcacttt tatctttgtg aataatttac tgtcctttcc tcctgggata tgagaaacat 3300 tttaaaaaac gtatttaaca gaagagagca aataaagata tatcaggaag gatgtattag 3360 ttatttactt aaatgtttat aatatctgga ttttttttgt tttgttactc atagaactgg 3420 tgttgtttgc tgtttttatt tctctaattg ttgcagagtt ctgcctgtta caaagctaca 3480 gaactgtatt gtttttattt tccttcttga gcacatgtta acaaactaag cttcacatta 3540 gagtgatgtc ataatgtaaa atgtttgcat tgtggttagg tattgaagtt tatgtcctgt 3600 ctgtgtaaag attcatcttt tattgtaaat atttagactt taccacagaa atattggaac 3660 agtttgcttt ataagattaa aaagcatcct tcagaatgga gcttgccttg tgcttagaaa 3720 taatatgttg aactattttg caatatacta ttttaaatct aaattctgtc acttcgctgc 3780 ctttttaaaa tagtgtggta tttcaaatat tgctagagct attttcctga aatacatttg 3840 caaaataagg ctgctttgta atcaaggaat atttttattg attgaaggaa atgactgtac 3900 tgcgattcaa aagtaaactt attttattat acagattatt tcttaaaaac tctatttata 3960 ccttaacatg aaatccatga ccacaccaaa cttggttatt cataattttt cctgttaaat 4020 ataaaacact gtaagttaaa aacagtaatg ccaacattga atttattttt gaggtcaaag 4080 aaccagttgt tctctttata tttagatgag gatgattgag tccatatact atgtatgttt 4140 acatatacta tacatgcaca ttaggtgttt tcatttgtgt tttgcttatg aaatgtcatt 4200 taaagttcac ttcttgagca tcaataaaaa gggaagctgt gtggttttgg aaaaaaaaaa 4260 aagg 4264
<210> 39
<211> 3930
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3750444CB1
<400> 39 gttccaaggt cgaaaccgcc ggtgagatcg acctgcaggt ttcagacctg caatcctgaa 60 gcttcaggtg aaagacacat tggaattgtt tcaaccatca gcgacatcga taatgaagag 120 ttccacccac caccatgcca ggtgtccagc ttgcaccctc ccatctccca gtgggtgcgc 180 ccatgcacaa gtaccacttt gtgccaagcc gtggagccca acggcaagcc tgctggaggc 240 ccaggatgac ctgggggtga cacagaggat cctggatgag gcaaaacagc gccttcgtga 300 ggtggaggac ggcatcgcca caatgcaggc taagtaccgg gaatgcatta ccaagaagga 360 ggagctggag ctgaagtgtg agcagtgtga gcagcggctg ggccgagctg gcaaggtgcg 420 caccctcctc ctgcaaggcc tgcaagcggg cccggcccag acaggggcca gaaaggacca 480 gggcgccggt gggtcctggg gtggctgtcc acaccccctt cctggcaacc ccaggtgcca 540 cagtgggtag ggccagcccc aggcccctag cccagcctcc cagagcccac cccacggggc 600 tgcccctcca gctcatcaac gggctgtcgg atgagaaggt gcgctggcag gagacggtgg 660 agaacctgca gtacatgctc aacaacatct ccggcgatgt cctggtggcc gctggctttg 720 tggcctacct gggccccttc acgggccagt accgcacggt gctctacgac agctgggtca 780 agcagctcag gagccacaat gtcccacaca cctccgagcc cacgctaatc gggacgctgg 840 ggaaccctgt gaagatccga tcgtggcaga tcgctggcct ccccaacgac acactgtcag 900 tggagaacgg ggtcatcaac cagttttccc agcgctggac ccacttcatt gaccctcaga 960 gccaggccaa caaatggatc aagaacatgg agaaggacaa tgggctggat gtgttcaagt 1020 tgagtgaccg cgacttcctg cgcagcatgg agaacgccat ccgctttggc aagccatgtc 1080 tcctggagaa cgtgggcgag gagctagacc cagccctgga gccagtgctg ctcaagcaga 1140 cgtacaagca gcagggaaac acggtgctga agctggggga cacggtgatc ccctaccatg 1200 aggacttcag gatgtacatc accaccaagc tgcccaaccc acactacacg cccgagatct 1260 ccaccaaact caccctcatc aacttcaccc tgtcgcccag tggcctagag gaccagctac 1320 tgggccaggt agtggcagag gagcgacccg acctggagga ggccaagaac cagctgatta 1380
67/86 tcagtaatgc caagatgcgc caggagctga aggacattga ggaccagatc ctgtaccggc 1440 tcagctcctc cgagggcaac cctgtaga-tg acatggaact catcaaggtg ctggaagcct 1500 ccaagatgaa ggctgctgag atccaggcca aagtcaggat tgcagagcag acggagaagg 1560 acatcgacct gacgcgcatg gagtacatac ccgtggccat ccgcacccag atcctcttct 1620 tctgtgtgtc cgacctggcc aacgtggacc ccatgtacca gtactccctt gagtggtttc 1680 tcaacatctt cctctcgggc atcgccaact cagagagagc agacaacctg aagaagcgca 1740 tctccaacat caaccgctac ctgacctaca gcctctacag caacgtctgc cgcagcctct 1800 ttgagaagca caagctgatg tttgccttcc tgctgtgtgt tcgcatcatg atgaacgagg 1860 gcaaaatcaa ccagagtgag tggcgatacc tcctgtctgg gggctccatc tcgatcatga 1920 ctgagaatcc ggcaccggac tggctgtcag accgggcttg gcgagacatc ctagcactct 1980 cgaacctgcc aaccttttcc tccttctctt ccgacttcgt gaagcacctc tcagaattcc 2040 gggtcatctt cgacagcctt gagccccacc gggagccttt gcctggcatc tgggaccagt 2100 acctagacca gttccagaag ctgctagtcc tccgctgcct gcgtggggac aaggttacca 2160 acgccatgca ggactttgtg gccaccaacc tggagccacg cttcattgaa ccccagacag 2220 ccaatctgtc agtggtgttc aaagactcca actccaccac acccctcatc tttgtgctgt 2280 cacccggcac agaccctgct gccgacctct acaagtttgc cgaagaaatg aagttctcca 2340 aaaagctctc tgccatctcc ctgggccagg ggcagggccc tcgggcagaa gccatgatgc 2400 gcagctccat agagaggggc aaatgggtct tcttccagaa ctgccacctg gcaccaagct 2460 ggatgccagc cctagaacgc ctcatcgagc acatcaaccc cgacaaggta cacagggact 2520 tccgcctctg gctcaccagc ctgcccagca acaagttccc agtgtccatc ctgcagaacg 2580 gctccaagat gaccattgag ccgccacgcg gtgtcagggc caacctgctg aagtcctata 2640 gtagccttgg tgaagacttc ctcaactcct gccacaaggt gatggagttc aagtctctgc 2700 tgctgtctct gtgcttgttc catgggaacg ccctggagcg ccgtaagttt gggcccctgg 2760 gcttcaacat cccctatgag ttcacggatg gagatctgcg catctgcatc agccagctca 2820 agatgttcct ggacgaatat gatgacatcc cctacaaggt cctcaagtac acggcagggg 2880 agatcaatta cgggggccgt gtcactgatg actgggaccg gcgctgcatc atgaacatct 2940 tggaggactt ctacaaccct gacgtgctct cccctgagca cagctacagc gcctcgggca 3000 tctaccacca gatcccgcct acctacgacc tccacggcta cctctcctac atcaagagcc 3060 tcccactcaa tgatatgcct gagatctttg gcctgcatga caatgccaac atcacctttg 3120 cccagaacga gacgttcgcc ctcctgggca ccatcatcca gctgcaaccc aaatcatctt 3180 ctgcaggcag ccagggccgg gaggagatag tggaggacgt cacccaaaac attctgctca 3240 aggtgcctga gcctatcaac ttgcaatggg tgatggccaa gtacccagtg ctgtatgagg 3300 aatcaatgaa cacagtacta gtacaagagg tcattaggta caatcggctg ctgcaggtga 3360 tcacacagac actgcaagac ctactcaagg cactcaaggg gctggtagtg atgtcctctc 3420 agctggagct gatggctgcc agcctgtaca acaatactgt gcctgagctc tggagtgcca 3480 aggcctaccc atcgctcaag cctctgtcat catgggtcat ggacctgctg caacgcctgg 3540 actttctgca ggcctggatc caagatggca tcccagctgt cttctggatc agtggattct 3600 tcttccccca ggcatgtctt aacaggcact ctgcagaatt ttgcccgcaa atttgtcatc 3660 tccattgaca ccatctcctt tgatttcaag gtctgggcac agccagggcc aggtcaggtg 3720 acaggctagg gtacagccca gggaggagag gctctgaggc cacggttggt tggcagttgg 3780 gggaccccta agccagggca tggaaagacc caagccagaa gaggccatga gtcccaggaa 3840 cgggtctggg ctgggtccat cagaaatcca caggggcagg gcacagacca caggccatgg 3900 gctaaagtgg taggtacgtg atgatgggca 3930
<210> 40
<211> 5204
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 5500608CB1
<400> 40 caccttcagc ccactcatct atcaccagtg gaagctgccc aggaactccg gaaatgcgca 60 ggcggcagga ggaggctatg cgaagactag cctcgcaggt ggttgcctat cactattgtc 120 aagcagataa tgcctacact tgcttggtgc cagaatttgt ccacaatgtt gctgccttgc 180 tctgccgctc acctcagctg acagcctatc gggagcagct tcttcgggaa cctcacctgc 240 agagcatgct gagccttcgt tcctgtgttc aagaccccat ggcctccttc cggaggggag 300 ttctggagcc actagaaaat ctccataaag agagaaaaga tcccagatga agatttcatc 360 attttaattg atggattaaa tgaagcagaa tttcacaaac cggattatgg ggatacaatt 420 gtatcgtttc tgagtaaaat gatcggaaag tttccttctt ggctcaaact aattgtaaca 480 gttaggacca gtttacagga aattaccaag ctgctgcctt tccataggat ttttttggat 540
68/86 cgactagaag agaatgaagc catagaccag gacctgcagg cttacatcct gcaccggata 600 cacagcagct cagagatcca gaataacatt tcacttaatg gcaaaatgga caatactaca 660 tttggcaaac ctcagttctc atctcaagac cctcagtcaa gggtcctatc tatatctgaa 720 acttacattt gacctcatag agaaaggcta tctagtgtta aagagctcta gctacaaagt 780 agttcctgtt tcgctctcag aggtttattt actccagtgc aatatgaagt tcccaaccca 840 gtcttccttt gaccgggtga tgcctctcct gaatgtggca gtggcctctc tccacccact 900 gactgatgag catatcttcc aggccatcaa tgctgggagc attgaaggca cactagaatg 960 ggaggatttt cagcagagaa tggagaacct ctccatgttc ctaatcaagc gcagagacat 1020 gactcgtatg tttgtacatc cttcttttcg agaatggctt atctggagag aagaaggaga 1080 gaaaaccaaa tttctctgtg atccgaggag tggtcacacg ttacttgcct tctggttttc 1140 ccgccaagag ggaaaactaa accgacagca gactattgaa ctgggacatc acatcctcaa 1200 agcacacatt tttaagggtt tgagtaaaaa agttggtgta tcatcctcca tcctccaagg 1260 tctctggatc tcttatagca cagaaggtct ttccatggca ctggcgtctt tacgaaatct 1320 ctacactcca aatataaagg tcagccgact gctgattttg ggaggtgcca atattaatta 1380 ccggacagag gttttaaata atgctccaat tctatgtgtt cagtcccatc ttggttacac 1440 agaaatggta gccctgctgc tggagttcgg ggccaacgtg gatgcctctt ctgaaagtgg 1500 cctgactccc ctgggatatg ctgcagcagc agggtacctg agcattgtgg tgctgctgtg 1560 caagaaacgg gccaaggtgg atcatttgga taagaacggg cagtgtgctt tggttcatgc 1620 tgcactccga ggtcatctgg aggttgtcaa gtttttgatt cagtgtgact ggacgatggc 1680 cggccagcag caaggagtat ttaagaagag ccatgccatc caacaggccc tcattgctgc 1740 agccagcatg ggttatactg agattgtctc ctacctactt gatcttccag aaaaagatga 1800 agaggaagta gagcgagcac agatcaacag ctttgacagt ctctggggag agacagccct 1860 aacagctgca gccggaaggg gcaaactgga ggtgtgccgt ttgctcttgg aacaaggggc 1920 ggcagtggcc cagccaaacc gccgaggagc agtgccacta ttcagcacag tgcgccaggg 1980 ccactggcag attgttgatc ttttactcac ccatggagct gatgtcaaca tggcagacaa 2040 gcagggccgc actcccctga tgatggctgc ttccgaaggc catctaggaa ccgtggactt 2100 tctgcttgca caaggtgcct ccattgctct tatggacaaa gaaggattga cagccctcag 2160 ctgggcttgt ttgaagggcc atctctcagt agtacgttct ctggtggata acggagctgc 2220 cacagaccat gctgacaaga atggccgtac cccactggat ctggcagctt tctatggcga 2280 tgctgaggtg gtccagttcc tggtagatca tggggccatg atcgagcacg ttgactacag 2340 tggaatgcgc cctttggata gggcagtggg gtgccggaac acttctgttg ttgtcactct 2400 tctgaagaaa ggagccaaga taggtccagc cacatgggcg atggccacct ccaagccaga 2460 catcatgatc atcctgttga gcaagctgat ggaagagggg gacatgtttt ataagaaagg 2520 taaagtaaag gaagctgccc agcgctacca gtacgccctg aagaagttcc ctagagaagg 2580 gtttggtgag gacttgaaaa ctttccggga actaaaggtg tctctcctcc tcaacctctc 2640 tcggtgtcgc aggaaaatga acgattttgg aatggcggag gaatttgcta ctaaggccct 2700 ggagctgaaa ccgaaatctt atgaagctta ctatgcgaga gcaagggcaa aacgcagcag 2760 cagacagttc gcagcagcct tagaggacct gaacgaggcc atcaagctgt gtcccaacaa 2820 ccgtgagatc cagagacttc tgctgagagt ggaagaagag tgtagacaga tgcagcagcc 2880 acagcagcca ccgccgccac cgcagcctca gcagcagttg ccggaagaag cagaacctga 2940 gccacagcat gaagacatat actctgtaca ggatatattc gaggaggagt acctggaaca 3000 ggatgttgaa aatgtttcca ttggcctcca gacagaggcc cggcccagcc aggggctccc 3060 ggtcatccag agcccaccct cctctccccc gcatcgggac tcagcctaca tctccagctc 3120 acctcttggc tctcatcagg tttttgactt ccggtccagt agttctgtag gctctcccac 3180 tagacagacc tatcagtcca cctcacctgc cctttctcca actcatcaga actcacatta 3240 caggcctagc ccaccacaca cttccccggc tcatcaggga ggatcttacc gtttcagccc 3300 ccctcctgtg gga.gga.cagg gcaaagaata cccaagccct cccccttccc ctctccggag 3360 aggccctcag tatcgggcca gccctccagc tgaaagtatg agtgtctata gatcccagtc 3420 tggttcaccc gtgcgctatc agcaggaaac aagcgtcagt cagcttcctg gcagacccaa 3480 atctccatta tccaaaatgg cccagcggcc ctaccagatg cctcagctcc ctgtggcagt 3540 tccccagcaa gggctcaggc tacagcctgc caaggcccag attgtgagaa gtaaccagcc 3600 cagcccagcc gtccattcaa gcaccgtcat ccccacagga gcctatggcc aagtagccca 3660 ttcaatggcc agtaaatacc agtcttcaca aggagacata ggagtcagcc agagccggtt 3720 ggtttatcaa gggtcaattg ggggaatcgt aggggatgga aggccggtgc agcatgtcca 3780 agccagcctg agtgcaggcg ccatctgtca gcatggagga ttgaccaaag aggatcttcc 3840 acagcgacct tcctcagcat accgaggtgg cgtgagatac agccagacac cacagatcgg 3900 acgcagccag tcagcatcct attacccagt ctgtcactca aaactagatc tggagcgctc 3960 ctccagccaa ctaggttccc ctgatgtgtc gcatttaatc agaagaccta tcagtgtcaa 4020 ccctaacgaa atcaaaccgc acccgccaac tcccaggccg ttgctgcatt cccaaagtgt 4080 aggccttcgc ttctctccat ctagcaatag tatctcctcc acctccaacc taactccgac 4140 cttccggcca tcttcttcca tccagcaaat ggagatccca ctgaaacctg catatgagag 4200 gtcatgtgac gagctgtcgc cagtgtctcc aactcaagga ggttacccca gtgagcccac 4260 ccgatccagg accacaccat tcatggggat catagataaa acagcacgga ctcagcagta 4320
69/86 cccccacctc caccagcaga atcggacctg ggcagtgtca tctgtggaca ccgtcctcag 4380 tcccacgtct ccaggcaacc tgcctcagcc tgagtccttc agtccaccat catccatcag 4440 caacattgcc ttttataaca aaaccaacaa tgcacagaat ggccatttgc tggaggacga 4500 ttattacagc ccccatggga tgctggctaa cgggtctcgt ggagacctct tggagcgagt 4560 cagccaggcc tcctcctatc ccgacgtgaa ggtagctcgg actctacctg tggctcaggc 4620 ataccaggac aacctgtaca ggcagctgtc ccgagactct cggcaagggc agacatcccc 4680 tatcaaacca aagagaccgt tcgtggagtc taatgtttaa aagacgtttt gttggagtga 4740 gacccatatg ttttcactgc acattttcag gcttggtttc cacattcgag gtagttctct 4800 ggcttaattt ctcatgtagt ttctgtgtgg tgttcagagg tggcagccca catgctgaaa 4860 tcctttgcat gcagccgact gggaagcggc ctcccgggag ccaggacttc agtttctctt 4920 gtctgtgccc agccacatgc tctctccctc tcttcagatg ccaacgagga gattttcgtg 4980 ctgtgtgctt taacccaggg agatcagaca cactggtcag ctttttccag gagacaatcg 5040 ctttcactga tgttcttgtt gtgtaattgt ctttttcctt ttttaaaaaa taaggtgttc 5100 ttgttcgttt tcttctagaa actttagaaa gagtgcgatg cccctttgcc tttgcatcct 5160 tagccagtgt cacccacaca gccagccgca gcgcattctc atgc 5204
<210> 41
<211> 2271
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2962837CB1
<400> 41 ggcaaggtcc cggcgaggcc gccgcgagcc tgcgcgtcgc taagtccagg cctgctgcgt 60 ggggcttcgc gcgctcgcgg ggttgcggcc cgggcagggg gagggcccgg gtgctcggag 120 ccttcccttc gctgccctcc tgccccctcc ctgcttctgc aagcgtgttt caatttgtac 180 aacgtgcata aaacatgaaa ttacccttgg ccacttccag gcgcgcagcc agcggctccc 240 tgcccttccc ctccgggccc tgagtaccgg ccccccacca aggaggagcc cgaggtctcc 300 gtcccggcgg cgatgctgcc ccgtcggcct ctggcgtggc ccgcgtggct gttgcggggt 360 gctccgggag ccgcgggttc ttggggtcgg ccggttggcc ccctggcccg cagaggctgc 420 tgctccgccc cggggacccc cgaggtgccg ctgacccggg agcgctaccc cgtgcggcgc 480 ttgccgttct ccacggtgtc taagcaggac ctggccgcct ttgagcgcat cgtgcccggc 540 ggggtcgtca cggacccgga agcgctgcag gctcccaacg tggactggtt gcggacgctg 600 cgaggctgta gcaaggtgct gctgaggcca cggacgtcgg aggaggtgtc ccacatcctc 660 aggcactgcc acgagaggaa cctggccgtg aacccacagg ggggcaacac aggcatggtg 720 ggtggcagcg tccccgtctt tgacgagatc atcctctcca ctgcccgcat gaaccgggtc 780 ctcagcttcc acagcgtgtc tggaattctg gtttgccagg cgggctgcgt cctggaggag 840 ctgagccggt atgtggagga acgggacttc atcatgccgc tggacttagg agccaagggc 900 agctgccaca tcgggggaaa cgtggcaacc aacgctggag gcctgcggtt tcttcgatat 960 ggctcactgc atgggactgt cctgggcctg gaagtggtgc tggccgacgg cactgtcctg 1020 gactgcctga cctccctgag gaaggacaac acgggctatg acctgaagca gctgttcatc 1080 gggtcggagg gcactttggg gatcatcacc acggtgtcca tcttgtgtcc acccaagccc 1140 agggctgtga acgtggcttt cctcggctgc ccaggctttg ctgaggttct gcagaccttc 1200 agcacctgca aggggatgct gggtgagatc ctgtctgcat tcgagttcat ggatgctgtg 1260 tgcatgcagc tggtcgggcg ccatctccac ctggccagcc cggtgcaaga gagtccgttt 1320 tacgtcctca tcgagacttc aggctccaac gcaggccatg acgctgagaa gctgggccac 1380 ttcctggagc acgcgctggg ctccggcctg gtgaccgatg ggaccatggc caccgaccag 1440 aggaaagtca agatgctgtg ggccctgagg gaaaggatca cagaggcgct gagccgggat 1500 ggctacgtgt acaagtacga cctctccctc cctgtggagc ggctctacga catcgtgact 1560 gacctgcgcg cccgcctcgg cccgcacgcc aagcacgtgg tgggctatgg ccaccttgga 1620 gatggtaacc tgcacctcaa tgtgacggcg gaggccttca gcccctcgct cctggctgcc 1680 ctggagcccc acgtgtacga gtggacggcc gggcagcagg gcagcgtcag cgcggagcac 1740 ggagtgggct tcaggaagag ggacgtcctg ggctacagca agccaccggg ggccctgcag 1800 ctcatgcagc agctcaaggc cctgctggac cccaagggca tcctcaaccc ctacaagacg 1860 ctgcccagcc aggcctgacg gccactcctg ctgctgccaa ggcccactgg gggtcggcgg 1920 gtggctctcg ggcgggggtg ttgcggtggc tctgagggat gagccggcag tgggcagggg 1980 accaggcacc tggttgaagg gactgggagc ccgcactggg gaactgccgg acgcatgtgc 2040 cctcggtgca gggagcatct ggcagagtgg ggggctgtgg caggcaccct cctttgcagg 2100 gcgaggtggg gcctctgcag ccatcctgga caggccgggg tgtgcggcag cttttgccca 2160 cgtggaagcg gggtgggtct cacttgcgtg gtggccctgt gccatcttgc ctgctgcggc 2220
70/86 tgggagcagg cgctgggtgt tggttctgct gttgtgctcg tcccgggatc g 2271
<210> 42
<211> 2270
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6961277CB1
<400> 42 cggctcgaga tttgccttcc tccctcccgc atctgagctt gtctccacca gcaacatgag 60 ccgccaattc acctgcaagt cgggagctgc cgccaagggg ggcttcagtg gctgctcagc 120 tgtgctctca gggggcagct catcctcctt ccgggcaggg agcaaagggc tcagtggggg 180 gcttggcagc cggagcctct acagcctggg gggtgtccgg agcctcaatg tggccagtgg 240 cagcgggaag agtggaggct atggatttgg ccggggccgg gccagtggct ttgctggaag 300 catgtttggc agtgtggccc tggggcctgt gtgcccaact gtatgcccac ctggaggcat 360 ccaccaggtt accatcaatg agagcctcct ggcccccctc aacgtggagc tggaccccaa 420 gatccagaaa gtgcgtgccc aggagcgaga gcagatcaag gctctgaaca acaagttcgc 480 ctccttcatc gacaaggtgc ggttcctgga gcagcagaac caggtactgg agaccaagtg 540 ggagctgctg cagcagctgg acctgaacaa ctgcaagaac aacctggagc ccatcctcga 600 gggctacatc agcaacctgc ggaagcagct ggagacgctg tctggggaca gggtgaggct 660 ggactcggag ctgaggaatg tgcgggacgt agtggaggac tacaagaaga ggtatgagga 720 ggaaatcaac aagcggacag cagcagagaa cgagtttgtg ctgctcaaga aggatgtgga 780 tgctgcttac gccaataagg tggaactgca ggccaaggtg gaatccatgg accaggagat 840 caagttcttc aggtgtctct ttgaagccga gatcactcag atccagtccc acatcagtga 900 catgtctgtc atcctgtcca tggacaacaa ccggaaccta gacctggaca gcatcattga 960 cgaagtccgc acccagtatg aggagattgc cttgaagagt aaggccgagg ctgaggccct 1020 gtaccagacc aagttccaag agcttcagct ggcagctggc aggcatgggg acgacctcaa 1080 aaacaccaag aatgaaatct cggagctcac tcggctcatc cagagaatcc gctcagagat 1140 cgagaacgtg aagaagcagg cttccaacct ggagacagcc atcgctgatg ctgagcagcg 1200 gggagacaac gccctgaagg atgcccgggc caagctggac gagctggagg gcgccctgca 1260 ccaggccaag gaggagctgg cgcggatgct gcgcgagtac caggagctca tgagcctgaa 1320 gctggccctg gacatggaga tcgccaccta tcgcaagcta ctggagagcg aggagtgcag 1380 gatgtcagga gaatttccct cccctgtcag catctccatc atcagcagca ccagtggcgg 1440 cagtgtctat ggcttccggc ccagcatggt cagcggtggc tatgtggcca acagcagcaa 1500 ctgcatctct ggagtgtgca gcgtgagagg cggggagggc aggagccggg gcagtgccaa 1560 cgattacaaa gacaccctgg ggaagggttc cagcctgagt gcaccctcca agaaaaccag 1620 tcggtagaga agactgcccc gggccccgcc tcattccatg acccggctct ggatcccaca 1680 ctgtacttcc cacagcccac tctcagctcc atctccaccc tgctggtcct gctcccatac 1740 acctggcact ggccttggcc acccacttct cccagcctgt gtcttcctga tcctgggaag 1800 gcctggatga ccaagcttgg tgaaattcct ccctgtacac accctattaa ctccttggct 1860 gtggtccccc agcta'cacca ccagcccagg tcctggctgc cagctttcct cctctgcccg 1920 gcctctagcg cagtcgctaa ctactctgct gggctccctg ggtctctgcc caaggccccg 1980 cacacactgg ggcctagcat agttcctgcc tatgccagga gctggctctg tgtttaagaa 2040 aaggaggact gaaggacaaa caaccaagag tggcccagtc cccaccccca catctagctc 2100 agtctcaaat ctgagtggga ccaagtgcaa ttcagggcct ttttctccac tcacctgcac 2160 ccagaagcag agaaaagcag gcactgttca cttttccttt attcttaatg gccttcctct 2220 gttgcaacct caataaacag cacaatctca aaaaaaaaaa aaaaaaagat 2270
<210> 43
<211> 2629
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 56022622CB1
<400> 43 ggcccgctcg ggtcctccca ggaagtttga aaaaaaaaaa aaaaaagttt tatgggcgga 60 tggaaggggc cggggcagcg tcggggaaag gaagggccgg aggcgcggcg gcgggcggcc 120
71/86 gagaggggcg gcggcggcgg cggcggcggg gttcccgcgc cgcggagccc ggcccgagag 180 ccgcgtccac gttcctgcct cctgctcccg ccgccctggg gcgccgccat gacgcccgat 240 ctgctcaact tcaagaaggg atggatgtcg atcttggacg agcctggaga gcctccctcc 300 ccctcgctca ccaccacctc tacttcgcag tggaagaaac attggtttgt gctgacagat 360 tcaagtctca aatattacag agactccact gctgaggagg cagatgagct ggatggtgag 420 atcgacctgc gttcctgcac ggatgtcact gagtacgcgg tgcagcgcaa ctatggcttc 480 cagatccaca ccaaggatgc tgtctatacc ttgtcggcca tgacctcagg catccggcgg 540 aactggatcg aggctctgag aaagaccgta cgtccaactt cagccccaga tgtcaccaag 600 ctctcggact ctaacaagga gaacgcgctg cacagctaca gcacccagaa gggccccctg 660 aaggcagggg agcagcgggc gggctctgag gtcatcagcc ggggtggccc tcggaaggcg 720 gacgggcagc gtcaggcctt ggactacgtg gagctctcgc cgctgaccca ggcttccccg 780 cagcgggccc gcaccccagc ccgcactcct gaccgcctgg ccaagcagga ggagctggag 840 cgggacctgg cccagcgctc cgaggagcgg cgcaagtggt ttgaggccac agacagcagg 900 accccagagg tgcctgctgg tgaggggccg cgccggggcc tgggtgcccc cctgactgag 960 gaccagcaaa accggcttag tgaggagatc gagaagaagt ggcaggagct ggagaagctg 1020 cccctgcggg agaataagcg ggtgcccctc actgccctgc tcaaccaaag ccgcggagag 1080 cgccgagggc ccccaagtga cggccacgag gcactggaga aggaggaggc atgtgagcgc 1140 agcctggcag agatggagtc ctcgcaccag caggtgatgg aggagctgca gcggcaccac 1200 gagcgggagc tgcagcgcct gcagcaggag aaggagtggc tcctggctga ggagacggca 1260 gccacggcct cagccattga agccatgaag aaggcctacc aggaagagct gagccgagag 1320 ctgagcaaaa cacggagtct ccagcagggc ccggatggcc tccggaagca gcaccagtca 1380 gatgtggagg cactgaagcg agagctgcag gtgctatcgg agcagtactc gcagaagtgc 1440 ctggagattg gggcactcat gcggcaggct gaggagcgcg agcacacgct gcgccgctgc 1500 cagcaggagg gccaggagct gctgcgccac aaccaggagc tgcatggccg cctgtcagag 1560 gagatagacc agctgcgcgg cttcattgcc tcgcagggca tgggcaatgg ctgcgggcgc 1620 agcaacgagc ggagttcctg cgagctagag gtgctgcttc gcgtaaaaga aaacgaactc 1680 cagtacctaa agaaggaggt gcagtgcctc cgggacgagc tccagatgat gcagaaggac 1740 aagcgcttca cctcgggaaa gtaccaggac gtctatgtgg agctgagcca catcaagaca 1800 cggtctgagc gggagatcga gcagctgaag gagcacctgc gtcttgccat ggccgccctc 1860 caggagaagg agtcgatgcg caacagcctg gctgagtaga ggtcccgccc agctgcagac 1920 cctccaggct ggaggaccag ccgccctcct tccctcctgg atggaagtaa aaagccaagc 1980 tttctcccca ccctctgtgg gccacacgtg cacttgcacc caccacacac acacacacac 2040 acacacacac acacacagac acacagacac atacgcacac acgtgcacac atgtacacac 2100 ggatacacac acacacacac acactgcata tctgagcgcg cccctcgcac tgggtctcac 2160 cttgcacctt cttcaggatt ttatatgtga agagattttt atatagattt ttttcctttt 2220 tttccaaaac actttatact ttaaaaaaaa aaaaaaaaag caattcctgg tggctgtgtg 2280 cctccaaccc tggtccccct ctgtctccag ccaccctctg cttgggcttc tgagctggtg 2340 gccctggccc agaggtctgg cggaggccca ggcagcagcc atggcggggt gtctctacag 2400 gggagaggcg ggagcctgcc accctcttcc tgccctacct cctactaaca cttcctgccc 2460 catttggacc cgtaccatgg ggctcaggac agagggagct agcagctggc ctccatggcc 2520 ccacagcctc cttcgaggct gtgctgggtg cagaaccgcc agagccaccc aaaaggtgtt 2580 tctcttctgc tccctgaacc tcttaactta ataaaacgtt ccagcagct 2629
<210> 44
<211> 5062
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 542310CB1
<400> 44 gatgagagcc gcgccgcacc gctcatagcc gcacaggtgt acaggcagga ggaccgactt 60 ccctctcccg ggcatcctcc ctgggctgcc gggacggcgt gcggcccgag gaggaggagg 120 aacgagggga gaaggcggag agcaggaacg cgaggaggag gacctggatc cgtttcctcc 180 ggccaggacc cgagcggccc cagccaccgc tacccgccgg cgctgtccgc tctccatcag 240 ccctcctgcg cccacccgcg accccgggct ctctgcgcgt cgggccgggg ccggagccgc 300 gcggccggag actatctggc ttcctggtga tgctcacgct ttgctaagtg ttggcggcca 360 tcgtggtttt cgcatcctgg ggacgaatcc tgagcttgcc agagacgggc ggcgcaaggt 420 ccgggctctg tttccctgtg agaagccgcc tcggcccacc gagatgtccc ggcaccatag 480 ccgcttcgaa agagattacc gggtgggctg ggaccgccgc gaatggagcg tcaacgggac 540 gcatgggacc accagcatct gcagtgtcac ctcgggggcc ggtggcggca cagccagcag 600
72/86 cctcagcgtc cggcccggcc tcctgccgct gcccgtggtg ccctcccggc tgcccacccc 660 ggctacagct cctgctccct gcaccaccgg cagcagcgag gccatcacca gcctcgtggc 720 cagctctgcg tctgcggtca ccaccaaggc tcccggcatc tccaaagggg acagtcagtc 780 ccagggactg gcgaccagca tccggtgggg gcagacgcct atcaatcagt ccacaccctg 840 ggacactgat gagccaccct ccaaacagat gagagagagt gacaatccag gcacagggcc 900 atgggtgacc acggtggccg ccgggaacca gcccaccctg atcgcacact cctatggagt 960 ggcccagcct cccaccttca gcccggctgt gaacgtccag gccccggtca ttggggtgac 1020 cccctcactg cctccccacg tggggcccca gctcccgctg atgccaggcc actactcgct 1080 ccctcagccg ccctctcagc cactgagcag cgtggtggtc aacatgcctg cccaggccct 1140 gtatgccagc cctcagcccc tggccgtgtc cacactgccc ggtgtggggc aggtggcccg 1200 cccaggaccc accgctgtgg gcaacggcca catggcaggg cccctgctgc ctccaccgcc 1260 gccagcccag ccgtccgcca ctctccccag tggtgcccct gccaccaatg ggccccccac 1320 aaccgactcg gcccacgggc tgcagatgct gcggaccatt ggcgtgggga agtatgagtt 1380 caccgacccg gggcacccca gagaaatgtt gaaggaattg aaccagcaac gcagagcgaa 1440 agcgtttaca gacctgaaaa ttgttgttga aggcagagag tttgaagtcc accaaaatgt 1500 tctagcttcc tgcagcttgt atttcaagga cctgattcaa aggtccgtgc aagacagcgg 1560 ccagggcggc cgggagaagc tggagctcgt cctgtcgaac ctgcaggcag acgtcctgga 1620 gttgctgctg gagtttgtct acacgggctc cctggtcatc gactcggcca acgccaagac 1680 actgctggag gcggccagca agttccagtt ccacaccttc tgcaaagtct gcgtgtcctt 1740 tctcgagaag cagctgacgg ccagcaactg cctgggcgtg ctggccatgg ccgaggccat 1800 gcagtgcagc gagctctacc acatggccaa ggccttcgcg ctgcagatct tccccgaggt 1860 ggccgcccag gaggagatcc tcagcatctc caaggacgac ttcatcgcct acgtctccaa 1920 cgacagcctc aacaccaagg ctgaggagct ggtgtacgag acagtcatca agtggatcaa 1980 gaaggacccc gcgacacgca cacagtacgc ggctgagctc ctggccgtgg tccgcctccc 2040 cttcatccac cccagctacc tgctcaatgt ggttgacaat gaagagctga tcaagtcatc 2100 agaagcctgc cgggacctgg tgaacgaggc caaacgctac catatgctgc cccacgcccg 2160 ccaggagatg cagacgcccc gaacccggcc gcgcctctct gcaggtgtgg ctgaggtcat 2220 cgtcttggtt gggggccgtc agatggtggg gatgacccag cgctcgctgg tggccgtcac 2280 ctgctggaac ccgcagaaca acaagtggta ccccttggcc tcgctgccct tctatgaccg 2340 cgagttcttc agtgtagtga gtgcagggga caacatctac ctctcaggtg ggatggaatc 2400 aggggtgacg ctggctgatg tctggtgcta catgtccctg cttgataact ggaacctcgt 2460 ctccagaatg acagtccccc gctgtcggca caatagcctc gtctacgatg ggaagattta 2520 caccctcggg ggacttggcg tggcaggcaa cgtggaccac gtggagaggt acgacaccat 2580 caccaaccaa tgggaggcgg tggcccctct gcccaaggca gtacactctg ctgcagccac 2640 agtgtgtggc ggcaagatct acgtgtttgg tggggtgaac gaggcaggcc gagctgccgg 2700 cgtcctccag tcttacgttc ctcagaccaa cacgtggagc ttcatcgagt ccccaatgat 2760 tgacaacaag tatgcccccg ctgtcacgct caatggcttc gttttcatcc tgggcggggc 2820 ttatgccaga gccaccacca tctacgaccc tgagaaagga aacattaagg cgggcccaaa 2880 catgaaccac tctcgccagt tctgcagtgc tgtggtgctt gatggcaaga tttatgcaac 2940 tggaggtatt gtcagcagtg aagggcccgc gctgggcaac atggaggcct acgagcccac 3000 aaccaacaca tggaccctcc tcccccacat gccctgccct gtgttcagac acggctgcgt 3060 cgtgataaag aaatatattc aaagcggctg acatcagcag aaagcccacg ataagactgt 3120 ggacaagtct ggtgaggcaa gtgccacgca atgataattt tccagcgaca ccaacaagag 3180 gccaacaaaa cacaatcaag gaactcactg cgctcaacat gttgaatatt ctctacattg 3240 aatgtagaaa atcatcctcg cctttggatg aaacggaggc accgcgcttg gagccgcagg 3300 aaccacgatc ccgccatggg gctggctgcc tcctgaacag gggcgctcgc tctgccaggt 3360 gcaatagagt ttcacgtatt tttcaactgg gagagagaag ctgttttttc cttcctgcag 3420 agcaagcttg atccctaaac aaccatagat cagttatctt atgacaacat taggcatcag 3480 gctctcttgg aataagatca aagtgtcctt atcactttga ttcctacttt tgttttttaa 3540 ccgatctaca ctttcagtgg ccgacagaaa acgagggaca atactgtgca tcacaaggcc 3600 taggaggctg ctggtcccca ctggggctga agagaagccc agctgcccac gcggagccag 3660 gggtggcagc tgtgggacag ccggggagca gggacagcgg tctgtccttc acaggttttt 3720 ctactgtgtt tttgctggag aaggacagtg attgcgctag ctttctctta cccggtatga 3780 attatttaga tttctgaggc attttcttga taaacaaaag gctattttta agtactgaga 3840 ggaggagcag gccacaagag ggataatgtt gtgggaattc ccaaagctct ttgtaggtag 3900 tgccagaggg gggcttttgc tctcattttt ctatgtgcag aatagaggat ctctcctggg 3960 gtgggcgatg cccccatttt atttttagaa aaagtaactc ccagacagcc ccataaaagc 4020 tgtgcccaag gaagaagagt ctgctctaga aggagcccgg ttctggctca ggacaccggc 4080 ccagctccct ccatgaggtc aagctgagga ccaggccagt gggaagggaa ggagggagaa 4140 ttagcgtcta taaagcacag gagactattt ttgatattca tagctatata ttaaggcacc 4200 tgccacaaga gctctcagga tggggacagc cttcttagtg gagccatggc agcaaggcct 4260 gagggcatga acagaaccac tcttcttgtc acatacgaac ctgagaaaag ggaagccagg 4320 agggaggtca caccatggct caaaagggaa aggccttccc acttgtcctt agcccctcaa 4380
73/86 acctcacacg gtcaacagtt tccattccag ggcaggagaa tgctgccgcc actgcgctgt 4440 tgagttgaag ttggtaccaa atacacattt accactttta tatctgggaa gtcaacttgc 4500 catcgtttca tgataacaac catttataag agaaaaagac aggacacgct ttccatcgtt 4560 cagtatttga tgacacaaaa ttccagttct aacgttgggc atcaacttct agcactacga 4620 gtgtggctcc cacttggaca agataccgag cttcgttatg cagtttttaa tattatttat 4680 tattttaaaa agtaataagc acaaaactac atacattgta tgtcatttaa agtatttatg 4740 tcaaacaggg tgcaagtgtg aacccaagga ctggagcaca aattcctaac tgcctggggc 4800 agggctaatg ttagcattgg tgtgcgtctg cctccaaagg aggttctagt tgtcagcgag 4860 actcaacaca gatgacattg aaattcgttt ctctcctcat ctatcacact ggagcaaaac 4920 tggctatttc tgtgaatgat ataaaacagg gttctctgta atggtattgt acatagtata 4980 tgtttactgt taagttcttg ttatattata ataaatatat ttatagatct agacttggaa 5040 aaaaaaaaaa aaaaaaaagg gsr 5062
<210> 45
<211> 1839
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 1732825CB1
<400> 45 gtgacgacgg agaagagggc cgctgccgct gcagtggctc gtgggtgaga gcaagtgaag 60 accgccgcag catcaggggc ctggactcaa ctcctcccca gagtcggagg tgttgcgcca 120 tgcccggggt ggccaattca ggcccctcca cttcctctag ggagactgca aacccctgtt 180 ccaggaagaa ggtgcatttt ggcagcatac atgatgcagt acgagctgga gatgtaaagc 240 agctttcaga aatagtggta cgtggagcca gcattaatga acttgatgtt ctccataagt 300 ttaccccttt acattggcag cacattctgg aagtttggag tgtcttcatt ggctgctctg 360 gcatggagct gatatcacac acgtaacaac gagaggttgg acagcatctc acatagctgc 420 aatcaggggt caggatgctt gtgtacaggc tcttataatg aatggagcaa atctgacagc 480 ccaggatgac cggggatgca ctcctttaca tcttgctgca actcatggac attctttcac 540 tttacaaata atgctccgaa gtggagtgga tcccagtgtg actgataaga gagaatggag 600 acctgtgcat tatgcagctt ttcatgggcg gcttggctgc ttgcaacttc ttgttaaatg 660 gggttgtagc atagaagatg tggactacaa tggaaacctt ccagttcact tagcagccat 720 ggaaggccac cttcactgtt tcaaattcct agtcagtaga atgagcagtg cgacgcaagt 780 tttaaaagct ttcaatgata atggagaaaa tgtactggat ttggcccaga ggttcttcaa 840 gcagaacatt ttacagttta tccagggggc tgagtatgaa ggaaaagacc tagaggatca 900 ggaaacttta gcatttccag gtcatgtggc tgcctttaag ggtgatttgg ggatgcttaa 960 gaaattagtg gaagatggag taatcaatat taatgagcgt gctgataatg gatcaactcc 1020 tatgcataaa gctgctggac aaggccacat agagtgtttg cagtggttaa ttaaaatggg 1080 agcagacagt aatattacca acaaagcagg ggagagaccc agtgatgtgg caaagaggtt 1140 tgcccatttg gcagcagtga agctgttaga ggagctacag aaatatgata tagatgacga 1200 aaatgaaatt gatgaaaatg atgtgaaata ttttataaga catggtgttg agggaagcac 1260 tgatgccaag gatgatttat gtctgagtga cttggataaa acagatgcca gaatgagagc 1320 ttacaagaaa attgtagaat tgagacacct cctggaaatt gccgagagca actataaaca 1380 cttgggaggc ataacagaag aagatttaaa gcagaagaaa gaacagcttg agtctgaaaa 1440 gaccatcaaa gaactgcagg gccagctgga gtatgaacga ctacgtagag aaaaattaga 1500 atgtcagctt gatgaatatc gagcagaagt tgatcaactc agggaaacac tggaaaaaat 1560 tcaagtccca aactttgtgg ctatggaaga cagcgcttct tgtgagtcaa acaaagagaa 1620 gaggcgagta aaaaaaaagg tttcttctgg aggggtgttt gtgagaaggt actaatcagt 1680 gaaataacta aattgacctg ctagattttt ctctttcatt agaaaaattg atataaatgt 1740 gagtctatac aaactatctc agaattactc tgatatgctt ctgttccaat tctgatggca 1800 gaaatgttat attaaagaga tttagagatt ttttaaatg 1839
<210> 46
<211> 7557
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 6170242CB1
74/86 <400> 46 ctggagacac atgaggctct gttcgaataa cctttctctc tgtgtgtttc tgtttgcagc 60 agcaaagtgg ggcaccaagg ccctgtgcta agcactcata atcctctggg ggtgctaccc 120 ctacaaacag cacccccacc atgtttaacc taatgaagaa agacaaggac aaagatggcg 180 ggcggaagga gaagaaggag aaaaaggaga aaaaggagcg gatgtcagcg gcagagcttc 240 ggagcctgga ggagatgagc ctgcgacgtg gcttcttcaa cctgaaccgc tcctccaagc 300 gtgaatccaa gacgcgcctg gaaatctcca accccatccc catcaaggtg gccagcggct 360 ctgacctgca cctgactgac attgactccg atagtaaccg gggcagcgtc atcctggact 420 cgggccacct aagtacagcc agctccagcg atgacctcaa gggtgaggag ggtagcttcc 480 gtggctcggt gctgcagcgg gcagccaagt tcggctcact ggccaagcag aactcacaga 540 tgattgtcaa gcgcttttcc ttctcccagc gtagccggga tgagagcgcc tcagaaacct 600 cgacgccctc agagcactct gccgccccct cgccacaggt ggaggtgagg actctagagg 660 gacagctggt gcagcatcct ggcccaggca tccctcgacc agggcaccga tcccgagccc 720 ctgagctagt gactaaaaag ttcccagtcg acctgcgcct gccccccgtg gtgcccctgc 780 ccccacctac cctccgggag ctggagctgc aacgacggcc cactggagac tttggcttct 840 ccctgcggcg cacaaccatg ctggatcggg gccccgaggg ccaggcctgt cggcgtgtgg 900 tccactttgc tgagcctggt gcaggcacca aggacctggc cctggggctg gtgccaggag 960 atcgactggt ggagattaat gggcacaatg tggagagcaa gtccagggat gagattgtgg 1020 agatgatccg gcagtcaggg gacagcgtgc ggctcaaggt gcagcccatt ccagagctca 1080 gcgagctcag caggagctgg ctgcggagcg gcgagggacc tcgcagggag ccatccgatg 1140 cgaaaacaga agaacagatt gcagcagaag aggcctggaa tgagacggag aaggtgtggc 1200 tggtccatag ggacggcttc tcactggcca gtcaactcaa atctgaggag ctcaacttgc 1260 ctgaggggaa ggtgcgtgtg aagctggacc acgatggggc catcctggat gtggatgagg 1320 atgacgttga gaaggctaat gctccctcct gcgaccgtct ggaggatctg gcctcactgg 1380 tgtacctcaa tgagtccagc gtcctgcaca ccttgcgcca gcgctatggc gctagcctgc 1440 tgcacacgta tgctggcccc agcctgctgg ttcttggccc ccgtggggcc cctgctgtgt 1500 actctgagaa ggtgatgcac atgttcaagg gttgtcggcg ggaggacatg gcaccccaca 1560 tctatgcagt ggcccagacc gcatacaggg cgatgctgat gagccgtcag gatcagtcaa 1620 tcatcctcct gggcagtagt ggcagtggca agaccaccag ctgccagcat ctggtgcagt 1680 acctggccac catcgcgggc atcagcggga acaaggtgtt ttctgtggag aagtggcagg 1740 ctctgtacac cctcctggaa gcctttggga acagccccac catcattaat ggcaatgcca 1800 cccgcttctc ccagatcctc tccctggact ttgaccaagc tggccaggtg gcctcagcct 1860 ccattcagac aatgcttctg gagaagctgc gtgtggctcg gcgcccagcc agtgaagcca 1920 cattcaacgt cttctactac ctgctggcct gtggggatgg caccctcagg acagagctcc 1980 acctcaacca cttggcagag aacaatgtgt ttgggattgt gccactggcc aagcctgagg 2040 aaaagcagaa ggcagctcag cagtttagta agctgcaggc ggccatgaag gtgctgggca 2100 tctcccccga tgaacagaag gcctgctggt tcattctggc tgccatctac cacctggggg 2160 ctgcgggagc caccaaagaa gctgctgaag ctgggcgcaa gcagtttgcc cgccatgagt 2220 gggcccagaa ggctgcgtac ctactgggct gcagcctgga ggagctgtcc tcagccatct 2280 tcaagcacca gcacaagggt ggcaccctgc agcgctccac ctccttccgc cagggccccg 2340 aggagagtgg cctgggagat gggacaggcc cgaaactgag tgcactggag tgccttgagg 2400 gcatggcggc cggcctctac agcgagctct tcacccttct cgtctccctg gtgaataggg 2460 ctctcaagtc cagccagcac tcactctgct ccatgatgat tgtcgacacc ccgggcttcc 2520 agaaccctga gcagggtggg tcagcccgcg gagcctcctt tgaggagctg tgccacaact 2580 acacccaaga ccggctgcag aggctcttcc acgagcgcac cttcgtgcag gagttggaaa 2640 gatacaagga ggagaacatc gagctggcgt ttgacgactt ggaacccccc acggatgact 2700 ctgtggctgc tgtggaccag gcctcccatc agtccctggt ccgctcgctg gcccgcacag 2760 acgaggcgag gggcctgctc tggctattgg aagaggaggc tctggtgcca ggggccagtg 2820 aggacaccct cctggagcgc cttttctcct attatggccc ccaggaaggt gacaaaaaag 2880 gccaaagccc ccttctgcac agcagcaaac cacaccactt tctcctgggc cacagccatg 2940 gcaccaactg ggtagagtac aatgtgactg gctggctgaa ctacaccaag cagaacccag 3000 ccacccagaa tgtcccccgg ctcctgcagg actcccagaa aaaaatcatc agcaacctgt 3060 ttctgggccg cgcaggcagt gccacggtgc tctctggctc catcgcgggc ctggagggcg 3120 gctcgcagct ggcactgcgc cgggccacca gcatgcggaa aacctttacc acaggcatgg 3180 cggctgtcaa aaagaagtca ctgtgcatcc agatgaagct acaggtggac gccctcatcg 3240 acaccatcaa gaagtcaaag ctgcattttg tgcactgctt cctgcctgta gctgagggct 3300 gggctgggga gccccgttcc gcctcctccc gccgagtcag cagcagcagt gagctggacc 3360 tgccctcggg agaccactgc gaggctgggc tcctgcagct cgacgtgccc ctgctccgca 3420 cccagctccg cggctcccgc ctgctcgatg ccatgcgcat gtaccgccaa ggttaccctg 3480 accacatggt gttttccgag ttccgccgcc gctttgatgt cctggccccg cacctgacca 3540 agaaacacgg gcgtaactac atcgtggtgg atgaaaggcg ggcagtggag gagctgctgg 3600 agtgcttgga tctggagaag agcagctgct gcatgggcct gagccgggtg ttcttccggg 3660 cgggcacctt ggcacggctg gaggagcagc gggatgaaca aaccagcagg aacctaaccc 3720
75/86 tgttccaagc agcctgcagg ggctacctgg cccgccagca cttcaagaag agaaagatcc 3780 aggacctggc cattcgctgt gtacagaaga acatcaagaa gaacaaaggg gtgaaggact 3840 ggccctggtg gaagcttttt accacagtga ggcccctcat cgaagtacag ctgtcagagg 3900 agcagatccg gaacaaagac gaggagatcc agcagctgcg gagcaagctc gagaaggcgg 3960 agaaggagag gaacgagctg cggctcaaca gtgaccggct ggagagccgg atctcagagc 4020 tgacatcgga gctgacagat gagcgtaaca caggagagtc cgcctcccag ctgctggacg 4080 cggagacagc agagaggctc cgggctgaga aggagatgaa ggaactgcag acccagtacg 4140 atgcactgaa gaagcagatg gaggttatgg aaatggaggt gatggaggcc cgtctcatcc 4200 gggcagcgga gatcaacggg gaagtggatg atgatgatgc aggtggcgag tggcggctga 4260 agtatgagcg ggctgtgcgg gaggtggacfc tcaccaagaa acggctccag caggagtttg 4320 aggacaagct ggaggtggag cagcagaaca agaggcagct ggaacggcgg ctcggggacc 4380 tgcaggcaga tagtgaggag agtcagcggg ctctgcagca gctcaagaag aagtgccagc 4440 gactgacggc tgagctgcaa gacaccaagc tgcacctgga gggccagcag gtccgcaacc 4500 acgaactgga gaagaagcag aggaggtttg acagtgagct ctcgcaggcg catgaggagg 4560 cccagcggga gaagctgcag cgggagaagc tgcagcggga gaaggacatg ctcctcgctg 4620 aggctttcag cctgaagcag caactagagg aaaaagacat ggacattgca gggttcaccc 4680 agaaggttgt gtctctagag gcagagctcc aggacatttc ttcccaagag tccaaggatg 4740 aggcttctct ggccaaggtc aagaaacagc tccgggacct ggaggccaaa gtcaaggatc 4800 aggaagaaga gctggatgag caggcaggga ccatccagat gctggaacag gccaagctgc 4860 gtctggagat ggagatggag cggatgagac agacccattc taaggagatg gagagtcggg 4920 atgaggaggt ggaggaggcc cggcagtcgt gtcagaagaa gttaaaacag atggaggtgc 4980 agctagagga agagtatgag gacaagcaga aggttctgcg agagaagcgg gagctggagg 5040 gcaagctcgc caccctcagc gaccaggtga accggcggga ctttgagtca gagaagcggc 5100 tgcggaagga cctgaagcgc accaaggccc tgctggcaga tgcccagctc atgctggacc 5160 acctgaagaa cagtgctccc agcaagcgag agattgccca gctcaagaac cagctggagg 5220 agtcagagtt cacctgtgcg gcagccgtga aagcacggaa agcaatggag gtggagatcg 5280 aagacctgca cctgcagatt gatgacatcg ccaaagccaa gacagcgctg gaggagcagc 5340 tgagccgcct tcagcgtgag aagaatgaga tccagaaccg gctggaggaa gatcaggaag 5400 acatgaacga attgatgaag aagcacaagg ctgccgtggc tcaggcttcc cgggacctgg 5460 ctcagataaa tgatctccaa gctcagctag aagaagccaa caaagagaag caggagctgc 5520 aggagaagct acaagccctc cagagccagg tggagttcct ggagcagtcc atggtggaca 5580 agtccctggt gagcaggcag gaagctaaga tacgggagct ggagacacgc ctggagtttg 5640 aaaggacgca agtgaaacgg ctggagagcc tggctagccg tctcaaggaa aacatggaga 5700 agctgactga ggagcgggat cagcgcattg cagccgagaa ccgggagaag gaacagaaca 5760 agcggctaca gaggcagctc cgggacacca aggaggagat gggcgagctt gccaggaagg 5820 aggccgaggc gagccgcaag aagcacgaac tggagatgga tctagaaagc ctggaggctg 5880 ctaaccagag cctgcaggct gacctaaagt tggcattcaa gcgcatcggg gacctgcagg 5940 ctgccattga ggatgagatg gagagtgatg agaatgagga cctcatcaac agtgagggag 6000 actctgatgt ggactcggag ctggaggacc gtgttgacgg ggtcaagtcc tggttgtcaa 6060 aaaacaaggg accttccaag gcagcttctg atgatggcag cttaaagagt tccagcccca 6120 ccagctactg gaagtccctt gcccctgatc ggtcagatga tgagcacgac cctctcgaca 6180 acacctccag accgcgatac tcccacagtt atctgagtga cagcgacaca gaggccaagc 6240 tgacggagac taacgcatag cccaggggag tggttggcag ccctctcacc ccagggcctg 6300 tggctgcctg ggcacctctc ccaggaagtg gtggggcacc ggtctccccc acccgactgc 6360 tgatctgcat gggaaacacc ctgaccttct tctgtcaggg gcactttcca ggctatgggt 6420 gtctgatgtc tccacgtgga agaggtgggg gaaagaggag tttctgaaga gaactttttg 6480 ctcctctgtc tcaaaatgcc agactcttgg cttctaccct gtgtcaccgt gggcagtggc 6540 aggtggcctg gcactgcatg gagccagcac gttgacctcc ctctcagctc cctgctcagg 6600 gacggtggac aggttgccta ctgggacact ctaggttgct gggtccatgg ggaggattgg 6660 gggaggagaa gcagtgcctt ccctctcgtg tggggtgggg gctctctctt cttggtgcct 6720 gctgtctttc tactttttaa tttaaatacc caacctctcc atcacagctg catccctgag 6780 agtgggaggg ggctgtagtg gtagctgggg ctcccaagaa cgactcggga atgtcatctc 6840 catcttcacc cttcagagag cagtcctttc tctgtgcagc tggagacgct ggtgaggaga 6900 gccgggtcca ggttcttaag aatgaggtgc ggaggggctc tccggtgctg ctgggctggg 6960 ttgagcaagc ctacgcagac aagtgtgtgt gtggaccatc cgcacctcca gcccccaccc 7020 caccctcttt gtctcagcgt gttatgtgca atgacctatt taaggtaaac ccattccaac 7080 tacagcagtt cagggctgat ccaagcactg cctccctcct gctctgtcca ggtggtctgg 7140 accataaact caacttgaga gggaaggctt ggggttgagg acttgtgatc agaaaaactg 7200 aagatggaag ttttggccgg tgctcattag acatgagtcc tcactctgtg tcctgagccc 7260 gtgtcattct tccaacctcc ctgcccccac acacttatcc cagacacaac accatgtggt 7320 ctggaggtcc cagcccccac cctaaaaagg ttatccctga gaactccacc agacttggga 7380 gcccaagtgc agtgcctggt gctgctccca tctgccgccc cccttctctc ctgcaattgg 7440 tttgtactca ctgggctgtg ctctcccctg tttacccgat gtatggaaat aaaggccctt 7500
76/86 ttcctcctga aaaaaaaaaa aaaaaaaaaa gggcagccgc tcgcgatcta gaactag 7557
<210> 47
<211> 1118
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2287640CB1
<400> 47 cggacggtgg gcggacgcgt gggctggcag agcaaatatg actcagaaac cggctcctca 60 gggttgtaac attagatgat acaggcttgg gtcgttacac atgacaccag tgcctttgtt 120 tcattgggct gggctctctg gaaggtgtgc tgctgcctga gctgctggaa aagcactgac 180 aggtgtttgc tagaaaagca ctcctggagc ttgccaccag cttggacttc tagggacttt 240 cctctcagcc aggaaggatt ttgatattca tcagaaatac ctccagaaga ttcaaggagc 300 tgtagaggtg aagtaagcct gtgaaggacc agcatgggaa tcctatactc tgagcccatc 360 tgccaagcag cctatcagaa tgactttgga caagtgtggc ggtgggtgaa agaagacagc 420 agctatgcca acgttcaaga tggctttaat ggagacacgc ccctgatctg tgcttgcagg 480 cgagggcatg tgagaatcgt ttccttcctt ttaagaagaa atgctaatgt caacctcaaa 540 aaccagaaag agagaacctg cttgcattat gctgtgaaga aaaaatttac cttcattgat 600 tatctactaa ttatcctctt aatgcctgtt ctgcttattg ggtatttcct catggtatca 660 aagacaaagc agaatgaggc tcttgtacga atgctacttg atgctggcgt cgaagttaat 720 gctacagatt gttatggctg taccgcatta cattatgcct gtgaaatgaa aaaccagtct 780 cttatccctc tgctcttgga agcccgtgca gaccccacaa taaagaataa gcatggtgag 840 agctcactgg atattgcacg gagattaaaa ttttcccaga ttgaattaat gctaaggaaa 900 gcattgtaat ccttgtgacc acaccgatgg agatacagaa aaagttaacg actggattct 960 atcttcattt tagacttttg gtctgtgggc catttaacct ggatgccacc attttatggg 1020 gataatgatg cttaccatgg ttaatgtttt ggaagagctt tttatttata gcattgttta 1080 ctcagtcaag ttcaccatgg gggaagttgc actgcgat 1118
<210> 48
<211> 3340
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 1990526CB1
<400> 48 ccacggggaa gctgcgaggc gcgggagcac ctgggggacc gcttgcagcg gggacgcgag 60 gacccgggct gggctttcct cacccgggta ccttgttatc ccataacttt ggtatcctga 120 aatctgagga ttccaccaag ataatatgat aagaactttc agtgatttgg ggccatatcc 180 tacttagact aatgtggaat ttccagattt cctgagagct tggtacagca gcacacactg 240 cttgctaatc agcacaggca ataatgccat ctctgcctca agaaggagtt attcagggac 300 cctctcccct ggatttgaat acagaattac cttatcaaag cacaatgaaa aggaaagtca 360 gaaagaagaa aaagaaggga accattacag caaatgttgc cgggacaaag tttgaaattg 420 ttcgtttagt aatagatgaa atgggattta tgaaaactcc agatgaggat gaaacaagta 480 atcttatatg gtgtgattct gctgttcagc aggagaaaat ttcagagctg caaaattatc 540 agaggatcaa ccattttcca ggaatggggg agatctgtag gaaggatttc ttagcaagaa 600 atatgaccaa aatgatcaag tctcggcctc tggattatac ctttgttcct cgaacttgga 660 tctttcctgc tgaatatact caattccaaa attatgtgaa agaattgaag aaaaaacgga 720 agcagaaaac ttttatagtg aaaccagcta atggtgcaat gggtcatggg atttctttga 780 taagaaatgg tgacaaactt ccatctcagg atcatttgat tgttcaagaa tacattgaaa 840 agcctttcct aatggaaggt tacaagtttg acttacgaat ttatattctg gttacatcgt 900 gtgatccact aaaaatattt ctctaccatg atgggcttgt gcgaatgggt acagagaagt 960 acattccacc taatgagtcc aatttgaccc agttatacat gcatctgaca aactactccg 1020 tgaacaagca taatgagcat tttgaacggg atgaaactga gaacaaaggc agcaaacgtt 1080 ccatcaaatg gtttacagaa ttccttcaag caaatcaaca tgatgttgct aagttttgga 1140 gtgatatttc agaattggtg gtaaagaccc tgattgtagc agaacctcat gtcctgcatg 1200 cctatcgaat gtgtagacct ggtcaacctc caggaagcga aagtgtctgc tttgaagtcc 1260
77/86 tgggatttga tattttgttg gatagaaaac taaagccatg gcttctggag attaaccgag 1320 ccccaagctt tggaactgat cagaaaatag actatgatgt aaaaagggga gtgctgctaa 1380 atgcgttgaa gctactaaac ataaggacca gtgacaaaag aagaaacttg gccaaacaaa 1440 aagctgaggc tcaaaggagg ctctatggtc aaaattcaat taaaaggctc ttaccaggct 1500 cctcagactg ggaacagcag agacaccagt tggagaggcg gaaagaagag ttgaaagaga 1560 gactcgctca agtacgaaag cagatctcac gagaagaaca tgaaaatcga catatgggga 1620 attatagacg aatttatcct cctgaagata aagcattact tgaaaagtat gaaaatttgt 1680 tagctgttgc ctttcagacc ttcctttcag gaagagcagc ttcattccag cgagagttga 1740 ataatccttt gaaaaggatg aaggaagaag atattttgga tcttctggag caatgtgaaa 1800 ttgatgatga aaagttgatg ggaaaaacta ccaagactcg aggaccaaag cctctgtgtt 1860 ctatgcctga gagtactgag ataatgaaaa gaccaaagta ctgcagcagt gacagcagtt 1920 atgatagtag cagcagctct tcagaatctg acgaaaatga aaaagaagag taccaaaata 1980 agaaaagaga aaagcaagtt acatataatc ttaaaccctc caaccactac aaattaattc 2040 aacaacccag ctccataaga cgttcagtca gctgccctcg gtccatctct gctcaatcac 2100 cttccagtgg ggacacccgc ccattttctg ctcaacaaat gatatctgtt tcacggccaa 2160 cttctgcatc tcggtcacat tccttaaacc gtgcttcctc ctacatgagg catctgcctc 2220 acagtaatga tgcctgctct accaactctc aagtgagtga gtctttgcgg caactgaaaa 2280 caaaagaaca agaagatgat ctaacaagtc agaccttatt tgttctcaaa gacatgaaga 2340 tccggtttcc aggaaagtca gatgcagaat cagaacttct gatagaagat atcattgata 2400 actggaagta tcataaaacc aaagtggctt catattggct cataaaattg gactctgtaa 2460 aacaacgaaa agttttggac atagtgaaaa caagtattcg tacagttctt ccacgcatct 2520 ggaaggtgcc tgatgttgaa gaagtaaatt tatatcggat tttcaaccgg gtttttaatc 2580 gcttactctg gagtcgtggc caagggctgt ggaactgttt ctgtgattca ggatcctctt 2640 gggagagtat attcaataaa agcccggagg tggtgactcc tttgcagctc cagtgttgcc 2700 agcgcctagt ggagctttgt aaacagtgcc tgctagtggt ttacaaatat gcaactgaca 2760 aaagaggatc actttcaggc attggtcctg actggggtaa ttccaggtat ttactaccag 2820 ggagcaccca attcttcttg agaacaccaa cctacaactt gaagtacaat tcacctggaa 2880 tgactcgctc caatgttttg tttacatcca gatatggcca tctgtgaaac agaagggaag 2940 atcgccattg gttatacata acagcaattc atttttttcc tctgaagttg aacatgcaaa 3000 gaacatgacc attaagtgct gttttatgta tataagacat atatatgtgt gaaaatatat 3060 gcacatatgc accctaataa catatattta ttatattaaa tgatatatga aagaagaatt 3120 agcagaaaat ggaatataag acttaacctt tctggaaacg taataaacca tgttaaaatt 3180 gtttaaaaaa aaaaaaataa aaaggggact aattaggccg ggggtgtttt gtcaatttta 3240 actaaacaaa aggggcggcc cgcctcaagg ggctcccagc tttacgtacg cgggtcattg 3300 ccggggttta ggcccccccc aagggggccc ccaaaatttc 3340
<210> 49
<211> 2230
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3742459CB1
<400> 49 gcgccctgga gcatgtgaca cgggaccggg tgcgaggggg ccagcgacgc cggccaccaa 60 cgagagtcca cctgaaggag tgcttcctct ggagaggcag ctccacgagg ccgcccgcca 120 gaacaatgtc ggcaggatgc aggagctgat tgggaggagg gttaacacca gggccagaaa 180 ccacgtgggc agggtggccc tgcactgggc tgcaggtgca gggcacgagc aggctgtgcg 240 tctgcttctg gagcacgagg ctgctgtgga cgaggaggat gcggtagggg ccctcacaga 300 ggcccttggt cctctccttg ccttggcccc agcctctgct tccctcctct ctccagtgct 360 gtccttgtct gcaccacccg cctcctgcct ccaaattccc gcctgtttct aaagcaaagc 420 agtgcaactc tctttggatg ctcgggagcc tgctgatcat ttgggatgaa tgcgcttctc 480 ctgtctgcct ggttcggcca cttacgaatc ctccagatct tggtaaactc aggggccaag 540 atccactgtg agagcaagga tggcctgacc ttactgcact gcgcagccca aaaaggccat 600 gtgcctgtgc tggcgttcat aatggaggac ctggaggatg tggccctgga ccacgtagac 660 aagctgggga ggacggcgtt tcacagggca gctgagcacg ggcagctgga tgctctggac 720 ttcctcgtgg gctctggctg tgaccacaat gtcaaagaca aggaggggaa cactgccctt 780 catctggctg ctggtcgggg ccatatggct gtgctgcagc gacttgtgga catcgggctg 840 gacctggagg agcagaatgc ggaaggtctg actgccctgc attcggctgc tggaggatcc 900 caccctgact gtgtgcagct cctcctcagg gctgggagca ccgtgaatgc cctcacccag 960 aaaaacctaa gctgccttca ctatgcagcc ctcagtggct cggaggatgt gtctcgggtc 1020
78/86 ctcatccacg caggaggctg cgccaacgtg gttgatcatc agggtgcctc tcctctgcac 1080 ctcgctgtga ggcacaactt ccctgccttg gtccggctcc tcatcaactc cgacagtgac 1140 gtgaatgccg tggacaatag gcagcagacg ccccttcacc tggctgcaga gcacgcctgg 1200 caggacatag cagatatgct cctcattgct ggggttgact taaacctgag agataagcag 1260 ggaaaaaccg ccctggcagt ggctgtccgc agcaaccatg tcagcctggt ggacatgatc 1320 ataaaagctg atcgtttcta cagatgggag aaggaccacc ccagtgatcc ctctgggaag 1380 agcttgtcct ttaagcagga ccatcggcag gaaacacagc agctccgttc tgtgctgtgg 1440 cggctggcct ccaggtatct gcagccccgt gagtggaaga agctggcata ttcctgggag 1500 ttcacggagg cacatgtcga cgccatcgag caacagtgga caggcaccag gagctatcag 1560 gagcacggcc accgaatgct gctcatttgg ctgcatggcg tggccacggc tggtgagaac 1620 cccagcaaag cgctgttcga gggcctcgtg gccattggca ggagggacct ggctgaaaat 1680 atcaggaaga aagcaaacgc agccccgagt gcccccagga ggtgcacagc catgtaaccg 1740 gaggggccag accttcaggc acgtgggacc tcagcgtgtg gagccacctg aacagaagat 1800 gaccatcatt taagggcttt ttaaaaaatc actgttaaca gacctccagg tgattctgct 1860 gaaatgcaca gtcatgcaga gcccaggagg caaatgtttg tacactgatc tttttcatga 1920 ggatgggtcc aagggcctgt aatcccgtcc aacaggctgg agtacaatgg cgagatctca 1980 gctcacggca acctccgcct cccgggttca" aatgattctc gtgcctcagc ctcccgagta 2040 gctgggatta caggtgcatg ccatcacagc tggctaattt ttgtattttt agtagagatg 2100 gggtttggcc atgatggcca ggctggaaaa ttgaaacata atttcacatt attccttttt 2160 ccaccttaaa taataagagt agaatacttt ctgtgttttt atctatacac atgaataaat 2220 gctatggctt 2230
<210> 50
<211> 3257
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 7468507CB1
<400> 50 tccaacgcat agtgaccatg tctagagaag tcgaagagat tagaaggaaa ttgaagaaaa 60 attacggagc tttggacaac ttcaagtaca gtttgaaaaa gacaaacgat tggcattgga 120 agacttgcaa gctgctcaca gacgggagat acaagagcta ttgaagtcac agcaggatca 180 cagtgcctca gtaaataaag gccaggaaaa ggcagaggaa ctacacagaa tggaggtgga 240 gtccctaaac aaaatgcttg aggagctaag acttgaacgg aagaaactaa ttgaggatta 300 tgaaggcaag ttgaataaag ctcagtcctt ttatgaacgt gagcttgata ctttgaaaag 360 gtcacagctt tttacagcag aaagcctaca ggccagcaaa gaaaaggaag ctgatcttag 420 aaaagaattt cagggacaag aagcaatttt acgaaaaact ataggaaaat taaagacaga 480 gttacagatg gtacaggatg aagctggaag tcttcttgac aaatgccaaa agcttcagac 540 ggcacttgcc atagcagaga acaatgttca ggttcttcaa aaacagcttg atgatgccaa 600 ggagggagaa atggccctat taagcaagca caaagaagtg gaaagtgagc tagcagctgc 660 cagagaacgt ttacaacagc aagcttcaga tcttgtcctc aaagctagtc atattggaat 720 gcttcaagca actcaaatga cccaggaagt tacaattaaa gatttagaat cagaaaaatc 780 gagagtcaat gagagattat ctcaacttga agaggaaaga gcttttttgc gaagcaaaac 840 ccaaagtctg gatgaagagc agaagcaaca gattctagaa ctggagaaga aagtaaatga 900 agcaaagaga actcagcaag aatattatga aagggaactt aaaaacctgc aaagtagatt 960 ggaagaggag gtgactcaat taaacgaggc ccattctaag actttggaag aattagcttg 1020 gaagcaccat atggcaattg aagctgtcca cagtaatgca attagggata agaaaaaact 1080 gcaaatggat ttggaagaac aacataacaa agataaacta aacctggaag aggataaaaa 1140 tcagcttcaa caagagctag aaaacctaaa ggaagtactg gaagacaagt tgaatacagc 1200 caatcaagag attggccacc tccaagatat ggtaaggaaa agtgaacaag gtcttggctc 1260 tgcagaagga cttattgcta gtcttcagga ctcccaggaa aggcttcaga atgagcttga 1320 cttgactaaa gacagcctaa aggagaccaa ggatgctcta ttaaatgtgg agggtgagct 1380 agaacaagaa aggcaacagc atgaagaaac aattgctgcc atgaaagaag aagagaagct 1440 caaagtggac aaaatggccc atgacttaga aattaagtgg actgaaaatc ttagacaaga 1500 gtgttctaaa cttcgtgaag agttaaggct tcaacatgaa gaggataaga agtcagcaat 1560 gtctcaactt ttgcagttga aagatcgaga gaaaaatgca gcaagagatt catggcagaa 1620 gaaagtagaa gatctcttaa accagatttc cttgctgaaa cagaatctgg agatacagct 1680 ttcccagtct cagacttctt tgcaacaact gcaagcccag tttacgcaag aacgacagcg 1740 gcttacgcaa gagcttgaag aattagagga gcaacatcag caaagacaca aatcattaaa 1800 agaagcacat gtccttgcat ttcaaactat ggaagaggaa aaggaaaagg agcaaagagc 1860
79/86 tcttgaaaat catttacaac agaagcattc tgcagagctt caatcactaa aagatgcaca 1920 cagagagtca atggagggct tccggataga aatggaacag gaacttcaga ctcttcggtt 1980 tgaattagaa gatgaaggaa aggctatgct tgcttccttg cgctcagaac tcaaccatca 2040 acatgcagct gcaattgatt tgttacggca taatcatcat caagaattgg cagctgctaa 2100 aatggaatta gagagaagca tagacatcag cagaagacag agtaaggagc acatatgtag 2160 aattacagat ctacaagagg aattaagaca cagagagcat cacatctctg aattggataa 2220 ggaggttcag caccttcatg agaatataag tgccctaacc aaagaactgg aatttaaggg 2280 gaaagaaatt ctcagaatac gaagtgaatc taaccaacag ataaggttgc atgaacaaga 2340 tttaaacaag agacttgaaa aagagttgga tgtcatgaca gcagaccacc tcagagagaa 2400 aaatatcatg cgggcagatt ttaataagac taacgagcta ctcaaggaaa taaatgccgc 2460 tttacaagtg tcattagaag aaatggaaga aaaatatcta atgagagaat caaaaccaga 2520 agatatacag atgattacag aattaaaagc catgcttaca gaaagagacc agatcataaa 2580 gaaactaatt gaggataata agttttatca gctggaatta gtcaatcgag aaactaactt 2640 caacaaagtg tttaactcaa gtcctactgt tggtgttatt aatccattgg ctaagcaaaa 2700 gaagaagaat gataaatcac caacaaacag gtttgtgagt gttcccaatc taagtgctct 2760 ggaatctggt ggagtgggca atggacatcc taaccgcctg gatcccattc ctaattctcc 2820 agtccacgat attgagttca acagcagcaa accacttcca cagccagtgc cacctaaagg 2880 gcccaagaca tttttgagtc ctgctcagag tgaagcttct ccagtggctt ctccagatcc 2940 ccagcgccag gagtggtttg cccggtactt cacattctga aagaattgtg ttggcacagc 3000 tctgtataga ctgttactaa gagcatgact ttatacagat tgttatgtaa ataggctttc 3060 ctatgtcaaa cactgtgaat gagaaagtat ttgtctctcc aacttgaaaa tgcactgtat 3120 ttcctgtgat atttattgga atcattctat aaggtactat attatgtgtg taattataac 3180 tgttattttt atttgagatg gaagagtctt taacctttgt aattactgca taataaattt 3240 tgttagaatc aaaaaaa 3257
<210> 51
<211> 2031
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 3049682CB1
<400> 51 cagcttttca gcagcagaca ctccacccca aagcctgcag aagggatttt gtgaagaggg 60 tcaccaggct gagcctcggc cagaacccgt ctacagagga ccctcagcca gagcagaaag 120 ctcctgagcc agctcccttg gatggactcc cagaggcctg agcccagaga ggaggaggag 180 gaggaacagg aactgcggtg gatggagctg gactccgaag aggccctggg aaccaggaca 240 gaggggccta gtgttgtcca gggctggggg cacctgctcc aggccgtgtg gaggggccct 300 gcaggcctgg tgacgcagct gctgcggcaa ggtgccagcg tggaggagag ggaccacgca 360 ggccggaccc cgctccacct ggccgtgctg cggggccacg cgcccctggt gcgtctcctg 420 ctgcagcgag gggccccggt gggcgcggtg gaccgggcgg ggcgcaccgc gctgcacgag 480 gccgcctggc acggacactc gcgggtggcc gagctgctgc tgcagcgcgg ggcctcggcg 540 gcggctcgct ccgggacggg cctcacgccg ctgcactggg ccgctgccct gggccacacg 600 ctgctggccg cgcgcctgct ggaggctccg ggcccgggac ccgcggcagc ggaggcggag 660 gacgcgcgcg gctggacggc ggcgcactgg gcggccgcgg gcgggcggct ggcggtgctg 720 gagctgctgg cggccggcgg cgcgggcctg gacggcgccc tgctcgtggc tgccgctgcg 780 gggcgcgggg cggcgctgcg cttcctcctg gcgcgcgggg cgcgggtgga cgcccgggat 840 ggcgcggggg ccacagcgct gggtctggcg gccgccctag gccgctccca ggacattgag 900 gtgctgctgg gccacggggc agacccaggc atcagggaca ggcatggccg ctctgcgctg 960 cacagggctg ccgcccgagg acacctgctt gccgtccagt tgctggtcac ccagggggcc 1020 gaggtggatg cgcgggacac cctgggcctc acacccctgc atcacgcctc tcgggaaggc 1080 cacgtggagg ttgccggctg cctgctggac aggggtgccc aggtggatgc taccggctgg 1140 ctccgaaaga cccccctaca cctggctgca gagcgagggc atgggcctac cgtggggctt 1200 ctgctgagcc gaggggccag ccccactctg cggacgcagt gggccgaggt ggcccagatg 1260 cctgaggggg acctgcccca ggcgctgcct gaacttggag ggggggagaa ggagtgtgag 1320 ggcatagagt ccacgggctg agccagacag caggctccag gctccaccgc cccagtgatt 1380 tccaggctct ctggctgagg ctgcctgcct ggaggggaca tcagggaaga ggcttccgga 1440 ggaggggatg ggagaaagta ggggatgtgg cttgagctgc agtcacaggc cttggctgga 1500 ccagggatgg cccccagctc ccaggagggc ccactgaccc tgcagctcca gccttctcca 1560 tacttcaaca aagaatgagt tgtggcaatg agggaagaga gaccctctca tagtgtttta 1620 tactcagtac ctgttttaag aaaaaacaac aaggaagtaa aaccaaagac aggcaggcag 1680
80/86 cctggcgcta ggcccgaaac caggcctgcg cctgcctggc ctaaacccag tagttgaaaa 1740 tcaattcata acttagaaac cgatgttatt catagattcc agacattgta tagaagaaca 1800 tttgtgaaac tccctgccgt gttctgtttc tctctgaccg ccggtgcatg cagcccctgt 1860 cacgtaccgc ctgcttgctc aaatcaatga cgaccctttc atgtgaaatc ttcggtgttg 1920 tgagccctta aaagggacag aaattgtgca cttggggagc tcggatttta aggcagtagc 1980 ttgccgatgc tcccagctga ataaagccct tccttctaaa aaaaaaaaaa a 2031
<210> 52
<211> 2576
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 914468CB1
<400> 52 tacgtattga aataaaaaaa aaaaagaaga agaacaaatg attcaatgga aaggaatgaa 60 tgaaattcct gagctgaaaa ctgcaagatg ggtattaatc aggacagaaa ggtgttccac 120 gcacagggaa cagaatatgc aaaagcctaa atcctaaatg tgggaagcag cctcacctct 180 ctgcaaccag ttctttgtct cataatctgc agctctgtgt ctatccctgt ctttccaggc 240 tcagcctcac tgttctccat ctctccgcag gcaccggcgc cccttcgtgg cggcacagaa 300 gaaccgctcc cgggcggcgt cgggtggggc agcgctggcc agtcctggcc cggggaccgg 360 atcaggggcc ccagctgggt ctggaggcaa ggagcgctca gaaaacttgt ctttgcggcg 420 cagcgtgtcg gagcttagcc ttcaggggcg gcggcggcgg cagcaggagc ggagacagca 480 ggcacttagc atggccccag gggcagccga cgcccaaatc ggaactgcag accccgggga 540 cttcgatcag ttgactcagt gcctcatcca ggcccccagc aaccgcccct acttcctgct 600 gctccagggc taccaggacg cccaggactt tgtggtgtat gtgatgacgc gagagcagca 660 cgtgtttggg cgaggtggga actcgtctgg ccgcgggggg tccccggctc cctatgtgga 720 caccttcctc aacgccccgg acatcctgcc gcgtcactgc acagtgcgcg cgggccctga 780 gcacccggcc atggtgcgcc cgtcccgggg cgccccagtc acgcacaacg ggtgcctcct 840 gctgcgggag gctgagctgc acccgggcga cctcctgggg ctgggcgagc acttcctgtt 900 catgtacaag gacccccgca ctgggggctc ggggcctgcg aggccgccgt ggctgcccgc 960 gcgccccggg gccacgccgc caggccctgg ctgggccttc tcctgtcgcc tgtgcggccg 1020 cggcctgcag gagcgcggcg aggcactggc cgcctacctg gacggccgtg agccagtcct 1080 gcgcttccgg ccgcgcgagg aggaggcgct gctgggcgag atcgtgcgcg ccgcagccgc 1140 cggctcggga gacctgccgc ccctcgggcc cgccacgctg ctggcgctgt gcgtgcagca 1200 ttccgcccgt gagctggagc tgggccacct gccacgactg ctgggctgcc tggcccggct 1260 catcaaggag gccgtctggg aaaagattaa ggaaattgga gaccgtcagc cagaaaacca 1320 ccctgagggg gtccccgagg tgcccctgac tcctgaagct gtgtctgtgg agctgcggcc 1380 actcatgctg tggatggcca acaccacgga gctgcttagc tttgtgcagg agaaggtgct 1440 ggaaatggag aaggaggctg accaagagga cccacagctc tgcaatgact tggaattatg 1500 tgatgaggcc atggccctcc tggatgaggt catcatgtgt accttccagc agtctgtcta 1560 ctacctcacc aagactctct attcaacgct gcctgctctc ctggatagta accctttcac 1620 agctggtgca gagctgccgg ggcctggcgc ggagctgggg gccatgcctc caggattgag 1680 acctaccctg ggcgtgttcc aggcagcctt ggagctgacc agccagtgcg agctgcaccc 1740 tgacctcgtg tctcagactt ttggctactt gttcttcttc tccaacgcat cccttctcaa 1800 ctcgctgatg gaacgaggtc aaggccggcc tttctatcaa tggtcccgag ctgttcaaat 1860 ccgaaccaac ctggacctcg tcttggactg gctacaggga gctgggctgg gcgacattgc 1920 cactgagttc ttccggaaac tctccatggc tgtgaacctg ctctgtgtgc cccgcacttc 1980 cctgctcaag gcttcatgga gcagcctaag aaccgaccac cccaccttga cccccgccca 2040 gctgcaccat ctgctcagcc actatcagct gggccctggc cgcgggccgc cagccgcgtg 2100 ggaccctccc cctgcagagc gggaggctgt ggacacaggg gacatcttcg aaagcttctc 2160 ctcgcacccg cccctcatcc tccccctggg gagctcgcgc ctgcgcctca ctggtccagt 2220 gacggacgat gccttgcacc gtgaactccg taggctccgc cgcctcctct gggatcttga 2280 gcagcaggag ctgccagcca attatcgcca tgggcctccc gtggccacgt ctccttgaga 2340 accaatacca aacgagcgcg cgaaccttga aatgtcacgg gcttctacgg acaggagccc 2400 gcctgagcgc aaagctttct gggagttgta gttcttatcc cgcgtggaat gttgggagat 2460 tgagttttcg ggaagtagcg gatgggacgg tgggagcatg ggcttaggat gtgaatgcca 2520 gggagcaata aaggtatccg tggtatcggc aaaaaaaaaa aaaaaaaaaa aaaaaa 2576
<210> 53 <211> 1534
81/86 <212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2673631CB1
<400> 53 gactgggggg tgtgaggaac aggggggacc atggacttca tcagcattca gcagttggta 60 agtggagaaa gagttgaagg gaaagtgttg ggatttggac atggagttcc tgaccctgga 120 gcctggccta gtgactggag gaggggcccc caagaggctg tggcccggga gaagctgaaa 180 ttggaagaag agaagaagaa gaaacttgaa agatttaaca gtaccagatt taatctggat 240 aacctggctg acttggaaaa cttggttcaa agacggaaaa agcgactgag acacagagtc 300 ccccccagga aacctgagcc cctggttaag ccgcagtccc aggcccaggt ggagcctgtg 360 -ggcctggaga tgttcctgaa ggcagctgct gagaaccagg agtacctgat tgacaagtac 420 ttgacagacg gaggggaccc caatgcccat gacaagctcc accgcaccgc cttgcactgg 480 gcctgtctga agggtcacag ccagctggtg aacaagctgc tggtggcagg tgccacagtg 540 gacgcgcgag acttgctgga caggacacct gtgttctggg cctgccgcgg aggacatctg 600 gtcatcctca aacagctgct taaccaggga gcccgggtca atgcccggga caagatcggg 660 agcacccccc tgcacgtggc agtgcgcacc cggcaccccg actgcctgga gcacctcatc 720 gagtgtggcg cccacctgaa cgcacaggat aaggaagggg acacggctct gcacgaggcc 780 gtgcggcacg gcagctacaa agccatgaag ctactgctgc tctatggggc cgagctgggg 840 gtgcggaacg cggcctccgt gaccccggtg cagctggctc gagactggca gcgcggcatc 900 cgggaggccc tgcaggccca cgtggcgcat ccccgcaccc ggtgctgacc gcagcaccgc 960 cccccgccgc gcctttcgca ctgccaccat tccatcctgt gccccgcccc cgcgtctgca 1020 cctctgtggt tcctgccctc agccctggtt cctccctctc tggcctgtgc cgcctcagca 1080 gccctggcag aactgaagag cggcaccggg cccagcaggc aaagagagag gcctccctgg 1140 cttcgagtgt caggggagcc gcgttccctc ccagggctgg agcagaggac cacaaggcag 1200 cagaaagcgc gggtccagat gagggccagg aaggggagga gagtgagggc caagaacgag 1260. ccttaaggga gcagtcccaa gctggagcca cccagggctg ggtctgggag tcctcagtgt 1320 ccacttgtcc cagaggatcc acctggttca tgaaccctcc ctcactgctc tctgcacatc 1380 acggccacac agcacctgca gggaggctgt ggggaggtgt ggagcaggtg caacaggcag 1440 ctactctcct gggggccaca cggcgggaga gaggattcga tgcagcatga cgatcccttc 1500 ctcccaggca tgacctcttc tcagaacaca gggc 1534
<210> 54
<211> 5633
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 2755454CB1
<400> 54 gcggagaggg aagaatatgg ccgccgggtg tggtgagggc gacgcgcttg cagtcgccgt 60 ctcttgcttc cccgtcctct gacatcgcct gcagccgagc gggcccgttc cgccggagct 120 gaggaccagg tattcaaata aagttaattg cagctttctg tgaaaatgtc agttttgata 180 tcacagagcg tcataaatta tgtagaggaa gaaaacattc ctgctctgaa agctcttctt 240 gaaaaatgca aagatgtaga tgagagaaat gagtgtggcc agactccact gatgatagct 300 gccgaacaag gcaatctgga aatagtgaag gaattaatta agaatggagc taactgcaat 360 ctggaagatt tggataattg gacagcactt atatctgcat cgaaagaagg gcatgtgcac 420 atcgtagagg aactactgaa atgtggggtt aacttggagc accgtgatat gggaggatgg 480 acagctctta tgtgggcatg ttacaaaggc cgtactgacg tagtagagtt gcttctttct 540 catggtgcca atccaagtgt cactggtctg cagtacagtg tttacccaat catttgggca 600 gcagggagag gccatgcaga tatagttcat cttttactgc aaaatggtgc taaagtcaac 660 tgctctgata agtatggaac caccccttta gtttgggctg cacgaaaggg tcatttggaa 720 tgtgtgaaac atttattggc catgggagct gatgtggatc aagaaggagc taattcaatg 780 actgcactta ttgtggcagt gaaaggaggt tacacacagt cagtaaaaga aattttgaag 840 aggaatccaa atgtaaactt aacagataaa gatggaaata cagctttgat gattgcatca 900 aaggagggac atacggagat tgtgcaggat ctgctcgacg ctggaacata tgtgaacata 960 cctgacagga gtggggatac tgtgttgatt ggcgctgtca gaggtggtca tgttgaaatt 1020 gttcgagcgc ttctccaaaa atatgctgat atagacatta gaggacagga taataaaact 1080
82/86 gctttgtatt gggctgttga gaaaggaaat gcaacaatgg tgagagatat cttacagtgc 1140 aatcctgaca ctgaaatatg cacaaaggat ggtgaaacgc cacttataaa ggctaccaag 1200 atgagaaaca ttgaagtggt ggagctgctg ctagataaag gtgctaaagt gtctgctgta 1260 gataagaaag gagatactcc cttgcatatt gctattcgtg gaaggagccg gaaactggca 1320 gaactgcttt taagaaatcc caaagatggg cgattacttt ataggcccaa caaagcaggc 1380 gagactcctt ataatattga ctgtagccat cagaagagta ttttaactca aatatttgga 1440 gccagacact tgtctcctac tgaaacagac ggtgacatgc ttggatatga tttatatagc 1500 agtgccctgg cagatattct cagtgagcct accatgcagc cacccatttg tgtggggtta 1560 tatgcacagt ggggaagtgg gaaatctttc ttactcaaga aactagaaga cgaaatgaaa 1620 accttcgccg gacaacagat tgagcctctc tttcagttct catggctcat agtgtttctt 1680 accctgctac tttgtggagg gcttggttta ttgtttgcct tcacggtcca cccaaatctt 1740 ggaatagcag tgtcactgag cttcttggct ctcttatata tattctttat tgtcatttac 1800 tttggtggac gaagagaagg agagagttgg aattgggcct gggtcctcag cactagattg 1860 gcaagacata ttggatattt ggaactcctc cttaaattga tgtttgtgaa tccacctgag 1920 ttgccagagc agactactaa agctttacct gtgaggtttt tgtttacaga ttacaataga 1980 ctgtccagtg taggtggaga aacttctctg gctgaaatga ttgcaaccct ctcggatgct 2040 tgtgaaagag agtttggctt tttggcaacc aggctttttc gagtattcaa gactgaagat 2100 actcagggta aaaagaaatg gaaaaaaaca tgttgtctcc catcttttgt catcttcctt 2160 tttatcattg gctgcattat atctggaatt actcttctgg ctatatttag agttgaccca 2220 aagcatctga ctgtaaatgc tgtcctcata tcaatcgcat ctgtagtggg attggccttt 2280 gtgttgaact gtcgtacatg gtggcaagtg ctggactcgc tcctgaattc ccaaagaaaa 2340 cgcctccata atgcagcctc caaactgcac aaattgaaaa gtgaaggatt catgaaagtt 2400 cttaaatgtg aagtggaatt gatggccagg atggcaaaaa ccattgacag cttcactcag 2460 aatcagacaa ggctggtggt catcatcgat ggattagatg cctgtgagca ggacaaagtc 2520 cttcagatgc tggacactgt ccgagttctg ttttcaaaag gcccgttcat tgccattttt 2580 gcaagtgatc cacatattat cataaaggca attaaccaga acctcaatag tgtgcttcgg 2640 gattcaaata taaatggcca tgactacatg cgcaacatag tccacttgcc tgtgttcctt 2700 aatagtcgtg gactaagcaa tgcaagaaaa tttctcgtaa cttcagcaac aaatggagac 2760 gttccatgct cagatactac agggatacag gaagatgctg acagaagagt ttcacagaac 2820 agccttgggg agatgacaaa acttggtagc aagacagccc tcaatagacg ggacacttac 2880 cgaagaaggc agatgcagag gaccatcact cgccagatgt cctttgatct tacaaaactg 2940 ctggttaccg aggactggtt cagtgacatc agtccccaga ccatgagaag attacttaat 3000 attgtttctg tgacaggacg attactgaga gccaatcaga ttagtttcaa ctgggacagg 3060' cttgctagct ggatcaacct tactgagcag tggccatacc ggacttcatg gctcatatta 3120 tatttggaag agactgaagg tattccagat caaatgacat taaaaaccat ctacgaaaga 3180 atatcaaaga atattccaac aactaaggat gttgagccac ttcttgaaat tgatggagat 3240 ataagaaatt ttgaagtgtt tttgtcttca aggaccccag ttcttgtggc tcgagatgta 3300 aaagtctttt tgccatgcac tgtaaaccta gatcccaaac tacgggaaat tattgcagat 3360 gttcgtgctg ccagagagca gatcagtatt ggaggactgg cgtacccccc gctccctcta 3420 catgagggtc ctcctagggc gccatcaggg tacagccagc ccccatccgt gtgctcttcc 3480 acgtccttca atgggccctt cgcaggtgga gtggtgtcac cacagcctca cagcagctat 3540 tacagcggca tgacgggccc tcagcatccc ttctacaaca gggggtcagg cccagcccca 3600 ggcccagtgg tattactgaa ttcactgaat gtggatgcag tatgtgagaa gctgaaacaa 3660 atagaagggc tggaccagag tatgctgcct cagtattgta ccacgatcaa aaaggcaaac 3720 ataaatggcc gtgtgttagc tcagtgtaac attgatgagc tgaagaaaga gatgaatatg 3780 aattttggag actggcacct tttcagaagc acagtactag aaatgagaaa cgcagaaagc 3840 cacgtggtcc ctgaagaccc acgtttcctc agtgagagca gcagtggccc agccccgcac 3900 ggtgagcctg ctcgccgcgc ttcccacaac gagctgcctc acaccgagct ctccagccag 3960 acgccctaca cactcaactt cagcttcgaa gagctgaaca cgcttggcct ggatgaaggt 4020 gcccctcgtc acagtaatct aagttggcag tcacaaactc gcagaacccc aagtctttcg 4080 agtctcaatt cccaggattc cagtattgaa atttcaaagc ttactgataa ggtgcaggcc 4140 gagtatagag atgcctatag agaatacatt gctcagatgt cccagttaga agggggcccc 4200 gggtctacaa ccattagtgg cagatcttct ccacatagca catattacat gggtcagagt 4260 tcatcagggg gctctattca ttcaaaccta gagcaagaaa aggggaagga tagtgaacca 4320 aagcccgatg atgggaggaa gtcctttcta atgaagaggg gagatgttat cgattattca 4380 tcatcagggg tttccaccaa cgatgcttcc cccctggatc ctatcactga agaagatgaa 4440 aaatcagatc agtcaggcag taagcttctc ccaggcaaga aatcttccga aaggtcaagc 4500 ctcttccaga cagatttgaa gcttaaggga agtgggctgc gctatcaaaa actcccaagt 4560 gacgaggatg aatctggcac agaagaatca gataacactc cactgctcaa agatgacaaa 4620 gacagaaaag ccgaagggaa agtagagaga gtgccgaagt ctccagaaca cagtgctgag 4680 ccgatcagaa ccttcattaa agccaaagag tatttatcgg atgcgctcct tgacaaaaag 4740 gattcatcgg attcaggagt gagatccagt gaaagttctc ccaatcactc tctgcacaat 4800 gaagtggcgg atgactccca gcttgaaaag gcaaatctca tagagctgga agatgacagt 4860
83/86 cacagcggaa agcggggaat cccacatagc ctgagtggcc tgcaagatcc aattatagct 4920 cggatgtcca tttgttcaga agacaagaaa agcccttccg aatgcagctt gatagccagc 4980 agccctgaag aaaactggcc tgcatgccag aaagcctaca acctgaaccg aactcccagc 5040 accgtgactc tgaacaacaa tagtgctcca gccaacagag ccaatcaaaa tttcgatgag 5100 atggagggaa ttagggagac ttctcaagtc attttgaggc ctagttccag tcccaaccca 5160 accactattc agaatgagaa tctaaaaagc atgacacata agcgaagcca acgttcaagt 5220 tacacaaggc tctccaaaga tcctccggag ctccatgcag cagcctcttc tgagagcaca 5280 ggctttggag aagaaagaga aagcattctt tgagaaaaac aagcaaaagg agaagagtgt 5340 tactgtaccc ttatgacaga attgtcctgg attttgactc catccacgcc catcaccttt 5400 ctacattttg ctgacagata actaaccgat gatgagggcc gagggtacaa cacgagacat 5460 cttgccgtgt gacagaaggg agcatgaaaa gccatggttc acacaaggca agcttctgtg 5520 ggctttgtat tagaagcttt cgaactccac taatatatct gtggctttca ttggggcctt 5580 tccccataaa attttttgag accaggggcg accggggatt aaacaacggg cca 5633
<210> 55
<211> 4587
<212> DNA
<213> Homo sapiens
<220>
<221> misc_feature
<223> Incyte ID No: 5868348CB1
<400> 55 gcgatctgag tagccagcgt cgccggcgac cgcggagttc tgggctagtg ggaccccgcg 60 cgggctggtt cgggatgagc gatggcatcg gtcaaggtgg ccgtgagggt ccggcccatg 120 aatcgcaggg aaaaggactt ggaggccaag ttcattattc agatggagaa aagcaaaacg 180 acaatcacaa acttaaagat accagaagga ggcactgggg actcaggaag agaacggacc 240 aagaccttca cctatgactt ttctttttat tctgctgata caaaaagccc agattacgtt 300 tcacaagaaa tggttttcaa aaccctcggc acagatgtcg tgaagtctgc atttgaaggt 360 tataatgctt gtgtctttgc atatgggcaa actggatctg gaaagtcata cactatgatg 420 ggaaattctg gagattctgg cttaatacct cggatctgtg aaggactctt cagtcggata 480 aatgaaacca ccagatggga tgaagcttct tttcgaactg aagtcagcta cttagaaatt 540 tataacgaac gtgtgagaga tctacttcgg cggaagtcat ctaaaacctt caatttgaga 600 gtccgtgagc atcccaaaga aggcccttat gttgaggatt tatccaaaca tttagtacag 660 aattatggtg acgtagaaga acttatggat gcgggcaata tcaaccggac caccgcagcg 720 actgggatga acgacgtcag tagcaggtct catgccatct tcaccatcaa gttcactcag 780 gctaaatttg attctgaaat gccatgtgaa accgtcagta agatccactt ggttgatctt 840 gccggaagtg agcgtgcaga tgccaccgga gccaccgggg ttaggctaaa ggaaggggga 900 aatattaaca agtccctcgt gactctgggg aacgtcattt ctgccttagc tgatttatct 960 caggatgctg caaatactct tgcaaagaag aagcaagttt tcgtgcctta cagggattct 1020 gtgttgactt ggttgttaaa agatagcctt ggaggaaact ctaaaactat catgattgcc 1080 accatttcac ctgctgatgt caattatgga gaaaccctaa gtactcttcg ctatgcaaat 1140 agagccaaaa acatcatcaa caagcctacc attaatgagg atgccaacgt caaacttatc 1200 cgtgagctgc gagctgaaat agccagactg aaaacgctgc ttgctcaagg gaatcagatt 1260 gccctcttag actcccccac agctttaagt atggaggaaa aacttcagca gaatgaagca 1320 agagttcaag aattgaccaa ggaatggaca aataagtgga atgaaaccca aaatattttg 1380 aaagaacaaa ctctagccct caggaaagaa gggattggag ttgttttgga ttctgaactg 1440 cctcatttga ttggcatcga tgatgacctt ttgagtactg gaatcatctt atatcattta 1500 aaggaaggtc agacatacgt tggtagagac gatgcttcca cggagcaaga tattgttctt 1560 catggccttg acttggagag tgagcattgc atctttgaaa atatcggggg gacagtgact 1620 ctgatacccc tgagtgggtc ccagtgctct gtgaatggtg ttcagatcgt ggaggccaca 1680 catctaaatc aaggtgctgt gattctcttg ggaagaacca atatgtttcg ctttaaccat 1740 ccaaaggaag ccgccaagct cagggagaag aggaagagtg gccttctgtc ctccttcagc 1800 ttgtccatga ccgacctctc gaagtcccgt gagaacctgt ctgcagtcat gttgtataac 1860 cccggacttg aatttgagag gcaacagcgt gaagaacttg aaaaattaga aagtaaaagg 1920 aaactcatag aagaaatgga ggaaaagcag aaatcagaca aggctgaact ggagcggatg 1980 cagcaggagg tggagaccca gcgcaaggag acagaaatcg tgcagctcca gattcgcaag 2040 caggaggaga gcctcaaacg ccgcagcttc cacatcgaga acaagctaaa ggatttactt 2100 gcggagaagg aaaaatttga agaggagagg ctgagggaac agcaggaaat cgagctgcag 2160 aagaagagac aagaagaaga gacctttctc cgcgtccaag aagaactcca acgactcaaa 2220 gaactcaaca acaacgagaa ggctgagaag tttcagatat ttcaagaact ggaccagctc 2280 caaaaggaaa aagatgaaca gtatgccaag cttgaactgg aaaaaaagag actagaggag 2340
84/86 caggagaagg agcaggtcat gctcgtggcc catctggaag agcagctccg agagaagcag 2400 gagatgatcc agctcctgcg gcgtggggag gtacagtggg tggaagagga gaagagggac 2460 ctggaaggca ttcgggaatc cctcctgcgg gtgaaggagg ctcgtgccgg aggggatgaa 2520 gatggcgagg agttagaaaa ggctcaactg cgtttcttcg aattcaagag aaggcagctt 2580 gtcaagctag tgaacttgga gaaggacctg gttcagcaga aagacatcct gaaaaaagaa 2640 gtccaagaag aacaggagat cctagagtgt ttaaaatgtg aacatgacaa agaatctaga 2700 ttgttggaaa aacatgatga gagtgtcaca gatgtcacgg aagtgcctca agatttcgag 2760 aaaataaagc cagtggagta caggctgcaa tataaagaac gccagctaca gtacctcctg 2820 cagaatcact tgccaactct gttggaagaa aagcagagag catttgaaat tcttgacaga 2880 ggccctctca gcttagacaa cactctttat caagtagaaa aggaaatgga agaaaaagaa 2940 gaacagcttg cacagtacca ggccaatgca aaccagctgc aaaagctcca agccaccttt 3000 gaattcactg ccaacattgc acgtcaggag gaaaaagtga ggaaaaagga aaaggagatt 3060 ttggagtcca gagagaagca gcagagagag gcgctggagc gggccctggc caggctggag 3120 aggagacatt ctgcgctgca gaggcactcc accctgggca cggagattga agagcagagg 3180 cagaaacttg ccagtctgaa cagtggcagc agagagcagt cagggctcca ggctagcctg 3240 gaggctgagc aggaagccct ggagaaggac caggagaggt tagaatatga aatccagcag 3300 ctgaaacaga agatttatga ggtcgatggt gttcaaaaag atcatcatgg gaccctggaa 3360 gggaaggtgg cttcttccag cttgccagtc agtgctgaaa aatcacacct ggttcccctc 3420 atggatgcca ggatcaatgc ttacattgaa gaagaagtcc aaagacgcct tcaggatttg 3480 catcgtgtga ttagtgaagg ctgcagtaca tctgcagaca cgatgaagga taatgagaaa 3540 cttcacaagg gcaccattca acgtaaacta aaatatgagc tgtgtcgtga cctcctgtgt 3600 gtcctgatgc cagagcctga tgccgctgcc tgcgctaatc atcccttgct ccagcaagat 3660 ctggttcagc tttctcttga ttggaaaaca gaaatccctg atttagtttt gccaaatgga 3720 gttcaggtgt catccaaatt ccagactacc ttggttgaca tgatttactt tcttcatgga 3780 aatatggaag tcaatgtccc ttccctggca gaagttcagt tactgctcta cacaacagtg 3840 aaagtcatgg gtgactctgg ccatgaccag tgccagtcgc tagtccttct gaacacccac 3900 attgcactgg tgaaggaaga ctgtgttttt tatccacgca ttcgatctcg aaacatacct 3960 cctccgggtg cacaatttga tgtgatcaaa tgccatgctt taagtgaatt caggtgtgtt 4020 gttgttccag aaaagaaaaa tgtgtcaaca gtagaactag tcttcttaca gaaactcaaa 4080 ccttcagtgg gttccagaaa tagtccacct gagcaccttc aggaagcccc aaatgtccag 4140 ttgttcacca ccccattgta tcttcaaggc agtcagaatg tcgcacctga ggtctggaaa 4200 cttactttca attctcaaga tgaggctctt tggctaatct cacatttgac aagactctaa 4260 ggaggagacc tttaaagatg cactacatgt tttttgagat cattaataaa ataagcattg 4320 tgaaaacagt caaggcaata tgaatatctc cgtgtagcta attgaattgg aactggaaaa 4380 atgcagacct ctaaaattga aaatgtaaat attttaaata tctacaataa aataaaaaca 4440 gctaatagca gagccccaat aaaatatctt tatcatcacc ttgcttcatt ttcttgaaac 4500 tcaggcttgt aaatttgtgc ctgcttcatt atttgtgagg tgattaaagc atttctgatt 4560 gttaaacaaa acaaaaaagg gggggcg 4587
<210> 56
<211> 1509
<212> DNA
<213> Homo sapiens
<220>
<221> misc. _feature
<223> Incyte ID No: 2055455CB1
<400> 56 cggaagcatc catggcggag ggcggcagcc cagacgggcg ggcagggccg gggctccgca 60 gtgcaggtcg taatctgaag gagtggctga gggagcaatt ttgtgatcat ccgctggagc 120 actgtgagga cacgaggctc catgatgcag cttacgtcgg ggacctccag accctcagga 180 gcctattgca agaggagagc taccggagcc gcatcaacga gaagtctgtc tggtgctgtg 240 gctggctccc ctgcacaccg ttgcgaatcg cggccactgc aggccatggg agctgtgtgg 300 acttcctcat ccggaagggg gccgaggtgg atctggtgga cgtaaaagga cagacggccc 360 tgtatgtggc tgtggtgaac gggcacctag agagtaccca gatccttctc gaagctggcg 420 cggaccccaa cggaagccgg caccatcgca gcacccctgt ctaccacgcc tctcgcgtgg 480 gccgggcaga catcctgaag gccctcatca ggtacggggc tgatgttgac gtcaaccacc 540 acctgactcc tgatgtccag cctcgattct cccggcggct cacctccttg gtggtctgcc 600 ccttgtacat cagcgcagcc taccacaacc tccagtgctt ccggctgctc ctcctggctg 660 gcgcgaaccc .tgacttcaac tgcaatggtc ctgtcaacac acagggattc tacaggggct 720 cccctgggtg cgtcatggat gctgttctgc gccacggctg tgaggcagcc ttcgtgagcc 780 tgctggtaga atttggagcc aacctgaatc tagtgaagtg ggaatcgctg ggcccagagt 840
85/86 cgagaggaag aagaaaagtg gaccctgagg ccttgcaggt ctttaaagag gccagaagtg 900 ttcccagaac cttgctgtgt ctgtgccgtg tggctgtgag aagagctctt ggcaaacacc 960 ggcttcatct gattccttcg ctgcctctgc cagaccccat aaagaagttt ctactccatg 1020 agtagactcc aagtgctgcg gttgattcca gtgagggaga aagtgatctg cagggaggtg 1080 gacaccgagc cctgagtgct gtgctgctgc tggtctcctg atggctgttg ctgcagaaga 1140 tgtcctcgta gactgtcatt gctcctcagg tgcctgggcc gctgaacagt ccttgggtca 1200 ttgtcagctg agaggcttat actaaagtta ttattgtttt tcccaaaaaa aaaaaaaaaa 1260 aaaaaaaaaa aaaaaagatg acaaaaaaaa agaagggggg ggccgccacc caataggtgt 1320 gtaccctcgc tgcacacgcg gagttattta ttctcgggca gcgatacttt cgagaggtgt 1380 gtggagagat attatgatat aactttttta agaacggacc accaccagga ggggggcccc 1440 gagatcacaa tgttcgcctt aatgtgtgat tttataacgc gcccactgtg gcggtgttaa 1500 aaagtgtgt 1509
86/86
EP02757814A 2001-03-29 2002-03-25 Cytoskeletion-associated proteins Withdrawn EP1373306A4 (en)

Applications Claiming Priority (15)

Application Number Priority Date Filing Date Title
US28050801P 2001-03-29 2001-03-29
US280508P 2001-03-29
US28132301P 2001-04-03 2001-04-03
US281323P 2001-04-03
US28376901P 2001-04-13 2001-04-13
US283769P 2001-04-13
US28860901P 2001-05-04 2001-05-04
US288609P 2001-05-04
US29051801P 2001-05-10 2001-05-10
US290518P 2001-05-10
US29187001P 2001-05-18 2001-05-18
US291870P 2001-05-18
US29445101P 2001-05-29 2001-05-29
US294451P 2001-05-29
PCT/US2002/009288 WO2002079404A2 (en) 2001-03-29 2002-03-25 Cytoskeleton-associated proteins

Publications (2)

Publication Number Publication Date
EP1373306A2 true EP1373306A2 (en) 2004-01-02
EP1373306A4 EP1373306A4 (en) 2005-07-20

Family

ID=27569565

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02757814A Withdrawn EP1373306A4 (en) 2001-03-29 2002-03-25 Cytoskeletion-associated proteins

Country Status (5)

Country Link
EP (1) EP1373306A4 (en)
JP (1) JP2004533227A (en)
AU (1) AU2002306879A1 (en)
CA (1) CA2441654A1 (en)
WO (1) WO2002079404A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1402031A2 (en) 2001-06-21 2004-03-31 Isis Innovation Limited Atopy
AU2003228872A1 (en) * 2002-05-10 2003-11-11 Incyte Corporation Cell adhesion and extracellular matrix proteins

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017355A2 (en) * 1998-09-18 2000-03-30 Incyte Pharmaceuticals, Inc. Human cytoskeleton associated proteins
WO2000056883A1 (en) * 1999-03-23 2000-09-28 Human Genome Sciences, Inc. 49 human secreted proteins
WO2002068579A2 (en) * 2001-01-10 2002-09-06 Pe Corporation (Ny) Kits, such as nucleic acid arrays, comprising a majority of human exons or transcripts, for detecting expression and other uses thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017355A2 (en) * 1998-09-18 2000-03-30 Incyte Pharmaceuticals, Inc. Human cytoskeleton associated proteins
WO2000056883A1 (en) * 1999-03-23 2000-09-28 Human Genome Sciences, Inc. 49 human secreted proteins
WO2002068579A2 (en) * 2001-01-10 2002-09-06 Pe Corporation (Ny) Kits, such as nucleic acid arrays, comprising a majority of human exons or transcripts, for detecting expression and other uses thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BERNAL S D ET AL: "CYTOSKELETON-ASSOCIATED PROTEINS THEIR ROLE AS CELLULAR INTEGRATORS IN THE NEOPLASTIC PROCESS" CRITICAL REVIEWS IN ONCOLOGY-HEMATOLOGY, vol. 3, no. 3, 1985, pages 191-204, XP009042995 ISSN: 1040-8428 *
SATO H ET AL: "The Genomic Organization of Type I Keratin Genes in Mice" GENOMICS, ACADEMIC PRESS, SAN DIEGO, US, vol. 56, no. 3, 15 March 1999 (1999-03-15), pages 303-309, XP004444979 ISSN: 0888-7543 *
See also references of WO02079404A2 *

Also Published As

Publication number Publication date
AU2002306879A1 (en) 2002-10-15
EP1373306A4 (en) 2005-07-20
WO2002079404A2 (en) 2002-10-10
JP2004533227A (en) 2004-11-04
WO2002079404A3 (en) 2003-03-20
CA2441654A1 (en) 2002-10-10

Similar Documents

Publication Publication Date Title
WO2002004520A2 (en) Transporters and ion channels
EP1257578A2 (en) Transporters and ion channels
JP2004533205A (en) Transporters and ion channels
WO2002046415A2 (en) Polynucleotide and polypeptide sequences of putative transporters and ion channells
EP1412387A2 (en) Transporters and ion channels
JP2004537254A (en) Transporters and ion channels
US20060035315A1 (en) Transporters and ion channels
US20050027103A1 (en) Structural and cytoskeleton-associated protein
WO2002077237A2 (en) Transporters and ion channels
WO2002053719A2 (en) Cytoskeleton-associated proteins
US20040044184A1 (en) Cytoskeleton-associated proteins
EP1373306A2 (en) Cytoskeletion-associated proteins
US20070276126A1 (en) Cell adhesion and extracellular matrix proteins
WO2003038052A2 (en) Nucleic acid-associated proteins
US20040116670A1 (en) Cytoskeleton-associated proteins
US20040101884A1 (en) Molecules for disease detection and treatment
JP2004516007A (en) Transporters and ion channels
WO2002078420A2 (en) Molecules for disease detection and treatment
US20030216310A1 (en) Transporters and ion channels
WO2003008625A2 (en) Structural and cytoskeleton-associated proteins
US20040096828A1 (en) Cytoskeleton-associated proteins
WO2003031940A2 (en) Structural and cytoskeleton-associated proteins
US20040053291A1 (en) Nucleic acid-associated proteins
US20040127683A1 (en) Transporters and ion channels
WO2004099436A2 (en) Structural and cytoskeleton-associated proteins

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030916

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

A4 Supplementary search report drawn up and despatched

Effective date: 20050603

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20050728

RIN1 Information on inventor provided before grant (corrected)

Inventor name: JONES, KAREN ANNE

Inventor name: RING, HUIJUN Z.

Inventor name: SWARNAKAR, ANITA

Inventor name: LEE, ERNESTINE A.

Inventor name: GRIFFIN, JENNIFER A.

Inventor name: CHAWLA, NARINDER K.

Inventor name: GIETZEN, KIMBERLY J.

Inventor name: LEE, SALLY

Inventor name: LAL, PREETI G.

Inventor name: BANDMAN, OLGA

Inventor name: LEE, SOO, YEUN

Inventor name: RICHARDSON, THOMAS, W.

Inventor name: EMERLING, BROOKE, M.

Inventor name: BECHA, SHANYA

Inventor name: YUE, HUIBIN

Inventor name: DING, LI

Inventor name: BURFORD, NEIL

Inventor name: ELLIOTT, VICKI, S.

Inventor name: AZIMZAI, YALDA

Inventor name: HONCHELL, CYNTHIA, D.

Inventor name: THANGAVELU, KAVITHA

Inventor name: DUGGAN, BRENDAN, M.

Inventor name: WARREN, BRIDGET, A.

Inventor name: BAUGHN, MARIAH, R.

Inventor name: ISON, CRAIG, H.

Inventor name: KHAN, FARRAH, A.

Inventor name: YUE, HENRY

Inventor name: TANG, TOM, Y.

Inventor name: HAFALIA, APRIL, J., A.