WO2002026930A9 - Acides nucleiques, proteines et anticorps - Google Patents

Acides nucleiques, proteines et anticorps Download PDF

Info

Publication number
WO2002026930A9
WO2002026930A9 PCT/US2001/029838 US0129838W WO0226930A9 WO 2002026930 A9 WO2002026930 A9 WO 2002026930A9 US 0129838 W US0129838 W US 0129838W WO 0226930 A9 WO0226930 A9 WO 0226930A9
Authority
WO
WIPO (PCT)
Prior art keywords
polypeptide
seq
sequence
polynucleotide
polypeptides
Prior art date
Application number
PCT/US2001/029838
Other languages
English (en)
Other versions
WO2002026930A3 (fr
WO2002026930A2 (fr
Inventor
Craig A Rosen
Charles E Birse
Original Assignee
Human Genome Sciences Inc
Craig A Rosen
Charles E Birse
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Human Genome Sciences Inc, Craig A Rosen, Charles E Birse filed Critical Human Genome Sciences Inc
Priority to AU2001296301A priority Critical patent/AU2001296301A8/xx
Priority to AU2001296301A priority patent/AU2001296301A1/en
Publication of WO2002026930A2 publication Critical patent/WO2002026930A2/fr
Publication of WO2002026930A9 publication Critical patent/WO2002026930A9/fr
Publication of WO2002026930A3 publication Critical patent/WO2002026930A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals

Definitions

  • the present invention relates to novel proteins. More specifically, isolated nucleic acid molecules are provided encoding novel polypeptides. Novel polypeptides and antibodies that bind to these polypeptides are provided. Also provided are vectors, host cells, and recombinant and synthetic methods for producing human polynucleotides and/or polypeptides, and antibodies. The invention further relates to diagnostic and therapeutic methods useful for diagnosing, treating, preventing and or prognosing disorders related to these novel polypeptides. The invention further relates to screening methods for identifying agonists and antagonists of polynucleotides and polypeptides of the invention. The present invention further relates to methods and/or compositions for inhibiting or enhancing the production and function of the polypeptides of the present invention.
  • CD molecules are membrane proteins that were first identified as phenotypic markers for different lymphocyte polpulations.
  • CD stands for "cluster of differentiation” and refers to a molecule recognized by a “cluster” of monoclonal antibodies that can be used to identify the lineage or distinguish one class of lymphocytes from another.
  • CD proteins were first used for subclassifying T cells, but different CD molecules recognized by specific antibodies now serve as useful markers of B cells and other leukocytes that participate in immune and inflammatory responses.
  • a number of CD molecules grouped by their molecular families are discussed below. For further information, see, e.g., Leukocyte Typing IV.
  • Immunoglobulin superfamily The Immunoglobulin (Ig) Superfamily of proteins consists of diverse families of proteins involved in cell-cell interactions, cell- surface recognition, intercellular communication, and immune and inflammatory responses. Members of the gene superfamily are structural homologs, but they do not necessarily share similar functions or proximity in the genome.
  • Ig superfamily Most identified members of the Ig superfamily are integral plasma membrane proteins with Ig domains in the extracellular portions, transmembrane domains composed of hydrophobic amino acids, and widely divergent cytoplasmic tails with little or no homology to one another or to previously identified signal-transduction structures.
  • CAMs Cell Adhesion Molecules
  • Neural CAMs are thought to play important roles in axonal growth and guidance.
  • ICMs Intercellular adhesion molecules
  • VCAM-1 Vascular adhesion molecule-1
  • MAdCAM-1 Mucosal addressin cell adhesion molecule-1
  • CD1 molecules have been demonstrated to restrict T cell responses to non- peptide lipid and glycolipid antigens.
  • CD2 functions as an intercellular adhesion molecule and is involved with signal transduction.
  • CD4 and CD8 are T cell surface glycoproteins that serve as accessory molecules by facilitating interactions of T cells with APCs or CTL target cells.
  • CD19 is a critical signal transduction molecule that regulates B lymphocyte development, activation, and differentiation.
  • CD23 is involved in th regulation of IgE synthesis.
  • CD54 reacts with CDlla/CD18 or CDllb/CD18 resulting in immune reaction and/or inflammation.
  • CD58 mediates the adhesion between killer and target cells, antigen presenting cells and T cells, and thymocytes and thymic epithelial cells.
  • CD64 is involved in antigen capture for presentation to T cells and receptor-mediated endocytosis of IgG- antigen complexes.
  • CD86 and CD80 acts as ligands for the T cell stimulation.
  • CD115 (CSF-1R) is the receptor for CSF-1 and mediates all of the biological effects of this cytokine.
  • Further members of the immunoglobulin superfamily which also fall into the cytokine receptor family, include, for example, CD114, CD116, CD122, CD123, CD124, CDwl25, CD126, CD127, CD130, CDwl31, CD132, CDwl99.
  • CD 132 is the receptor for IL-2, IL-4, EL-7, IL-9 and IL-15. Mutations in humans causes XSCID, a disease characterized by the absence of T and NK cells.
  • CDwl31 is a key signal transducing molecule of the IL-3, GM-CSF and E -5 receptors.
  • CD127 is the receptor for interleukin- 7.
  • CD124 is a receptor subunit for Interleukin-4 and 13.
  • Integrin assemblies The integrin gene family consists of homologous genes that encode molecules which promote cell-cell or cell-matrix interactions.
  • the integrin family of heterodimeric receptors provides structural links between the extracellular environment and the cytoskeleton-signaling network and thus helps to regulate cell migration, differentiation, cell cycle progression, apoptosis, phagocytosis, ECM assembly, and metalloproteinase activity. All integrins are heterodimeric cell surface proteins composed of two noncovalently linked polypeptide chains, alpha and beta. The two chains consist of extracellular and transmembrane segments and cytoplasmic tails 30 to 46 amino acids long.
  • the extracellular domains of the two chains bind to various ligands, including extracellular matrix glycoproteins, complement components and proteins on the surface of other cells. Many of the integrins bind to Arg-Gly-Asp (R-G-D) sequences in the ligands, but this is not always the case.
  • the cytoplasmic domains of the integrins interact with cytoskeletal components (including vinculin, talin, actin, alpha-actinin, and tropomyosin), and it is hypothesized that the integrins coordinate (i.e., "integrate") the binding of cells to extracellular proteins with cytoskeleton-dependent motility, shape change and phagocytic responses.
  • integrin superfamily include molecules involved in beta chain assembly such as, for example, CD104, CD61, CD29, CD18; and molecules involved in alpha chain assembly, such as, CD 103, CD51, CD49a, CD41, CD l ie, CD lib, CD 11a, for example.
  • VLA very late activating
  • the beta2-integrins also known as the LFA-1 family, play an important role in the adhesion of lymphocytes with other cells, such as accessory cells and vascular endothelium.
  • LFA-1 lymphocytes
  • TM4SF Cell surface proteins of the tetraspanin superfamily (or TM4SF) are a newly characterized family of proteins which are presumed to span the plasma membrane four times. The function of this family of molecules is poorly understood, but there is evidence that they may be involved in transmembrane signal transduction and regulation of cellular proliferation, development, motility, and tumor cell growth and metastasis, in a number of different cell types.
  • the TM4SF proteins may function as "adaptors" or "molecular facilitators,” organizing various cell-surface proteins into large multimeric complexes on the cell surface.
  • TM4SF proteins contain putative small (20-27 amino acids) and large (75-130 amino acids) extracellular domains and four putative hydrophobic transmembrane domains. Many TM4SF proteins are thought to be involved in signal transduction. In fact, several TM4SF members have been shown to associate with tyrosine phosphatases. Members of this family include, CD151, CD82, CD81, CD63, CD53, CD37, CD9, for example. Several of these members, including CD151, CD81, CD63 and CD9, are known to interact with integrin alpha3/betal integrin complexes.
  • tetraspanins e.g., CD81, CD82, CD9, CD63
  • CD37 B cells
  • CD53 lymphoid and myeloid cells
  • Extracellular enzymes Several CD molecules have enzymatic activity. Such as, for example, protein kinase activity. Members include, for example, CDwl36 (involved in the induction of migration, morphological change, and proliferation in different target cells), and CD42a-d (CD42a-d complex serves as the receptor for von Willebrand factor and thrombin — the complex mediates adhesion of platelets to subendothelial matrices).
  • Other members function as phosphatases, such as, for example, CD148, and CD45 (a tyrosine phosphatase which is critical for T and B cell antigen receptor-mediated activation).
  • Futher members function as, ectoenzymes, such as for example, CD157 (ADP-ribosyl cyclase and cyclic ADP-ribose hydrolase activity), CD38 (involved in the metabolism of two calcium messangers cADPR and NAADP), CD26 (cleaves off N- terminal X-Pro or X-Ala dipeptides from polypeptides), and CD39 (ecto-apyrase).
  • CD157 ADP-ribosyl cyclase and cyclic ADP-ribose hydrolase activity
  • CD38 involved in the metabolism of two calcium messangers cADPR and NAADP
  • CD26 cleaves off N- terminal X-Pro or X-Ala dipeptides from polypeptid
  • members function as peptidases, such as, for example, metallopeptidases: CD 13 (catalyzes the removal of single amino acids from the amino terminus of small peptides and probably plays a role in their final digestion), CD156, CD143 (peptidyl dipeptide hydrolase involved in the metabolism of two major vaso-active peptides, angiotensin II and bradykinin), and CD10.
  • metallopeptidases such as, for example, metallopeptidases: CD 13 (catalyzes the removal of single amino acids from the amino terminus of small peptides and probably plays a role in their final digestion), CD156, CD143 (peptidyl dipeptide hydrolase involved in the metabolism of two major vaso-active peptides, angiotensin II and bradykinin), and CD10.
  • Other members function as proteoglycans, for example, CD138, or protease cofactors, for example, CD142.
  • Scavenger Receptor Superfamily Scavenger receptors (SRs) have been studied primarily due to their ability to bind and internalize modified lipoproteins, suggesting an important role in foam cell formation and the pathogenesis of atherosclerosis. However, the ability of some SRs to function as pattern recognition receptors through their binding of a wide variety of pathogens indicates a potential role in host defence.
  • the scavenger receptor family of proteins includes: class A (type I and II macrophage scavenger receptors, MARCO), class B (CD36, scavenger receptor class BI), mucinlike (CD68/macrosialin, dSR-CI) and endothelial (LOX-1) receptors.
  • ligand-binding domains Two motifs have been identified as ligand-binding domains: a charged collagen structure of type I and II receptors, and an immunodominant domain of CD36. These structures can recognize a wide range of negatively charged macromolecules, including oxidized low-density lipoproteins, damaged or apoptotic cells, and pathogenic microorganisms. After binding, these ligands can be either internalized by endocytosis or phagocytosis, or remain at the cell surface and mediate adhesion or lipid transfer through caveolae. Under physiological conditions, scavenger receptors serve to scavenge or clean up cellular debris and other related materials, and they play a role in host defence.
  • Scavenger-receptor class A has been held responsible for the clearance of modified LDL from the blood circulation.
  • Scavenger-receptor BI can facilitate selective uptake of cholesterol esters from HDL.
  • Members of the scavenger receptor superfamily include molecules such as, CD163, CD36, CD68, CD6, and CD5, for example. [0020] These molecules can recognize a wide range of negatively charged macromolecules, including oxidized low-density lipoproteins, damaged or apoptotic cells, and pathogenic microorganisms, related molecules would be predicted to have similar biological activities.
  • Regulators of Complement Activation Gene The C3 convertases of the human complement system are controlled by fluid-phase and membrane proteins in the RCA (regulators of complement activation) family.
  • the RCA members are related structurally (possess approximately 60 amino acid repeating motifs termed short consensus repeats (SCR)), functionally (bind C3b/C4b), and genetically (genes are tightly clustered on chromosome 1 at q3.2). Uncontrolled activation of complement can lead to the formation of the membrane attack complex on self tissues and excessive generation of inflammatory mediators.
  • Members of this family include molecules such as, CD46 (cofactor for Factor 1 proteolytic cleavage of C3b and C4b); CD35 (mediates adherence of C4b/C3b coated particles in preparation for phagocytosis); and CD21 (part of a large signal transduction complex involving CD19, CD81 and Leul3), for example.
  • CD46 cofactor for Factor 1 proteolytic cleavage of C3b and C4b
  • CD35 mediates adherence of C4b/C3b coated particles in preparation for phagocytosis
  • CD21 part of a large signal transduction complex involving CD19, CD81 and Leul3
  • C-type lectins are a family of carbohydrate-binding proteins intimately involved in diverse processes including vertebrate immune cell signalling and trafficking, activation of innate immunity in both vertebrates and invertebrates, and venom-induced haemostasis. Members of this family include, for example, CD62E, CD62L, CD62P, CD69, CD72, CD94, CD141, and CD161.
  • CD69 is a member of the natural killer (NK) cell gene complex family of signal transducing receptors and may be involved in the pathogenesis of such diseases as rheumatoid arthritis, chronic inflammatory liver diseases, mild asthma, and acquired immunodeficiency syndrome.
  • CD62E, L, and P mediate leukocyte rolling on activated endothelium at inflammatory sites.
  • CD94 forms a complex with NKG2-A which is involved in inhibition of NK cell function.
  • CD141 is critical for the activation of protein C and initiation of the protein C anticoagulant pathway.
  • TNFs and TNF receptor families Tumor necrosis factors (TNF) alpha and beta are cytokines, which act through TNF receptors to regulate numerous biological processes, including protection against infection and induction of shock and inflammatory disease.
  • TNF molecules belong to the "TNF-ligand” superfamily, and act together with their receptors or counter-ligands, the "TNF-receptor” superfamily.
  • TNF family members include, for example, CD70, CD153, and CD154.
  • TNF-R family members include, for example, CD27 (mediates the co-stimulatory signal for T and B cell activation), CD30 (involved in TCR mediated cell death), CD40 (central to T dependent responses), CD95 (involved in the mediation of apoptosis inducing signals), CD120a, CD 134, and CDwl37 (costimulator of T cell proliferation).
  • Mucins are adhesion molecules that play a part in leukocyte migration, homing and cell-cell interactions. Mucins are thought to bind to L-selectin and initiate leukocyte:endothelial cell interactions. Members include, for example, CD34, CD42b (part of the CD42a-d complex which serves as a receptor for the von Willebrand factor and thrombin), CD43, CD99, CD162 (mediates rolling of leukocytes on activated endothelium, on activated platelets and on other leukocytes at inflammatory sites), and CD164.
  • CD molecules are known to be anchored at the cell surface by glycosyl phosphatidyl inositol. Members include, for example, CD55, CD59, CD73, CD87, and CDwl08. Genetic defects in GPI-anchor attachment that cause a reduction or loss of both CD55 and CD59 on erythrocytes produce the symptoms of the disease paroxysmal nocturnal hemoglobinuria (PNH). CD73, like other GPI anchored surface proteins, can mediate costimulatory signals in T cell activation. Very small (e.g., 12-35 amino acids) CD molecules which are formed by displacement of C terminal amino acids by GPI linkage include, for example, CD24 and CD52.
  • CD molecules are members of receptor families, such as, for example, G- protein coupled receptors (e.g., CD97, and CD88); tyrosine kinase receptors (e.g., CD135, and CD140a); transferrin receptors (e.g., CD71); LDL receptor (e.g., CD91); TGF-beta type III receptors (e.g., CD105); Fc receptors for IgG (e.g., CD32); chemokine receptors (e.g., CDwl28a); and viral receptors (e.g., CD155).
  • G- protein coupled receptors e.g., CD97, and CD88
  • tyrosine kinase receptors e.g., CD135, and CD140a
  • transferrin receptors e.g., CD71
  • LDL receptor e.g., CD91
  • TGF-beta type III receptors e.g., CD105
  • CD15s include, for example, CD15s, CD 15, CDwl7, CD57, CDw60 (CDw60 antibodies provide costimulatory/comitogenic signals for T cells), CD65, CD65s, CDw75 (cell adhesion, ligand for CD22), CDw76, and CD77 (cross-linking of CD77 induces apoptosis of Burkitt's lymphoma cells).
  • CD molecules have been assigned to small molecular families. These include, for example, members of the lamp family (e.g., CD107a); members of the sushi family (e.g., CD25, the EL-2 receptor); membrane glycoproteins (e.g., CD165, which is involved in the adhesion between thymocytes and thymic epithelial cells).
  • members of the lamp family e.g., CD107a
  • members of the sushi family e.g., CD25, the EL-2 receptor
  • membrane glycoproteins e.g., CD165, which is involved in the adhesion between thymocytes and thymic epithelial cells.
  • CDwl2 CD14
  • CD44 and CD44R may be involved in leukocyte attachment to and rolling on endothelial cells, homing to peripheral lymphoid organs and to sites of inflammation
  • CD74 involved in the intracellular sorting of MHC class II molecules
  • CD85 may play a role in T cell activation in peripheral blood
  • CDw92 CDw93
  • CD98 involved in the regulation of cellular activation
  • CDwl50 binding of CDwl50 to its ligand enhances proliferation and Ig production by activated B cells
  • CD139 CD139
  • CD109 CD109
  • CD-like proteins are cell surface molecules which play various functions in the immune system, including cell-cell interaction, regulation of inflammation.
  • CD-like proteins with a predominant tissue expression pattern are important targets for targeted drug delivery, tumor-targeted therapy (e.g., including, but not limited to, radioimmunotherapy) antibody mediated attack of diseased tissues or cancers, and immune mediated cytotoxicity.
  • tumor-targeted therapy e.g., including, but not limited to, radioimmunotherapy
  • antibody mediated attack of diseased tissues or cancers e.g., including, but not limited to, radioimmunotherapy
  • the present invention relates to novel proteins. More specifically, isolated nucleic acid molecules are provided encoding novel polypeptides. Novel polypeptides and antibodies that bind to these polypeptides are provided. Also provided are vectors, host cells, and recombinant and synthetic methods for producing human polynucleotides and/or polypeptides, and antibodies. The invention further relates to diagnostic and therapeutic methods useful for diagnosing, treating, preventing and or prognosing disorders related to these novel polypeptides. The invention further relates to screening methods for identifying agonists and antagonists of polynucleotides and polypeptides of the invention. The present invention further relates to methods and/or compositions for inhibiting or enhancing the production and function of the polypeptides of the present invention.
  • Table 1 summarizes some of the polynucleotides encompassed by the invention (including cDNA clones related to the sequences (Clone ID NO:Z), contig sequences (contig identifier (Contig ID:) and contig nucleotide sequence identifier (SEQ ID NO:X)) and further summarizes certain characteristics of these polynucleotides and the polypeptides encoded thereby.
  • the first column provides the gene number in the application for each clone identifier.
  • the second column provides a unique clone identifier, "Clone ID NO:Z", for a cDNA clone related to each contig sequence disclosed in Table 1.
  • the third column provides a unique contig identifier, "Contig ID:” for each of the contig sequences disclosed in Table 1.
  • the fourth column provides the sequence identifier, "SEQ ID NO:X”, for each of the contig sequences disclosed in Table 1.
  • the fifth column “ORF (From-To)", provides the location (i.e., nucleotide position numbers) within the polynucleotide sequence of SEQ ID NO:X that delineate the preferred open reading frame (ORF) that encodes the amino acid sequence shown in the sequence listing and referenced in Table 1 as SEQ ID NO:Y (column 6).
  • Column 7 lists residues comprising predicted epitopes contained in the polypeptides encoded by each of the preferred ORFs (SEQ ID NO:Y).
  • polypeptides of the invention comprise, or alternatively consist of, one, two, three, four, five or more of the predicted epitopes described in Table 1.
  • Tissue Distribution shows the expression profile of tissue, cells, and/or cell line libraries which express the polynucleotides of the invention.
  • the first number in column 8 represents the tissue/cell source identifier code corresponding to the key provided in Table 4. Expression of these polynucleotides was not observed in the other tissues and/or cell libraries tested.
  • the second number in column 8 represents the number of times a sequence corresponding to the reference polynucleotide sequence (e.g., SEQ ID NO:X) was identified in the tissue/cell source.
  • tissue/cell source identifier codes in which the first two letters are "AR” designate information generated using DNA array technology. Utilizing this technology, cDNAs were amplified by PCR and then transferred, in duplicate, onto the array. Gene expression was assayed through hybridization of first strand cDNA probes to the DNA array. cDNA probes were generated from total RNA extracted from a variety of different tissues and cell lines.
  • Probe synthesis was performed in the presence of P dCTP, using oligo(dT) to prime reverse transcription. After hybridization, high stringency washing conditions were employed to remove non-specific hybrids from the array. The remaining signal, emanating from each gene target, was measured using a Phosphorimager. Gene expression was reported as Phosphor Stimulating Luminescence (PSL) which reflects the level of phosphor signal generated from the probe hybridized to each of the gene targets represented on the array. A local background signal subtraction was performed before the total signal generated from each array was used to normalize gene expression between the different hybridizations. The value presented after "[array code]:" represents the mean of the duplicate values, following background subtraction and probe normalization.
  • PSL Phosphor Stimulating Luminescence
  • Table 5 A key to the OMIM reference identification numbers is provided in Table 5.
  • Table 2 summarizes homology and features of some of the polypeptides of the invention.
  • the first column provides a unique clone identifier, "Clone ID NO:Z”, corresponding to a cDNA clone disclosed in Table 1.
  • the second column provides the unique contig identifier, "Contig ID:” corresponding to contigs in Table 1 and allowing for correlation with the information in Table 1.
  • the third column provides the sequence identifier, "SEQ ID NO:X”, for the contig polynucleotide sequence.
  • the fourth column provides the analysis method by which the homology/identity disclosed in the Table was determined.
  • NR non-redundant protein database
  • PFAM protein families
  • polypeptides of the invention comprise, or alternatively consist of, an amino acid sequence encoded by a polynucleotide in SEQ ID NO:X as delineated in columns 8 and 9, or fragments or variants thereof.
  • Table 3 provides polynucleotide sequences that may be disclaimed according to certain embodiments of the invention.
  • the first column provides a unique clone identifier, "Clone ID”, for a cDNA clone related to contig sequences disclosed in Table 1.
  • the second column provides the sequence identifier, "SEQ ID NO:X”, for contig sequences disclosed in Table 1.
  • the third column provides the unique contig identifier, "Contig ID:”, for contigs disclosed in Table 1.
  • the fourth column provides a unique integer 'a' where 'a' is any integer between 1 and the final nucleotide minus 15 of SEQ ID NO:X
  • the fifth column provides a unique integer 'b' where 'b' is any integer between 15 and the final nucleotide of SEQ ID NO:X, where both a and b correspond to the positions of nucleotide residues shown in SEQ ID NO:X, and where b is greater than or equal to a + 14.
  • the uniquely defined integers can be substituted into the general formula of a-b, and used to describe polynucleotides which may be preferably excluded from the invention.
  • preferably excluded from the invention are at least one, two, three, four, five, ten, or more of the polynucleotide sequence(s) having the accession number(s) disclosed in the sixth column of this Table (including for example, published sequence in connection with a particular BAC clone).
  • preferably excluded from the invention are the specific polynucleotide sequence(s) contained in the clones corresponding to at least one, two, three, four, five, ten, or more of the available material having the accession numbers identified in the sixth column of this Table (including for example, the actual sequence contained in an identified BAC clone).
  • Table 4 provides a key to the tissue/cell source identifier code disclosed in Table 1, column 8.
  • Column 1 provides the tissue/cell source identifier code disclosed in Table 1, Column 8.
  • Columns 2-5 provide a description of the tissue or cell source. Codes corresponding to diseased tissues are indicated in column 6 with the word "disease". The use of the word "disease" in column 6 is non-limiting.
  • the tissue or cell source may be specific (e.g. a neoplasm), or may be disease-associated (e.g., a tissue sample from a normal portion of a diseased organ).
  • tissues and/or cells lacking the "disease" designation may still be derived from sources directly or indirectly involved in a disease state or disorder, and therefore may have a further utility in that disease state or disorder.
  • the tissue/cell source is a library
  • column 7 identifies the vector used to generate the library.
  • Table 5 provides a key to the OMIM reference identification numbers disclosed in Table 1, column 10.
  • OMIM reference identification numbers (Column 1) were derived from Online Mendelian Inheritance in Man (Online Mendelian Inheritance in Man, OMIM. McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine, (Bethesda, MD) 2000. World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/).
  • Column 2 provides diseases associated with the cytologic band disclosed in Table 1, column 9, as determined using the Morbid Map database. Definitions
  • isolated refers to material removed from its original environment (e.g., the natural environment if it is naturally occurring), and thus is altered “by the hand of man” from its natural state.
  • an isolated polynucleotide could be part of a vector or a composition of matter, or could be contained within a cell, and still be “isolated” because that vector, composition of matter, or particular cell is not the original environment of the polynucleotide.
  • isolated does not refer to genomic or cDNA libraries, whole cell total or mRNA preparations, genomic DNA preparations (including those separated by electrophoresis and transferred onto blots), sheared whole cell genomic DNA preparations or other compositions where the art demonstrates no distinguishing features of the polynucleotide/sequences of the present invention.
  • a "polynucleotide” refers to a molecule having a nucleic acid sequence encoding SEQ ID NO:Y or a fragment or variant thereof; a nucleic acid sequence contained in SEQ ID NO:X (as described in column 3 of Table 1) or the complement thereof; or a cDNA sequence contained in Clone ID NO:Z (as described in column 2 of Table 1 and contained within ATCC Deposit No. PTA-2448 or PTA-2449).
  • the polynucleotide can contain the nucleotide sequence of the full length cDNA sequence, including the 5' and 3' untranslated sequences, the coding region, as well as fragments, epitopes, domains, and variants of the nucleic acid sequence.
  • a "polypeptide” refers to a molecule having an amino acid sequence encoded by a polynucleotide of the invention as broadly defined (obviously excluding poly- Phenylalanine or poly-Lysine peptide sequences which result from translation of a polyA tail of a sequence corresponding to a cDNA).
  • SEQ ID NO:X was often generated by overlapping sequences contained in multiple clones (contig analysis).
  • a representative clone containing all or most of the sequence for SEQ ID NO:X is deposited at Human Genome Sciences, Inc. (HGS) in a catalogued and archived library.
  • HGS Human Genome Sciences, Inc.
  • each clone is identified by a cDNA Clone ID (identifier generally referred to herein as Clone ID NO:Z).
  • Clone ID NO:Z identifier generally referred to herein as Clone ID NO:Z.
  • Each Clone ID is unique to an individual clone and the Clone ID is all the information needed to retrieve a given clone from the HGS library.
  • the polynucleotides of the invention are at least 15, at least 30, at least 50, at least 100, at least 125, at least 500, or at least 1000 continuous nucleotides but are less than or equal to 300 kb, 200 kb, 100 kb, 50 kb, 15 kb, 10 kb, 7.5kb, 5 kb, 2.5 kb, 2.0 kb, or 1 kb, in length.
  • polynucleotides of the invention comprise a portion of the coding sequences, as disclosed herein, but do not comprise all or a portion of any intron.
  • the polynucleotides comprising coding sequences do not contain coding sequences of a genomic flanking gene (i.e., 5' or 3' to the gene of interest in the genome). In other embodiments, the polynucleotides of the invention do not contain the coding sequence of more than 1000, 500, 250, 100, 50, 25, 20, 15, 10, 5, 4, 3, 2, or 1 genomic flanking gene(s).
  • a "polynucleotide” of the present invention also includes those polynucleotides capable of hybridizing, under stringent hybridization conditions, to sequences contained in SEQ ID NO:X, or the complement thereof (e.g., the complement of any one, two, three, four, or more of the polynucleotide fragments described herein), the polynucleotide sequence delineated in columns 8 and 9 of Table 2 or the complement thereof, and/or cDNA sequences contained in Clone ID NO:Z (e.g., the complement of any one, two, three, four, or more of the polynucleotide fragments, or the cDNA clone within the pool of cDNA clones deposited with the ATCC, described herein).
  • “Stringent hybridization conditions” refers to an overnight incubation at 42 degree C in a solution comprising 50% formamide, 5x SSC (750 mM NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 ⁇ g/ml denatured, sheared salmon sperm DNA, followed by washing the filters in O.lx SSC at about 65 degree C.
  • nucleic acid molecules that hybridize to the polynucleotides of the present invention at lower stringency hybridization conditions.
  • Changes in the stringency of hybridization and signal detection are primarily accomplished through the manipulation of formamide concentration (lower percentages of formamide result in lowered stringency); salt conditions, or temperature.
  • washes performed following stringent hybridization can be done at higher salt concentrations (e.g. 5X SSC).
  • blocking reagents include Denhardt's reagent, BLOTTO, heparin, denatured salmon sperm DNA, and commercially available proprietary formulations.
  • the inclusion of specific blocking reagents may require modification of the hybridization conditions described above, due to problems with compatibility.
  • polynucleotide which hybridizes only to polyA+ sequences (such as any 3' terminal polyA+ tract of a cDNA shown in the sequence listing), or to a complementary stretch of T (or U) residues, would not be included in the definition of "polynucleotide,” since such a polynucleotide would hybridize to any nucleic acid molecule containing a poly (A) stretch or the complement thereof (e.g., practically any double-stranded cDNA clone generated using oligo dT as a primer).
  • the polynucleotide of the present invention can be composed of any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA.
  • polynucleotides can be composed of single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions.
  • polynucleotide can be composed of triple-stranded regions comprising RNA or DNA or both RNA and DNA.
  • a polynucleotide may also contain one or more modified bases or DNA or RNA backbones modified for stability or for other reasons.
  • Modified bases include, for example, tritylated bases and unusual bases such as inosine.
  • a variety of modifications can be made to DNA and RNA; thus, "polynucleotide” embraces chemically, enzymatically, or metabolically modified forms.
  • the polypeptide of the present invention can be composed of amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres, and may contain amino acids other than the 20 gene-encoded amino acids.
  • the polypeptides may be modified by either natural processes, such as posttranslational processing, or by chemical modification techniques which are well known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini.
  • polypeptides may be branched, for example, as a result of ubiquitination, and they may be cyclic, with or without branching. Cyclic, branched, and branched cyclic polypeptides may result from posttranslation natural processes or may be made by synthetic methods.
  • Modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination.
  • SEQ ID NO:X refers to a polynucleotide sequence described, for example, in Tables 1 or 2, while “SEQ ID NO:Y” refers to a polypeptide sequence described in column 6 of Table 1. SEQ ID NO:X is identified by an integer specified in column 4 of Table 1. The polypeptide sequence SEQ ID NO:Y is a translated open reading frame (ORF) encoded by polynucleotide SEQ ID NO:X. "Clone ID NO:Z” refers to a cDNA clone described in column 2 of Table 1.
  • a polypeptide having biological activity refers to a polypeptide exhibiting activity similar to, but not necessarily identical to, an activity of a polypeptide of the present invention, including mature forms, as measured in a particular biological assay, with or without dose dependency. In the case where dose dependency does exist, it need not be identical to that of the polypeptide, but rather substantially similar to the dose- dependence in a given activity as compared to the polypeptide of the present invention (i.e., the candidate polypeptide will exhibit greater activity or not more than about 25-fold less and, preferably, not more than about tenfold less activity, and most preferably, not more than about three-fold less activity relative to the polypeptide of the present invention).
  • Table 1 summarizes some of the polynucleotides encompassed by the invention (including contig sequences (SEQ ID NO:X) and clones (Clone ED NO:Z) and further summarizes certain characteristics of these polynucleotides and the polypeptides encoded thereby.
  • HDPMQ69 812744 12 1-669 188 Arg-1 to Ala-9, Pro-lltoGlu-17, Leu- 19 to Gly-30, Pro-84 to Gly-90, Asn-114toVal-120, Ser-123 to Phe-128, Gly-132 to Asn-138, Gln-149toTyr-154, Arg-203 to Pro-223.
  • the first column in Table 1 provides the gene number in the application corresponding to the clone identifier.
  • the second column in Table 1 provides a unique "Clone ID NO:Z" for a cDNA clone related to each contig sequence disclosed in Table 1.
  • This clone ID references the cDNA clone which contains at least the 5' most sequence of the assembled contig and at least a portion of SEQ ID NO:X was determined by directly sequencing the referenced clone.
  • the reference clone may have more sequence than described in the sequence listing or the clone may have less. In the vast majority of cases, however, the clone is believed to encode a full-length polypeptide. In the case where a clone is not full-length, a full-length cDNA can be obtained by methods described elsewhere herein.
  • the third column in Table 1 provides a unique "Contig ID” identification for each contig sequence.
  • the fourth column provides the "SEQ ID NO:” identifier for each of the contig polynucleotide sequences disclosed in Table 1.
  • the fifth column, "ORF (From-To)" provides the location (i.e., nucleotide position numbers) within the polynucleotide sequence "SEQ ID NO:X” that delineate the preferred open reading frame (ORF) shown in the sequence listing and referenced in Table 1, column 6, as SEQ ID NO:Y. Where the nucleotide position number "To" is lower than the nucleotide position number "From", the preferred ORF is the reverse complement of the referenced polynucleotide sequence.
  • the sixth column in Table 1 provides the corresponding SEQ ID NO:Y for the polypeptide sequence encoded by the preferred ORF delineated in column 5.
  • the invention provides an amino acid sequence comprising, or alternatively consisting of, a polypeptide encoded by the portion of SEQ ID NO:X delineated by "ORF (From-To)". Also provided are polynucleotides encoding such amino acid sequences and the complementary strand thereto.
  • polypeptides of the invention comprise, or alternatively consist of, at least one, two, three, four, five or more of the predicted epitopes as described in Table 1. It will be appreciated that depending on the analytical criteria used to predict antigenic determinants, the exact address of the determinant may vary slightly.
  • Column 8 in Table 1 provides an expression profile and library code: count for each of the contig sequences (SEQ ID NO:X) disclosed in Table 1, which can routinely be combined with the information provided in Table 4 and used to determine the tissues, cells, and/or cell line libraries which predominantly express the polynucleotides of the invention.
  • the first number in column 8 represents the tissue/cell source identifier code corresponding to the code and description provided in Table 4.
  • the second number in column 8 represents the number of times a sequence corresponding to the reference polynucleotide sequence was identified in the tissue/cell source.
  • tissue/cell source identifier codes in which the first two letters are "AR" designate information generated using DNA array technology.
  • cDNAs were amplified by PCR and then transferred, in duplicate, onto the array. Gene expression was assayed through hybridization of first strand cDNA probes to the DNA array. cDNA probes were generated from total RNA extracted from a variety of different tissues and cell lines. Probe synthesis was performed in the presence of 33 P dCTP, using oligo(dT) to prime reverse transcription. After hybridization, high stringency washing conditions were employed to remove non-specific hybrids from the array. The remaining signal, emanating from each gene target, was measured using a Phosphorimager.
  • Phosphor Stimulating Luminescence which reflects the level of phosphor signal generated from the probe hybridized to each of the gene targets represented on the array.
  • a local background signal subtraction was performed before the total signal generated from each array was used to normalize gene expression between the different hybridizations.
  • the value presented after "[array code]:” represents the mean of the duplicate values, following background subtraction and probe normalization.
  • One of skill in the art could routinely use this information to identify normal and/or diseased tissue(s) which show a predominant expression pattern of the corresponding polynucleotide of the invention or to identify polynucleotides which show predominant and/or specific tissue and/or cell expression.
  • Column 9 in Table 1 provides a chromosomal map location for certain polynucleotides of the invention. Chromosomal location was determined by finding exact matches to EST and cDNA sequences contained in the NCBI (National Center for Biotechnology Information) UniGene database. Each sequence in the UniGene database is assigned to a "cluster"; all of the ESTs, cDNAs, and STSs in a cluster are believed to be derived from a single gene. Chromosomal mapping data is often available for one or more sequence(s) in a UniGene cluster; this data (if consistent) is then applied to the cluster as a whole.
  • a sequence from the UniGene database (the 'Subject') was said to be an exact match if it contained a segment of 50 nucleotides in length such that 48 of those nucleotides were in the same order as found in the Query sequence. If all of the matches that met this criteria were in the same UniGene cluster, and mapping data was available for this cluster, it is indicated in Table 1 under the heading "Cytologic Band". Where a cluster had been further localized to a distinct cytologic band, that band is disclosed; where no banding information was available, but the gene had been localized to a single chromosome, the chromosome is disclosed.
  • Table 2 further characterizes certain encoded polypeptides of the invention, by providing the results of comparisons to protein and protein family databases.
  • the first column provides a unique clone identifier, "Clone ID NO:”, corresponding to a cDNA clone disclosed in Table 1.
  • the second column provides the unique contig identifier, "Contig ID:” which allows correlation with the information in Table 1.
  • the third column provides the sequence identifier, "SEQ ID NO:”, for the contig polynucleotide sequences.
  • the fourth column provides the analysis method by which the homology/identity disclosed in the Table was determined.
  • the fifth column provides a description of the PFAM/NR hit identified by each analysis.
  • NR non-redundant protein database
  • PFAM protein families
  • each of the polynucleotides shown in Table 1, column 3 (e.g., SEQ ID NO:X or the 'Query' sequence) was used to search against the NR database.
  • the computer program BLASTX was used to compare a 6-frame translation of the Query sequence to the NR database (for information about the BLASTX algorithm please see Altshul et al., J. Mol. Biol. 215:403- 410 (1990); and Gish and States, Nat. Genet. 3:266-272 (1993).
  • a description of the sequence that is most similar to the Query sequence (the highest scoring 'Subject') is shown in column five of Table 2 and the database accession number for that sequence is provided in column six.
  • the highest scoring 'Subject' is reported in Table 2 if (a) the estimated probability that the match occurred by chance alone is less than 1.0e-07, and (b) the match was not to a known repetitive element.
  • BLASTX returns alignments of short polypeptide segments of the Query and Subject sequences which share a high degree of similarity; these segments are known as High-Scoring Segment Pairs or HSPs.
  • Table 2 reports the degree of similarity between the Query and the Subject for each HSP as a percent identity in Column 7. The percent identity is determined by dividing the number of exact matches between the two aligned sequences in the HSP, dividing by the number of Query amino acids in the HSP and multiplying by 100.
  • the polynucleotides of SEQ ID NO:X which encode the polypeptide sequence that generates an HSP are delineated by columns 8 and 9 of Table 2.
  • the PFAM database PFAM version 2.1, (Sonnhammer et al., Nucl. Acids Res., 26:320-322, 1998)) consists of a series of multiple sequence alignments; one alignment for each protein family. Each multiple sequence alignment is converted into a probability model called a Hidden Markov Model, or HMM, that represents the position- specific variation among the sequences that make up the multiple sequence alignment (see, e.g., Durbin et al., Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998 for the theory of HMMs).
  • HMM Hidden Markov Model
  • HMMER version 1.8 (Sean Eddy, Washington University in Saint Louis) was used to compare the predicted protein sequence for each Query sequence (SEQ ID NO:Y in Table 1) to each of the HMMs derived from PFAM version 2.1.
  • a HMM derived from PFAM version 2.1 was said to be a significant match to a polypeptide of the invention if the score returned by HMMER 1.8 was greater than 0.8 times the HMMER 1.8 score obtained with the most distantly related known member of that protein family.
  • the description of the PFAM family which shares a significant match with a polypeptide of the invention is listed in column 5 of Table 2, and the database accession number of the PFAM hit is provided in column 6.
  • Column 7 provides the score returned by HMMER version 1.8 for the alignment.
  • Columns 8 and 9 delineate the polynucleotides of SEQ ID NO:X which encode the polypeptide sequence which show a significant match to a PFAM protein family.
  • the invention provides a protein comprising, or alternatively consisting of, a polypeptide encoded by the polynucleotides of SEQ ID NO:X delineated in columns 8 and 9 of Table 2. Also provided are polynucleotides encoding such proteins, and the complementary strand thereto.
  • nucleotide sequence SEQ ID NO:X and the translated SEQ ID NO:Y are sufficiently accurate and otherwise suitable for a variety of uses well known in the art and described further below.
  • the nucleotide sequences of SEQ ID NO:X are useful for designing nucleic acid hybridization probes that will detect nucleic acid sequences contained in SEQ ID NO:X or the cDNA contained in Clone ID NO:Z. These probes will also hybridize to nucleic acid molecules in biological samples, thereby enabling immediate applications in chromosome mapping, linkage analysis, tissue identification and/or typing, and a variety of forensic and diagnostic methods of the invention.
  • polypeptides identified from SEQ ID NO:Y may be used to generate antibodies which bind specifically to these polypeptides, or fragments thereof, and/or to the polypeptides encoded by the cDNA clones identified in, for example, Table 1.
  • DNA sequences generated by sequencing reactions can contain sequencing errors.
  • the errors exist as misidentified nucleotides, or as insertions or deletions of nucleotides in the generated DNA sequence.
  • the erroneously inserted or deleted nucleotides cause frame shifts in the reading frames of the predicted amino acid sequence. In these cases, the predicted amino acid sequence diverges from the actual amino acid sequence, even though the generated DNA sequence may be greater than 99.9% identical to the actual DNA sequence (for example, one base insertion or deletion in an open reading frame of over 1000 bases).
  • the present invention provides not only the generated nucleotide sequence identified as SEQ ED NO:X, and a predicted translated amino acid sequence identified as SEQ ID NO:Y, but also a sample of plasmid DNA containing cDNA Clone ID NO:Z (deposited with the ATCC on September 12, 2000 and given ATCC Designation Numbers PTA-2448 and PTA-2449).
  • the nucleotide sequence of each deposited clone can readily be determined by sequencing the deposited clone in accordance with known methods. Further, techniques known in the art can be used to verify the nucleotide sequences of SEQ ID NO:X.
  • amino acid sequence of the protein encoded by a particular clone can also be directly determined by peptide sequencing or by expressing the protein in a suitable host cell containing the deposited human cDNA, collecting the protein, and determining its sequence.
  • Partial cDNA clones can be made full-length by utilizing the rapid amplification of cDNA ends (RACE) procedure described in Frohman, M.A., et al., Proc. Nat'l. Acad. Sci. USA, 85:8998-9002 (1988).
  • RACE rapid amplification of cDNA ends
  • RNA Poly A+ or total RNA is reverse transcribed with Superscript II (Gibco/BRL) and an antisense or complementary primer specific to the cDNA sequence.
  • the primer is removed from the reaction with a Microcon Concentrator (Amicon).
  • the first-strand cDNA is then tailed with dATP and terminal deoxynucleotide transferase (Gibco/BRL).
  • the second strand is synthesized from the dA-tail in PCR buffer, Taq DNA polymerase (Perkin-Elmer Cetus), an oligo-dT primer containing three adjacent restriction sites (Xhol, Sail and Clal) at the 5' end and a primer containing just these restriction sites.
  • This double-stranded cDNA is PCR amplified for 40 cycles with the same primers as well as a nested cDNA-specific antisense primer.
  • the PCR products are size-separated on an ethidium bromide-agarose gel and the region of gel containing cDNA products the predicted size of missing protein-coding DNA is removed.
  • cDNA is purified from the - agarose with the Magic PCR Prep kit (Promega), restriction digested with Xhol or Sail, and ligated to a plasmid such as pBluescript SKII (Stratagene) at Xhol and EcoRV sites. This DNA is transformed into bacteria and the plasmid clones sequenced to identify the correct protein-coding inserts.
  • RNA is alkaline hydrolyzed after reverse transcription and RNA ligase is used to join a restriction site-containing anchor primer to the first-strand cDNA. This obviates the necessity for the dA-tailing reaction which results in a polyT stretch that is difficult to sequence past.
  • An alternative to generating 5' or 3' cDNA from RNA is to use cDNA library double-stranded DNA.
  • An asymmetric PCR-amplified antisense cDNA strand is synthesized with an antisense cDNA-specific primer and a plasmid-anchored primer. These primers are removed and a symmetric PCR reaction is performed with a nested cDNA-specific antisense primer and the plasmid-anchored primer.
  • RNA oligonucleotide is ligated to the 5' ends of a population of RNA presumably containing full-length gene RNA transcript and a primer set containing a primer specific to the ligated RNA oligonucleotide and a primer specific to a known sequence of the gene of interest, is used to PCR amplify the 5' portion of the desired full length gene which may then be sequenced and used to generate the full length gene.
  • This method starts with total RNA isolated from the desired source, poly A RNA may be used but is not a prerequisite for this procedure.
  • RNA preparation may then be treated with phosphatase if necessary to eliminate 5' phosphate groups on degraded or damaged RNA which may interfere with the later RNA ligase step.
  • the phosphatase if used is then inactivated and the RNA is treated with tobacco acid pyrophosphatase in order to remove the cap structure present at the 5' ends of messenger RNAs.
  • This reaction leaves a 5' phosphate group at the 5' end of the cap cleaved RNA which can then be ligated to an RNA oligonucleotide using T4 RNA ligase.
  • This modified RNA preparation can then be used as a template for first strand cDNA synthesis using a gene specific oligonucleotide.
  • the first strand synthesis reaction can then be used as a template for PCR amplification of the desired 5' end using a primer specific to the ligated RNA oligonucleotide and a primer specific to the known sequence of the gene of interest.
  • the resultant product is then sequenced and analyzed to confirm that the 5' end sequence belongs to the relevant gene.
  • the present invention also relates to vectors or plasmids which include such DNA sequences, as well as the use of the DNA sequences.
  • the material deposited with the ATCC (deposited with the ATCC on September 12, 2000 and given ATCC Designation Numbers PTA-2448 and PTA-2449) is a mixture of cDNA clones derived from a variety of human tissues and cloned in either a plasmid vector or a phage vector. These deposits are referred to as "the deposits" herein.
  • the deposited material includes cDNA clones corresponding to SEQ ID NO:X described, for example, in Table 1 (Clone ID NO:Z).
  • a clone which is isolatable from the ATCC Deposits by use of a sequence listed as SEQ ED NO:X may include the entire coding region of a human gene or in other cases such clone may include a substantial portion of the coding region of a human gene.
  • sequence listing may in some instances list only a portion of the DNA sequence in a clone included in the ATCC Deposits, it is well within the ability of one skilled in the art to sequence the DNA included in a clone contained in the ATCC Deposits by use of a sequence (or portion thereof) described in, for example Tables 1 or 2 by procedures hereinafter further described, and others apparent to those skilled in the art.
  • Patent Nos. 5,128,256 and 5,286,636 Uni-Zap XR (U.S. Patent Nos. 5,128, 256 and 5,286,636), Zap Express (U.S. Patent Nos. 5,128,256 and 5,286,636), pBluescript (pBS) (Short, J. M. et al., Nucleic Acids Res. 7(5:7583-7600 (1988); Alting-Mees, M. A. and Short, J. M., Nucleic Acids Res. 17:9494 (1989)) and pBK (Alting-Mees, M. A.
  • phagemid pBS may be excised from the Lambda Zap and Uni-Zap XR vectors, and phagemid pBK may be excised from the Zap Express vector. Both phagemids may be transformed into E. coli strain XL-1 Blue, also available from Stratagene.
  • Vectors pSportl, pCMVSport 1.0, pCMVSport 2.0 and pCMVSport 3.0 were obtained from Life Technologies, Inc., P. O. Box 6009, Gaithersburg, MD 20897. All Sport vectors contain an ampicillin resistance gene and may be transformed into E. coli strain DH10B, also available from Life Technologies. See, for instance, Gruber, C. E., et al., Focus 15:59- (1993). Vector lafmid BA (Bento Soares, Columbia University, New York, NY) contains an ampicillin resistance gene and can be transformed into E. coli strain XL-1 Blue.
  • Vector pCR ® 2.1 which is available from Invitrogen, 1600 Faraday Avenue, Carlsbad, CA 92008, contains an ampicillin resistance gene and may be transformed into E. coli strain DH10B, available from Life Technologies. See, for instance, Clark, J. M., Nuc. Acids Res. 6:9677-9686 (1988) and Mead, D. et ah, Bio/Technology 9: (1991).
  • the present invention also relates to the genes corresponding to SEQ ID NO:X, SEQ ID NO:Y, and/or the deposited clone (Clone ID NO:Z).
  • the corresponding gene can be isolated in accordance with known methods using the sequence information disclosed herein. Such methods include preparing probes or primers from the disclosed sequence and identifying or amplifying the corresponding gene from appropriate sources of genomic material.
  • allelic variants, orthologs, and/or species homologs are also provided in the present invention. Procedures known in the art can be used to obtain full-length genes, allelic variants, splice variants, full-length coding portions, orthologs, and/or species homologs of genes corresponding to SEQ ID NO:X or the complement thereof, polypeptides encoded by genes corresponding to SEQ D NO:X or the complement thereof, and or the cDNA contained in Clone ID NO:Z, using information from the sequences disclosed herein or the clones deposited with the ATCC. For example, allelic variants and/or species homologs may be isolated and identified by making suitable probes or primers from the sequences provided herein and screening a suitable nucleic acid source for allelic variants and/or the desired homologue.
  • polypeptides of the invention can be prepared in any suitable manner. Such polypeptides include isolated naturally occurring polypeptides, recombinantly produced polypeptides, synthetically produced polypeptides, or polypeptides produced by a combination of these methods. Means for preparing such polypeptides are well understood in the art. [0088]
  • the polypeptides may be in the form of the secreted protein, including the mature form, or may be a part of a larger protein, such as a fusion protein (see below). It is often advantageous to include an additional amino acid sequence which contains secretory or leader sequences, pro-sequences, sequences which aid in purification, such as multiple histidine residues, or an additional sequence for stability during recombinant production.
  • polypeptides of the present invention are preferably provided in an isolated form, and preferably are substantially purified.
  • a recombinantly produced version of a polypeptide, including the secreted polypeptide can be substantially purified using techniques described herein or otherwise known in the art, such as, for example, by the one-step method described in Smith and Johnson, Gene 67:31-40 (1988).
  • Polypeptides of the invention also can be purified from natural, synthetic or recombinant sources using techniques described herein or otherwise known in the art, such as, for example, antibodies of the invention raised against the polypeptides of the present invention in methods which are well known in the art.
  • the present invention provides a polynucleotide comprising, or alternatively consisting of, the nucleic acid sequence of SEQ ID NO:X, and/or the cDNA sequence contained in Clone ID NO:Z.
  • the present invention also provides a polypeptide comprising, or alternatively, consisting of, the polypeptide sequence of SEQ ID NO:Y, a polypeptide encoded by SEQ ID NO:X or a complement thereof, and/or a polypeptide encoded by the cDNA contained in Clone ID NO:Z.
  • Polynucleotides encoding a polypeptide comprising, or alternatively consisting of the polypeptide sequence of SEQ ID NO:Y, a polypeptide encoded by SEQ ID NO:X, and/or a polypeptide encoded by the cDNA contained in Clone ED NO:Z are also encompassed by the invention.
  • the present invention further encompasses a polynucleotide comprising, or alternatively consisting of, the complement of the nucleic acid sequence of SEQ ID NO:X, a nucleic acid sequence encoding a polypeptide encoded by the complement of the nucleic acid sequence of SEQ ID NO:X, and or the cDNA contained in Clone ID NO:Z.
  • polynucleotide sequences such as EST sequences, are publicly available and accessible through sequence databases and may have been publicly available prior to conception of the present invention. Preferably, such related polynucleotides are specifically excluded from the scope of the present invention.
  • each contig sequence (SEQ ID NO:X) listed in the fourth column of Table 1, preferably excluded are one or more polynucleotides comprising a nucleotide sequence described by the general formula of a-b, where a is any integer between 1 and the final nucleotide minus 15 of SEQ ID NO:X, b is an integer of 15 to the final nucleotide of SEQ ED NO:X, where both a and b correspond to the positions of nucleotide residues shown in SEQ ED NO:X, and where b is greater than or equal to a + 14.
  • polynucleotides comprising a nucleotide sequence described by the general formula of a-b, where a and b are integers as defined in columns 4 and 5, respectively, of Table 3.
  • the polynucleotides of the invention do not consist of at least one, two, three, four, five, ten, or more of the specific polynucleotide sequences referenced by the Genbank Accession No. as disclosed in column 6 of Table 3 (including for example, published sequence in connection with a particular BAC clone).
  • preferably excluded from the invention are the specific polynucleotide sequence(s) contained in the clones corresponding to at least one, two, three, four, five, ten, or more of the available material having the accession numbers identified in the sixth column of this Table (including for example, the actual sequence contained in an identified BAC clone). In no way is this listing meant to encompass all of the sequences which may be excluded by the general formula, it is just a representative example. All references available through these accessions are hereby incorporated by reference in their entirety.
  • AA196784 AW967555, N71300, AW962131, AI820789, AI383003, AI092628, AA179929, AI301811, AW665275, AI635377, AI202935, AA355370, AW811175, AI274614, AA995447, BE878232, AW811176, T89653, AA205505, AA348524, AA554037, H69789, AA211589, AA491705, AI244923, AW593673, AW973146, AA169800, AA205463, AI261331, T67419, BF906346, BF736981, AI863156, AA206489, BE875888, AA434396, AC005392, AC008069, AL009050, AC011604, AC005549, AL355352, AC005410, AC008134, AJ133269, AL035698, AC00
  • HMMCHO 56 1082413 1 - 766 15 - 780 AI049671, AI034206, AW079877, AI791913, BF741641, BF681546, AI792133, AU144517,
  • HCESF90 78 1091570 1 - 4792 15 - 4806 AL526737, AL527357, AL520133, AL527319, AL526770, BE531120, BE615116, BG249131 , BG167313, BE900315, AL039400, BE884344, BG249782, AV753572, BF308572, BE893950, BE542224, N25612, AI553834, BF663432, AU134574, AW512989, AL042597, AI871719, BF058945, AA453727, AU127908, AW499567, N36098, BF339648, BE278930, BF057402, BF061439, BE392204, AA486431, BF924312, AU145864, BE439990, BE615650, AA662205, AI140127, AI457676, AI811001, N24770, AI669340, BF
  • HAICQ70 80 1091887 1 - 3267 15 - 3281 BE734515, AA446042, AA187765, AW963708, BE144726, AW388267, AA128252, BF238982, AA705646, AI745576, AA887587, AW269821, W79190, AI751939, AI860344,
  • HDAAN31 100 1094668 1 - 862 15 - 876 AA620978, AI652117, Z99396, AW979232, AW973270, AW969885, AW979204, AW973202, AW973213, AW976515, AW975976, AW979165, AW976510, AW975975, AW969884, AW970113, AW972943, AW973770, AW970860, AW969759, AW979083, BF592735, AW971732, AW976982, AW970587, AW975876, AW973650, AW979175, AW979064, AW969637, AW975938, AW975138, AW979002, AW975966, AW975968, AW972762, AW979116, AW972884, AW973987, AW975910, AW
  • AR BF038480 BG003969, AW877209, AX015418, AR085071, AL137638, U69263, U69262, AR080280, AR091518, AR093384, AR035224, 185513, AR009152, AR009151, AR027099, AX009487, AR093392, 125041, AR093383, AR069374, 125027, AR069375, 144515, 126928, 126930, 126927, 144516, A94046, A94054, 105393, A10617, AR028792, A01323, AR034783, AX030966, 163120, AR067733, AX009486, AX029455, AR054109, AR064322, AR064323, AR064320, AR064321, A94048, A94061, AX027785, AR038307, A01324, A32110, A49045, AR038321
  • the present invention is directed to variants of the polynucleotide sequence disclosed in SEQ ED NO:X or the complementary strand thereto, nucleotide sequences encoding the polypeptide of SEQ ID NO:Y, the nucleotide sequence of SEQ ID NO:X encoding the polypeptide sequence as defined in column 7 of Table 1, nucleotide sequences encoding the polypeptide as defined in column 7 of Table 1, the nucleotide sequence as defined in columns 8 and 9 of Table 2, nucleotide sequences encoding the polypeptide encoded by the nucleotide sequence as defined in columns 8 and 9 of Table 2, the cDNA sequence contained in Clone ID NO:Z, and/or nucleotide sequences encoding the polypeptide encoded by the cDNA sequence contained in Clone ID NO:Z.
  • the present invention also encompasses variants of the polypeptide sequence disclosed in SEQ ED NO:Y, the polypeptide sequence as defined in column 7 of Table 1, a polypeptide sequence encoded by the polynucleotide sequence in SEQ ID NO:X, a polypeptide sequence encoded by the nucleotide sequence as defined in columns 8 and 9 of Table 2, a polypeptide sequence encoded by the complement of the polynucleotide sequence in SEQ ED NO:X, and/or a polypeptide sequence encoded by the cDNA sequence contained in Clone ED NO:Z.
  • Variant refers to a polynucleotide or polypeptide differing from the polynucleotide or polypeptide of the present invention, but retaining essential properties thereof. Generally, variants are overall closely similar, and, in many regions, identical to the polynucleotide or polypeptide of the present invention.
  • one aspect of the invention provides an isolated nucleic acid molecule comprising, or alternatively consisting of, a polynucleotide having a nucleotide sequence selected from the group consisting of: (a) a nucleotide sequence described in SEQ D
  • the present invention is also directed to nucleic acid molecules which comprise, or alternatively consist of, a nucleotide sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, identical to, for example, any of the nucleotide sequences in (a), (b), (c), (d), (e), (f), (g), (h), (i), or (j) above, the nucleotide coding sequence in SEQ ED NO:X or the complementary strand thereto, the nucleotide coding sequence of the cDNA contained in Clone JD NO:Z or the complementary strand thereto, a nucleotide sequence encoding the polypeptide of SEQ JD NO:Y, a nucleotide sequence encoding a polypeptide sequence encoded by the nucleotide sequence in SEQ ID NO:X, a polypeptide sequence encoded by the complement of the polynucleotide sequence in SEQ JD NO:
  • Polynucleotides which hybridize to the complement of these nucleic acid molecules under stringent hybridization conditions or alternatively, under lower stringency conditions, are also encompassed by the invention, as are polypeptides encoded by these polynucleotides and nucleic acids.
  • the invention encompasses nucleic acid molecules which comprise, or alternatively, consist of a polynucleotide which hybridizes under stringent hybridization conditions, or alternatively, under lower stringency conditions, to a polynucleotide in (a), (b), (c), (d), (e), (f), (g), (h), or (i), above, as are polypeptides encoded by these polynucleotides.
  • polynucleotides which hybridize to the complement of these nucleic acid molecules under stringent hybridization conditions, or alternatively, under lower stringency conditions are also encompassed by the invention, as are polypeptides encoded by these polynucleotides.
  • the invention provides a purified protein comprising, or alternatively consisting of, a polypeptide having an amino acid sequence selected from the group consisting of: (a) the complete amino acid sequence of SEQ ID NO:Y or the complete amino acid sequence encoded by the cDNA in Clone ID NO:Z; (b) the amino acid sequence of a mature form of a polypeptide having the amino acid sequence of SEQ ED NO:Y or the amino acid sequence encoded by the cDNA in Clone ED NO:Z; (c) the amino acid sequence of a biologically active fragment of a polypeptide having the complete amino acid sequence of SEQ ID NO:Y or the complete amino acid sequence encoded by the cDNA in Clone ID NO:Z; and (d) the amino acid sequence of an antigenic fragment of a polypeptide having the complete amino acid sequence of SEQ ID NO:Y or the complete amino acid sequence encoded by the cDNA in Clone ED NO:Z.
  • the present invention is also directed to proteins which comprise, or alternatively consist of, an amino acid sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, identical to, for example, any of the amino acid sequences in (a), (b), (c), or (d), above, the amino acid sequence shown in SEQ JD NO:Y, the amino acid sequence encoded by the cDNA contained in Clone ID NO:Z, the amino acid sequence of the polypeptide encoded by the nucleotide sequence in SEQ JD NO:X as defined in columns 8 and 9 of Table 2, the amino acid sequence as defined in column 7 of Table 1, an amino acid sequence encoded by the nucleotide sequence in SEQ ID NO:X, and an amino acid sequence encoded by the complement of the polynucleotide sequence in SEQ JD NO:X.
  • polypeptides are also provided (e.g., those fragments described herein).
  • Further proteins encoded by polynucleotides which hybridize to the complement of the nucleic acid molecules encoding these amino acid sequences under stringent hybridization conditions or alternatively, under lower stringency conditions, are also encompassed by the invention, as are the polynucleotides encoding these proteins.
  • nucleic acid having a nucleotide sequence at least, for example, 95% "identical" to a reference nucleotide sequence of the present invention it is intended that the nucleotide sequence of the nucleic acid is identical to the reference sequence except that the nucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence encoding the polypeptide.
  • nucleic acid having a nucleotide sequence at least 95% identical to a reference nucleotide sequence up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence.
  • the query sequence may be an entire sequence referred to in Table 1 or 2 as the ORF (open reading frame), or any fragment specified as described herein.
  • nucleic acid molecule or polypeptide is at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of the present invention can be determined conventionally using known computer programs.
  • a preferred method for determining the best overall match between a query sequence (a sequence of the present invention) and a subject sequence, also referred to as a global sequence alignment, can be determined using the FASTDB computer program based on the algorithm of Brutlag et al. (Comp. App. Biosci. 6:237-245 (1990)). In a sequence alignment the query and subject sequences are both DNA sequences.
  • RNA sequence can be compared by converting U's to T's.
  • the result of said global sequence alignment is expressed as percent identity.
  • the FASTDB program does not account for 5' and 3' truncations of the subject sequence when calculating percent identity.
  • the percent identity is corrected by calculating the number of bases of the query sequence that are 5' and 3' of the subject sequence, which are not matched/aligned, as a percent of the total bases of the query sequence. Whether a nucleotide is matched/aligned is determined by results of the FASTDB sequence alignment. This percentage is then subtracted from the percent identity, calculated by the above FASTDB program using the specified parameters, to arrive at a final percent identity score. This corrected score is what is used for the purposes of the present invention. Only bases outside the 5' and 3' bases of the subject sequence, as displayed by the FASTDB alignment, which are not matched/aligned with the query sequence, are calculated for the purposes of manually adjusting the percent identity score.
  • a 90 base subject sequence is aligned to a 100 base query sequence to determine percent identity.
  • the deletions occur at the 5' end of the subject sequence and therefore, the FASTDB alignment does not show a matched/alignment of the first 10 bases at 5' end.
  • the 10 unpaired bases represent 10% of the sequence (number of bases at the 5' and 3' ends not matched/total number of bases in the query sequence) so 10% is subtracted from the percent identity score calculated by the FASTDB program. If the remaining 90 bases were perfectly matched the final percent identity would be 90%.
  • a 90 base subject sequence is compared with a 100 base query sequence.
  • deletions are internal deletions so that there are no bases on the 5' or 3' of the subject sequence which are not matched/aligned with the query.
  • percent identity calculated by FASTDB is not manually corrected.
  • bases 5' and 3' of the subject sequence which are not matched/aligned with the query sequence are manually corrected for. No other manual corrections are to be made for the purposes of the present invention.
  • a polypeptide having an amino acid sequence at least, for example, 95% "identical" to a query amino acid sequence of the present invention it is intended that the amino acid sequence of the subject polypeptide is identical to the query sequence except that the subject polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence.
  • the amino acid sequence of the subject polypeptide may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence.
  • up to 5% of the amino acid residues in the subject sequence may be inserted, deleted, (indels) or substituted with another amino acid.
  • These alterations of the reference sequence may occur at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
  • any particular polypeptide is at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to, for instance, the amino acid sequence of a polypeptide referred to in Table 1 (e.g., the amino acid sequence identified in column 6) or Table 2 (e.g., the amino acid sequence of the polypeptide encoded by the polynucleotide sequence defined in columns 8 and 9 of Table 2) or a fragment thereof, the amino acid sequence of the polypeptide encoded by the nucleotide sequence in SEQ ID NO:X or a fragment thereof, or the amino acid sequence of the polypeptide encoded by cDNA contained in Clone JD NO:Z, or a fragment thereof, can be determined conventionally using known computer programs.
  • a preferred method for determining the best overall match between a query sequence (a sequence of the present invention) and a subject sequence can be determined using the FASTDB computer program based on the algorithm of Brutlag et al. (Comp. App. Biosci.6:237-245 (1990)).
  • the query and subject sequences are either both nucleotide sequences or both amino acid sequences.
  • the result of said global sequence alignment is expressed as percent identity.
  • the percent identity is corrected by calculating the number of residues of the query sequence that are N- and C-terminal of the subject sequence, which are not matched/aligned with a corresponding subject residue, as a percent of the total bases of the query sequence. Whether a residue is matched/aligned is determined by results of the FASTDB sequence alignment.
  • This percentage is then subtracted from the percent identity, calculated by the above FASTDB program using the specified parameters, to arrive at a final percent identity score.
  • This final percent identity score is what is used for the purposes of the present invention. Only residues to the N- and C-termini of the subject sequence, which are not matched/aligned with the query sequence, are considered for the purposes of manually adjusting the percent identity score. That is, only query residue positions outside the farthest N- and C- terminal residues of the subject sequence. [0107] For example, a 90 amino acid residue subject sequence is aligned with a 100 residue query sequence to determine percent identity.
  • the deletion occurs at the N- terminus of the subject sequence and therefore, the FASTDB alignment does not show a matching/alignment of the first 10 residues at the N-terminus.
  • the 10 unpaired residues represent 10% of the sequence (number of residues at the N- and C- termini not matched/total number of residues in the query sequence) so 10% is subtracted from the percent identity score calculated by the FASTDB program. If the remaining 90 residues were perfectly matched the final percent identity would be 90%.
  • a 90 residue subject sequence is compared with a 100 residue query sequence. This time the deletions are internal deletions so there are no residues at the N- or C-termini of the subject sequence which are not matched/aligned with the query.
  • the polynucleotide variants of the invention may contain alterations in the coding regions, non-coding regions, or both. Especially preferred are polynucleotide variants containing alterations which produce silent substitutions, additions, or deletions, but do not alter the properties or activities of the encoded polypeptide. Nucleotide variants produced by silent substitutions due to the degeneracy of the genetic code are preferred. Moreover, polypeptide variants in which less than 50, less than 40, less than 30, less than 20, less than 10, or 5-50, 5-25, 5-10, 1-5, or 1-2 amino acids are substituted, deleted, or added in any combination are also preferred.
  • Polynucleotide variants can be produced for a variety of reasons, e.g., to optimize codon expression for a particular host (change codons in the human mRNA to those preferred by a bacterial host such as E. coli).
  • Naturally occurring variants are called "allelic variants," and refer to one of several alternate forms of a gene occupying a given locus on a chromosome of an organism. (Genes II, Lewin, B., ed., John Wiley & Sons, New York (1985)). These allelic variants can vary at either the polynucleotide and/or polypeptide level and are included in the present invention. Alternatively, non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis.
  • variants may be generated to improve or alter the characteristics of the polypeptides of the present invention.
  • one or more amino acids can be deleted from the N-terminus or C-terminus of the polypeptide of the present invention without substantial loss of biological function.
  • Ron et al. J. Biol. Chem. 268: 2984-2988 (1993)
  • variant KGF proteins having heparin binding activity even after deleting 3, 8, or 27 amino-terminal amino acid residues.
  • Interferon gamma exhibited up to ten times higher activity after deleting 8-10 amino acid residues from the carboxy terminus of this protein. (Dobeli et al., J. Biotechnology 7:199-216 (1988).)
  • the invention further includes polypeptide variants which show a functional activity (e.g., biological activity) of the polypeptides of the invention.
  • Such variants include deletions, insertions, inversions, repeats, and substitutions selected according to general rules known in the art so as have little effect on activity.
  • the present application is directed to nucleic acid molecules at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical to the nucleic acid sequences disclosed herein, (e.g., encoding a polypeptide having the amino acid sequence of an N and/or C terminal deletion), irrespective of whether they encode a polypeptide having functional activity.
  • nucleic acid molecule does not encode a polypeptide having functional activity
  • one of skill in the art would still know how to use the nucleic acid molecule, for instance, as a hybridization probe or a polymerase chain reaction (PCR) primer.
  • PCR polymerase chain reaction
  • nucleic acid molecules of the present invention that do not encode a polypeptide having functional activity include, inter alia, (1) isolating a gene or allelic or splice variants thereof in a cDNA library; (2) in situ hybridization (e.g., "FISH") to metaphase chromosomal spreads to provide precise chromosomal location of the gene, as described in Verma et al., Human Chromosomes: A Manual of Basic Techniques, Pergamon Press, New York (1988); (3) Northern Blot analysis for detecting mRNA expression in specific tissues (e.g., normal or diseased tissues); and (4) in situ hybridization (e.g., histochemistry) for detecting mRNA expression in specific tissues (e.g., normal or diseased tissues).
  • in situ hybridization e.g., histochemistry
  • nucleic acid molecules having sequences at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical to the nucleic acid sequences disclosed herein, which do, in fact, encode a polypeptide having functional activity.
  • a polypeptide having "functional activity” is meant, a polypeptide capable of displaying one or more known functional activities associated with a full-length (complete) protein of the invention.
  • Such functional activities include, but are not limited to, biological activity, antigenicity [ability to bind (or compete with a polypeptide of the invention for binding) to an anti -polypeptide of the invention antibody], immunogenicity (ability to generate antibody which binds to a specific polypeptide of the invention), ability to form multimers with polypeptides of the invention, and ability to bind to a receptor or ligand for a polypeptide of the invention.
  • polypeptides, and fragments, variants and derivatives of the invention can be assayed by various methods.
  • various immunoassays known in the art can be used, including but not limited to, competitive and non-competitive assay systems using techniques such as radioimmunoassays, ELISA (enzyme linked immunosorbent assay), "sandwich” immunoassays, immunoradiometric assays, gel diffusion precipitation reactions, immunodiffusion assays, in situ immunoassays (using colloidal gold, enzyme or radioisotope labels, for example), western blots, precipitation reactions, agglutination assays (e.g., gel agglutination assays, hemagglutination assays), complement fixation assays, immunofluorescence assays, protein A assays, and immunoelectrophoresis assays, etc.
  • competitive and non-competitive assay systems using techniques such as radioimmunoassays, ELISA (enzyme linked immunosorbent assay), "sandwich” immunoassays, immunoradiometric
  • antibody binding is detected by detecting a label on the primary antibody.
  • the primary antibody is detected by detecting binding of a secondary antibody or reagent to the primary antibody.
  • the secondary antibody is labeled. Many means are known in the art for detecting binding in an immunoassay and are within the scope of the present invention.
  • binding can be assayed, e.g., by means well-known in the art, such as, for example, reducing and non-reducing gel chromatography, protein affinity chromatography, and affinity blotting. See generally, Phizicky et al., Microbiol. Rev. 59:94-123 (1995).
  • the ability of physiological correlates of a polypeptide of the present invention to bind to a substrate(s) of the polypeptide of the invention can be routinely assayed using techniques known in the art.
  • nucleic acid molecules having a sequence at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to, for example, the nucleic acid sequence of the cDNA contained in Clone ID NO:Z, the nucleic acid sequence referred to in Table 1 (SEQ ED NO:X), the nucleic acid sequence disclosed in Table 2 (e.g,.
  • nucleic acid sequence delineated in columns 8 and 9) or fragments thereof will encode polypeptides "having functional activity.”
  • degenerate variants of any of these nucleotide sequences all encode the same polypeptide, in many instances, this will be clear to the skilled artisan even without performing the above described comparison assay.
  • a reasonable number will also encode a polypeptide having functional activity. This is because the skilled artisan is fully aware of amino acid substitutions that are either less likely or not likely to significantly effect protein function (e.g., replacing one aliphatic amino acid with a second aliphatic amino acid), as further described below.
  • the first strategy exploits the tolerance of amino acid substitutions by natural selection during the process of evolution. By comparing amino acid sequences in different species, conserved amino acids can be identified. These conserved amino acids are likely important for protein function. In contrast, the amino acid positions where substitutions have been tolerated by natural selection indicates that these positions are not critical for protein function. Thus, positions tolerating amino acid substitution could be modified while still maintaining biological activity of the protein.
  • the second strategy uses genetic engineering to introduce amino acid changes at specific positions of a cloned gene to identify regions critical for protein function. For example, site directed mutagenesis or alanine-scanning mutagenesis (introduction of single alanine mutations at every residue in the molecule) can be used. See Cunningham and Wells, Science 244: 1081-1085 (1989). The resulting mutant molecules can then be tested for biological activity.
  • tolerated conservative amino acid substitutions involve replacement of the aliphatic or hydrophobic amino acids Ala, Val, Leu and He; replacement of the hydroxyl residues Ser and Thr; replacement of the acidic residues Asp and Glu; replacement of the amide residues Asn and Gin, replacement of the basic residues Lys, Arg, and His; replacement of the aromatic residues Phe, Tyr, and T ⁇ , and replacement of the small-sized amino acids Ala, Ser, Thr, Met, and Gly.
  • variants of the present invention include (i) substitutions with one or more of the non-conserved amino acid residues, where the substituted amino acid residues may or may not be one encoded by the genetic code, or (ii) substitutions with one or more of the amino acid residues having a substituent group, or (iii) fusion of the mature polypeptide with another compound, such as a compound to increase the stability and/or solubility of the polypeptide (for example, polyethylene glycol), (iv) fusion of the polypeptide with additional amino acids, such as, for example, an IgG Fc fusion region peptide, serum albumin (preferably human serum albumin) or a fragment thereof, or leader or secretory sequence, or a sequence facilitating purification, or (v) fusion of the polypeptide with another compound, such as albumin (including but not limited to recombinant albumin (see, e.g., U.S.
  • polypeptide variants containing amino acid substitutions of charged amino acids with other charged or neutral amino acids may produce proteins with improved characteristics, such as less aggregation. Aggregation of pharmaceutical formulations both reduces activity and increases clearance due to the aggregate's immunogenic activity. See Pinckard et al., Clin. Exp. Immunol. 2:331-340 (1967); Robbins et al., Diabetes 36: 838-845 (1987); Cleland et al., Crit. Rev. Therapeutic Drug Carrier Systems 10:307-377 (1993).
  • a further embodiment of the invention relates to polypeptides which comprise the amino acid sequence of a polypeptide having an amino acid sequence which contains at least one amino acid substitution, but not more than 50 amino acid substitutions, even more preferably, not more than 40 amino acid substitutions, still more preferably, not more than 30 amino acid substitutions, and still even more preferably, not more than 20 amino acid substitutions from a polypeptide sequence disclosed herein.
  • a polypeptide prefferably has an amino acid sequence which comprises the amino acid sequence of a polypeptide of SEQ ED NO: Y, an amino acid sequence encoded by SEQ ED NO:X, an amino acid sequence encoded by the portion of SEQ ID NO:X as defined in columnns 8 and 9 of Table 2, an amino acid sequence encoded by the complement of SEQ JD NO:X, and/or an amino acid sequence encoded by cDNA contained in Clone ED NO.Z which contains, in order of ever-increasing preference, at least one, but not more than 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acid substitutions.
  • the polypeptides of the invention comprise, or alternatively, consist of, fragments or variants of a reference amino acid sequence selected from: (a) the amino acid sequence of SEQ ED NO:Y or fragments thereof (e.g., the mature form and/or other fragments described herein); (b) the amino acid sequence encoded by SEQ JD NO:X or fragments thereof; (c) the amino acid sequence encoded by the complement of SEQ ID NO:X or fragments thereof; (d) the amino acid sequence encoded by the portion of SEQ JD NO:X as defined in columns 8 and 9 of Table 2 or fragments thereof; and (e) the amino acid sequence encoded by cDNA contained in Clone ED NO:Z or fragments thereof; wherein the fragments or variants have 1-5, 5-10, 5-25, 5-50, 10-50 or 50-150, amino acid residue additions, substitutions, and/or deletions when compared to the reference amino acid sequence.
  • the amino acid substitutions are conservative. Polyn
  • polynucleotide fragments refers to a polynucleotide having a nucleic acid sequence which, for example: is a portion of the cDNA contained in Clone ED NO:Z or the complementary strand thereto; is a portion of the polynucleotide sequence encoding the polypeptide encoded by the cDNA contained in Clone JD NO:Z or the complementary strand thereto; is a portion of a polynucleotide sequence encoding the amino acid sequence encoded by the region of SEQ ID NO:X as defined in columns 8 and 9 of Table 2 or the complementary strand thereto; is a portion of the polynucleotide sequence of SEQ ID NO:X as defined in columns 8 and 9 of Table 2 or the complementary strand thereto; is a portion of the polynucleotide sequence of SEQ ID NO:X as defined in columns 8 and 9 of Table 2 or the complementary strand thereto; is a portion of the polynucleotide sequence of SEQ ID NO
  • the polynucleotide fragments of the invention are preferably at least about 15 nt, and more preferably at least about 20 nt, still more preferably at least about 30 nt, and even more preferably, at least about 40 nt, at least about 50 nt, at least about 75 nt, or at least about 150 nt in length.
  • a fragment "at least 20 nt in length,” for example, is intended to include 20 or more contiguous bases from the cDNA sequence contained in Clone ED NO:Z, or the nucleotide sequence shown in SEQ ED NO:X or the complementary stand thereto.
  • nucleotide fragments include, but are not limited to, as diagnostic probes and primers as discussed herein.
  • larger fragments e.g., at least 160, 170, 180, 190, 200, 250, 500, 600, 1000, or 2000 nucleotides in length ) are also encompassed by the invention.
  • polynucleotide fragments of the invention comprise, or alternatively consist of, a sequence from about nucleotide number 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751-800, 801-850, 851-900, 901-950, 951- 1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 1301-1350, 1351-1400, 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651-1700, 1701- 1750, 1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000, 2001-2050, 2051-2100, 2101-2150, 2151-2200, 2201-2250, 225
  • polynucleotide fragments of the invention comprise, or alternatively consist of, a sequence from about nucleotide number 1-50, 51- 100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751-800, 801-850, 851-900, 901-950, 951-1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 1301-1350, 1351- 1400, 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651-1700, 1701-1750, 1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000, 2001-2050, 2051-2100, 2101- 2150, 2151-2200, 2201-2250, 2251-2

Abstract

La présente invention concerne de nouvelles protéines, et plus particulièrement, des molécules d'acide nucléique isolées codantes pour de nouveaux polypeptides. Cette invention concerne aussi de nouveaux polypeptides et des anticorps qui se lient à ces polypeptides. Cette invention concerne encore des vecteurs, des cellules hôtes et des techniques de recombinaison et de synthèse permettant de produire des polynucléotides et/ou des polypeptides humains, et des anticorps. Cette invention concerne aussi des techniques diagnostiques et thérapeutiques qui conviennent pour le diagnostic, le traitement, la prévention et/ou le pronostic de pathologies liées à ces nouveaux polypeptides. Cette invention concerne enfin des techniques de criblage permettant d'identifier des agonistes et des antagonistes des ces polynucléotides et de ces polypeptides, et des techniques et/ou des compositions destinées à inhiber ou renforcer la production et la fonction de ces polypeptides.
PCT/US2001/029838 2000-09-26 2001-09-25 Acides nucleiques, proteines et anticorps WO2002026930A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2001296301A AU2001296301A8 (en) 2000-09-26 2001-09-25 Nucleic acids, proteins, and antibodies
AU2001296301A AU2001296301A1 (en) 2000-09-26 2001-09-25 Nucleic acids, proteins, and antibodies

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US23548400P 2000-09-26 2000-09-26
US60/235,484 2000-09-26

Publications (3)

Publication Number Publication Date
WO2002026930A2 WO2002026930A2 (fr) 2002-04-04
WO2002026930A9 true WO2002026930A9 (fr) 2005-04-07
WO2002026930A3 WO2002026930A3 (fr) 2009-06-11

Family

ID=22885694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/029838 WO2002026930A2 (fr) 2000-09-26 2001-09-25 Acides nucleiques, proteines et anticorps

Country Status (2)

Country Link
AU (2) AU2001296301A1 (fr)
WO (1) WO2002026930A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9107862B2 (en) 2007-09-04 2015-08-18 Compugen Ltd. Polypeptides and polynucleotides, and uses thereof as a drug target for producing drugs and biologics
US9394330B2 (en) 2012-03-21 2016-07-19 Alios Biopharma, Inc. Solid forms of a thiophosphoramidate nucleotide prodrug
US9428574B2 (en) 2011-06-30 2016-08-30 Compugen Ltd. Polypeptides and uses thereof for treatment of autoimmune disorders and infection

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2337100C (fr) 1998-08-07 2011-09-20 Immunex Corporation Polypeptides des molecules d'adhesion des cellules dendritiques lymphoides (ldcam) et polynucleotides codant lesdits polypeptides
EP1513934B1 (fr) * 2002-06-06 2011-03-02 Oncotherapy Science, Inc. Genes et polypeptides en rapport avec les cancers du colon chez l'homme
MXPA06000974A (es) 2003-07-25 2006-08-31 Amgen Inc Antagonistas y agonistas de ldcam y metodos para su uso.
WO2005059504A2 (fr) * 2003-12-12 2005-06-30 Bayer Healthcare Ag Diagnostics et therapeutique destines au traitement de maladies associees au recepteur couple aux proteines g gpr34 (gpr34)
CA2810928A1 (fr) 2010-09-22 2012-03-29 Alios Biopharma, Inc. Analogues nucleotidiques substitues
CA2860234A1 (fr) 2011-12-22 2013-06-27 Alios Biopharma, Inc. Analogues de nucleotide phosphorothioate substitue
BR112014018481A2 (pt) 2012-02-01 2017-07-04 Compugen Ltd anticorpo monoclonal ou policlonal ou um fragmento de ligação a antígeno do mesmo, polinucleotídeo, anticorpo monoclonal, vetor, hibridoma, anticorpo, hibridoma 5166-2 e/ou 5166-9, anticorpo ou fragmento de ligação a antígeno, composição farmacêutica, uso do anticorpo ou fragmento de ligação a anticorpo, método para tratar câncer, método para diagnosticar câncer em um indivíduo, anticorpo, método, composição ou uso
WO2013142157A1 (fr) 2012-03-22 2013-09-26 Alios Biopharma, Inc. Combinaisons pharmaceutiques comprenant un analogue thionucléotidique
LT3124976T (lt) * 2015-07-28 2018-12-27 F. Hoffmann-La Roche Ag Patobulintas bakterinio endotoksino testas, skirtas endotoksinų aptikimui

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9107862B2 (en) 2007-09-04 2015-08-18 Compugen Ltd. Polypeptides and polynucleotides, and uses thereof as a drug target for producing drugs and biologics
US9375466B2 (en) 2007-09-04 2016-06-28 Compugen Ltd Polypeptides and polynucleotides, and uses thereof as a drug target for producing drugs and biologics
US9555087B2 (en) 2007-09-04 2017-01-31 Compugen Ltd Polypeptides and polynucleotides, and uses thereof as a drug target for producing drugs and biologics
US9428574B2 (en) 2011-06-30 2016-08-30 Compugen Ltd. Polypeptides and uses thereof for treatment of autoimmune disorders and infection
US9394330B2 (en) 2012-03-21 2016-07-19 Alios Biopharma, Inc. Solid forms of a thiophosphoramidate nucleotide prodrug

Also Published As

Publication number Publication date
WO2002026930A3 (fr) 2009-06-11
AU2001296301A8 (en) 2009-07-30
AU2001296301A1 (en) 2002-04-08
WO2002026930A2 (fr) 2002-04-04

Similar Documents

Publication Publication Date Title
WO2001055208A1 (fr) Acides nucleiques, proteines et anticorps
WO2001054472A2 (fr) Acides nucleiques, proteines et anticorps
WO2001055306A2 (fr) Acides nucleiques, proteines, et anticorps
WO2002026930A9 (fr) Acides nucleiques, proteines et anticorps
WO2001055440A1 (fr) Acides nucleiques, proteines et anticorps
WO2002072763A2 (fr) Acides nucleiques, proteines et anticorps
EP1254152A2 (fr) Acides nucleiques, proteines, et anticorps
WO2001055164A1 (fr) Acides nucleiques, proteines et anticorps
EP1252312A1 (fr) Acides nucleiques, proteines et anticorps
WO2001055449A1 (fr) Acides nucleiques, proteines et anticorps
WO2001055305A2 (fr) Acides nucleiques, proteines, et anticorps
EP1254165A2 (fr) Acides nucleiques, proteines, et anticorps
EP1254170A2 (fr) Acides nucleiques, proteines et anticorps
EP1254272A2 (fr) Acides nucleiques, proteines et anticorps
EP1252326A1 (fr) Acides nucleiques, proteines et anticorps

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1-321, SEQUENCE LISTING, ADDED

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
COP Corrected version of pamphlet

Free format text: PAGES 1-321, SEQUENCE LISTING, ADDED

NENP Non-entry into the national phase

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)