AU2005248530A1 - Differential expression of markers in ovarian cancer - Google Patents
Differential expression of markers in ovarian cancer Download PDFInfo
- Publication number
- AU2005248530A1 AU2005248530A1 AU2005248530A AU2005248530A AU2005248530A1 AU 2005248530 A1 AU2005248530 A1 AU 2005248530A1 AU 2005248530 A AU2005248530 A AU 2005248530A AU 2005248530 A AU2005248530 A AU 2005248530A AU 2005248530 A1 AU2005248530 A1 AU 2005248530A1
- Authority
- AU
- Australia
- Prior art keywords
- pea
- node
- amino acid
- amino acids
- homologous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Peptides Or Proteins (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Description
WO 2005/116850 PCT/IB2005/002555 DIFFERENTIAL EXPRESSION OF MARKERS IN OVARIAN CANCER FIELD OF THE INVENTION The present invention is related to novel nucleotide and protein sequences that are 5 diagnostic markers for ovarian cancer, and assays and methods of use thereof. BACKGROUND OF THE INVENTION Ovarian cancer causes more deaths than any other cancer of the female reproductive system. An estimated 25,580 new cases will be diagnosed during 2004 in the United States, and 10 approximately 16,090 of these women will die of the disease. Despite advances in the management of advanced ovarian cancer, 70% to 80% of patients will ultimately succumb to disease that is diagnosed in late stages. When ovarian cancer is diagnosed in stage I, more than 90% of patients can be cured with conventional surgery and chemotherapy. At present, however, only 25% of ovarian cancers are detected in stage 1. Detection of a greater fraction of ovarian 15 cancers at an early stage might significantly affect survival. A worldwide research effort, aiming at early detection of ovarian cancer, is currently being performed; finding molecular markers for the disease is one of the major research topics (J Clin Oncol. 2003 May 15;21(10 Suppl):200-5). No single marker has been shown to be sufficiently sensitive or specific to .contribute to the diagnosis of ovarian cancer. The marker that is currently most frequently used is CA-125 (Br 20 J Cancer. 2000 May;82(9):1535-8). Its properties do not support its use for screening, but it is a major diagnostic tool. CA-125 is a member of the epithelial sialomucins markers group and is the most well documented and the best performing single marker from this group. Another name for CA-125 is mucin 16, and although it is a membrane protein, it can be found in the serum. Its greatest sensitivity is achieved for serous and emdometrioid ovarian tumors compared to 25 mucinous or clear cell tumors. Other than diagnosis, it can be used for disease monitoring (Eur J Gynaecol Oncol. 2000;21(1):64-9). In about 70% of patients, a rising level of CA-125 may be the first indication of relapse, predating clinical relapse by a median of 4 months. The serum concentration of CA-125 is elevated by the vascular invasion, tissue destruction and inflammation associated with malignant disease and is elevated in over 90% of those women 30 with advanced ovarian cancer. Yet, CA-125 is not specific to ovarian cancer. It is elevated in 40% of all patients with advanced intra-abdominal malignancy. Levels can also be elevated WO 2005/116850 PCT/IB2005/002555 2 during menstruation or pregnancy and in other benign conditions such as endometriosis, peritonitis or cirrhosis, particularly with ascites. CA-125 is not a marker that can be detected through use of urine samples due to a high molecular weight. There are other ovarian cancer markers originating from epithelial mucins but none can 5 replace CA-125, due to poorer specificity and sensitivity. These other markers may prove complementary to CA-125. CA-50, CA 54-61, CA-195 and CA 19-9 all appear to have greater sensitivity for detection of mucinous tumors while STN and TAG-72 have better sensitivity for detection of clear cell tumors (Dis Markers. 2004;20(2):53-70). Kallikreins, a family of serine proteases, and other protease-related proteins are also 10 potential markers for ovarian cancer. Indeed, the entire family of kallikreins map to a region on chromosome 19q which is shown to be amplified in ovarian cancers. In particular, kallikrein 6 (protease M) and kallilrein 10 have been reported to have sensitivity up to 75% and specificity up to 100%. Matrix metalloproteinases (MMPs) are another family of proteases useful in ovarian cancer screening and prognosis. MMP-2 was reported to have 66% sensitivity and 100% 15 specificity in one study. Cathepsin L, a cystein protease, was described to have a lower false positive rate compared with CA- 125. Based on their biochemical proteolytic role, it would seem likely that these proteases would be active in invasion and metastasis formation and indeed these markers appear to have higher sensitivity for advanced stages of the disease. Due to their relatively low molecular weight, such proteases are candidates to be urine markers, or markers 20 which can be detected in urine samples (Dis Markers. 2004;20(2):53-70). Hormones have a role in normal ovarian physiology. Therefore, it is not surprising that hormones, and growth and inhibition factors as well, are suitable for ovarian cancer detection. Measurements of fragments of gonadotropin in the urine were found to have sensitivity up to 83% and specificity up to 92% for detecting ovarian cancer. Inhibins, members of the 25 Transforming Growth Factors (TGF) beta superfamily, have been shown to have a diagnostic value in the detection of granulosa cell tumor, a relatively uncommon type of ovarian cancer, associated with better prognosis overall. Serum inhibin is an ovarian product which decreases to non detectable levels after menopause, however, certain ovarian cancers (mucinous carcinomas and sex cord stromal tumours such as granulosa cell tumours) continue to produce inhibin. 30 Studies have shown that that inhibin assays which detect all inhibin forms (as opposed to test detecting specific members of the inhibins family) provide the highest sensitivity/specificity WO 2005/116850 PCT/IB2005/002555 3 characteristics as an ovarian cancer diagnostic test (Mol Cell Endocrinol. 2002 May 31; 191(1):97-103). Measurement of serum TGF-alpha itself was found to have sensitivity up to 70% and specificity of 89% in early stage disease. The growth factor Mesothelin was also found to have diagnostic value but only for late stage disease. 5 Immunohistochemistry is frequently used to assess the origin of tumor and staging when a pathological tissue sample is available. A few molecular markers have been shown to have diagnostic value in Immunohistochemistry of ovarian cancer, among them Epidermal Growth Factor, p53 and HER-2. P53 expression is much lower at early stage than late stage disease. P53 high expression is more typical or characteristic of invasive serous tumors than of mucinous 10 tumors. No benign tumors are stained with P53. HER-2 is found in less than 25% of newly diagnosed ovarian cancers. Ovarian cancer of type granulosa cell tumor has in general better prognosis with late relapse and/or metastasis formation. However, about 50% of patients still die within 20 years of diagnosis. In this specific tumor type, immunohistochemistry staining of estrogen receptor beta (ERb) and proliferating cell nuclear antigen (PCNA) showed that loss of 15 ERb expression and high PCNA expression, characterized a subgroup of granulosa cell tumors with a worse outcome (Histopathology. 2003 Sep;43(3):254-62). Survivin expression was also shown to be correlated to tumor grade, histologic type and mutant p53 but actual correlation to survival is questionable (Mod Pathol. 2004 Feb;17(2):264) Many other markers have been tested over the years for ovarian cancer detection. Some 20 markers have shown only limited value while others are still under investigation. Among them are TPA and TPS, two cytokeratins whose inclusion in a panel with CA-125 resulted in diagnoses with sensitivity up to 93% and specificity up to 98%. LPA - lysophosphatidic acid was a very promising marker with one study demonstrating 98% sensitivity and 90% specificity. However, this marker is very unstable and requires quick processing and freezing of plasma, and 25 therefore has limited usage. As previously described, no single marker has been shown to be sufficiently sensitive or specific to contribute to the diagnosis of ovarian cancer. Therefore combinations of markers in panel are being tested. Usually CA-125 is one of the panel members. The best performing panel combinations so far have been CA-125 with CA 15-3 with sensitivity of 93% and specificity of 30 93%, CA-125 with CEA (which has very little sensitivity by itself) with specificity of 93% and WO 2005/116850 PCT/IB2005/002555 4 specificity of 93%, and CA-125 with TAG-72 and CA 15-3 where specificity becomes 95% but sensitivity is diminished (Dis Markers. 2004;20(2):53-70). 5 SUMMARY OF THE INVENTION The background art does not teach or suggest markers for ovarian cancer that are sufficiently sensitive and/or accurate, alone or in combination. The present invention overcomes these deficiencies of the background art by providing novel markers for ovarian cancer that are both sensitive and accurate. These markers are 10 differentially expressed and preferably overexpressed in ovarian cancer specifically, as opposed to normal ovarian tissue. The measurement of these markers, alone or in combination, in patient (biological) samples provides information that the diagnostician can correlate with a probable diagnosis of ovarian cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between ovarian cancer and non-cancerous states. 15 According to preferred embodiments of the present invention, examples of suitable biological samples which may optionally be used with preferred embodiments of the present invention include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, ovarian tissue, any human organ or tissue, 20 including any tumor or normal tissue, any sample obtained by lavage (for example of the bronchial system or of the female reproductive system), and also samples of in vivo cell culture constituents. In a preferred embodiment, the biological sample comprises ovarian tissue and/or a serum sample and/or a urine sample and/or secretions or other samples from the female reproductive system and/or any other tissue or liquid sample. The sample can optionally be 25 diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay. Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence 30 Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from WO 2005/116850 PCT/IB2005/002555 5 EMBnet, maintained by the ISREC Bionformatics group and the LICR Information Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformnatics, http://www.ch.embnet.org/software/TMPRED_form.htnml) for transmembrane region prediction; (iii) signalp_hmm or (iv) signalp_nn (both from Center for Biological Sequence 5 Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/SignalP/background/prediction.php) for signal peptide prediction. The terms "signalp_hmm" and "signalpnn" refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization 10 and/or gene structure, and the use of heuristics by the individual inventor. In some cases for the manual inspection of cellular localization prediction inventors used the ProLoc computational platform [Einat Hazkani-Covo, Erez Levanon, Galit Rotman, Dan Graur and Amit Novik; (2004) "Evolution of multicellularity in metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis." Cell Biology 15 International 2004;28(3):171-8.], which predicts protein localization based on various parameters including, protein domains (e.g., prediction of trans-membranous regions and localization thereof within the protein), pl, protein length, amino acid composition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain organelle (such as, nuclear localization signal, NLS, mitochondria localization signal), signal 20 peptide and anchor modeling and using unique domains from Pfam that are specific to a single compartment. Information is given in the text with regard to SNPs (single nucleotide polymorphisms). A description of the abbreviations is as follows. "T - > C", for example, means that the SNP results in a change at the position given in the table from T to C. Similarly, "M - > Q", for 25 example, means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*). As part of the description of an SNP, a comment may be found in 30 parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP. An FTId is WO 2005/116850 PCT/IB2005/002555 6 a unique and stable feature identifier, which allows construction of links directly from position specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field, as follows: FTId=XXX_number, in which XXX is the 3-letter code for the specific feature key, separated by an underscore from 5 a 6-digit number. In the table of the amino acid mutations of the wild type proteins of the selected splice variants of the invention, the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence. SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker. Preferred 10 embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein. Information given in the text with regard to the Homology to the known proteins was 15 determined by Smith-Waterman version 5.1.2 using special (non default) parameters as follows: -model=sw.model -GAPEXT=0 -GAPOP=100.0 -MATRIX=blosuml00 20 Information is given with regard to overexpression of a cluster in cancer based on ESTs. A key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P-value without including the level of expression in cell 25 lines (P1) - library based statistics: P-value including the level of expression in cell-lines (P2) - EST clone statistics: P-value without including the level of expression in cell-lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of 30 expression in cell-lines (R3) - EST clone statistics: P-value including the level of expression in cell-lines (SP2) WO 2005/116850 PCT/IB2005/002555 7 - EST clone statistics: predicted overexpression ratio including the level of expression in cell-lines (R4) Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer. 5 Information is given with regard to overexpression of a cluster in cancer based on microarrays. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to 10 a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. For microarrays prepared according to a design by the present 15 inventors, the probe name begins with the name of the cluster (gene), followed by an identifying number. These probes are listed below with their respective sequences. >H61775 0 11 0 CCCCAGCTTTTATAGAGCGGCCCAAGGAAGAATATTTCCAAGAAGTAGGG >HSAPHOL 0 11 0 20 GGAACATTCTGGATCTGACCCTCCCAGTCTCATCTCCTGACCCTCCCACT >HUMGRP5E_0 0_16630 GCTGATATGGAAGTTGGGGAATCTGAATTGCCAGAGAATCTTGGGAAGAG >HUMGRP5E 0 2 0 TCTCATAGAAGCAAAGGAGAACAGAAACCACCAGCCACCTCAACCCAAGG 25 >D56406_0_5 0 TCTGACTTTTACGGACTTGGCTTGTTAGAAGGCTGAAAGATGATGGCAGG >M77904 0 8 0 AGTCTGTGTTTGAGGGTGAAGGCTCAGCAACCCTGATGTCTGCCAACTAC >Z25299 0 3 0 30 AACTCTGGCACCTTGGGCTGTGGAAGGCTCTGGAAAGTCCTTCAAAGCTG >Z44808 0 8 0 WO 2005/116850 PCT/IB2005/002555 8 AAAAGCATGAGTTTCTGACCAGCGTTCTGGACGCGCTGTCCACGGACATG >Z44808_0_0_72347 ATGTTCTTAGGAGGCAAGCCAGGAGAAGCCGGGTCTGACTTTTCAGCTCA >Z44808_0_0_72349 5 TCCTCCAGACCCAAAGCCACAACCCATCGCAAGTCAAGAACACTTTCCAG >S67314 0 0 741 CACAGAGCCAGGATGTTCTTCTGACCTCAGTATCTACTCCAGCTCCAGCT >S67314 0 0 744 TGGCATGCTGGAACATGGACTCTAGCTAGCAAGAAGGGCTCAAGGAGGTG 10 >Z39337 0 0 66755 GCAGGGGTTAAAAGGACGTTCCAGAAGCATCTGGGGACAGAACCAGCCTC >Z39337 0 9 0 TAATAAACGCAGCGACGTGAGGGTCCTGATTCTCCCTGGTTTTACCCCAG >HUMPHOSLIP 0 0 18458 15 AAGGAAGCAGGACCAGTGGATGTGAGGCGTGGTCGAAGAACAACAGAAAG >HUMPHOSLIP 0 0 18487 ACAGGGGCCAGATGGTGACCCATGACCCAGCCTAAAAGGCAGCCAGAGGG >M78530 0 6 0 CTFTCCTACACACATCTAGACGTTCAAGTTTGCAAATCAGTTTTTAGCAAG 20 >HSMUCIA 0 37 0 AAAAGGAGACTTCGGCTACCCAGAGAAGTTCAGTGCCCAGCTCTACTGAG >HSMUC1A 0 0 11364 AAAGGCTGGCATAGGGGGAGGTTTCCCAGGTAGAAGAAGAAGTGTCAGCA >HSMUC1A 0 0 11365 25 AATTAACCCTTTGAGAGCTGGCCAGGACTCTGGACTGATTACCCCAGCCT Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the Human Genome U133 (HG-U133) Set at www.affymetrix.com/products/arrays/specific/hgu 133.affx; GeneChip Human Genome U133A 30 2.0 Array at www.affymetrix.com/products/arrays/specific/hgul 33av2.affx; and Human Genome U133 Plus 2.0 Array at WO 2005/116850 PCT/IB2005/002555 9 www.affymetrix.com/products/arrays/specific/hgu I 33plus.affx). The probe names follow the Affymetrix naming convention. The data is available from NCBI Gene Expression Omnibus (see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210). The dataset (including results) is available from 5 www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE 1133 for the Series GSE 133 database (published on March 2004); a reference to these results is as follows: Su et al (Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. Epub 2004 Apr 09). The following list of abbreviations for tissues was used in the TAA histograms. The term 10 "TAA" stands for "Tumor Associated Antigen", and the TAA histograms, given in the text, represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below (the first word is the abbreviation while the second word is the full name): ("BONE", "bone"); 15 ("COL", "colon"); ("EPI", "epithelial"); ("GEN", "general"); ("LIVER", "liver"); ("LUN", "lung"); 20 ("LYMPH", "lymph nodes"); ("MARROW", "bone marrow"); ("OVA", "ovary"); ("PANCREAS", "pancreas"); ("PRO", "prostate"); 25 ("STOMACH", "stomach"); ("TCELL", "T cells"); ("THYROID", "Thyroid"); ("MAM", "breast"); ("BRAIN", "brain"); 30 ("UTERUS", "uterus"); ("SKIN", "skin"); WO 2005/116850 PCT/IB2005/002555 10 ("KIDNEY", "kidney"); ("MUSCLE", "muscle"); ("ADREN", "adrenal"); ("HEAD", "head and neck"); 5 ("BLADDER", "bladder"); It should be noted that the terms "segment", "seg" and "node" are used interchangeably in reference to nucleic acid sequences of the present invention; they refer to portions of nucleic 10 acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below. Optionally and preferably, they are examples of oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may 15 optionally be derived, and/or for any other use. Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in 20 this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). All of these are hereby incorporated by reference as if fully set forth herein. As used herein, the following terms have the meanings ascribed to 25 them unless specified otherwise. As used herein the phrase "ovarian cancer" refers to cancers of the ovary including but not limited to Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, 30 Leydig, and collagen-producing stromal cells).
WO 2005/116850 PCT/IB2005/002555 11 The term "marker" in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having ovarian cancer as compared to a comparable sample taken from subjects who do not have ovarian cancer. 5 The phrase "differentially present" refers to differences in the quantity of a marker present in a sample taken from patients having ovarian cancer as compared to a comparable sample taken from patients who do not have ovarian cancer. For example, a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic 10 acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays. A polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present. 15 As used herein the phrase "diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay are termed "true 20 negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. As used herein the phrase "diagnosing" refers to classifying a disease or a symptom, 25 determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery. The term "detecting" may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample 30 obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a "biological sample obtained from WO 2005/116850 PCT/IB2005/002555 12 the subject" may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term "level" refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention. 5 Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or 10 polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made. 15 Determining the level of the same variant in normal tissues of the same origin is preferably effected along-side to detect an elevated expression and/or amplification and/or a decreased expression, of the variant as opposed to the normal tissues. A "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of ovarian cancer. A test amount can be either in absolute amount 20 (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). A "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a patient with ovarian cancer or a person without ovarian cancer. A control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., 25 relative intensity of signals). "Detect" refers to identifying the presence, absence or amount of the object to be detected. A "label" includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means. For example, useful labels include 32 P, 35 S, 30 fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal WO 2005/116850 PCT/IB2005/002555 13 antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample. The label can be incorporated in or attached to a primer or probe either covalently, or through ionic, van 5 der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin. The label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly. For example, the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a 10 complementary sequence, to which it can specifically hybridize. The binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule. The binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. 15 D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry. Exemplary detectable labels, optionally and preferably for use with immunoassays, include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric 20 labels such as colloidal gold or colored glass or plastic beads. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. 25 "Immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen. The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide (or other epitope), 30 refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay WO 2005/116850 PCT/IB2005/002555 14 conditions, the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, 5 polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymorphic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species. A variety of 10 immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be 15 at least twice background signal or noise and more typically more than 10 to 100 times background. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tranlscript NantX H61775 T21 H61775 T22 20 a nucleic acid sequence comprising a sequence in the table below: H6177Segment Name2 H61775 node 2 H61775 node 4 H61775 node 6 H61775 node 8 WO 2005/116850 PCT/IB2005/002555 15 H61775 node_0 H61775_node_5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below amino acid sequence comprising a sequence in the table below: 5 ProIteci Name H61775 P16 H61775P17 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: HUMCEAPEAIT8 HUMCEA PEAI__T9 HUMCEA PEA_1 T20 HUMCEAPEA_1 T25 HUMCEAPEA_1 T26 10 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMCEA PEA 1 node_0 HUMCEA PEA 1_node 2 HUMCEA _PEA 1 node 11 HUMCEA PEA 1 node 12 HUMCEA PEA 1_node 31 WO 2005/116850 PCT/IB2005/002555 16 HUMCEA PEA I node_36 HUMCEA PEA I node_44 HUMCEA PEA _ node 46 HUMCEAPEA 1 node_63 HUMCEA PEA _ node 65 HUMCEAPEA Inode 67 HUMCEA PEA 1_node 3 HUMCEAPEA_1 node 7 HUMCEA PEAI node 8 HUMCEA PEA_1 node 9 HUMCEA PEA_1 node 10 HUMCEA PEA_ I node_15 HUMCEA PEA I _node 16 HUMCEA PEA 1 node 17 HUMCEA PEA 1 node 18 HUMCEA PEA 1 node 19 HUMCEAPEA l node 20 HUMCEAPEA_1_node 21 HUMCEA PEA_1 node 22 HUMCEAPEA_1 node 23 HUMCEA PEA_1_node_24 HUMCEAPEA 1 node_27 HUMCEAPEA 1_node 29 HUMCEAPEA 1 node 30 HUMCEAPEA 1 node_33 HUMCEAPEA 1 node_34 HUMCEA PEA_1 node 35 HUMCEA PEA_1 node 45 HUMCEA PEA 1 node 50 HUMCEAPEA 1 node 51 WO 2005/116850 PCT/IB2005/002555 17 HUMCEA PEA _ node 56 HUMCEA PEA I node 57 HUMCEAPEA l node 58 HUMCEA PEA 1 node 60 HUMCEA PEA 1 node 61 HUMCEA PEA 1 node 62 HUMCEA PEA 1 node 64 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: tProtein ame Corresponcung Transcript(s HUMCEA PEA 1 P4 HUMCEA PEA 1 T8 HUMCEA PEA 1 P5 HUMCEAPEA 1 T9 HUMCEAPEA 1 P14 HUMCEAPEA 1 T20 HUMCEAPEA 1 P19 HUMCEAPEA 1 T25 HUMCEAPEA 1 P20 HUMCEA PEA 1 T26 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transciit Naiei HUMEDF PEA _2 _T5 HUMEDFPEA 2 TlO HUMEDF PEA 2 T 11 10 a nucleic acid sequence comprising a sequence in the table below: Segient Nane WO 2005/116850 PCT/IB2005/002555 18 HUMEDF PEA 2 node_6 HUMEDF PEA 2 node 11 HUMEDFPEA 2 node 18 HUMEDFPEA 2 node_19 HUMEDF PEA 2 node 22 HUMEDF PEA 2 node 2 HUMEDF PEA 2 node 8 HUMEDF PEA 2 node 20 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protceiune Cospodig transcript s) HUMEDFPEA 2 P5 HUMEDF PEA 2 T10O HUMEDF PEA 2 P6 HUMEDF PEA 2 T I HUMEDF PEA 2 P8 HUMEDF PEA 2 T5 5 According to preferred embodiments of the present invention, there is provided an 10 isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name HSAPHOL T10 HSAPHOL T4 HSAPHOL T5 HSAPHOL T6 HSAPHOL T7 HSAPHOL T8 WO 2005/116850 PCT/IB2005/002555 19 HISAPHOL T9 a nucleic acid sequence comprising a sequence in the table below: SegLme2nt Natme HSAPHOL node 11 HSAPHOL node 13 HSAPHOL node 15 HSAPHOL node 19 HSAPHOL node 2 HSAPHOL node 21 HSAPHOL node 23 HSAPHOL node 26 HSAPHOL node 28 HSAPHOL node 38 HSAPHOL node 40 HSAPHOL node 42 HSAPHOL node 16 HSAPHOL node 25 HSAPHOL node 34 HSAPHOL node 35 HSAPHOL node 36 HSAPHOL node 41 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protin Nane HSAPHOL P2 WO 2005/116850 PCT/IB2005/002555 20 H-ISAPHOL P3 HSAPHOL P4 HSAPHOL P5 HSAPHOL P6 HSAPHOL P7 HSAPHOL P8 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tratnscript Name T10888 PEA 1 TI T10888 PEA 1 T4 T10888 PEA 1 T5 T10888 PEA 1 T6 5 a nucleic acid sequence comprising a sequence in the table below: Segment N~une T10888 PEA 1 node 11 T10888 PEA 1 node 12 T10888 PEA 1 node 17 T10888 PEA 1 node 4 T10888 PEA 1_node_6 T10888 PEA 1 node 7 T10888_PEA 1_node 9 T10888 PEA 1 node 15 WO 2005/116850 PCT/IB2005/002555 21 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein1 NameII T10888 PEA 1 P2 T10888 PEA 1 P4 T10888 PEA 1 P5 T10888 PEA 1 P6 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tnmscript iName HSECADH TI I HSECADH T18 HSECADH T19 HSECADH T20 a nucleic acid sequence comprising a sequence in the table below: 10 HSECADHnodet Nam HSECADH node 2 HSECADH node 14 HSECADH node- 15 HSECADH node 21 HSECADH node 22 HSECADH node 25 HSECADH node 26 HSECADH node 48 WO 2005/116850 PCT/IB2005/002555 22 HSECADH node 52 HSECADH node 53 HSECADH node 54 HSECADH node 57 HSECADH node 60 HSECADH node 62 HSECADH node 63 HSECADH node 7 HSECADH node 1 HSECADH node 11 HSECADH node 12 HSECADH node 17 HSECADH node 18 HSECADH node 19 HSECADH node 3 HSECADH node 42 HSECADH node 45 HSECADH node 46 HSECADH node 55 HSECADH node 56 HSECADH node_58 HSECADH node 59 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Nane HSECADHP9 HSECADH P13 WO 2005/116850 PCT/IB2005/002555 23 HSECADH P14 HSECADH Pl5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tirnhscrt N me HUMGRP5E T4 HUMGRP5E _T5 5 a nucleic acid sequence comprising a sequence in the table below: Segmenit Nmiie IHUMGRP5E node_0 HUMGRP5E node 2 HUMGRP5E node 8 HUMGRP5E node 3 HUMGRP5E node_7 According to preferred embodiments of the present invention, there is provided an 10 isolated polypeptide comprising an amino acid sequence in the table below: Protein Name HUMGRP5EP4 HUMGRP5EP5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: 15 WO 2005/116850 PCT/IB2005/002555 24 TranscriptName R11723 PEA I TI5 R11723 PEA 1 TI7 R 11723 PEA 1 T19 R11 723 PEA 1 T20 RI 1723_PEA 1 T5 R11723_PEA 1 T6 a nucleic acid sequence comprising a sequence in the table below: Smn1Name RI 1723 PEA 1 node 13 R11723 PEA 1 node 16 RI 1723 PEA 1 node 19 RI1723 PEA 1 node 2 RI 1723 PEA 1 node 22 RI 1723 PEA 1 node 31 RI 1723 PEA 1 node 10 RI1723 PEA 1 node 11 R11723 PEA 1 node 15 R11723 PEA 1 node 18 R1 1723 PEA 1 node 20 RI1723 PEA 1 node 21 R1 1723 PEA 1 node 23 R11723 PEA 1 node 24 Rl1723 PEA 1_node 25 RI 1723 PEA 1_node 26 RI 1723 PEA 1 node 27 R11723 PEA 1 node 28 WO 2005/116850 PCT/IB2005/002555 25 R11723 PEA lnode_29 RI1723 PEA lnode_3 RI1723 PEA I node_30 R11723 PEA l node_4 R I1723 PEA 1 node 5 RI1723_PEA_1 node_6 R l 1723 PEA 1 node 7 R11723 PEA 1 node 8 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name R11723_PEAIP2 Rl1723_PEA 1 P6 RI 1723_PEA 1 P7 R11723_PEA 1 P13 RI 1723_PEA 1 Pl0 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name D56406 PEA lT3 D56406_PEA lT6 D56406_PEA 1_T7 a nucleic acid sequence comprising a sequence in the table below: 10 WO 2005/116850 PCT/IB2005/002555 26 Segment Name' D56406 PEA 1 node 0 D56406 PEA_1 node 13 D56406_PEA 1 node_ 11 D56406_PEA 1 node 2 D56406_PEA 1 node 3 D56406 PEA 1 node 5 D56406 PEA 1 node 6 D56406 PEA_1 node 7 D56406 PEA_1 node 8 D56406_PEA _1 node 9 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Proteinmeanic D56406_PEA 1 P2 D56406_PEA 1 P5 D56406_PEA 1 P6 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: fransenpt Nme~ H53393 PEA 1_ T10O H53393 PEAI_ 1 TlI H53393 PEA_1 T3 H53393 PEA 1 T9 WO 2005/116850 PCT/IB2005/002555 27 a nucleic acid sequence comprising a sequence in the table below: Segmn't Name 7 1-153393_PEAI node_0 H53393_PEA _I node_10 H53393 PEA 1 node_12 H53393 PEAInode_13 H53393 PEA1 _node_15 H53393 PEA I_ node_17 H53393 PEA 1 node_19 H53393 PEA 1 node_23 H53393 PEA 1 node_24 H53393 PEA Inode_25 H53393 PEA 1_node 29 H53393 PEA 1 node 4 H53393 PEA 1 node 6 H53393 PEA _1 node 8 H53393 PEA _1 node 21 H53393 PEA_1 node 22 According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide comprising an amino acid sequence in the table below: Protein Name x H53393_PEA_1_P2 H53393_PEAlP3 H53393_PEA 1 P6 WO 2005/116850 PCT/IB2005/002555 28 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transnpt Name HSU40434 PEA 1 TI3 a nucleic acid sequence comprising a sequence in the table below: 5 HSU40434 PEA_1 _node_ I HSU40434 PEA 1 node 3016 HSU40434 PEA 1 node 30 HSU40434 PEA 1 node 3257 HSU40434 PEA 1 node 57 HSU40434 PEA 1 node 10 HSU40434 PEA 1 node 10 HSU40434 PEA 1 node 13 HSU40434 PEA 1 node 218 HSU40434 PEA 1 node 2 HSU40434 PEA 1 node 20 HSU40434 PEA 1 node 21 HSU40434_PEA 1 node 23 HSU40434 PEA 1 node 24 HSU40434_PEA 1 node 26 HSU40434 PEA 1 node 328 HSU40434 PEA 1 node 3 HSU40434_PEA_1 node_35 HSU40434 PEA 1 node 36 HSU40434 PEA 1 node 37 HSU40434 PEA 1 node 38 WO 2005/116850 PCT/IB2005/002555 29 I-HSU40434 PEA I node 39 HSU40434 PEA_1 node 40 HSU40434 PEA 1 node 41 HSU40434 PEA 1 node 42 HSU40434 PEA _1 node 43 HSU40434 PEA 1 node 44 HSU40434 PEA 1 node 47 HSU40434 PEA 1 node 48 HSU40434 PEA 1 node 51 HSU40434 PEA 1 node 52 HSU40434 PEA 1 node 53 HSU40434 PEA 1 node 54 HSU40434 PEA 1 node 56 HSU40434 PEA 1 node 7 HSU40434 PEA 1 node 8 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Pr-otein Narne HSU40434_PEA 1_ Pl2 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcipt Name M77904 T11I M77904 T3 M77904 T8 WO 2005/116850 PCT/IB2005/002555 30 M77904 T9 a nucleic acid sequence comprising a sequence in the table below: SmetName: M77904 node 0 M77904 node 11 M77904 node 12 M77904 node 14 M77904 node 15 M77904 node 17 M77904 node 2 M77904 node 21 M77904 node 23 M77904 node 24 M77904 node 27 M77904 node 28 M77904 node 4 M77904 node 6 M77904 node 7 M77904 node 8 M77904 node 9 M77904 node 19 M77904 node 22 M77904 node 25 M77904 node 26 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: WO 2005/116850 PCT/IB2005/002555 31 Protein Name M77904 P2 M77904 P4 M77904 P5 M77904 P7 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Z25299 PEA 2 TI Z25299_PEA_2_T2 Z25299_PEA_2_T3 Z25299_PEA 2 T6 Z25299_PEA 2 T9 5 a nucleic acid sequence comprising a sequence in the table below: ~Scegmt Nami Z25299_PEA 2 node 20 Z25299_PEA 2 node 21 Z25299_PEA_2 node 23 Z25299_PEA 2 node 24 Z25299_PEA_2_node_8 Z25299_PEA 2_node_ 12 Z25299 PEA 2 node 13 Z25299 PEA 2_node_14 Z25299_PEA_2_node_17 Z25299_PEA_2_node 18 Z25299_PEA 2_node 19 WO 2005/116850 PCT/IB2005/002555 32 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Prote:in Name, Z25299 PEA 2 P2 Z25299 PEA 2 P3 Z25299 PEA 2 P7 Z25299 PEA 2 Pl0 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: TrancipN1 me T39971 T1O T39971_T12 T39971 T16 T39971 T5 a nucleic acid sequence comprising a sequence in the table below: 10 Segment Name T39971 node 0 T39971_node 18 T39971 node 21 T39971_node_22 T39971 node 23 T39971 node 31 T39971 node 33 T39971_node 7 WO 2005/116850 PCT/IB2005/002555 33 T39971 node I T39971 node 10 T39971 node 11 T39971 node 12 T39971 node 15 T39971 node 16 T39971 node 17 T39971 node 26 T39971 node 27 T39971 node 28 T39971 node 29 T39971 node 3 T39971 node 30 T39971 node 34 T39971 node 35 T39971 node 36 T39971 node 4 T39971 node 5 T39971 node 8 T39971 node 9 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Namec T39971_P6 T39971 P9 T39971_Pll T39971P12 5 WO 2005/116850 PCT/IB2005/002555 34 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name Z44808 PEA 1 TI Z44808 PEA 1 T4 Z44808 PEA 1 T5 Z44808 PEA 1 T8 Z44808 PEA 1 T9 5 a nucleic acid sequence comprising a sequence in the table below: ~Segnent Name Z44808_PEA 1 node 0 Z44808 PEA 1 node 16 Z44808_PEA 1 node 2 Z44808_PEA 1 node 24 Z44808_PEA 1 node 32 Z44808_PEA 1 node 33 Z44808 PEA_1 node 36 Z44808 PEA 1 node 37 Z44808 PEA 1 node 41 Z44808 PEA 1 node 11 Z44808 PEA_1 node 13 Z44808 PEA 1 node 18 Z44808 PEA _1 node 22 Z44808 PEA_1 node 26 Z44808_PEA 1 node 30 Z44808_PEA 1 node 34 WO 2005/116850 PCT/IB2005/002555 35 Z44808_PEA I node 35 Z44808 PEA l node 39 Z44808 PEA_1 node 4 Z44808 PEA _1 node_6 Z44808 PEA 1 node 8 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Proteii Name Z44808 PEA 1 P5 Z44808 PEA 1 P6 Z44808 PEA 1 P7 Z44808 PEA 1 P11 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tnmsflcript Name1 S67314 PEA 1 T4 S67314 PEA 1 T5 S67314 PEA 1 T6 S67314 PEA 1 T7 a nucleic acid sequence comprising a sequence in the table below: Se gentNa S67314 PEA 1 node 0 S67314 PEA 1 node 11 S67314 PEA 1 node 13 S67314 PEA 1 node 15 WO 2005/116850 PCT/IB2005/002555 36 S67314 PEA I node 17 S67314 PEA 1 node 4 S67314 PEA 1 node_10 S67314 PEA 1 node 3 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Pr-oteinNan S67314 PEA_1_P4 S67314_PEA 1 lP5 S67314_PEA 1 P6 S67314_PEA 1 P7 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Namie Z39337_PEA_2 PEAlT3 Z39337 PEA 2 PEA_1 T6 Z39337_PEA 2 PEA_1 T12 a nucleic acid sequence comprising a sequence in the table below: SegmenitiName Z39337_PEA 2 PEA 1 node 2 Z39337 PEA 2 PEA 1 node 15 Z39337_PEA 2 PEA 1 node 16 Z39337_PEA 2 PEA 1_node 18 Z39337_PEA 2 PEA 1 node_21 Z39337_PEA 2 PEA 1_node_22 WO 2005/116850 PCT/IB2005/002555 37 Z39337_PEA 2 PEAInode_3 Z39337_PEA 2_PEA _1 node 5 Z39337 PEA 2 PEA 1 node 6 Z39337 PEA 2 PEAI node_10 Z39337 PEA 2 PEA I_ node_11 Z39337 PEA 2 PEA 1 node 14 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Namei( Z39337 PEA 2 PEA 1 P4 Z39337 PEA 2 PEA 1 P9 Z39337 PEA 2 PEA 1 P13 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Namne HUMPHOSLIP PEA_2_T6 HUMPHOSLIP PEA 2_T7 HUMPHOSLIP PEA 2_T14 HUMPHOSLIP PEA 2_T16 HUMPHOSLIP PEA 2_T17 HUMPHOSLIP PEA 2_T18 HUMPHOSLIP PEA 2_T19 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMPHOSLIP PEA_2_node 0 WO 2005/116850 PCT/IB2005/002555 38 HUMPHOSLIP PEA 2 node 19 HUMPHOSLIP PEA 2 node 34 HUMPHOSLIP PEA 2 node_68 HUMPHOSLIP PEA 2 node 70 HUMPHOSLIP PEA 2 node 75 HUMPHOSLIP PEA 2 node 2 HUMPHOSLIP PEA 2 node 3 HUMPHOSLIP PEA 2 node 4 HUMPHOSLIP PEA 2 node 6 HUMPHOSLIP PEA 2 node 7 HUMPHOSLIP PEA 2 node 8 HUMPHOSLIP PEA 2 node 9 HUMPHOSLIP PEA 2 node 14 HUMPHOSLIP PEA 2 node 15 HUMPHOSLIP PEA 2 node 16 HUMPHOSLIP PEA 2 node 17 HUMPHOSLIP PEA 2 node 23 HUMPHOSLIP PEA 2 node 24 HUMPHOSLIP PEA 2 node 25 HUMPHOSLIP PEA 2 node 26 HUMPHOSLIP PEA 2 node_29 HUMPHOSLIP PEA 2 node 30 HUMPHOSLIP PEA 2 node 33 HUMPHOSLIP PEA 2 node 36 HUMPHOSLIP PEA 2 node 37 HUMPHOSLIP PEA 2 node 39 HUMPHOSLIP PEA 2 node 40 HUMPHOSLIP PEA 2 node 41 HUMPHOSLIP PEA 2 node_42 HUMPHOSLIP PEA 2_node 44 WO 2005/116850 PCT/IB2005/002555 39 HUMPHOSLIP PEA_2 node 45 HUMPHOSLIP PEA_2_node 47 HUMPHOSLIP PEA 2 node 51 HUMPHOSLIP PEA_2_node_52 HUMPHOSLIP PEA_2 node_53 HUMPHOSLIP PEA_2_node_54 HUMPHOSLIP PEA_2_node_55 HUMPHOSLIP PEA_2_node_58 HUMPHOSLIPPEA_2 node 59 HUMPHOSLIP PEA 2_node_60 HUMPHOSLIP PEA_2_node 61 HUMPHOSLIP PEA_2 node_62 HUMPHOSLIP PEA_2_node_63 HUMPHOSLIPPEA_2 node_64 HUMPHOSLIP PEA 2 node_65 HUMPHOSLIP PEA 2 node_66 HUMPHOSLIP PEA 2 node_67 HUMPHOSLIP PEA 2 node_69 HUMPHOSLIP PEA 2 node_71 HUMPHOSLIP PEA.2 node_72 HUMPHOSLIP PEA 2 node_73 HUMPHOSLIP PEA 2 node_74 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name HUMPHOSLIPPEA_2 Pl0 HUMPHOSLIPPEA 2_P12 WO 2005/116850 PCT/IB2005/002555 40 HUMPHOSLIP PEA 2_P30 HUMPHOSLIPPEA_2_P31 HUMPHOSLIPPEA_2 P33 HUMPHOSLIPPEA 2 P34 HUMPHOSLIP PEA_2 P35 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Trncrpt Nine T59832T6 T59832_T8 T59832 TI1l T59832 T15 T59832 T22 a nucleic acid sequence comprising a sequence in the table below: 5 S5m98tNae 7 T59832_node_2 T59832_node_9 T59832_node_29 T59832_node 3 T59832_node 4 T59832node_5 T59832 node_6 T59832 node 8 T59832 node 9 T59832 node 10 WO 2005/116850 PCT/IB2005/002555 41 T59832 node 11 T59832 node 12 T59832 node 14 T59832 node 16 T59832_node 19 T59832 node 20 T59832 node 25 T59832 node 26 T59832 node 27 T59832 node 28 T59832 node 30 T59832 node 31 T59832 node 32 T59832 node 34 T59832 node 35 T59832_node 36 T59832 node 37 T59832_node 38 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name T59832_P5 T59832 P7 T59832 P9 T59832 P12 T59832 P18 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: WO 2005/116850 PCT/IB2005/002555 42 TJranscript Name i, HSCP2_PEA IT4 HSCP2 PEA 1 T13 HSCP2 PEA 1 TI9 HSCP2 PEA I T20 HSCP2 PEA 1 T22 HSCP2 PEA 1 T23 HSCP2_PEA_1 T25 HSCP2_PEA 1 T31 HSCP2 PEA _1 T33 HSCP2 PEA _1 T34 HSCP2 PEA 1 T45 HSCP2_PEA 1 T50 a nucleic acid sequence comprising a sequence in the table below: HSCP2_PEA_1 node 8 SClemet Name HSCP2_PEA 1_node 0 HSCP2_PEA _1node 13 HSCP2 PEA _1node 6 HSCP2_PEA 1 node 8 HSCP2 PEA 1 node_1026 HSCP2_PEA 1 -node_ 14 HSCP2_PEA_1 node_23 HSCP2 PEA 1 node 26 HSCP2 PEA 1 node 29 HSCP2 PEA 1 node 31 HSCP2_PEA 1 node 32 HSCP2_PEA _1 node 34 WO 2005/116850 PCT/IB2005/002555 43 HiSCP2 PEA 1 node 52 HSCP2 PEA 1 node 58 HSCP2_PEA 1 node 72 HSCP2_PEA 1 node 73 HSCP2_PEA 1 node 74 HSCP2_PEA 1 node 76 HSCP2 PEA 1 node 78 HSCP2 PEA 1 node 80 HSCP2 PEA_1 node 84 HSCP2 PEA 1 node 4 HSCP2 PEA 1 node 7 HSCP2 PEA 1 node 13 HSCP2 PEA1_ node 15 HSCP2 PEA I node 16 HSCP2 PEA 1 node 18 HSCP2 PEA 1 node 20 HSCP2 PEA 1 node 21 HSCP2 PEA 1 node 37 HSCP2 PEA 1 node 38 HSCP2_PEA 1 node 39 HSCP2 PEA _1 node 41 HSCP2 PEA_1 node 42 HSCP2 PEA 1 node 46 HSCP2 PEA 1 node 47 HSCP2_PEA 1 node 50 HSCP2 PEA 1 node 51 HSCP2 PEA_ lnode 55 HSCP2_PEA1_ node 56 HSCP2_PEA _1 node 60 HSCP2_PEA 1 node 61 WO 2005/116850 PCT/IB2005/002555 44 HSCP2 PEAInode 67 HSCP2 PEA_1 node_68 HSCP2 PEA__node 69 HSCP2 PEA_1 node 70 HSCP2_PEA l node 75 HSCP2 PEA 1 node_77 HSCP2 PEA 1 node_79 HSCP2 PEA 1 node_82 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Tra~n Ni HSCP2PEA 1_P4 HSCP2 PEA 1 lP8 HSCP2 PEA 1Pl4 HSCP2 PEA 1 P25 HSCP2PEA1P2 HSCP2 PEA 1 P16 HSCP2_PEA lP6 HSCP2_PEA lP22 HSCP2 PEA 1 P24 HSCP2_PEA_1 P25 HSCP2_PEA 1 P33 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Namec HUMTENPEA 1 T4 HUMTENPEA_1 T5 WO 2005/116850 PCT/IB2005/002555 45 HUMTEN PEA _ T6 HUMTEN PEA _ T7 HUMTEN PEA 1 TI I HUMTEN PEA 1 T14 HUMTEN PEA 1 T16 HUMTEN PEA I TI7 HUMTEN PEA_1 TI8 HUMTENPEAI T19 HUMTEN PEA 1 T20 HUMTEN PEA 1 T23 HUMTEN PEA 1 T32 HUMTEN PEA 1 T35 HUMTEN PEA 1_T36 HUMTENPEA 1 T37 HUMTEN PEA 1 T39 HUMTEN PEA 1 T40 HUMTENPEA 1 T41 a nucleic acid sequence comprising a sequence in the table below: HUMTENPEAlnode_0 HUMTEN PEA 1 node 2 HUMTENPEA 1 node5 HUMTENPEA 1 node_2 HUMTEN PEA 1 node6 HUMTEN PEA 1 node 11 HUMTENPEA _1 node 12 HUMTENPEA 1 node_16 WO 2005/116850 PCT/IB2005/002555 46 HUMTEN PEA 1 node_19 HUMTEN PEA _I node_23 HUMTEN PEA I node 27 HUMTEN PEA 1 node 28 HUMTEN PEA _ node 30 HUMTEN PEA 1 node 32 HUMTEN PEA 1 node 33 HUMTEN PEA 1 node_35 HUMTEN PEA I node 38 HUMTEN PEA 1 node 40 HUMTEN PEA_ node 42 HUMTEN PEA 1 node_43 HUMTEN PEA_ lnode 44 HUMTEN PEA l node 45 HUMTEN PEA 1 node 46 HUMTEN PEA_ lnode 47 HUMTENPEA 1_node 49 HUMTENPEA 1_node 51 HUMTEN PEA_ lnode 56 HUMTEN PEA _1node 65 HUMTEN PEA 1_node 71 HUMTEN PEA 1_node 73 HUMTEN PEA 1 node 76 HUMTEN PEA 1 node_79 HUMTENPEA _1 node 83 HUMTEN PEA 1 node 89 HUMTEN PEA 1_node 7 HUMTEN PEA 1node8 HUMTEN PEAI lnode 9 HUMTEN PEA 1 node 14 WO 2005/116850 PCT/IB2005/002555 47 HUMTEN PEA 1 node_17 HUMTEN PEA 1 node_21 HUMTENPEA 1 node_22 HUMTEN PEA I node 25 HUMTEN PEA 1 node_36 HUMTEN PEA 1 node 53 HUMTEN PEA 1 node 54 HUMTEN PEA 1 node 57 HUMTEN PEA 1 node 61 HUMTEN PEA 1 node 62 HUMTEN PEA 1 node 67 HUMTEN PEA 1 node 68 HUMTENPEA 1 node_69 HUMTEN PEA _ node 70 HUMTEN PEA 1 node 72 HUMTEN PEA 1 node_84 HUMTEN PEA 1 node_85 I-HUMTEN PEA 1 node_86 HUMTEN PEA 1 node_87 HUMTEN PEA 1 node_88 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name HUMTEN PEA 1 P5 HUMTEN PEA 1 P6 HUMTEN PEA 1 P7 HUMTEN PEA 1 P8 HUMTEN PEA 1 P10O WO 2005/116850 PCT/IB2005/002555 48 HUMTEN PEAIPll HIUMTEN PEA_1_PI3 HUMTEN PEA 1 P14 HUMTEN PEA 1 P15 HUMTEN PEA 1 P16 HUMTEN PEA 1 P17 HUMTEN PEA 1 P20 HUMTEN PEA 1 P26 HUMTEN PEA 1 P27 HUMTEN PEA 1 P28 HUMTEN PEA 1 P29 HUMTEN PEA 1 P30 HUMTEN PEA 1 P31 HUMTEN PEA 1 P32 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcri.pt Nmeii HUMOSTRO PEA 1_PEA 1 T14 HUMOSTRO PEA 1 PEA 1 T16 HUMOSTRO PEA 1_PEA 1 T30 5 a nucleic acid sequence comprising a sequence in the table below: Segment Namhe HUMOSTROPEA_1 PEA 1 node 0 HUMOSTRO PEA 1 PEA_1 node 10 HUMOSTRO PEA_1 PEA_1 node 16 HUMOSTRO PEA_1 PEA 1 node 23 WO 2005/116850 PCT/IB2005/002555 49 HUMOSTRO PEA I PEA 1 node 31 HUMOSTRO PEA 1 PEA 1 node 43 HUMOSTRO PEA 1 PEA 1 node 3 HUMOSTRO PEA1 -PEA I node 5 HUMOSTRO PEA 1 _PEA 1 node 7 HUMOSTRO PEA 1 PEA 1 node_8 HUMOSTRO PEA 1 PEA 1 node 15 HUMOSTRO PEA 1 PEA 1 node_17 HUMOSTRO PEA 1 PEA_1 node_20 HUMOSTRO PEA 1 PEA_1 node 21 HUMOSTRO PEA 1 PEA 1 node 22 HUMOSTRO PEA 1_PEA 1 node 24 HUMOSTRO PEA 1 PEA 1_node 26 HUMOSTRO PEA I PEA 1 node 27 HUMOSTRO PEA 1 PEA 1 node 28 HUMOSTRO PEA 1 PEA 1 node 29 HUMOSTRO PEA 1 PEA 1 node 30 HUMOSTRO PEA 1 PEA 1 node 32 HUMOSTRO PEA 1 PEA 1 node 34 HUMOSTRO PEA 1 PEA 1 node 36 HUMOSTRO PEA 1 PEA I node 37 HUMOSTRO PEA 1 PEA 1 node 38 HUMOSTRO PEA 1 PEA 1 node 39 HUMOSTRO PEA 1 PEA I node 40 HUMOSTROPEA 1 PEA 1 node 41 HUMOSTRO PEA 1 PEA 1 node 42 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: WO 2005/116850 PCT/IB2005/002555 50 Protein Name( HUMOSTROPEA l _PEA_1 P21 HUMOSTROPEA 1_PEA lP25 HUMOSTROPEA 1 PEA 1 P30 According to preferred embodiments of the present invention, there is provided an 5 isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Tr-ans"cript Nam le T46984 PEA 1 T2 T46984 PEA 1 T3 T46984 PEA 1 TI2 T46984 PEA 1 TI3 T46984 PEA 1 TI4 T46984 PEA 1 T15 T46984_PEA 1_TI9 T46984_PEA 1_T23 T46984_PEA 1 T27 T46984_PEA 1 T32 T46984_PEA 1 T34 T46984_PEA 1 T35 T46984_PEA 1 T40 T46984_PEA 1 T42 T46984_PEA_T43 T46984_PEA_T46 T46984_PEA 1 T47 T46984_PEA__1 T48 WO 2005/116850 PCT/IB2005/002555 51 T46984 PEA 1 T51 T46984 PEA 1 T52 T46984 PEA 1 T54 a nucleic acid sequence comprising a sequence in the table below: Segment Name T46984_PEA_1 node 2 T46984_PEA 1 node_4 T46984 PEA 1 node_6 T46984 PEA_1 node_ 12 T46984 PEA 1 node 14 T46984 PEA 1 node 25 T46984_PEA _1node 29 T46984_PEA 1 node_34 T46984_PEA 1_node 46 T46984_PEA_1 node_47 T46984_PEA_1 node 52 T46984_PEA 1 node 65 T46984_PEA_1 node_69 T46984 PEA 1 node_75 T46984_PEA 1 node_86 T46984_PEA_1 node_9 T46984_PEA 1 node_13 T46984_PEA 1 node 19 T46984_PEA 1 node_21 T46984_PEA 1 node 22 T46984_PEA 1_node 26 T46984_PEA 1 node 28 WO 2005/116850 PCT/IB2005/002555 52 T46984 PEAI node_31 T46984_PEA 1 node 32 T46984_PEA 1 node 38 T46984 PEA 1 node 39 T46984 PEA 1 node_40 T46984 PEA 1 node 42 T46984 PEA 1 node 43 T46984 PEA 1 node 48 T46984 PEA 1 node 49 T46984 PEA 1 node 50 T46984 PEA 1 node 51 T46984 PEA 1 node 53 T46984 PEA 1 node 54 T46984 PEA 1 node 55 T46984 PEA 1 node 57 T46984 PEA 1 node 60 T46984 PEA 1 node 62 T46984_PEA l node 66 T46984_PEA 1 node 67 T46984_PEA 1 node_70 T46984 PEA 1 node 71 T46984_PEA 1 node 72 T46984_PEA 1_node 73 T46984_PEA 1 node 74 T46984 PEA 1_node 83 T46984 PEA 1 node 84 T46984_PEA 1_node 85 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: WO 2005/116850 PCT/IB2005/002555 53 "rotein Name T46984_PEA_1 P2 T46984_PEA 1 P3 T46984 PEA 1_P20 T46984 PEA 1 P32 T46984 PEA 1 P32 T46984_PEA lP21 T46984_PEA I P27 T46984_PEA 1 P32 T46984 PEA lP34 T46984_PEA 1 P38 T46984_PEA 1_P39 T46984_PEA 1 P3945 T46984_PEA-lP45 T46984_PEA 1 P46 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Trci-i pt Namei M78530_PEA 1 T1 I M78530 PEA-1-T12 M78530PEA 1T13 5 a nucleic acid sequence comprising a sequence in the table below: Semennt Name M78530_PEA_1_node 0 M78530_PEA 1 node 15 M78530_PEA 1 node_16 M78530_PEA 1 node_19 WO 2005/116850 PCT/IB2005/002555 54 M78530 PEA 1 node_21 M78530 PEA I node_23 M78530 PEA 1 node 27 M78530 PEA 1 node 29 M78530 PEA 1 node 36 M78530 PEA 1 node 37 M78530 PEA 1 node 2 M78530 PEA 1 node 4 M78530 PEA 1 node 5 M78530 PEA 1 node_7 M78530 PEA 1 node 9 M78530_PEA 1 node 10 M78530_PEA 1_node 18 M78530_PEA I node 25 M78530 PEA 1 node 30 M78530_PEA lnode_33 M78530_PEA 1_node 34 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Namec M78530_PEA 1 P15 M78530_PEA 1 P16 M78530_PEA 1 P17 5 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: rpanscit Name WO 2005/116850 PCT/IB2005/002555 55 T48ll9_T2 a nucleic acid sequence comprising a sequence in the table below: Segment Name T48119 node 0 T48119 node 11 T48119 node_13 T48119 node 38 T48119 node 41 T48119 node 45 T48119 node 47 T48119 node 4 T48119 node 8 T48119 node 15 T48119 node 17 T48119 node 20 T48119 node 22 T48119 node 26 T48119 node 28 T48119 node 31 T48119 node 32 T48119 node 33 T48119 node 44 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name T48119_P2 5 WO 2005/116850 PCT/IB2005/002555 56 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Namie HSMUCIA PEA 1 TI2 HSMUCIA PEA_1 T26 HSMUCIA PEA_1 T28 HSMUCIA PEA_1 T29 HSMUCIA PEA 1 T30 HSMUCIA PEA 1 T31 HSMUCIA PEA 1 T33 HSMUCIA PEA 1 T34 HSMUC1A PEA 1 T35 HSMUC1A PEA 1 T36 HSMUC1A PEA 1 T40 HSMUC1A PEA 1 T42 HSMUCIA PEA 1 T43 HSMUCIA PEA 1 T47 5 a nucleic acid sequence comprising a sequence in the table below: See-ment Name HSMUCIA PEA 1 node_0 HSMUC1A PEA_1 node_14 HSMUCIA PEA 1 node_24 HSMUCIA PEA_1 node 2938 HSMUCIA PEA_ 1 node 3 HSMUC1A PEA1 node38 HSMUCIA PEA_1 node 3 WO 2005/116850 PCT/IB2005/002555 57 HISMUCIA PEA i node 4 HSMUCIA PEA 1 node_5 HSMUCIAPEA 1 node_6 HSMUCIA PEA 1 node_7 HSMUCIA PEA 1 node 17 HSMUCIA PEA 1 node 18 HSMUCIA PEA 1 node 20 HSMUCIA PEA 1 node 21 HSMUCIA PEA 1 node 23 HSMUC1A PEA 1 node 26 HSMUC1A PEA 1 node 27 HSMUC1A PEA 1 node 31 HSMUC1A PEA 1 node 34 HSMUCIA PEA 1 node 36 HSMUCIA PEA 1 node 37 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protem \ame HSMUC1A PEA 1 P25 HSMUC1A PEA 1 P29 HSMUC1A PEA 1 P30 HSMUC1A PEA 1 P32 HSMUCIA PEA 1 P36 HSMUC1A PEA 1 P39 HSMUC1A PEA 1 P45 HSMUCIA PEA 1 P49 HSMUCIA PEA 1 P52 WO 2005/116850 PCT/IB2005/002555 58 HISMUCIA PEA 1 P53 HSMUCIA PEA 1 P56 HSMUCIA PEA 1 P58 HSMUCIA PEA 1 P59 HISMUCIA PEA 1 P63 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMUCIA_PEA 1 P63, comprising a first amino acid sequence being at least 90 % homologous to 5 MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSV corresponding to amino acids 1 - 45 of MUClHUMAN, which also corresponds to amino acids 1 - 45 of HSMUC1APEA_1 P63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 10 EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK corresponding to amino acids 46 - 85 ofHSMUC IAPEA_1_P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMUCIA_PEA_1_P63, comprising a polypeptide 15 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUCIA PEA 1 P63. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAI_P2, comprising a first amino acid 20 sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA 25 IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ
PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN
WO 2005/116850 PCT/IB2005/002555 59 RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNV corresponding to amino acids 1 - 498 of RIB2_HUMAN, which also corresponds to amino acids 1 - 498 of T46984_PEAl_ P2, and a second amino acid 5 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCA corresponding to amino acids 499 - 501 ofT46984_PEA_1 P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for T46984_PEA_1 P3, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 15 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQ corresponding to amino acids 1 - 433 of RIB2 HUMAN, which 20 also corresponds to amino acids 1 - 433 of T46984_PEA_1 P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP corresponding to amino acids 434 - 444 of T46984_PEA_1 P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential 25 order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1 P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 30 ICHIWKLIFLP in T46984 PEA 1 P3.
WO 2005/116850 PCT/IB2005/002555 60 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA 1_P10, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL 5 GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN 10 RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNV corresponding to amino acids 1 - 498 of RIB2_HUMAN, which also corresponds to amino acids 1 - 498 of T46984_PEA _IP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 15 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LMDQK corresponding to amino acids 499 - 503 of T46984_PEA 1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of T46984_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LMDQK in T46984 PEA 1 P10. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for T46984_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 30 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA
IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ
WO 2005/116850 PCT/IB2005/002555 61 PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRL14NQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNVADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEK 5 RPPTVVSNTFTALILSPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWT QLNMFQTLKYLAILGSVTFLAGNRMLAQQAVKR corresponding to amino acids I - 628 of RIB2_HUMAN, which also corresponds to amino acids 1 - 628 of T46984 PEA I P1 1. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEA 1_P12, comprising a first amino acid 10 sequence being at least 90 % homologous to MAP PGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAF YSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQlYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA 15 IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN corresponding to amino acids 1 - 338 of RIB2 HUMAN, which also corresponds to amino acids 1 - 338 of T46984_PEA_1 P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% 20 homologous to a polypeptide having the sequence SQDLH corresponding to amino acids 339 343 of T46984_PEA 1 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1_P12, comprising a polypeptide being 25 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T46984 PEA 1 P12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_ _ P21, comprising a first amino acid 30 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M WO 2005/116850 PCT/IB2005/002555 62 corresponding to amino acids I - I of T46984_PEA _I P21, and a second amino acid sequence being at least 90 % homologous to KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSSVTQIYHAV AALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSIVEEI EDLVA 5 RLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFES LSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKL EHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVEL RVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQLVDVNT GAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDSASGTYTLYLII 10 GDATLKNPILWNVADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTVVSNTF TALILSPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWTQLNMFQTLKY LAILGSVTFLAGNRMLAQQAVKRTAH corresponding to amino acids 70 - 631 of RIB2_HUMAN, which also corresponds to amino acids 2 - 563 of T46984_PEA_1 P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a 15 sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA 1_P27, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL 20 GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN 25 RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFA corresponding to amino acids 1 - 415 of RIB2_HUMAN, which also corresponds to amino acids 1 - 415 of T46984_PEAl P27, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 30 FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP corresponding to amino WO 2005/116850 PCT/IB2005/002555 63 acids 416 - 459 ofT46984_PEA_1 _P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1_P27, comprising a polypeptide being 5 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984 PEA 1 P27. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for T46984_PEA_1_P32, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 15 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVE corresponding to amino acids 1 - 364 of RIB2_HUMAN, which also corresponds to amino acids 1 - 364 of T46984_PEA 1_P32, and a second amino acid sequence being at least 20 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT corresponding to amino acids 365 - 397 of T46984_PEA_1_P32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1_P32, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984 PEA 1 P32. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1_P34, comprising a first amino acid WO 2005/116850 PCT/IB2005/002555 64 sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 5 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVG corresponding to amino acids 1 - 329 of RIB2 HUMAN, which also corresponds to amino acids 1 - 329 of T46984 PEA_1 _P34. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for T46984_PEA_1_P35, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 15 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI corresponding to amino acids 1 - 287 of RIB2_HUMAN, which also corresponds to amino acids 1 - 287 of T46984_PEA_1_P35, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% 20 homologous to a polypeptide having the sequence GCWPSRQSREQHISSRRKMEILKTECQEKESRTIHSMRRKMEKKNFI corresponding to amino acids 288 - 334 ofT46984_PEA 1 P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated polypeptide encoding for a tail of T46984_PEA 1 P35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GCWPSRQSREQHISSRRKMEILKTECQEKESRTIHSMRRKMEKKNFI in T46984 PEA_1_P35. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA 1 P38, comprising a first amino acid WO 2005/116850 PCT/IB2005/002555 65 sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEAL corresponding to amino acids 1 - 145 of 5 RIB2_HUMAN, which also corresponds to amino acids 1 - 145 of T46984_PEA_ 1_P38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDPDWCQCLQLHFCS corresponding to amino acids 146 - 160 of T46984 PEA 1 P38, wherein said first amino acid sequence and second amino acid sequence 10 are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA 1_P38, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 MDPDWCQCLQLHFCS in T46984_PEA_1 P38. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA 1 P39, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL 20 GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLA corresponding to amino acids 1 160 of RIB2_HUMAN, which also corresponds to amino acids 1 - 160 of T46984_PEA_1 _P39. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1 P45, comprising a first amino acid 25 sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCE corresponding to amino acids 1 - 101 of RIB2_HUMAN, which also corresponds to amino acids 1 - 101 of T46984_PEA 1_P45, and a second amino acid sequence being at least 70%, optionally at least 30 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG corresponding to WO 2005/116850 PCT/IB2005/002555 66 amino acids 102 - 116 ofT46984_PEA 1 P45, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984 PEA_ 1 IP45, comprising a polypeptide being 5 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984 PEA_1 P45. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA 1 P46, comprising a first amino acid 10 sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAK corresponding to amino acids 1 - 69 of RIB2_HUMAN, which also corresponds to amino acids 1 - 69 of T46984_PEA_1 P46, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 15 preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG corresponding to amino acids 70 - 84 of T46984_PEA 1_P46, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of T46984_PEA 1_P46, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_ PEA 1 P46. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for M78530_PEA__P1P5, comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL 30 CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH
SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN
WO 2005/116850 PCT/IB2005/002555 67 VRAAPSAEFSVDRTRHLMSFLTMMGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPW DAGTDSGVTYESPNKPTIPQEKI RPLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQ CNIVPDNVDDIVADLAPEEKDEDDTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKA QLDLSVPCPDTQDFQPCMGPGCSDEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQ 5 FPEDGSVCTLPTEE corresponding to amino acids 1 - 544 of Q9HCB6, which also corresponds to amino acids 1 - 544 of M78530_PEA_1 PI5, a bridging amino acid T corresponding to amino acid 545 of M78530_PEA 1 P15, a second amino acid sequence being at least 90 % homologous to EKCTVNEECSPSSCLMTEWGEWDECSATCGMGMKKRHRMIKMNPADGSMCKAETSQ 10 AEKCMMPECHTIPCLLSPWSEWSDCSVTCGKGMRTRQRMLKSLAELGDCNEDLEQVE KCMLPEC corresponding to amino acids 546 - 665 of Q9HCB6, which also corresponds to amino acids 546 - 665 of M78530_PEA_1 P15, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 RKSWSSSRPITSMFLSPGSPEPASANTARS corresponding to amino acids 666 - 695 of M78530_PEA_1 P15, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEA_1 P15, comprising a polypeptide 20 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA 1_P15. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P15, comprising a first amino acid 25 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS corresponding to amino acids 1 - 83 of M78530_PEAl_P15, a second amino acid sequence being at least 90 % homologous to 30 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ
VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT
WO 2005/116850 PCT/IB2005/002555 68 AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNVRAAPSAEFSVDRTRHLMSFLTM MGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPWDAGTDSGVTYESPNKPTIPQEKIR PLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQCNIVPDNVDDIVADLAPEEKDED 5 DTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKAQLDLSVPCPDTQDFQPCMGPGCS DEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQFPEDGSVCTLPTEETEKCTVNEEC SPSSCLMTEWGEWDECSATCGMGMKKRHRMIKMNPADGSMCKAETSQAEKCMMPE CHTIPCLLSPWSEWSDCSVTCGKGMRTRQRMLKSLAELGDCNEDLEQVEKCMLPEC corresponding to amino acids 1 - 582 of 094862, which also corresponds to amino acids 84 10 665 of M78530_PEA_1 P15, and a third amino acid sequence being at least 70%, optiomlly at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS corresponding to amino acids 666 - 695 of M78530_PEA_1_P15, wherein said first amino acid sequence, second amino acid sequence and 15 third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of M78530_PEA_1 P 15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530 PEA 1 P15. An isolated polypeptide encoding for a tail of M78530 PEA 1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 25 RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530 PEA 1 P15. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530 PEA 1 P 16, comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT 30 EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE
ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL
WO 2005/116850 PCT/IB2005/002555 69 CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKM EEEIRQQSDEVLTVIKAKAQWPAWQPLN V corresponding to amino acids 1 - 297 of Q8NCD7, which also corresponds to amino acids 1 297 of M78530 PEA 1 P16. 5 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1 P16, comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE 10 ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN V corresponding to amino acids 1 - 297 of Q9HCB6, which also corresponds to amino acids 1 297 of M78530 PEA_1_P16. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT 20 EFSLRVEGDPDFYKPGTSYRVTLS corresponding to amino acids 1 - 83 of M78530_PEA_1_P16, and a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE 25 LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNV corresponding to amino acids 1 214 of 094862, which also corresponds to amino acids 84 - 297 of M78530_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 30 isolated polypeptide encoding for a head of M78530_PEA_1_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably WO 2005/116850 PCT/IB2005/002555 70 at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530 PEA I P16. According to preferred embodiments of the present invention, there is provided an 5 isolated chimeric polypeptide encoding for M78530_PEA 1P17, comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL 10 CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ corresponding to amino acids 1 275 of Q8NCD7, which also corresponds to amino acids 1 - 275 of M78530_PEA_1 _P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide 15 having the sequence VRQKNHRMTK corresponding to amino acids 276 - 285 of M78530_PEA 1_P 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEA 1 P17, comprising a polypeptide 20 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530 PEA_1_P17. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA 1 P17, comprising a first amino acid 25 sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH 30 SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ corresponding to amino acids 1 275 of Q9HCB6, which also corresponds to amino acids 1 - 275 of M78530_PEA_1_P17, and a WO 2005/116850 PCT/IB2005/002555 71 second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK corresponding to amino acids 276 - 285 of M78530_PEA 1 P17, wherein said first amino acid sequence and second amino acid sequence 5 are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEA 1_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 10 VRQKNHRMTK in M78530 PEA 1 P17. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS corresponding to amino acids 1 - 83 of M78530_PEA_1 P17, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT 20 AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQ corresponding to amino acids 1 - 192 of 094862, which also corresponds to amino acids 84 - 275 of M78530_PEAI P17, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK 25 corresponding to amino acids 276 - 285 of M78530_PEA_1_P17, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of M78530_PEA 1 P17, comprising a polypeptide 30 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the WO 2005/116850 PCT/IB2005/002555 72 sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEG YCSRILRA QGTRREGYTEFSLR VEGDPDFYKPGTSYR VTLS of M78530_PEA 1 P17. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a tail ofM78530_PEA 1 P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530 PEA 1 P17. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for T48119_P2, comprising a first amino acid sequence being at least 90 % homologous to MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERISGLGLTPE QKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARSIRARDPGARVLIVSEDP ELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGKERSIYFQPPSFYVSAQDLPHIENGGV 15 AVLTGKKVVQLDVRDNMVKLNDGS QITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTL FRKIGDFRSLEKISREVKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILP EYLSNWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHIVAAVGLEP NVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGRRRVEHHDHAV VSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVDSSLPTVGVFAKATAQD 20 NPKSATEQSGTGIRSESETESEASEITIPPSTPAVPQAPVQGEDYGKGVIFYLRDKVVVGI VLWNIFNRMPIARKIIKDGEQHEDLNEVAKLFNIHED corresponding to amino acids 50 613 of PCD8_HUMAN, which also corresponds to amino acids 1 - 564 ofT48119 P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T48119P2, comprising a first amino acid sequence 25 being at least 90 % homologous to MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERISGLGLTPE QKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARSIRARDPGARVLIVSEDP ELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGKERSIYFQPPSFYVSAQDLPHIENGGV AVLTGKKVVQLDVRDNMVKLNDGSQITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTL 30 FRKIGDFRSLEKISREVKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILP
EYLSNWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHIVAAVGLEP
WO 2005/116850 PCT/IB2005/002555 73 NVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGRRRVEHHDHAV VSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVDSSLPTVGVFAKATAQD NPKSATEQSGTGIRSESETESEASEITIPPSTPAVPQAPVQGEDYGKGVIFYLRDKVVVGI VLWNIFNRMPIARKIIKDGEQHEDLNEVAKLFNIHED corresponding to amino acids 50 5 613 of PCD8_HUMAN, which also corresponds to amino acids 1 - 564 of T48119 P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971_P6, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC 10 KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG corresponding to amino acids 1 - 276 of VTNCHUMAN, which also corresponds to amino acids 1 - 276 of 15 T39971_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TQGVVGD corresponding to amino acids 277 - 283 ofT39971 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 20 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971 _P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TQGVVGD in T39971 P6. 25 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971_P9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 30 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR
GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV
WO 2005/116850 PCT/IB2005/002555 74 LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT corresponding to amino acids I - 325 of VTNC_HUMAN, which also corresponds to amino acids 1 - 325 of T39971_P9, and a second amino acid sequence being at least 90 % homologous to 5 SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGA NNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGC PAPGHL corresponding to amino acids 357 - 478 of VTNC_HUMAN, which also corresponds to amino acids 326 - 447 ofT39971P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 15 acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 P 1, comprising a first amino acid sequence 20 being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV 25 LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1 - 326 of VTNC_HUMAN, which also corresponds to amino acids 1 - 326 ofT39971_P11, and a second amino acid sequence being at least 90 % homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids 442 30 - 478 of VTNC_HUMAN, which also corresponds to amino acids 327 - 363 of T39971_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 75 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion ofT3997 I_P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more 5 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T3997 I_P I, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 15 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1 - 326 of Q9BSH7, which also corresponds to amino acids 1 - 326 ofT39971_P11, and a second amino 20 acid sequence being at least 90 % homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids 442 - 478 of Q9BSH7, which also corresponds to amino acids 327 - 363 ofT39971_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for an edge portion of T39971 _P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a 30 sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
WO 2005/116850 PCT/IB2005/002555 76 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 PI2, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC 5 KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1 - 223 of VTNC_HUMAN, which also corresponds to amino acids 1 - 223 of T39971 P12, and a second amino acid sequence being at least 70%, optionally at least 80%, 10 preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV corresponding to amino acids 224 - 238 of T39971 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 15 isolated polypeptide encoding for a tail of T39971_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971 P12. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for T39971_P12, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR 25 GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1 - 223 of Q9BSH7, which also corresponds to amino acids 1 - 223 ofT39971_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV corresponding to amino acids 224 30 238 of T39971_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 77 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971 P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 VPGAVGQGRKHLGRV in T39971_P12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808_PEA_1 P5, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR 10 TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE 15 RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1 - 441 of SMO2_HUMAN, which also corresponds to amino acids 1 - 441 of Z44808_PEA 1_ IP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a 20 polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR corresponding to amino acids 442 - 464 ofZ44808_PEA 1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z44808_PEA_1 P5, comprising a polypeptide being 25 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEA 1_ PS. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808_PEA 1 P6, comprising a first amino acid 30 sequence being at least 90 % homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR
WO 2005/116850 PCT/IB2005/002555 78 TFLSRCEFQRAKCKDPQLE1AYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA 5 KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRH corresponding to amino acids 1 - 428 of SMO2 HUMAN, which also corresponds to amino acids 1 - 428 of Z44808_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 10 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RSKRNL corresponding to amino acids 429 - 434 ofZ44808_PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z44808 PEA 1 P6, comprising a polypeptide being 15 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RSKRNL in Z44808 PEA 1 P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808_PEA_1_P7, comprising a first amino acid 20 sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN 25 DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1 - 441 of SMO2_HUMAN, which also corresponds to amino acids 1 - 441 of Z44808 PEA 1 P7, and 30 a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a WO 2005/116850 PCT/IB2005/002555 79 polypeptide having the sequence LLWLRGKVSFYCF corresponding to amino acids 442 - 454 ofZ44808_PEA 1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a tail of Z44808_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLWLRGKVSFYCF in Z44808 PEA 1 P7. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for Z44808_PEA 1_P11 , comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT 15 corresponding to amino acids 1 - 170 of SMO2_HUMAN, which also corresponds to amino acids 1 - 170 of Z44808_PEA 1 P 11, and a second amino acid sequence being at least 90 % homologous to DIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGL YKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQ 20 GCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLD KNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKE DGKADTKKRHTPRGHAESTSNRQPRKQG corresponding to amino acids 188 - 446 of SMO2_HUMAN, which also corresponds to amino acids 171 - 429 of Z44808 PEA 1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of Z44808_ PEA_1 P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least 30 about 50 amino acids in length, wherein at least two amino acids comprise TD, having a WO 2005/116850 PCT/IB2005/002555 80 structure as follows: a sequence starting from any of amino acid numbers 170-x to -170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA 1 P4, comprising a first amino acid 5 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of FABH_HUMAN, which also corresponds to amino I10 acids 1 - 116 of S67314_PEA 1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL corresponding to amino acids 15 117 - 215 of S67314_PEA_1 _P4, wherein said firstand second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314_PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 20 least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEA_1_P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1 P4, comprising a first amino acid 25 sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 116 of S67314_PEA_1 P4, and a second amino acid sequence being at least 70%, optionally at 30 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WO 2005/116850 PCT/IB2005/002555 81 VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL corresponding to amino acids 117 - 215 of S67314_PEA_1 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314_PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL 10 TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314_PEA 1 P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA 1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of FABH_HUMAN, which also corresponds to amino acids 1 - 116 of S67314_PEA_1 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 20 preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV corresponding to amino acids 117 - 178 of S67314_PEA_1 P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated polypeptide encoding for a tail of S67314_PEA_1 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314 PEA 1 P5. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1_P5, comprising a first amino acid WO 2005/116850 PCT/IB2005/002555 82 sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids I 5 116 of S67314_PEA 1 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV corresponding to amino acids 117 - 178 of S67314_PEA_1 P5, wherein said first and 10 second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail ofS67314_PEA__P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314 PEA 1 P5. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA 1 P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 20 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of FABH_HUMAN, which also corresponds to amino acids 1 - 116 ofS67314_PEA_1 P6, and a second amino acid sequence being at least 70%, 25 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK corresponding to amino acids 117 - 126 of S67314_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 30 isolated polypeptide encoding for a tail of S67314_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at WO 2005/116850 PCT/IB2005/002555 83 least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEAlP6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA__1 P6, comprising a first amino acid 5 sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 116 of S67314 PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at 10 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK corresponding to amino acids 117 - 126 of S67314_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 15 isolated polypeptide encoding for a tail of S67314_PEA_ 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEA__lP6. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for S67314_PEA_1 P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL corresponding to amino acids 1 - 24 of FABH_HUMAN, which also corresponds to amino acids 1 - 24 of S67314_PEA_1 P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 25 95% homologous to a polypeptide having the sequence AHILITFPLPS corresponding to amino acids 25 - 35 of S67314_PEA_1 P7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA corresponding 30 to amino acids 25 - 133 of FABHHUMAN, which also corresponds to amino acids 36 - 144 of WO 2005/116850 PCT/IB2005/002555 84 S67314_PEA_ 1 P7, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of S67314_PEA 1 P7, comprising an amino 5 acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, corresponding to S67314_PEA 1_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1_P7, comprising a first amino acid 10 sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL corresponding to amino acids 1 - 24 of AAP35373, which also corresponds to amino acids 1 24 of S67314_PEA_1 P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS corresponding to amino acids 15 25 - 35 of S67314_PEA_1 P7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA corresponding to amino acids 25 - 133 of AAP35373, which also corresponds to amino acids 36 - 144 of 20 S67314_PEA_1 P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of S67314_PEA_1 P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, 25 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, corresponding to S67314_PEA_1 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA_1 P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 30 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWLPLSGAA corresponding to amino acids 1 - 9 of WO 2005/116850 PCT/IB2005/002555 85 Z39337_PEA 2_PEA_1 lP4, and a second amino acid sequence being at least 90 % homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP 5 AKLSELIQPLPLERDCSANTTSCHILGWGKTADGDFPDTIQCAYIHLVSREECEHAYPGQ ITQNMLCAGDEKYGKDSCQGDSGGPLVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCR YTNWIQKTIQAK corresponding to amino acids 1 - 244 of KLK6_HUMAN, which also corresponds to amino acids 10 - 253 ofZ39337_PEA_2 PEA_1 P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of Z39337_PEA_2 PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWLPLSGAA of Z39337_PEA_2_PEA_1_P4. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA_1 P9, comprising a first amino acid sequence being at least 90 % homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP 20 AKLSELIQPLPLERDCSANTTSCHILGWGKTADG corresponding to amino acids 1 - 149 of KLK6_HUMAN, which also corresponds to amino acids 1 - 149 of Z39337_PEA_2_PEA 1 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence Q corresponding to amino acids 150 25 150 ofZ39337_PEA_2_PEA_1 P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P10, comprising a first amino acid sequence being at least 90 % homologous to 30 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE corresponding to amino acids 1 - 67 of PLTP_HUMAN, which also corresponds to WO 2005/116850 PCT/IB2005/002555 86 amino acids I - 67 of HUMPHOSLIP PEA_2_PlI0, and a second amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMK DPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMES 5 YFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKP SGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFR1YSN HSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVTNHAGFLTI GADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV corresponding to amino acids 163 493 of PLTP_HUMAN, which also corresponds to amino acids 68 - 398 of 10 HUMPHOSLIP_PEA_2_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMPHOSLIPPEA_2_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 15 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2. 20 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA 2 P12, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS 25 AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRG AFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDK VPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVP PDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLK 30 TMLQIGVMPMLN corresponding to amino acids 1 - 427 of PLTP_HUMAN, which also corresponds to amino acids 1 - 427 of HUMPHOSLIPPEA 2 P12, and a second amino acid WO 2005/116850 PCT/IB2005/002555 87 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKAGV corresponding to amino acids 428 - 432 of HUMPHOSLIP_PEA 2 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential 5 order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIPPEA 2 P 2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 10 sequence GKAGV in HUMPHOSLIP PEA 2 P12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P31, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH 15 FYYNISE corresponding to amino acids 1 - 67 of PLTP HUMAN, which also corresponds to amino acids 1 - 67 of HUMPHOSLIPPEA 2 P31, and a second amino acid sequence being at . least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG corresponding to amino acids 68 - 98 of 20 HUMPHOSLIP_PEA_2 P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIPPEA_2_P31, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 25 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG in HUMPHOSLIP PEA 2 P31. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2 P33, comprising a first amino acid sequence being at least 90 % homologous to 30 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH
FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS
WO 2005/116850 PCT/IB2005/002555 88 AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ corresponding to amino acids 1 - 183 of PLTP_HUMAN, which also corresponds to amino acids 1 - 183 of HUMPHOSLIP_PEA_2 P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and 5 most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL corresponding to amino acids 184 - 200 of HUMPHOSLIP PEA_2 P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 10 isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2 P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP_PEA_2 P33. According to preferred embodiments of the present invention, there is provided an 15 isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA 2 P34, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF 20 LLNQQICPVLYHAGTVLLNSLLDTVPV corresponding to amino acids 1 - 205 of PLTP_HUMAN, which also corresponds to amino acids 1 - 205 of HUMPHOSLIP_PEA_2 P34, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LWTSLLALTIPS corresponding to 25 amino acids 206 - 217 ofHUMPHOSLIP_PEA_2 P34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA 2 P34, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 30 more preferably at least about 90% and most preferably at least about 95% homolopus to the sequence LWTSLLALTIPS in HUMPHOSLIP_PEA_2_P34.
WO 2005/116850 PCT/IB2005/002555 89 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2 P35, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH 5 FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF corresponding to amino acids 1 - 109 of PLTP_HUMAN, which also corresponds to amino acids 1 - 109 of HUMPHOSLIP_PEA_2 P35, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQ corresponding to amino acids 163 - 183 of PLTPHUMAN, 10 which also corresponds to amino acids 111 - 131 of HUMPHOSLIPPEA 2 P35, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL corresponding to amino acids 132 - 148 of HUMPHOSLIP_PEA_2 P35, wherein said first amino acid sequence, second amino acid 15 sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMPHOSLIPPEA_2 P35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, 20 optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a structure as follows (numbering according to HUMPHOSLIP_PEA_2_P35): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + 25 ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIPPEA_2 P35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 30 sequence VWAATGRRVARVGMLSL in HUMPHOSLIP PEA_2 P35.
WO 2005/116850 PCT/IB2005/002555 90 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 5 PLVNVTLYYEALCGGCRAFLI RELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 12 - 223 of GILT_HUMAN, which also corresponds to amino acids 1 - 212 of T59832 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 10 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 213 - 238 of T59832_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 15 isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM 25 ECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 12 - 214 of GILT_HUMAN, which also corresponds to amino acids 1 - 203 of T59832_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to 30 amino acids 204 - 244 of T59832_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 91 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 10 PLVNVTLYYEALCGGCRAFLIRELFPTWLLVME1LNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE corresponding to amino acids 12 - 141 of GILT_HUMAN, which also corresponds to amino acids 1 - 130 of T59832_P12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED 15 QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 131 - 219 of T59832 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a 25 sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P18, comprising a first amino acid sequence being at least 90 % homologous to 30 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 12 - 55 of GILTHUMAN, which also corresponds to amino acids 1 - 44 of T59832_P18, WO 2005/116850 PCT/IB2005/002555 92 and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 45 - 133 of T59832 P 18, wherein said 5 first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P 18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally 10 at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. 15. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEA_1 P4, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS 20 HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG 25 TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH 30 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ
NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK
WO 2005/116850 PCT/IB2005/002555 93 VKIIFKNMATRPYSIHAI-IGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT 5 YTVLQNE corresponding to amino acids 1 - 1060 of CERU_HUMAN, which also corresponds to amino acids 1 - 1060 of HSCP2_PEA_1 P4, and a second amino acid sequence being-at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGTSM corresponding to amino acids 1061 - 1065 ofHSCP2_PEA 1_P4, wherein said first amino acid 10 sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2_PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGTSM in 15 HSCP2 PEA 1 P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA_1 P8, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD 20 RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF 25 QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL 30 FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH
YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ
WO 2005/116850 PCT/IB2005/002555 94 NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT 5 VHFHGHSFQYK corresponding to amino acids 1 - 1006 ofCERU_HUMAN, which also corresponds to amino acids 1 - 1006 of HSCP2_PEA__1 P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KCFQEHLEFGYSTAM corresponding to amino acids 1007 - 1021 of HSCP2 PEA_1 _P8, 10 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2_PEA 1 P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 15 least about 90% and most preferably at least about 95% homologous to the sequence KCFQEHLEFGYSTAM in HSCP2_PEA_1_P8. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA 1 PI4, comprising a first amino acid sequence being at least 90 % homologous to 20 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH 25 AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES 30 LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH corresponding to amino acids 1 - 621 of CERUHUMAN, which also corresponds to amino acids 1 - 621 of HSCP2_PEA_1 P14, a WO 2005/116850 PCT/IB2005/002555 95 second amino acid sequence bridging amino acid sequence comprising of W, and a third amino acid sequence being at least 90 % homologous to TFNVECLTTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGIL 5 GPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGA GTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENE SWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYL MGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHV TDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 694 - 1065 of 10 CERU_HUMAN, which also corresponds to amino acids 623 - 994 of HSCP2_PEA_1 _P14, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCP2_PEA_1 P14, comprising a 15 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HWT having a structure as follows (numbering according to HSCP2_PEA_1 _P14): a sequence starting from any of amino acid 20 numbers 621-x to 621; and ending at any of amino acid numbers 623 + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA_1_ P15, comprising a first amino acid sequence being at least 90 % homologous to 25 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH 30 AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF
QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG
WO 2005/116850 PCT/IB2005/002555 96 TTRIGGSYKKLVYREYTDASFTNRKERGPEEEH-ILGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL 5 FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS 10 DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE corresponding to amino acids 1 - 1060 of CERUHUMAN, which also corresponds to amino acids I - 1060 of HSCP2_PEA_1 P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and 15 most preferably at least 95% homologous to a polypeptide having the sequence GEYPASSETHRRIWNVIYPITVSVIILFQISTKE corresponding to amino acids 1061 - 1094 of HSCP2_PEA 1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of HSCP2_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEYPASSETHRRIWNVIYPITVSVIILFQISTKE in HSCP2_PEA_1_P15. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for HSCP2_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 30 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY
CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH
WO 2005/116850 PCT/IB2005/002555 97 AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD 5 PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNI RMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ corresponding to amino acids 1 - 761 of CERU HUMAN, which also corresponds to amino 10 acids 1 - 761 of HSCP2_PEA_1 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence K corresponding to amino acids 762 - 762 of HSCP2_PEAlP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA 1 P 16, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS 20 HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG 25 TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH 30 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ
NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK
WO 2005/116850 PCT/IB2005/002555 98 VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKH corresponding to amino acids 1 - 1007 of CERU_HUMAN, which also 5 corresponds to amino acids 1 - 1007 of HSCP2_PEA 1 P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLRLTGEYGM corresponding to amino acids 1008 - 1017 of HSCP2 PEA 1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a 10 sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2_PEA_1 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 LLRLTGEYGM in HSCP2 PEA_1_P16. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEA _P6, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD 20 RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF 25 QVQECNKSSSKDNIRGKHVRHYYIAAEEI1WNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL 30 FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH
YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ
WO 2005/116850 PCT/IB2005/002555 99 NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT 5 VHFHGHSFQYK corresponding to amino acids 1 - 1006 of CERU HUMAN, which also corresponds to amino acids I - 1006 of HSCP2_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSL corresponding to amino acids 1007 - 1009 of HSCP2_PEA 1 P6, wherein said first amino acid 10 sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA 1_P22, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD 15 RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHE corresponding to amino acids 1 - 131 of CERUHUMAN, which also corresponds to amino acids 1 - 131 of HSCP2_PEA 1 P22, a second amino acid sequence bridging amino acid sequence comprising of A, and a third amino acid sequence being at least 90 % homologous to 20 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI 25 GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVV YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE 30 SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR
PYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI
WO 2005/116850 PCT/IB2005/002555 100 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 262 - 1065 of CERU HUMAN, which also corresponds to amino acids 133 936 of HSCP2_PEA 1 P22, wherein said first amino acid sequence, second amino acid 5 sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCP2_PEA I _P22, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more 10 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EAV having a structure as follows (numbering according to HSCP2_PEA 1 P22): a sequence starting from any of amino acid numbers 131-x to 131; and ending at any of amino acid numbers 133 + ((n-2) - x), in which x varies from 0 to n-2. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA 1 _P24, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MPLTMGKRNLFLLTP corresponding to amino acids 1 - 15 of HSCP2_PEA 1_P24, and a 20 second amino acid sequence being at least 90 % homologous to VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNY 25 NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVV 30 YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE
SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR
WO 2005/116850 PCT/IB2005/002555 101 PYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKM HAI NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 262 - 1065 of CERU_HUMAN, which also corresponds to amino acids 16 - 819 5 of HSCP2_PEA 1 P24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSCP2_PEA 1 P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 10 at least about 90% and most preferably at least about 95% homologous to the sequence MPLTMGKRNLFLLTP of HSCP2 PEA_1_P24. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA_1 P25, comprising a first amino acid sequence being at least 90 % homologous to 15 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH 20 AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES 25 LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH corresponding to amino acids 1 - 621 of CERU_HUMAN, which also corresponds to amino acids 1 - 621 of HSCP2_PEA 1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CKYCIIHQSTKLF corresponding to amino acids 622 - 634 of 30 HSCP2_PEA 1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 102 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2_PEA_1 P25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 CKYCIIHQSTKLF in HSCP2_PEA _1P25. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA_1 P33, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD 10 RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKK corresponding to amino acids 1 - 202 of CERUHUMAN, which also corresponds to amino acids I - 202 of HSCP2_PEA 1 P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 15 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC corresponding to amino acids 203 - 232 of HSCP2_PEA_1_P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of HSCP2_PEA_1_P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC in HSCP2_PEA_1_P33. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC 30 EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG
VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG
WO 2005/116850 PCT/IB2005/002555 103 RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 5 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQ1EVKDVT 10 DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE 15 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTIT1RGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 20 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA T corresponding to amino acids 1 - 1525 of TENAHUMANV1, which also corresponds to amino acids 1 - 1525 of HUMTEN_PEA 1 P5, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 25 preferably at least 95% homologous to a polypeptide having the sequence TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVS VAGTTLAGDPTRPLTAFVI corresponding to amino acids 1526 - 1617 of HUMTENPEAlP5, and a third amino acid sequence being at least 90 % homologous to TEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLE 30 LRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADE
GVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAI
WO 2005/116850 PCT/IB2005/002555 104 ATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTR LVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATV DSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDL DSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEV1VGPDTTSYSLADLS 5 PSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDK AQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLN KITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHN GRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKG HEHSIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1526 - 2201 of 10 TENAHUMAN_V1, which also corresponds to amino acids 1618 - 2293 of HUMTEN_PEA_1 P5, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMTENPEA_1 P5, comprising an 15 amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI, corresponding to HUMTEN_PEA_1_P5. 20 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTENPEA 1_ P6, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP 25 DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC 30 PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC
SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG
WO 2005/116850 PCT/IB2005/002555 105 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG 5 EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL 10 TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV 15 TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTE corresponding to amino acids 1 - 1527 of TENA_HUMAN VI, which also corresponds to amino acids 1 - 1527 of HUMTEN PEA_1 P6, and a second amino acid sequence being at least 20 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAHIT GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN corresponding to amino acids 1528 - 1658 of HUMTEN_PEA_1_P6, wherein 25 said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 30 at least about 90% and most preferably at least about 95% homologous to the sequence
PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAHIT
WO 2005/116850 PCT/IB2005/002555 106 GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN in HUMTEN PEA I P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEAlP7, comprising a first amino 5 acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 10 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 15 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT 20 DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE 25 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 30 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA
TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA
WO 2005/116850 PCT/IB2005/002555 107 TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVT corresponding to amino acids 1 - 1617 of TENA_HUMAN_VI, which also corresponds to amino acids 1 - 1617 of HUMTEN_PEAIP7, and a second amino acid sequence being at least 70%, optionally at least 5 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT corresponding to amino acids 1618 - 1673 of HUMTEN_PEA_1 P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_l1P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT in 15 HUMTEN PEA 1 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEAlP8, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 20 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 25 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE 30 ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI
LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG
WO 2005/116850 PCT/IB2005/002555 108 EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPAT1NAATELDTPKDLQVSE 5 TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD 10 QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAH I SGLPPSTDFIVYLSGLAPSIRTKTISATA T corresponding to amino acids 1 - 1525 of TENA_HUMAN_V I, which also corresponds to 15 amino acids 1 - 1525 of HUMTEN_PEA_1 P8, and a second amino acid sequence being at least 90 % homologous to TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTKKQSEPLEITLLAPERTRD LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA 20 LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV 25 GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HRVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1617 - 2201 of TENA_HUMAN_VI, which also corresponds to amino acids 1526 - 2110 ofHUMTEN_PEA_1 P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTENPEA_1_P8, WO 2005/116850 PCT/IB2005/002555 109 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TT, having a 5 structure as follows: a sequence starting from any of amino acid numbers 1525-x to 1525; and ending at any of amino acid numbers 1526+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTENPEA 1_Pl0, comprising a first amino acid sequence being at least 90 % homologous to 10 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG 15 RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ 20 CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR 25 GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG 30 EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVL corresponding to amino acids 1 - 1252 of WO 2005/116850 PCT/IB2005/002555 110 TENAHUMAN_VI, which also corresponds to amino acids I - 1252 of HUMTEN_PEA1__P10O, and a second amino acid sequence being at least 90 % homologous to TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA 5 TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT 10 RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNL 15 NKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYH NGRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWK GHEHSIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1344 - 2201 of TENAHUMAN_V1, which also corresponds to amino acids 1253 - 2110 of HUMTEN_PEA_1 P10, wherein said first amino acid sequence and second amino acid 20 sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTENPEA_1 _P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino 25 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LT, having a structure as follows: a sequence starting from any of amino acid numbers 1252-x to 1252; and ending at any of amino acid numbers 1253+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 111 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 5 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 10 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT 15 DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE 20 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV corresponding to amino acids 1 - 1343 ofTENA_HUMANV1, which also corresponds to 25 amino acids 1 - 1343 of HUMTEN_PEA 1 P13, and a second amino acid sequence being at least 90 % homologous to TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV KLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDS 30 PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS
THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ
WO 2005/116850 PCT/IB2005/002555 112 ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT AQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRS FSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH SIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1708 - 2201 of 5 TENA_HUMANVI, which also corresponds to amino acids 1344 - 1837 of HUMTEN_PEA_1 P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_I_Pl 3, 10 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VT, having a structure as follows: a sequence starting from any of amino acid numbers 1343-x to 1343; and 15 ending at any of amino acid numbers 1344+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTENPEA_1 P14, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 20 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 25 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE 30 ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI
LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG
WO 2005/116850 PCT/IB2005/002555 113 EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE 5 TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD 10 QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL 15 ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD 20 LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIV corresponding to amino acids 1 - 2025 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 2025 of HUMTEN_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at 25 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF corresponding to amino acids 2026 - 2091 of HUMTENPEA_1 P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a 30 sequential order.
WO 2005/116850 PCT/IB2005/002555 114 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA 1_P 14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF in HUMTEN PEA_1_Pl4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA 1 _P 5, comprising a first amino acid sequence being at least 90 % homologous to 10 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG 15 RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ 20 CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR 25 GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAS corresponding to amino acids 1 - 1070 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 1070 of HUMTEN_PEA_1_P15, and a second amino 30 acid sequence being at least 90 % homologous to
TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTKKQSEPLEITLLAPERTRD
WO 2005/116850 PCT/IB2005/002555 115 LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL 5 LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HRVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA 10 corresponding to amino acids 1617 - 2201 of TENA_HUMAN_VI, which also corresponds to amino acids 1071 - 1655 of HUMTENPEA_1 P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTENPEA 1 P15, 15 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and 20 ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 25 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 30 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC
PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC
WO 2005/116850 PCT/IB2005/002555 116 SDMSCPNDCHIQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI 5 LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE 10 TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAS corresponding to amino acids 1 - 1070 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 1070 of HUMTEN_PEA_1 P16, and a second amino acid sequence being at least 90 % homologous to TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV 15 KLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDS PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT 20 AQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRS FSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH SIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1708 - 2201 of TENA_HUMAN_V 1, which also corresponds to amino acids 1071 - 1564 of HUMTEN_PEA 1_ P16, wherein said first amino acid sequence and second amino acid 25 sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino 30 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a WO 2005/116850 PCT/IB2005/002555 117 structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTENPEA 1 P17, comprising a first amino 5 acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 10 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 15 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT 20 DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE 25 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 30 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA
TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA
WO 2005/116850 PCT/IB2005/002555 118 TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT 5 RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIV corresponding to amino acids 1 - 2025 of 10 TENA HUMAN_V1, which also corresponds to amino acids 1 - 2025 of HUMTENPEA_1 P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST corresponding to amino acids 15 2026 - 2067 ofHUMTENPEA_1 P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTENPEA_1_P 17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 20 at least about 90% and most preferably at least about 95% homologous to the sequence TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST in HUMTEN_PEA_1_P17. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P20, comprising a first amino 25 acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 30 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG
RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH
WO 2005/116850 PCT/IB2005/002555 119 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ 5 CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYE1SLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR 10 GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG 15 EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA 20 TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT 25 RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLG 30 corresponding to amino acids 1 - 2057 of TENAHUMAN_V1, which also corresponds to amino acids 1 - 2057 of HUMTEN_PEA_1 P20, and a second amino acid sequence being at WO 2005/116850 PCT/IB2005/002555 120 least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NAALHVYI corresponding to amino acids 2058 - 2065 of HUMTENPEA1 P20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1 P20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NAALHVYI in HUMTEN PEA 1 P20. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P26, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP 15 DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC 20 PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI 25 LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE 30 TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL
TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE
WO 2005/116850 PCT/IB2005/002555 121 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV 5 TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIlDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTI S DINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD 10 EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATT corresponding to amino acids 1 - 1708 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 1708 of HUMTEN_PEA__1 P26, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 GTVNKQERTEKSHDSGVFFSQG corresponding to amino acids 1709 - 1730 of HUMTEN_PEA_1 P26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1 P26, comprising a polypeptide 20 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTVNKQERTEKSHDSGVFFSQG in HUMTEN PEA_1_P26. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA 1_P27, comprising a first amino 25 acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 30 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG
RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH
WO 2005/116850 PCT/IB2005/002555 122 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ 5 CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR 10 GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG 15 EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV T corresponding to amino acids 1 - 1344 of TENAHUMAN_V1, which also corresponds to amino acids 1 - 1344 of HUMTEN_PEA_1 P27, and a second amino acid sequence being at 20 least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GI corresponding to amino acids 1345 - 1346 of HUMTEN_PEA_ 1 P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for HUMTEN_PEAlP28, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC 30 EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG
VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG
WO 2005/116850 PCT/IB2005/002555 123 RCVENECVCDEGFTGEDCSEL1CPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 5 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT 10 DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE 15 ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLT corresponding to amino acids 1 - 1253 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 1253 of HUMTEN_PEA_1_P28, and a second amino acid sequence being at least 70%, optionally at 20 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ corresponding to amino acids 1254 - 1292 ofHUMTENPEA 1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA 1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ in HUMTEN_PEA_1_P28. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTENPEA_1_P29, comprising a first amino WO 2005/116850 PCT/IB2005/002555 124 acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQ1VFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC 5 EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC 10 SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG 15 EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL 20 TAEKGRHKSKPARVKAST corresponding to amino acids 1 - 1071 of TENAHUMANVI, which also corresponds to amino acids 1 - 1071 of HUMTEN_PEA_1_P29, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GESALSFLQTLG corresponding to amino acids 1072 - 1083 of 25 HUMTEN_PEA 1_P29, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA 1_P29, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 30 at least about 90% and most preferably at least about 95% homologous to the sequence GESALSFLQTLG in HUMTEN PEA 1 P29.
WO 2005/116850 PCT/IB2005/002555 125 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA 1 P30, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 5 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 10 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE 15 ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA 20 EVDVPKSQQATTKTTLTG corresponding to amino acids 1 - 954 of TENA_HUMANVI, which also corresponds to amino acids 1 - 954 of HUMTENPEA 1 P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELCISASLSQPALEGP corresponding to amino acids 955 - 970 of 25 HUMTEN_PEA_1_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1 P30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 30 at least about 90% and most preferably at least about 95% homologous to the sequence ELCISASLSQPALEGP in HUMTENPEA_1_P30.
WO 2005/116850 PCT/IB2005/002555 126 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HIUMTENPEA l P31, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 5 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 10 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE 15 ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTR corresponding to amino acids 1 - 802 of TENA_HUMANVi , which also corresponds to amino acids 1 - 802 of HUMTENPEA 1 P31, and a second amino acid sequence being at least 70%, optionally at 20 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EYHL corresponding to amino acids 803 - 806 of HUMTEN_PEA_1_P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated polypeptide encoding for a tail of HUMTEN_PEAlP31, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EYHL in HUMTEN PEA_1_P31. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P32, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 127 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG 5 VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG 10 FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVAT corresponding to amino acids 1 - 710 of TENA HUMAN_V 1, which also corresponds to amino acids 1 - 710 of HUMTEN_PEA 1 P32, and a second amino acid 15 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CE corresponding to amino acids 711 - 712 ofHUMTEN_PEA 1 P32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for HUMOSTRO_PEA_1 PEAlP21, comprising a first amino acid sequence being at least 90 % homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ corresponding to amino acids 1 - 58 of OSTP_HUMAN, which also corresponds to amino acids 1 - 58 of HUMOSTRO_PEA_1_PEA 1 P21, and a second amino acid sequence being at least 25 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VFLNFS corresponding to amino acids 59 - 64 ofHUMOSTROPEA_1_PEA_1_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 30 isolated polypeptide encoding for a tail of HUMOSTRO_PEA_1_PEA_1_P21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, WO 2005/116850 PCT/IB2005/002555 128 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VFLNFS in HUMOSTROPEA 1_PEA_1_P21. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTROPEA 1 IPEA 1 P25, comprising a 5 first amino acid sequence being at least 90 % homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1 - 31 of OSTP_HUMAN, which also corresponds to amino acids 1 - 31 of HUMOSTRO_PEAI_ PEA_1 P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 10 preferably at least 95% homologous to a polypeptide having the sequence H corresponding to amino acids 32 - 32 ofHUMOSTRO_PEA1__PEA_1 IP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO_PEA_1_PEA_1 _P30, comprising a 15 first amino acid sequence being at least 90 % homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1 - 31 of OSTP_HUMAN, which also corresponds to amino acids 1 - 31 of HUMOSTROPEA 1_PEAlP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 20 preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI corresponding to amino acids 32 - 39 ofHUMOSTRO_PEAIPEA_1 P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMOSTRO_PEA_1_PEAIP30, comprising a 25 polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTROPEA_1 _PEA_1_ P30. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P 16, comprising a first amino acid 30 sequence being at least 90 % homologous to
MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL
WO 2005/116850 PCT/IB2005/002555 129 RFGFLLPIF1QFGLYSPRIDPDYVG corresponding to amino acids 11 - 93 of Q9P2J2, which also corresponds to amino acids 1 - 83 of H61775_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 5 DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV corresponding to amino acids 84 - 152 of H61775_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H61775_P16, comprising a polypeptide being at least 10 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16. According to preferred embodiments of the present invention, there is provided an 15 isolated chimeric polypeptide encoding for H61775_P 16, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1 - 83 of AAQ88495, which also corresponds to amino acids 1 - 83 of H61775_P16, and a second amino acid sequence being 20 at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV corresponding to amino acids 84 - 152 of H61775_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H61775_P 16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW 30 RSSCSVTLQV in H61775_P16.
WO 2005/116850 PCT/IB2005/002555 130 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P 17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL 5 RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11 - 93 of Q9P2J2, which also corresponds to amino acids 1 - 83 of H61775_P17. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to 10 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1 - 83 of AAQ88495, which also corresponds to amino acids 1 - 83 of H61775_P 17. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP2, comprising a first amino acid 15 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCA corresponding to amino acids 1 - 22 of HSAPHOL_P2, second amino acid sequence being at least 90 % homologous to PATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 27 of AAH21289, 20 which also corresponds to amino acids 23 - 49 of HSAPHOL_P2, and a third amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE 25 MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEM VVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN 30 YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 83 - 586 of AAH21289, which also corresponds to amino acids 50 - 553 of WO 2005/116850 PCT/IB2005/002555 131 HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at 5 least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P2, comprising a 10 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid 15 numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 49 of HSAPHOL P2, second amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT 25 ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK 30 VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to WO 2005/116850 PCT/IB2005/002555 132 amino acids 21 - 524 of PPBT HUMAN, which also corresponds to amino acids 50 - 553 of HSAPHOLP2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a head of HSAPHOLP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL P2. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 15 acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP3, comprising a first amino acid 20 sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 63 - 82 of AAH21289, which also corresponds to amino acids 1 - 20 of HSAPHOLP3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA 25 AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG 30 PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids .123 - 586 of AAH21289, which also corresponds to amino WO 2005/116850 PCT/IB2005/002555 133 acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOLP3, comprising a 5 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid 10 numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 1 - 20 of PPBT_HUMAN, which also corresponds to amino acids 1 - 20 of 15 HSAPHOLP3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE 20 PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 61 - 524 of PPBTHUMAN, which also corresponds to amino 25 acids 21 - 484 of HSAPHOL_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally 30 at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino WO 2005/116850 PCT/IB2005/002555 134 acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 5 isolated chimeric polypeptide encoding for HSAPHOLP4, comprising a first amino acid sequence being at least 90 % homologous to MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV 10 EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL 15 F corresponding to amino acids 124 - 586 of AAH21289, which also corresponds to amino acids 1 - 463 of HSAPHOL P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P4, comprising a first amino acid sequence being at least 90 % homologous to 20 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 25 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F corresponding to amino acids 62 - 524 of PPBTHUMAN, which also corresponds to amino acids 1 - 463 of HSAPHOL P4. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP5, comprising a first amino acid WO 2005/116850 PCT/IB2005/002555 135 sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA 5 TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids 63 - 417 of AAH21289, which also corresponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 % 10 homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 440 - 586 of AAH21289, which also corresponds to amino acids 356 - 502 of HSAPHOL_P5, wherein said 15 first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more 20 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for HSAPHO LP5, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA 30 TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK
NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL
WO 2005/116850 PCT/IB2005/002555 136 GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids I - 355 of PPBT_HUMAN, which also corresponds to amino acids 1 - 355 of HSAPHOLP5, and a second amino acid sequence being at least 90 % homologous to 5 DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 377 - 524 of PPBT_HUMAN, which also corresponds to amino acids 356 - 502 of HSAPHOL_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOLP5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 15 acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP6, comprising a first amino acid 20 sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMIHNIRDIDVIMGGGRKYMYPK 25 NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 63 - 349 of AAH21289, which also corresponds to amino acids 1 287 of HSAPHOL_P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV 30 PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF corresponding to amino acids 395 - 586 of AAH21289, which WO 2005/116850 PCT/IB2005/002555 137 also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a 5 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino 10 acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL 15 GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 1 - 287 of PPBT_HUMAN, which also corresponds to amino 20 acids 1 - 287 of HSAPHOLP6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG 25 SLAAGPLLLALALYPLSVLF corresponding to amino acids 333 - 524 of PPBT_HUMAN, which also corresponds to amino acids 288 - 479 of HSAPHOLP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOLP6, comprising a 30 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more WO 2005/116850 PCT/IB2005/002555 138 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. 5 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 10 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 63 326 of AAH21289, which also corresponds to amino acids 1 - 264 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 15 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOLP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of HSAPHOLP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOLP7. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for HSAPHOLP7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA 30 TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR corresponding to amino acids 1 - 262 of WO 2005/116850 PCT/IB2005/002555 139 PPBT_HUMAN, which also corresponds to amino acids 1 - 262 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP 5 corresponding to amino acids 263 - 306 of HSAPHOLP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOLP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least 10 about 90% and most preferably at least about 95% homologous to the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP7, comprising a first amino acid sequence being at least 90 % homologous to 15 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 1 20 264 of 075090, which also corresponds to amino acids 1 - 264 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences 25 are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOLP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 30 LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7.
WO 2005/116850 PCT/IB2005/002555 140 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL 5 GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 63 - 350 of AAH21289, which also corresponds to amino acids 10 1 - 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOLP8, wherein said first and second amino acid sequences are contiguous and in a 15 sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOLP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL 25 GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 1 - 288 of PPBTHUMAN, which also corresponds to amino 30 acids 1 - 288 of HSAPHOLP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most WO 2005/116850 PCT/IB2005/002555 141 preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOLP8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOLP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLP8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLA1GTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 15 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 1 - 288 of 075090, which also corresponds to amino acids 1 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 20 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOLP8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8. 30 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888_PEA_1 P2, comprising a first amino acid WO 2005/116850 PCT/IB2005/002555 142 sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL 5 WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVS corresponding to amino acids 1 - 319 of CEA6_HUMAN, which also corresponds to amino acids I - 319 of T10888_PEA_1 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 10 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP corresponding to amino acids 320 - 324 of T10888_PEA_1 _P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888_PEA 1 P2, comprising a polypeptide being 15 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888 PEA 1 P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888_PEA 1 P4, comprising a first amino acid 20 sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 25 corresponding to amino acids 1 - 234 of CEA6_HUMAN, which also corresponds to amino acids 1 - 234 of T10888_PEA 1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL corresponding to amino acids 235 - 256 of 30 T10888_PEA _1 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 143 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail ofT10888_PEA_I _P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 LLLSSQLWPPSASRLECWPGWL in T10888 PEA 1 lP4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888 PEA 1 P4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP 10 QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL corresponding to amino acids 1 - 234 of Q13774, which also corresponds to amino acids 1 - 234 of T10888_PEA_1 P4, and a second amino acid sequence being at least 70%, optionally at least 15 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL corresponding to amino acids 235 - 256 ofT10888_PEA_1 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail of T10888_PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888 PEA_1_P4. According to preferred embodiments of the present invention, there is provided an 25 isolated chimeric polypeptide encoding for T10888_PEA_1 P5, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL 30 WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY
GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS
WO 2005/116850 PCT/IB2005/002555 144 YMCQAHNSATGLNRTTVTMITVSG corresponding to amino acids 1 - 320 of CEA6_HUMAN, which also corresponds to amino acids 1 - 320 of T 10888_PEA 1 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide 5 having the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV corresponding to amino acids 321 - 390 of T10888_PEA 1 P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 10 isolated polypeptide encoding for a tail of T10888_PEA 1 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888 PEA 1 P5. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888_PEA 1 P6, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG 20 FYTLQVIKSDLVNEEATGQFHVY corresponding to amino acids 1 - 141 of CEA6_HUMAN, which also corresponds to amino acids 1 - 141 of T10888_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 25 REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI corresponding to amino acids 142 - 183 ofT10888_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888_PEA_1 P6, comprising a polypeptide being 30 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at WO 2005/116850 PCT/IB2005/002555 145 least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLD1 in T10888_PEA _ P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P9, comprising a first amino acid 5 sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 10 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 274 of Q9UII7, which also corresponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG 15 corresponding to amino acids 275 - 322 of HSECADHP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADHP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least 20 about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHP9, comprising a first amino acid 25 sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 30 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 274 of Q9UII8, which also corresponds to amino acids 1 - 274 of HSECADHP9, and a second WO 2005/116850 PCT/IB2005/002555 146 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to amino acids 275 - 322 of HSECADHP9, wherein said first and second amino 5 acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADHP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 10 TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHP9, comprising a first amino acid sequence being at least 90 % homologous to 15 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 20 274 of CADI_HUMAN, which also corresponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to 25 amino acids 275 - 322 of HSECADH_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least 30 about 90% and most preferably at least about 95% homologous to the sequence WO 2005/116850 PCT/IB2005/002555 147 TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P 3, comprising a first amino acid 5 sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWV1PPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 10 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids 1 - 379 of Q9UI17, which also corresponds to amino acids 1 - 379 of HSECADHP13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 of HSECADH_P13, wherein said first and second 15 amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P 13, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 20 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL 25 STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids 1 - 379 of Q9UII8, which also corresponds to amino acids 1 - 379 of HSECADHP13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 of HSECADH_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for HSECADHP13, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 148 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHI- LERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 5 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids 1 - 379 of CAD I_HUMAN, which also corresponds to amino acids 1 - 379 of HSECADH_P13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 of HSECADHP 13, wherein said first 10 and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 15 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 20 1 - 336 of Q9UII7, which also corresponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second 25 amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADHP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 30 VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADHP14.
WO 2005/116850 PCT/IB2005/002555 149 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 5 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 10 1- 336 of Q9UII8, which also corresponds to amino acids 1 - 336 of HSECADH -_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADH P 14, wherein said first and second 15 amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADHP 14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH P14. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 25 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 30 1 - 336 of CAD1 HUMAN, which also corresponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least WO 2005/116850 PCT/IB2005/002555 150 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADH_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 10 VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH P14. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHP15, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 15 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding to amino acids 1 - 229 of Q9UII7, which also corresponds to amino acids 1 - 229 of HSECADHP15, and a second amino acid sequence VSIS corresponding to amino acids 230 20 233 of HSECADHP15, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHP15, comprising a first amino acid sequence being at least 90 % homologous to 25 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding to amino acids 1 - 229 of Q9UII8, which also corresponds to amino acids 1 - 229 of 30 HSECADHP15, and a second amino acid sequence VSIS corresponding to amino acids 230 - WO 2005/116850 PCT/IB2005/002555 151 233 of HSECADH_P15, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHP15, comprising a first amino acid 5 sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding 10 to amino acids 1 - 229 of CADI_HUMAN, which also corresponds to amino acids 1 - 229 of HSECADH P 15, and a second amino acid sequence VSIS corresponding to amino acids 230 233 of HSECADHP15, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 15 isolated chimeric polypeptide encoding for T59832_P5, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 12 - 55 of GILTHUMAN, which also corresponds to amino acids 1 - 44 of T59832_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 20 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG corresponding to amino acids 45 - 189 of 25 T59832_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 30 90% and most preferably at least about 95% homologous to the sequence
VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP
WO 2005/116850 PCT/IB2005/002555 152 SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG in T59832 P5. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence 5 being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQ ALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 12 10 - 223 of GILT_HUMAN, which also corresponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 213 - 238 of T59832_P7, wherein said first and second amino acid sequences are contiguous 15 and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 25 PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 1 - 212 of BAC98466, which also corresponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 30 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 213 - 238 WO 2005/116850 PCT/IB2005/002555 153 ofT59832 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 5 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence 10 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV corresponding to amino acids 1 - 90 of T59832_P7, and a second amino acid sequence being at least 90 % homologous to 15 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVC MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYV PWVTVNGVRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 1 - 148 of BAC85622, which also corresponds to amino acids 91 - 238 of T59832_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 20 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 25 PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 30 PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC
QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM
WO 2005/116850 PCT/IB2005/002555 154 ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 1 - 212 of Q8WU77, which also corresponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 5 the sequence VRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 213 - 238 of T59832_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P7, comprising a polypeptide being at least 10 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence 15 being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 12 - 214 of 20 GILT_HUMAN, which also corresponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to amino acids 204 - 244 of T59832_P9, wherein said first and second amino acid sequences are 25 contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 30 NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832 P9.
WO 2005/116850 PCT/IB2005/002555 155 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 5 PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 1 - 203 of BAC98466, which also corresponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 10 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to amino acids 204 - 244 ofT59832 P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 15 isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832 P9. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV corresponding to amino acids 1 - 90 of 25 T59832_P9, second amino acid sequence being at least 90 % homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVC MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 1 - 113 of BAC85622, which also corresponds to amino acids 91 203 of T59832_P9, and a third amino acid sequence being at least 70%, optionally at least 80%, 30 preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WO 2005/116850 PCT/IB2005/002555 156 NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to amino acids 204 - 244 of T59832 P9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a head of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832 P9. 10 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832 P9. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC 20 QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 1 - 203 of Q8WU77, which also corresponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 25 the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to amino acids 204 - 244 of T59832_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 30 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about WO 2005/116850 PCT/IB2005/002555 157 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence 5 being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE corresponding to amino acids 12 - 141 of GILT_HUMAN, which also corresponds to amino acids 1 - 130 of T59832 P12, and a second amino acid sequence being at 10 least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 131 - 219 of T59832_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 20 acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence 25 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV corresponding to amino acids 1 - 90 of T59832_P12, second amino acid sequence being at least 90 % homologous to 30 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE corresponding to amino acids 1 - 40 of BAC85622, which also corresponds to amino acids 91 - 130 of T59832_P12, WO 2005/116850 PCT/IB2005/002555 158 third amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 72 - 122 of BAC85622, which also corresponds to amino acids 131 - 181 of T59832_P12, and a fourth amino acid sequence being at least 70%, optionally at 5 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 182 - 219 of T59832 P12, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA 15 PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832 P12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more 20 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 25 isolated polypeptide encoding for a tail of T59832_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK in T59832 P12. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 159 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE corresponding to amino acids 1 - 130 of Q8WU77, which also corresponds to amino acids 1 - 130 of T59832_P12, and a second amino acid sequence being at 5 least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 162 - 250 of Q8WU77, which also corresponds to amino acids 131 - 219 of T59832_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 15 acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P 18, comprising a first amino acid sequence 20 being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 12 - 55 of GILTHUMAN, which also corresponds to amino acids 1 - 44 of T59832_P18, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED 25 QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 45 - 133 of T59832_P18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a 30 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more WO 2005/116850 PCT/IB2005/002555 160 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. 5 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P18, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 1 - 44 of Q8WU77, which also corresponds to amino acids 1 - 44 of T59832_Pl 8, and a 10 second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 162 - 250 of Q8WU77, which also corresponds to amino acids 45 - 133 of T59832_P18, wherein said first and second amino acid sequences are contiguous and in a sequential order. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino 20 acids in length, wherein at least two amino acids comprise KC, living a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P18, comprising a first amino acid sequence 25 being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 1 - 44 of Q8NEI4, which also corresponds to amino acids 1 - 44 of T59832_P18, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED 30 QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 162 - 250 of WO 2005/116850 PCT/IB2005/002555 161 Q8NEI4, which also corresponds to amino acids 45 - 133 of T59832_P18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a 5 polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid 10 numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMGRP5EP4, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG 15 ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK corresponding to amino acids 1 - 127 of GRP_HUMAN, which also corresponds to amino acids 1 - 127 of HUMGRP5E_P4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ corresponding to amino acids 135 148 of GRP_HUMAN, which also corresponds to amino acids 128 - 141 of HUMGRP5E _P4, 20 wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMGRP5EP4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more 25 preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for HUMGRP5E P5, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 162 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK corresponding to amino acids 1 - 127 of GRPHUMAN, which also corresponds to amino acids 1 - 127 of HUMGRP5E_P5, and a second amino acid sequence 5 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS corresponding to amino acids 128 - 142 of HUMGRP5EP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 10 isolated polypeptide encoding for a tail of HUMGRP5E_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5. According to preferred embodiments of the present invention, there is provided an 15 isolated chimeric polypeptide encoding for R1 1723_PEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR 20 corresponding to amino acids 1 - 110 of R1 1723_PEA_1 P6, and a second amino acid sequence being at least 90 % homologous to MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 1 - 112 of Q8IXMO, which also corresponds to amino acids 111 25 222 of R1 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R1 1723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 30 at least about 90% and most preferably at least about 95% homologous to the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV
WO 2005/116850 PCT/IB2005/002555 163 MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R11723_PEA_ 1 P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEAlP6, comprising a first amino acid 5 sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q96AC2, which also corresponds to amino acids 1 - 83 of RI 1723_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 10 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of RI 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in 15 a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R1 1723_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in RI 1723_PEA 1 P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1_P6, comprising a first amino acid 25 sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q8N2G4, which also corresponds to amino acids 1 - 83 of R1 1723_PEA 1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 30 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL
WO 2005/116850 PCT/IB2005/002555 164 RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHIDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of RI 1723 PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723_PEA_ 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL 10 RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in Ri 1723_PEA_1 P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA 1_P6, comprising a first amino acid sequence being at least 90 % homologous to 15 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24 - 106 of BAC85518, which also corresponds to amino acids 1 - 83 of RI 1723_PEA_l1P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of R1 1723 PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R1 1723_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL 30 RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA 1 P6.
WO 2005/116850 PCT/IB2005/002555 165 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for RI 1723_PEA_1 lP7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV 5 MEQSAG corresponding to amino acids 1 - 64 of Q96AC2, which also corresponds to amino acids 1 - 64 of R1 1723_PEA__1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of 10 RI 1723_PEA_1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R1 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 15 least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723 PEA 1 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to 20 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 1 - 64 of Q8N2G4, which also corresponds to amino acids 1 - 64 of RI 1723_PEA_ 1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 25 SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R1 1723_PEA_1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R1 1723_PEA_1 P7, comprising a polypeptide being 30 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at WO 2005/116850 PCT/IB2005/002555 166 least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723 PEA _ P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1 P7, comprising a first amino acid 5 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids I - 5 of R1 1723_PEA_1 P7, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG 10 corresponding to amino acids 22 - 80 of BAC85273, which also corresponds to amino acids 6 64 of RI 1723_PEA 1 P7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of 15 R1 1723_PEA_1 P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R1 1723_PEA 1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 20 at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI1723 PEA 1 P7. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R1 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 25 least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723 PEA 1 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723 PEA 1 P7, comprising a first amino acid sequence being at least 90 % homologous to 30 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 24 - 87 of BAC85518, which also corresponds to WO 2005/116850 PCT/IB2005/002555 167 amino acids I - 64 of R 11723 PEA_ 1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of 5 R1 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 10 least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723 PEA 1 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to 15 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of R1 1723_PEA 1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 DTKRTNTLLFEMRHFAKQLTT corresponding to amino acids 64 - 84 of R11723_PEA_1 P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723_PEA_1 P13, comprising a polypeptide being 25 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in RI 1723_PEA 1_P13. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1_P10O, comprising a first amino acid 30 sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV
WO 2005/116850 PCT/IB2005/002555 168 MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of R 11723_PEAIPl0, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 5 DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of RI 1723 PEA 1_Pl0, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723 PEA 1_P10O, comprising a polypeptide being 10 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723_PEA 1_P10. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for RI 1723_PEA 1 P10O, comprising a first amino acid 15 sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q8N2G4, which also corresponds to amino acids 1 - 63 of R1 1723_PEA 1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 20 preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of R1 1723 PEA 1_P10O, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 25 isolated polypeptide encoding for a tail of RI 1723_PEA 1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence - DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA_1 P10. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for RI 1723_PEA_1_P10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at WO 2005/116850 PCT/IB2005/002555 169 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of RI 1723_PEAI P10, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA 5 corresponding to amino acids 22 - 79 of BAC85273, which also corresponds to amino acids 6 63 of RI 1723_PEA_1 Pl0, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of 10 R1 1723_PEA 1_P10O, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of RI 1723_PEA_1 P10O, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 15 at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI1723 PEA_1_P10. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723_PEA_1_P 10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at 20 least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEAIP10. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R1 1723_PEA_1_P10O, comprising a first amino acid sequence being at least 90 % homologous to 25 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 24 - 86 of BAC85518, which also corresponds to amino acids 1 - 63 of R1 1723_PEA 1 P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 30 DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of WO 2005/116850 PCT/IB2005/002555 170 RI 1723_PEA_ I _Pl 0, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of RI 1723 PEA_1_P 10, comprising a polypeptide being 5 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Ri 1723_PEA 1_Pl0. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for D56406_PEA_1 P2, comprising a first amino acid 10 sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMT LLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAF QHWE corresponding to amino acids 1 - 120 of NEUTHUMAN, which also corresponds to amino acids 1 - 120 of D56406_PEA_1_P2, second amino acid sequence being at least 70%, 15 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ARWLTPVIPALWEAETGGSRGQEMETIPANT corresponding to amino acids 121 - 151 of D56406_PEA 1 P2, and a third amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to 20 amino acids 121 - 170 of NEUT_HUMAN, which also corresponds to amino acids 152 - 201 of D56406_PEA 1 P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of D56406_PEA 1 P2, comprising an amino 25 acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for ARWLTPVIPALWEAETGGSRGQEMETIPANT, corresponding to D56406_PEAlP2. According to preferred embodiments of the present invention, there is provided an 30 isolated chimeric polypeptide encoding for D56406_PEA_1 P5, comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLC WO 2005/116850 PCT/IB2005/002555 171 corresponding to amino acids 1 - 23 of NEUT_HUMAN, which also corresponds to amino acids I - 23 of D56406_PEA_ I P5, and a second amino acid sequence being at least 90 % homologous to SEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEEL 5 VARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKR KIPYILKRQLYENKPRRPYILKRDSYYY corresponding to amino acids 26 - 170 of NEUT_HUMAN, which also corresponds to amino acids 24 - 168 of D56406_PEA 1 P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for an edge portion of D56406_PEA_1 P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise CS, having a 15 structure as follows: a sequence starting from any of amino acid numbers 23-x to24; and ending at any of amino acid numbers + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for D56406_PEA_1 P6, comprising a first amino acid sequence being at least 90 % homologous to 20 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK corresponding to amino acids 1 - 45 of NEUT HUMAN, which also corresponds to amino acids 1 - 45 of D56406_PEA_1 P6, and a second amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to amino acids 121 - 170 of NEUTHUMAN, which also corresponds to amino acids 46 - 95 of 25 D56406_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of D56406_PEA_1 P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, 30 optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least WO 2005/116850 PCT/IB2005/002555 172 about 50 amino acids in length, wherein at least two amino acids comprise KL, having a structure as follows: a sequence starting from any of amino acid numbers 45-x to 46; and ending at any of amino acid numbers 46+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an 5 isolated chimeric polypeptide encoding for H53393_PEA_1 P2, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV 10 QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV 15 LDVNDNAPEFAEFYETFVCEKAKADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNF TIQDNK corresponding to amino acids 1 - 543 of CAD6_HUMAN, which also corresponds to amino acids 1 - 543 of H53393_PEA_1_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GK corresponding to 20 amino acids 544 - 545 ofH53393_PEA_1 P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H53393_PEA 1_ P3, comprising a first amino acid sequence being at least 90 % homologous to 25 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV 30 GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR
FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV
WO 2005/116850 PCT/IB2005/002555 173 KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQ corresponding to amino acids 1 - 504 of CAD6_HUMAN, which also corresponds to amino acids I - 504 of H53393_PEAlP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 5 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RFGFSLS corresponding to amino acids 505 - 511 of H53393_PEA 1 P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H53393_PEA_1 P3, comprising a polypeptide being 10 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RFGFSLS in H53393 PEA 1 P3. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H53393_PEA_1_P6, comprising a first amino acid 15 sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA 20 KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKK corresponding to amino acids 1 - 333 of CAD6_HUMAN, which also corresponds to amino acids 1 - 333 of H53393_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a 25 polypeptide having the sequence VMPLLKHHTE corresponding to amino acids 334 - 343 of H53393_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H53393_PEA_ lP6, comprising a polypeptide being 30 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at WO 2005/116850 PCT/IB2005/002555 174 least about 90% and most preferably at least about 95% homologous to the sequence VMPLLKHHTE in H53393_PEA 1_P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU40434_PEA_1_PI2, comprising a first amino 5 acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISS LSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDALP LDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALACWGVRGSLLS EADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGP 10 PSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRRE VEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAI PFTYEQLDVLKH KLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMSPQVA TLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW corresponding to amino acids 1 - 458 of Q14859, which also corresponds to amino acids 1 - 458 of 15 HSU40434 PEA 1 P12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU40434_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ corresponding to amino 20 acids 1 - 43 of Q9BTR2, which also corresponds to amino acids 1 - 43 of HSU40434_PEA_1_P12, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence E corresponding to amino acids 44 - 44 of HSU40434_PEA_1_P12, and a third amino acid sequence being at least 90 % homologous to 25 AAPLDGVLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRC LAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLL PAALACWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQE AARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPS WRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRV 30 NAIPFTYEQLDVLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKAL
LEVNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW
WO 2005/116850 PCT/IB2005/002555 175 corresponding to amino acids 44 - 457 of Q9BTR2, which also corresponds to amino acids 45 458 ofHSU40434_PEA_1 P12, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for an edge portion of HSU40434_PEA_1 P12, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably atleast about 90% and most preferably at least about 95% homologous to the sequence encoding for E, corresponding to HSU40434_PEA 1_P12. According to preferred embodiments of the present invention, there is provided an 10 isolated chimeric polypeptide encoding for M77904_P2, comprising a first amino acid sequence being at least 90 % homologous to MLS IKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL 15 MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES corresponding to amino acids 67 - 341 ofQ8WU91, which also corresponds to amino acids 1 - 275 of M77904_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 20 the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR 25 DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE corresponding to amino acids 276 - 770 of M77904_P2, wherein said 30 first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 176 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN 10 CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE in M77904 P2. According to preferred embodiments of the present invention, there is provided an 15 isolated chimeric polypeptide encoding for M77904 P2, comprising a first amino acid sequence being at least 90 % homologous to MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL 20 MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTISCTDHRYCQR KSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQKLQQHTHEKPCNTSFSYLV ASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAPSFQQEASRQGLTVSFIPYFKEEGV 25 FTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAF MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding 30 to amino acids 67 - 836 of Q96QU7, which also corresponds to amino acids 1 - 770 of M77904 P2.
WO 2005/116850 PCT/IB2005/002555 177 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS 5 KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES corresponding to amino acids 1 10 341 of Q8WU91, which also corresponds to amino acids 1 - 341 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD 15 DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVIPALWEAKAGGSLEVRSSRPAWPTW corresponding to amino acids 342 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 20 isolated polypeptide encoding for a tail ofM77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVC LESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT 25 PVIPALWEAKAGGSLEVRSSRPAWPTW in M77904 P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS 30 KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR
TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR
WO 2005/116850 PCT/IB2005/002555 178 IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 5 corresponding to amino acids 1 - 416 of Q9H5V8, which also corresponds to amino acids 1 416 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS 10 LEVRSSRPAWPTW corresponding to amino acids 417 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 15 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904 P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P4, comprising a first amino acid sequence 20 being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP 25 EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS corresponding to amino acids 1 - 416 of Q96QU7, which also corresponds to amino acids 1 416 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 30 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WO 2005/116850 PCT/IB2005/002555 179 TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW corresponding to amino acids 417 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904 P4. 10 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 % homologous to MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG 15 RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding to amino acids 606 - 836 of Q96QU7, which also corresponds to amino acids 1 - 231 of M77904 P5. According to preferred embodiments of the present invention, there is provided an 20 isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 % homologous to MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP 25 TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding to amino acids 419 - 649 of Q9H8C2, which also corresponds to amino acids 1 - 231 of M77904 P5. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence 30 being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS
WO 2005/116850 PCT/IB2005/002555 180 KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q8WU91, which also corresponds to amino acids 1 - 219 of M77904_P7, and a second amino 5 acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. 15 According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR 20 TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q9H5V8, which also corresponds to amino acids 1 - 219 of M77904 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 25 the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_ P7, comprising a polypeptide being at least 30 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about WO 2005/116850 PCT/IB2005/002555 181 90% and most preferably at least about 95% homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence 5 being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFS1PRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of 10 Q96QU7, which also corresponds to amino acids 1 - 219 of M77904 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904 P7, wherein said first and second amino acid sequences are contiguous and in a 15 sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 EKAPPCYLIRLKHTRSSLF in M77904 P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299_PEA_2 P2, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP 25 GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK corresponding to amino acids 1 - 131 of ALKIHUMAN, which also corresponds to amino acids 1 - 131 of Z25299_PEA 2 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 30 GKQGMRAH corresponding to amino acids 132 - 139 of Z25299_PEA_2 P2, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 182 According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299_PEA_2 P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 5 GKQGMRAH in Z25299_PEA_2_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299_PEA 2 P3, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP 10 GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK corresponding to amino acids 1 - 131 of ALKI_HUMAN, which also corresponds to amino acids 1 - 131 of Z25299_PEA 2 P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 GEKRHHKQLRDQEVDPLEMRRHSAG corresponding to amino acids 132 - 156 of Z25299_PEA_2 P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299_PEA 2 P3, comprising a polypeptide being 20 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG in Z25299_PEA_2 P3. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299_PEA_2 P7, comprising a first amino acid 25 sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNP corresponding to amino acids 1 - 81 of ALKI_HUMAN, which also corresponds to amino acids 1 - 81 of Z25299_PEA_2 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 30 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WO 2005/116850 PCT/IB2005/002555 183 RGSLGSAQ corresponding to amino acids 82 - 89 of Z25299_PEA 2 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299_PEA_2 P7, comprising a polypeptide being 5 at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RGSLGSAQ in Z25299_PEA_2_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299_PEA_2 Pl0, comprising a first amino acid 10 sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPT corresponding to amino acids 1 - 82 of ALKI_HUMAN, which also corresponds to amino acids 1 - 82 of Z25299_PEA_2_P10. According to preferred embodiments of the present invention, there is provided an 15 antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally the amino acid sequence corresponds to a bridge, edge portion, tail, head or insertion as described herein. Optionally the antibody is capable of differentiating between a splice variant having 20 said epitope and a corresponding known protein. According to preferred embodiments of the present invention, there is provided a kit for detecting ovarian cancer, comprising a kit detecting overexpression of a splice variant as described herein. Optionally the kit comprises a NAT-based technology. 25 Optionally the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit comprises an antibody as described herein. 30 Optionally the kit further comprises at least one reagent for performing an ELISA or a Western blot.
WO 2005/116850 PCT/IB2005/002555 184 According to preferred embodiments of the present invention, there is provided a method for detecting ovarian cancer, comprising detecting overexpression of a splice variant as described herein. Optionally detecting overexpression is performed with a NAT-based technology. 5 Optionally detecting overexpression is performed with an immunoassay. Optionally the inununoassay comprises an antibody as described herein. According to preferred embodiments of the present invention, there is provided a biomarker capable of detecting ovarian cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment 10 thereof. According to preferred embodiments of the present invention, there is provided a method for screening for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein. According to preferred embodiments of the present invention, there is provided a 15 method for diagnosing ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein. According to preferred embodiments of the present invention, there is provided a method for monitoring disease progression and/or treatment efficacy and/or relapse of ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method 20 or assay as described herein. According to preferred embodiments of the present invention, there is provided a method of selecting a therapy for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein and selecting a therapy according to said detection. 25 According to preferred embodiments of the present invention, preferably any of the above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%, more preferably at least about 90%, most preferably at least about 95% homology thereto. Unless otherwise noted, all experimental data relates to variants of the present invention, 30 named according to the segment being tested (as expression was tested through RT-PCR as described).
WO 2005/116850 PCT/IB2005/002555 185 All nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes 5 as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably. 10 BRIEF DESCRIPTION OF DRAWINGS Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages. Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster 15 based on presence or absence of unique sequence regions. Figure 3 is schematic summary of quantitative real-time PCR analysis. Figure 4 is schematic presentation of the oligonucleotide based microarray fabrication. Figure 5 is schematic summary of the oligonucleotide based microarray experimental flow. 20 Figure 6 shows cancer and cell-line vs. normal tissue expression for. Figure 7 shows expression of segment8 in H61775 in cancerous vs. non-cancerous tissues. Figure 8 shows expression of segment in H61775 in normal tissues. Figure 9 shows cancer and cell- line vs. normal tissue expression. 25 Figure 10 is a histogram showing over expression of T juncl 1-17 transcripts in cancerous ovary samples relative to the normal samples. Figure 11 is a histogram showing expression of T junc I1-17 transcripts in normal tissues. Figure 12 shows cancer and cell-line vs. normal tissue expression. 30 Figure 13 is a histogram showing over expression of HUMGRP5Ejunc3-7 transcripts in cancerous ovary samples relative to the normal samples.
WO 2005/116850 PCT/IB2005/002555 186 Figure 14 is a histogram showing expression of HIUMGRP5Ejunc3-7 transcripts in normal tissues. Figure 15 shows cancer and cell- line vs. normal tissue expression. Figure 16 is a histogram showing over expression of R 11723 segl3 transcripts in 5 cancerous ovary samples relative to the normal PM samples. Figure 17 is a histogram showing expression of R 11723 segl 13 transcripts in normal tissue samples. Figure 18 is a histogram showing over expression of RI 1723 juncl 1-18 transcripts in cancerous ovary samples relative to the normal samples. 10 Figure 19 is a histogram showing expression of R1 1723 juncl 1-18 transcripts in normal tissue samples. Figure 20 shows cancer and cell- line vs. normal tissue expression. Figure 21 is a histogram showing over expression of H53393 segl3 transcripts in cancerous ovary samples relative to the normal samples. 15 Figure 22 is a histogram showing over expression of H53393 junc21-22 transcripts in cancerous ovary samples relative to the normal samples. Figure 23 shows cancer and cell-line vs. normal tissue expression. Figure 24 shows cancer and cell-line vs. normal tissue expression. Figure 25 shows cancer and cell-line vs. normal tissue expression. 20 Figure 26 is a histogram showing over expression of Z25299 juncl3-14-21 transcripts in cancerous ovary samples relative to the normal samples. Figures 27A and 27B are histograms showing over expression of Z25299 seg20 transcripts in cancerous ovary samples relative to the normal samples (27A) or in normal tissues (27B). 25 Figures 28A and 28B are histograms showing over expression of Z25299 seg23 transcripts in cancerous ovary samples relative to the normal samples (28A) or in normal tissues (28B). Figure 29 shows cancer and cell-line vs. normal tissue expression. Figure 30 is a histogram showing down regulation of T39971 junc23-33R transcripts in 30 cancerous ovary samples relative to the normal samples.
WO 2005/116850 PCT/IB2005/002555 187 Figure 31 is a histogram showing expression of T39971 junc23-33R transcripts in normal tissues. Figure 32 shows cancer and cell- line vs. normal tissue expression. Figures 33A and 33B are histograms showing down regulation of Z44808 junc8-11 5 transcripts in cancerous ovary samples relative to the normal samples (33A) or expression in normal tissues (33B). Figure 34 shows cancer and cell- line vs. normal tissue expression. Figure 35 shows cancer and cell-line vs. normal tissue expression. Figure 36 shows cancer and cell-line vs. normal tissue expression. 10 Figure 37 shows cancer and cell-line vs. normal tissue expression. Figure 38 shows cancer and cell-line vs. normal tissue expression. Figure 39 shows cancer and cell- line vs. normal tissue expression. Figure 40 shows cancer and cell-line vs. normal tissue expression. Figure 41 shows cancer and cell-line vs. normal tissue expression. 15 Figure 42 shows cancer and cell- line vs. normal tissue expression. Figure 43 is a histogram showing differential expression of a variety of transcripts in cancerous ovary samples relative to the normal samples. Figure 44 shows cancer and cell-line vs. normal tissue expression. 20 DESCRIPTION OF PREFERRED EMBODIMENTS The present invention is of novel markers for ovarian cancer that are both sensitive and accurate. Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as 25 tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. Furthermore, at least certain of these markers are able to distinguish between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal 30 sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells), alone or in combination. These WO 2005/116850 PCT/IB2005/002555 188 markers are differentially expressed, and preferably overexpressed in ovarian cancer specifically, as opposed to normal ovarian tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis of ovarian cancer. The markers of the present invention, alone or in 5 combination, show a high degree of differential detection between ovarian cancer and non cancerous states. The markers of the present invention, alone or in combination, can be used for prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment monitoring of ovarian cancer. For example, optionally and preferably, these markers may be 10 used for staging ovarian cancer and/or monitoring the progression of the disease. Furthermore, the markers of the present invention, alone or in combination, can be used for detection of the source of metastasis found in anatomical places other thenovary. Also, one or more of the markers may optionally be used in combination with one or more other ovarian cancer markers (other than those described herein). According to an optional embodiment of the present 15 invention, such a combination may be used to differentiate between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from either granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells). 20 These markers are specifically released to the bloodstream under conditions of ovarian cancer (or one of the above indicative conditions), and/or are otherwise expressed at a much higher level and/or specifically expressed in ovarian cancer tissue or cells, and/or tissue or cells under one of the above indicative conditions. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a 25 probable diagnosis of ovarian cancer and/or a condition that it is indicative of a higher risk for ovarian cancer. The present invention therefore also relates to diagnostic assays for ovarian cancer, and methods of use of such markers for detection of ovarian cancer, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample.
WO 2005/116850 PCT/IB2005/002555 189 In another embodiment, the present invention relates to bridges, tails, heads and/or insertions, and/or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and/or insertions are described in greater detail below with regard to the Examples. As used herein a "tail" refers to a peptide sequence at the end of an amino acid sequence 5 that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the corresponding known protein, while at least a second portion of the variant comprises the tail. As used herein a "head" refers to a peptide sequence at the beginning of an amino acid 10 sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the corresponding known protein. As used herein "an edge portion" refers to a connection between two portions of a splice 15 variant according to the present invention that were not joined in the wild type or known protein. An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein. A "bridge" may optionally be an 20 edge portion as described above, but may also include a join between a head and a "known protein" portion of a variant, or a join between a tail and a "known protein" portion of a variant, or a join between an insertion and a "known protein" portion of a variant. Optionally and preferably, a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant, comprises at least about 10 amino acids, more preferably 25 at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant. Also optionally, the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number 30 in between). It should be noted that a bridge cannot be extended beyond the length of the sequence in WO 2005/116850 PCT/IB2005/002555 190 either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below. For example, certain descriptions of the bridges feature the following format: a bridge 5 between two edges (in which a portion of the known protein is not present in the variant) may optionally be described as follows: a bridge portion of CONTIG-NAME PI (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most 10 preferably at least about 50 amino acids in length, wherein at least two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAMEPI): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2. In this example, it 15 should also be read as including bridges in which n is any number of amino acids between 10-50 amino acids in length. Furthermore, the bridge polypeptide cannot extend beyond the sequence, so it should be read such that 49-x (for example) is not less than 1, nor 50 + ((n-2) - x) (for example) greater than the total sequence length. In another embodiment, this invention provides antibodies specifically recognizing the 20 splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a corresponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below). In another embodiment, this invention provides an isolated nucleic acid molecule 25 encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an oligonucleotide of at 30 least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this WO 2005/116850 PCT/IB2005/002555 191 invention. In another embodiment, this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention. In another embodiment, this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological 5 sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known corresponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction correlates with the presence of a splice 10 variant in the biological sample. In another embodiment, this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a 15 hybridization complex correlates with the presence of a splice variant nucleic acid sequence in the biological sample. According to the present invention, the splice variants described herein are non-limiting examples of markers for diagnosing ovarian cancer. Each splice variant rurker of the present invention can be used alone or in combination, for various uses, including but not limited to, 20 prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of ovarian cancer. According to optional but preferred embodiments of the present invention, any marker according to the present invention may optionally be used alone or combination. Such a combination may optionally comprise a plurality of markers described herein, optionally 25 including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker. Furthermore, such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi-quantitative measurement of any marker described herein to any other marker described herein, and/or any other known marker, and/or any other marker. With regard to such a ratio 30 between any marker described herein (or a combination thereof) and a known marker, more WO 2005/116850 PCT/IB2005/002555 192 preferably the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene. According to other preferred embodiments of the present invention, a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof, 5 may be featured as a biomarker for detecting ovarian cancer and/or an indicative condition, such that a biomarker may optionally comprise any of the above. According to still other preferred embodiments, the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence corresponding to a splice variant protein as described herein. Any oligopeptide or 10 peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges. The present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides. 15 The present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, corresponding to a splice variant of the present invention as described above, optionally for any application. Non-limiting examples of methods or assays are described below. The present invention also relates to kits based upon such diagnostic methods or assays. 20 Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, 25 altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or artificially induced, either randomly or in a targeted fashion. The present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at 30 least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % identical to the nucleic acid sequences set forth below], sequences WO 2005/116850 PCT/IB2005/002555 193 encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or man induced, either randomly or in a targeted fashion. The present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide 5 sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention. In cases where the polynucleotide sequences of the present invention encode previously unidentified polypeptides, the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments 10 thereof described hereinabove. A "nucleic acid fragment" or an "oligonucleotide" or a "polynucleotide" are used herein interchangeably to refer to a polymer of nucleic acids. A polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a 15 genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above). As used herein the phrase "complementary polynucleotide sequence" refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be 20 subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase. As used herein the phrase "genomic polynucleotide sequence" refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome. As used herein the phrase "composite polynucleotide sequence" refers to a sequence, 25 which is composed of genomic and cDNA sequences. A composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween. The intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements. 30 Preferred embodiments of the present invention encompass oligonucleotide probes.
WO 2005/116850 PCT/IB2005/002555 194 An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion 5 according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Alternatively, an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, 10 particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Oligonucleotides designed according to the teachings of the present invention can be 15 generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis. Equipment and reagents for executing solid-phase synthesis are commercially available from, for example, Applied Biosystems. Any other means for such synthesis may also be employed; the actual synthesis of the oligonucleotides is well within the capabilities of one skilled in the art and can be accomplished via established methodologies as 20 detailed in, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Current Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Current Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988) and "Oligonucleotide Synthesis" Gait, M. J., ed. (1984) utilizing solid phase chemistry, 25 e.g. cyanoethyl phosphoramidite followed by deprotection, desalting and purification by for example, an automated trityl-on method or HPLC. Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases. 30 Preferably, the oligonucleotide of the present invention features at least 17, at least 18, at least WO 2005/116850 PCT/IB2005/002555 195 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention. The oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodiester linkage. 5 Preferably used oligonucleotides are those modified at one or more of the backbone, internucleoside linkages or bases, as is broadly described hereinunder. Specific examples of preferred oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural internucleoside linkages. Oligonucleotides having modified backbones include those that retain 10 a phosphorus atom in the backbone, as disclosed in U.S. Pat. NOs: 4,469,863; 4,476,301; 5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466, 677; 5,476,925; 5,519,126; 5,536,821; 5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; and 5,625,050. Preferred modified oligonucleotide backbones include, for example, phosphorothioates, 15 chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked 20 analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms can also be used. Alternatively, modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside 25 linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate 30 backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH 2 component parts, as WO 2005/116850 PCT/IB2005/002555 196 disclosed in U.S. Pat. Nos. 5,034,506; 5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541,307; 5,561,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 5,623, 070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. 5 Other oligonucleotides which can be used according to the present invention, are those modified in both sugar and the internucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target. An example for such an oligonucleotide mimetic, includes peptide nucleic acid (PNA). United States patents that teach the preparation of PNA compounds 10 include, but are riot limited to, U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262, each of which is herein incorporated by reference. Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat. No: 6,303,374. Oligonucleotides of the present invention may also include base modifications or substitutions. As used herein, "unmodified" or "natural" bases include the purine bases adenine 15 (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified bases include but are not limited to other synthetic and natural bases such as 5 methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and 20 cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8 substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5 substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8 azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. 25 Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6 1.2 oC and are presently preferred base substitutions, even more particularly when combined 30 with 2'-O-methoxyethyl sugar modifications.
WO 2005/116850 PCT/IB2005/002555 197 Another modification of the oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. Such moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S 5 tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, as disclosed in U.S. Pat. No: 6,303,374. 10 It is not necessary for all positions in a given oligonucleotide molecule to be uniformly modified, and in fact more than one of the aforementioned modifications may be incorporated in a single compound or even at a single nucleoside within an oligonucleotide. It will be appreciated that oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and/or to increase bioavailability, 15 therapeutic efficacy and reduce cytotoxicity. To enable cellular expression of the polynucleotides of the present invention, a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element. As used herein, the phrase "cis acting regulatory element" refers to a 20 polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention. Preferably, the promoter utilized by the nucleic acid construct of the present invention is 25 active in the specific cell population transformed. Examples of cell type-specific and/or tissue specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne 30 et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk WO 2005/116850 PCT/IB2005/002555 198 whey promoter (U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). The nucleic acid construct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom. 5 The nucleic acid construct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication. Preferably, the nucleic acid construct utilized is a shuttle vector, which can propagate both in E. coli (wherein the construct comprises an appropriate selectable marker and origin of replication) and be compatible for propagation in cells, or integration in a gene and a tissue of choice. The construct according to 10 the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a virus or an artificial chromosome. Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral 15 vector and packaging systems are those sold by Clontech, San Diego, Calif., includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5'LTR promoter. Currently preferred in vivo nucleic acid transfer techniques include transfection with 20 viral or non-viral constructs, such as adenovirus, lentivirus, Herpes simplex I virus, or adeno associated virus (AAV) and lipid-based systems. Useful lipids for lipid-mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most preferred constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or retroviruses. A viral construct such as a 25 retroviral construct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is 30 already present in the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal WO 2005/116850 PCT/IB2005/002555 199 sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a 5 tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers. Hybridization assays 10 Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non-limiting examples of probes according to the present invention were previously described). Traditional hybridization assays include PCR, RT-PCR, Real- time PCR, RNase protection, in-situ hybridization, primer extension, Southern blots (DNA detection), dot or slot 15 blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Current Opin. Biotechnol. 10:71-75). Other detection methods include kits containing probes on a dipstick setup and the like. Hybridization based assays which allow the detection of a variant of interest (i.e., DNA 20 or RNA) in a biological sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long. Thus, the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate 25 to stringent hybridization conditions. Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaC1, 1 % SDS and 5 x 106 cpm 32p labeled probe, at 65 oC, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65 0 C and whereas moderate hybridization is effected using a hybridization solution 30 containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x 106 cpm 32 P labeled probe, at 65 oC, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 oC.
WO 2005/116850 PCT/IB2005/002555 200 More generally, hybridization of short nucleic acids (below 200 bp in length, e.g. 17-40 bp in length) can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency; (i) hybridization solution of 6 x SSC and I % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 5 100 pg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 1 - 1.5 oC below the Tn final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 oC below the Tm; (ii) hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH- 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ptg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, 10 hybridization temperature of 2 - 2.5 oC below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 oC below the Tm, final wash solution of 6 x SSC, and final wash at 22 oC; (iii) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 pg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 15 temperature. The detection of hybrid duplexes can be carried out by a number of methods. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Such labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art. A label can be conjugated to either the oligonucleotide 20 probes or the nucleic acids derived from the biological sample. Probes can be labeled according to numerous well known methods. Non-limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Non-limiting examples of detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies. Other detectable markers for use with probes, which can enable an increase in 25 sensitivity of the method of the invention, include biotin and radio-nucleotides. It will become evident to the person of ordinary skill that the choice of a particular label dictates the manner in which it is bound to the probe. For example, oligonucleotides of the present invention can be labeled subsequent to synthesis, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo 30 cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled WO 2005/116850 PCT/IB2005/002555 201 streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent. Alternatively, when fluorescently-labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamrnine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the 5 oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. 10 It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. For instance, samples may be hybridized to an irrelevant probe and treated with RNAse A prior to hybridization, to assess false hybridization. Although the present invention is not specifically dependent on the use of a label for the detection of a particular nucleic acid sequence, such a label might be beneficial, by increasing 15 the sensitivity of the detection. Furthermore, it enables automation. Probes can be labeled according to numerous well known methods. As commonly known, radioactive nucleotides can be incorporated into probes of the invention by several methods. Non- limiting examples of radioactive labels include 3 H, 14C, 32 p, and 35S. 20 Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heteropneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve 25 accuracy of hybridization assays. Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the like. Probes of the invention can be constructed of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA. 30 WO 2005/116850 PCT/IB2005/002555 202 NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example). 5 As used herein, a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions. Amplification of a selected, or target, nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8:14 10 Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non- limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al., 1989, Proc. NatI. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, 15 BioTechnology 6:1197-1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra). The terminology "amplification pair" (or "primer pair") refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification 20 processes, preferably a polymerase chain reaction. Other types of amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below. As commonly known in the art, the oligos are designed to bind to a complementary sequence under selected conditions. In one particular embodiment, amplification of a nucleic acid sample from a patient is 25 amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid. In one preferred embodiment, RT-PCR is carried out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA. In another preferred embodiment, the amplification of the differentially expressed nucleic acids is carried out simultaneously. It will be realized by a person skilled in the art that 30 such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences.
WO 2005/116850 PCT/IB2005/002555 203 The nucleic acid (i.e. DNA or RNA) for practicing the present invention may be obtained according to well known methods. Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes 5 employed. Optionally, the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system. As commonly known in the art, the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual, 10 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Current Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre-mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected 15 based on splice site accessibility. Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity. The polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non-limiting examples of these reactions are described in greater detail 20 below). The pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 'C, preferably less than 5 oC, more preferably less than 4 oC, most preferably less than 3 'C, ideally between 3 'C and 0 oC. Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), as described 25 in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Mullis et al., is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This technology provides one approach to the problems of low target sequence concentration. PCR can be used to directly increase the concentration of the target to an easily detectable level. This process for amplifying the target sequence involves the introduction of a 30 molar excess of two oligonucleotide primers which are complementary to their respective strands of the double-stranded target sequence to the DNA mixture containing the desired target WO 2005/116850 PCT/IB2005/002555 204 sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired 5 target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter. Because the desired segments of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified." 10 Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes referred to as "Ligase Amplification Reaction" (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids. In LCR, four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is 15 added to the mixture. Provided that there is complete complementarity at the junction, ligase will covalently link each set of hybridized molecules. Importantly, in LCR, two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA. LCR has also been used in combination with PCR to achieve enhanced detection of single-base 20 changes: see for example Segev, PCT Publication No. W09001069 Al (1990). However, because the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target-independent background signal. The use of LCR for mutant screening is limited to the examination of specific nucleic acid positions. Self-Sustained Synthetic Reaction (3SR/NASBA): The self-sustained sequence replication 25 reaction (3SR) is a transcription-based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest. In a cocktail of enzymes and substrates that includes a second primer, reverse transcriptase, RNase H, RNA polymerase and 30 ribo-and deoxyribonucleoside triphosphates, the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest. The WO 2005/116850 PCT/IB2005/002555 205 use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs). Q-Beta (QI$) Replicase: In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Q3 replicase. A previously identified 5 major problem with false positives resulting from the replication of unhybridized probes has been addressed through use of a sequence-specific ligation step. However, available thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be performed by T4 DNA ligase at low temperatures (37 degrees C.). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to 10 detect a mutation at the junction site, but not elsewhere. A successful diagnostic method must be very specific. A straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction. While the 3SR/NASBA, and Q3 systems are all able to generate a large quantity of signal, one or more of the enzymes involved in each cannot be used at high temperature (i.e., > 15 55 degrees C). Therefore the reaction temperatures cannot be raised to prevent non-specific hybridization of the probes. If probes are shortened in order to make them melt more easily at low temperatures, the likelihood of having more than one perfect match in a complex genome increases. For these reasons, PCR and LCR currently dominate the research field in detection technologies. 20 The basis of the amplification procedure in the PCR and LCR is the fact that the products of one cycle become usable templates in all subsequent cycles, consequently doubling the population with each cycle. The final yield of any such doubling system can be expressed as: (1+X)n =y, where "X" is the mean efficiency (percent copied in each cycle), "n" is the number of cycles, and "y" is the overall efficiency, or yield of the reaction. If every copy of a target DNA is 25 utilized as a template in every cycle of a polymerase chain reaction, then the mean efficiency is 100 %. If 20 cycles of PCR are performed, then the yield will be 220, or 1,048,576 copies of the starting material. If the reaction conditions reduce the mean efficiency to 85 %, then the yield in those 20 cycles will be only 1.8520, or 220,513 copies of the starting material. In other words, a PCR running at 85 % efficiency will yield only 21 % as much final product, compared to a WO 2005/116850 PCT/IB2005/002555 206 reaction running at 100 % efficiency. A reaction that is reduced to 50 % mean efficiency will yield less than I % of the possible product. In practice, routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield. At 5 50 % mean efficiency, it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive. In addition, any background products that amplify with a better mean efficiency than the intended target will become the dominant products. Also, many variables can influence the mean efficiency of PCR, including target DNA 10 length and secondary structure, primer length and design, primer and dNTP concentrations, and buffer composition, to name but a few. Contamination of the reaction with exogenous DNA (e.g., DNA spilled onto lab surfaces) or cross-contamination is also a major consideration. Reaction conditions must be carefully optimized for each different primer pair and target sequence, and the process can take days, even for an experienced investigator. The 15 laboriousness of this process, including numerous technical considerations and other factors, presents a significant drawback to using PCR in the clinical setting. Indeed, PCR has yet to penetrate the clinical market in a significant way. The same concerns arise with LCR, as LCR must also be optimized to use different oligonucleotide sequences for each target sequence. In addition, both methods require expensive equipment, capable of precise temperature cycling. 20 Many applications of nucleic acid detection technologies, such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences. One method of the detection of allele-specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand 25 and the 3' end of the primer. An allele-specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence. This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent 30 extension or have only a minimal effect.
WO 2005/116850 PCT/IB2005/002555 207 A similar 3'-mnismatch strategy is used with greater effect to prevent ligation in the LCR. Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target-independent background ligation products initiating the amplification. Moreover, the combination of PCR with subsequent LCR to identify the nucleotides at individual 5 positions is also a clearly cumbersome proposition for the clinical laboratory. The direct detection method according to various preferred embodiments of the present invention may be, for example a cycling probe reaction (CPR) or a branched DNA analysis. When a sufficient amount of a nucleic acid to be detected is available, there are advantages to detecting that sequence directly, instead of making more copies of that target, 10 (e.g., as in PCR and LCR). Most notably, a method that does not amplify the signal exponentially is more amenable to quantitative analysis. Even if the signal is enhanced by attaching multiple dyes to a single oligonucleotide, he correlation between the final signal intensity and amount of target is direct. Such a system has an additional advantage that the products of the reaction will not themselves promote further reaction, so contamination of lab 15 surfaces by the products is not as much of a concern. Recently devised techniques have sought to eliminate the use of radioactivity and/or improve the sensitivity in automatable formats. Two examples are the "Cycling Probe Reaction" (CPR), and "Branched DNA" (bDNA). Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses a long chimeric oligonucleotide in which a central portion is made of RNA while the two termini are made of 20 DNA. Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process. The signal, in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the 25 oligonucleotide is vulnerable to RNases that may carried through sample preparation. Branched DNA: Branched DNA (bDNA), involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased. 30 The detection of at least one sequence change according to various preferred embodiments of the present invention may be accomplished by, for example restriction fragment WO 2005/116850 PCT/IB2005/002555 208 length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymorphism (SSCP) analysis or Dideoxy fingerprinting (ddF). The demand for tests which allow the detection of specific nucleic acid sequences and 5 sequence changes is growing rapidly in clinical diagnostics. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing. A handful of methods have been devised to scan nucleic acid segments for mutations. One option is to determnnine the entire gene sequence of each test sample (e.g., a bacterial isolate). 10 For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor-intense and expensive to be practical and effective in the clinical setting. 15 In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The 20 presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain-terminating nucleotide analogs. Restriction fragment length polymorphism (RFLP): For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level 25 of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymorphism [RFLP] analysis). 30 Single point mutations have been also detected by the creation or destruction of RFLPs. Mutations are detected and bcalized by the presence and size of the RNA fragments generated WO 2005/116850 PCT/IB2005/002555 209 by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals 5 which are not suited for use in a clinical laboratory. RFLP analysis suffers from low sensitivity and requires a large amount of sample. When RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair 10 recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are 15 limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered. Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity, but again, these are few in number. Allele specific oligonucleotide (ASO): If the change is not in a recognition sequence, then allele-specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the 20 mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis-match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations. The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and 25 wild-type alleles. The ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. 30 With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test. That is to say, they are WO 2005/116850 PCT/IB2005/002555 210 inapplicable when one needs to detect the presence of a mutation within a gene or sequence of interest. Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence 5 changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis" (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences 10 because of the corresponding changes in their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of G-C base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the faction of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to 15 ensure that the amplified sequence has a low dissociation temperature. Modifications of the technique have been developed, using temperature gradients, and the method can be also applied to RNA:RNA duplexes. Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires 20 specialized equipment tb prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE). CDGE requires 25 that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of mutations. A technique analogous to DGGE, termed temperature gradient gel electrophoresis (TGGE), uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperature gradient perpendicularly oriented 30 relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA WO 2005/116850 PCT/IB2005/002555 211 therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel. Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single-Strand Conformation Polymorphism" (SSCP) was developed by Hayashi, Sekya and 5 colleagues and is based on the observation that single strands of nucleic acid (an take on characteristic conformations in non-denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as 10 an assay for sequence variations. The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that intra-molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A 15 serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fingerprinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is 20 performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal 25 detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order 1 cover the entire fragment. SSCP and DGGE 30 have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able WO 2005/116850 PCT/IB2005/002555 212 to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be 5 screened. According to a presently preferred embodiment of the present invention the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable technique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self-sustained synthetic 10 reaction, QP-Replicase, cycling probe reaction, branched DNA, restriction fragment length polymorphism analysis, mismatch chemical cleavage, heteroduplex analysis, allele-specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy fingerprinting. Detection may also optionally be performed with a chip or other such device. The nucleic 15 acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group. This reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station. describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. 20 Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already incorporated into the nucleic acid, which is now bound to the probes attached to the chip. Since the sequence and position of each probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given probe canbe determined. 25 It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily. Amino acid sequences and peptides 30 The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one WO 2005/116850 PCT/IB2005/002555 213 or more amino acid residue is an analog or mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins. The terms "polypeptide," "peptide" and "protein" include glycoproteins, as well as non-glycoproteins. 5 Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic 10 acid sequence) and therefore involves different chemistry. Solid phase polypeptide synthesis procedures are well known in the art and further described by John Morrow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid 15 chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516 20 544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511 514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463. 25 The present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein. The present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % 30 homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, WO 2005/116850 PCT/IB2005/002555 214 optionally and preferably including the following: fltering on (this option filters repetitive or low-complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E value is 10, gap costs are 11, 1 (initialization and extension), and number of alignments shown is 50. Nucleotide (nucleic acid) sequence 5 homology/identity is preferably determined by using the BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and also preferably include having an E value of 10, filtering low complexity sequences and a word size of 11. Finally, the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as 10 deletions, insertions or substitutions of one or more amino acids, either naturally occurring or artificially induced, either randomly or in a targeted fashion. It will be appreciated that peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which 15 may have, for example, modifications rendering the peptides more table while in a body or more capable of penetrating into cells. Such modifications include, but are not limited to N terminus modification, C terminus modification, peptide bond modification, including, but not limited to, CH2-NH, CH2-S, CH2-S=O0, O=C-NH, CH2-O, CH2-CH2, S=C-NH, CH=CH or CF=CH, backbone modifications, and residue modification. Methods for preparing 20 peptidomimetic compounds are well known in the art and are specified. Further details in this respect are provided hereinunder. Peptide bonds (-CO-NH-) within the peptide may be substituted, for example, by N methylated bonds (-N(CH3)-CO-), ester bonds (-C(R)H-C-O-O-C(R)-N-), ketomethylen bonds (-CO-CH2-), (X-aza bonds (-NH-N(R)-CO-), wherein R is any alkyl, e.g, methyl, carba bonds ( 25 CH2-NH-), hydroxyethylene bonds (-CH(OH)-CH2-), thioamide bonds (-CS-NH-), olefinic double bonds (-CH=CH-), retro amide bonds (-NH-CO-), peptide derivatives (-N(R)-CH2-CO-), wherein R is the "normal" side chain, naturally presented on the carbon atom. These modifications can occur at any of the bonds along the peptide chain and even at several (2-3) at the same time.
WO 2005/116850 PCT/IB2005/002555 215 Natural aromatic amino acids, Trp, Tyr and Phe, may be substituted for synthetic non natural acid such as Phenylglycine, TIC, naphthylelanine (Nol), ring-methylated derivatives of Phe, halogenated derivatives of Phe or o-methyl-Tyr. In addition to the above, the peptides of the present invention may also include one or 5 more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc). As used herein in the specification and in the claims section below the term "amino acid" or "amino acids" is understood to include the 20 naturally occurring amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, 10 phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and ornithine. Furthermore, the term "amino acid" includes both D- and L-amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention. 15 Table 1 Non-conventional amino Code Non-conventional amino acid Code acid a-aminobutyric acid Abu L-N-methylalanine Nmala ao-amino- a-methylbutyrate Mgabu L-N-methylarginine Nmarg aminocyclopropane- Cpro L-N-methylasparagine Nmasn Carboxylate L-N-methylaspartic acid Nmasp aminoisobutyric acid Aib L-N-methylcysteine Nmcys aminonorbornyl- Norb L-N-methylglutamine Nmgin Carboxylate L-N-methylglutamic acid Nmglu Cyclohexylalanine Chexa L-N-methylhistidine Nmhis Cyclopentylalanine Cpen L-N-methylisolleucine Nmile D-alanine Dal L-N-methylleucine Nmleu D-arginine Darg L-N-methyllysine Nmlys WO 2005/116850 PCT/1B2005/002555 216 D-aspartic acid D~asp L-N-methyli-nethionine Nrnmet D-cysteine Dcys L-N-methyinorleucine Nrnnle D-glutamine DgnL-N-methylnorvaline Nrnnva D-glutamic acid Dglu L-N- methylomithine Nmorn D-histidine DhsL-N-methylphenylalanine Nmphe D-isoleucine Dile L-N-methylproline Nmpro D-ec ~ lu L-N-methylserine Nmser D-ymeDlys L-N-methylthreonine Nmthr D-methionine 5m-et L-N-methyltryptophan NKmtrp D-omnithine Don L-N-methyltyrosine Nmtyr D-phenylalanine Dphe L-N-methylvaline Nmval __________pr L-N-methylethylglycine Nmetg D-eieDser L-N-rnethyl-t-butylglycine Nmtbug D-threonine Dthr L-norleucine Me D-tryptophan Dtrp L-norvaline Nva D-tyrosine Dtyr ix-rethyl-aminoisobutyrate Maib D-vahn'e Dval xc-methyl-7-aminobutyrate Mgabu D-a-methylalanine Dmala xc-methylcyclohexylalanine Mchexa D-cc-methylarginine Dmarg cc-methylcyclopentylalanine Mcpen D-cx-methylasparagine Dmasn oX-methyl- c-napthylalanine Manap D-ax-methylaspartate Dmasp (X- methylpenicillamine Mpen D-ax-methylcysteine my -( 4 -aminobutyl)gyie Nl D-cic-methylglutamine Dmgln N-(2-aniinoethyl)glycine Naeg D-cx-methylhistidine Dmnhis N-( 3 -aminopropyl)glycine Nom D-ax-methylisoleucine Dmile N- amnino-c(x-methylbutyrate Nmaabu D-oc-methylleucine Dmleu ci-napthylalanine Anap D-u-methyllysine Dmlys N-benzylglycine Nphe D-oc-methylmethionine Drmet N-( 2 -carbamylethyl)glycine Ngln D-cc-methylornithine Dmom N-(arbamylmethyl)glycine Nasn WO 2005/116850 PCT/IB2005/002555 217 D-a--methylphenylalanine Dmrnphe N-(2-carboxyethyl)glycine Nglu D-a-methylproline Dmpro N-(carboxymrnethyl)glycine Nasp D-c-methylserine Dmser N-cyclobutylglycine Ncbut D-ax-methylthreonine Dmthr N-cycloheptylglycine Nehep D-c-methyltryptophan Dmtrp N-cyclohexylglycine Nchex D-a-methyltyrosine Dmty N-cyclodecylglycine Ncdec D-a-methylvaline Dmrnval N-cyclododeclglycine Ncdod D-a-methylalnine Dnmala N-cyclooctylglycine Ncoct D-ax-methylarginine Dnmarg N-cyclopropylglycine Ncpro D-c-methylasparagine Dnmasn N-cycloundecylglycine Ncund D-ct-methylasparatate Dnmasp N-(2,2-diphenylethyl)glycine Nbhm D-a-methylcysteine Dnmcys N-(3,3- Nbhe diphenylpropyl)glycine D-N-methylleucine Dnmleu N-(3-indolylyethyl) glycine Nhtrp D-N-methyllysine Dnmlys N-methyl-y-aminobutyrate Nmgabu N- Nmchexa D-N-methylmethionine Dnmmet methylcyclohexylalanine D-N-methylornithine Dnmorn N-methylcyclopentylalanine Nmcpen N-methylglycine Nala D-N-methylphenylalanine Dnmphe N-methylaminoisobutyrate Nmaib D-N-methylproline Dnmpro N-(1-methylpropyl)glycine Nile D-N-methylserine Dnmser N-(2-methylpropyl)glycine Nile D-N-methylserine Dnmser N-(2-methylpropyl)glycine Nleu D-N-methylthreonine Dnmthr D-N-methyltryptophan Dnmtrp N-(1-methylethyl)glycine Nva D-N-methyltyrosine Dnmtyr N-methyla- napthylalanine Nmanap D-N-methylvaline Dnmval N-methylpenicillamine Nmpen y-aminobutyric acid Gabu N-(p-hydroxyphenyl)glycine Nhtyr L-t-butylglycine Tbug N-(thiomethyl)glycine Ncys L-ethylglycine Etg penicillamine Pen WO 2005/116850 PCT/IB2005/002555 218 L-homophenylalanine Hphe L-u-methylalanine Mala L-c-methylarginine Marg L-c-methylasparagine Masn L-t-methylaspartate Masp L-c-methyl-t-butylglycine Mtbug L-a- methylcysteine Mcys L- methylethylglycine Metg L-(x-methylglutamine Mgln L-a-methylglutamate Mglu L-c-methylhistidine Mhis L-a.-methylhomo Mhphe phenylalanine L-ac-methylisoleucine Mile N-(2-methylthioethyl)glycine Nmet D-N-methylglutamine Dnmgln N-(3- Narg guanidinopropyl)glycine D-N-methylglutamate Dnmglu N- (1-hydroxyethyl)glycine Nthr D-N-methylhistidine Dnmhis N-(hydroxyethyl)glycine Nser D-N-methylisoleucine Dnmile N-(imidazolylethyl)glycine Nhis D-N-methylleucine Dnmleu N-(3-indolylyethyl)glycine Nhtrp D-N-methyllysine Dnmlys N-methyl- y- aminobutyrate Nmgabu N- Nmchexa D-N-methylmethionine Dnmmet methylcyclohexylalanine D-N-methylomrnithine Dnmomrn N-methylcyclopentylalanine Nmcpen N-methylglycine Nala D-N-methylphenylalanine Dnmphe N-methylaminoisobutyrate Nmaib D-N-methylproline Dnmpro N- (1-methylpropyl)glycine Nile D-N-methylserine Dnmser N-(2-methylpropyl)glycine Nleu D-N-methylthreonine Dnmthr D-N-methyltryptophan Dnmtrp N-(1-methylethyl)glycine Nval D-N-methyltyrosine Dnmtyr N-methyla- napthylalanine Nmanap D-N-methylvaline Dnmval N-methylpenicillamine Nmpen y-aminobutyric acid Gabu N-(p-hydroxyphenyl)glycine Nhtyr L-t-butylglycine Tbug N-(thiomethyl)glycine Ncys L-ethylglycine Etg penicillamine Pen L-homophenylalanine Hphe L-a-methylalanine Mala WO 2005/116850 PCT/IB2005/002555 219 L-a-methylarginine Marg L-a-methylasparagine Masn L-c-methylaspartate Masp L-a-methyl-t-butylglycine Mtbug L-c-methylcysteine Mcys L- methylethylglycine Metg L-c-methylglutamine Mgln L-a-methylglutamate Mglu L-a-methylhistidine Mhis L-ca- Mhphe methylhomophenylalanine L-a-methylisoleucine Mile N-(2-methylthioethyl)glycine Nmet L-c-methylleucine Mleu L-c-methyllysine Mlys L-c-methylmethionine Mmet L-a-methylnorleucine Mnle L-ca-methylnorvaline Mnva L-a-methylornithine Morn L-c-methylphenylalanine Mphe L-a-methylproline Mpro L-c-methylserine mser L-a-methylthreonine Mthr L-c-methylvaline Mtrp L-ax-methyltyrosine Mtyr L-a-methylleucine Mval L-N- Nmhphe Nnbhm methylhomophenylalanine N-(N-(2,2-diphenylethyl) N-(N-(3,3-diphenylpropyl) carbamylmethyl-glycine Nnbhm carbamylmethyl(1)glycine Nnbhe 1 -carboxy- 1-(2,2-diphenyl Nmbc ethylamino)cyclopropane Table 1 Cont. Since the peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble form, the peptides of the present invention preferably 5 include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxyl-containing side chain. The peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide 10 characteristics, cyclic forms of the peptide can also be utilized.
WO 2005/116850 PCT/IB2005/002555 220 The peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques. These methods include exclusive solid phase synthesis well known in the art, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) 5 and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Synthetic peptides can be purified by preparative high performance liquid chromatography and the composition of which can be confirmed via amino acid sequencing. In cases where large amounts of the peptides of the present invention are desired, the 10 peptides of the present invention can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & 15 Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 and also as described above. Antibodies 20 "Antibody" refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen). The recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad-immunoglobulin variable region genes. 25 Antibodies exist, e.g., as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab' and F(ab)' 2 fragments. The term "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric 30 antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain WO 2005/116850 PCT/IB2005/002555 221 constant region domains, CHI, CH2 and CH3, but does not include the heavy chain variable region. The functional fragments of antibodies, such as Fab, F(ab')2, and Fv that are capable of binding to macrophages, are described as follows: (1) Fab, the fragment which contains a 5 monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment 10 of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds; (4) Fv, defined as a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (5) Single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the 15 light chain and the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of producing polyclonal and monoclonal antibodies as well as fragments thereof are well known in the art (See for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988, incorporated herein by reference). 20 Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic 25 cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly. These methods are described, for 30 example, by Goldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and references contained therein, which patents are hereby incorporated by reference in their entirety. See also Porter, R.
WO 2005/116850 PCT/IB2005/002555 222 R. [Biochem. J. 73: 119-126 (1959)]. Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody. 5 Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in hbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variable chains can be linked by an intermolecular disulfide bond or cross linked by chemicals such as glutaraldehyde. Preferably, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (sFv) are 10 prepared by constructing a structural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The structural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing sFvs are described, for example, by [Whitlow and Filpula, Methods 2: 15 97-105 (1991); Bird et al., Science 242:423-426 (1988); Pack et al., Bio/Technology 11:1271-77 (1993); and U.S. Pat. No. 4,946,778, which is hereby incorporated by reference in its entirety. Another form of an antibody fragment is a peptide coding for a single complementarity determining region (CDR). CDR peptides ("minimal recognition units") can be obtained by constructing genes encoding the CDR of an antibody of interest. Such genes are prepared, for 20 example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Larrick and Fry [Methods, 2:106-10 (1991)]. Humanized forms of non-human (e.g., murine) antibodies are chimeric molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived 25 from non-human immunoglobulin. Humanized antibodies include human immunoglobulins (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non-human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by corresponding non-human 30 residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. In general, the humanized WO 2005/116850 PCT/IB2005/002555 223 antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions correspond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at 5 least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323 329 (1988); and Presta, Curr. Op. Struct. Biol., 2:593-596 (1992)]. Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized antibody has one or more amino acid residues introduced into it from a source which 10 is non-human. These non-human amino acid residues are often referred to as import residues, which are typically taken from an import variable domain. Humanization can be essentially performed following the method of Winter and co-workers [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature 332:323-327 (1988); Verhoeyen et al., Science, 239:1534 1536 (1988)], by substituting rodent CDRs or CDR sequences for the corresponding sequences 15 of a human antibody. Accordingly, such humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the corresponding sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies. 20 Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and Boerner et al. are also available for the preparation of human monoclonal antibodies (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boerner et al., J. Immunol., 25 147(1):86-95 (1991)]. Similarly, human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g., mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene rearrangement, assembly, and antibody repertoire. This approach is described, 30 for example, in U.S. Pat. Nos. 5,545,807; 5,545,806; 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: Marks et al., Bio/Technology 10,: 779- WO 2005/116850 PCT/IB2005/002555 224 783 (1992); Lonberg et al., Nature 368: 856-859 (1994); Morrison, Nature 368 812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 (1996); Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar, Intern. Rev. Immunol. 13, 65-93 (1995). Preferably, the antibody of this aspect of the present invention specifically binds at least 5 one epitope of the polypeptide variants of the present invention. As used herein, the term "epitope" refers to any antigenic determinant on an antigen to which the paratope of an antibody binds. Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three 10 dimensional structural characteristics, as well as specific charge characteristics. Optionally, a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and/or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site. 15 An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present invention in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination. One or more unique sequence portions may optiomlly combine with one or more other non-contiguous 20 portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope. Immunoassays In another embodiment of the present invention, an immunoassay can be used to 25 qualitatively or quantitatively detect and analyze markers in a sample. This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex of the antibody bound to the marker in the sample. To prepare an antibody that specifically binds to a marker, purified protein markers can be used. Antibodies that specifically bind to a protein marker can be prepared using any suitable 30 methods known in the art.
WO 2005/116850 PCT/IB2005/002555 225 After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme-linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g., U.S. Pat. Nos. 5 4,366,241; 4,376,110; 4,517,288; and 4,837,168). Generally, a sample obtained from a subject can be contacted with the antibody that specifically binds the marker. Optionally, the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample. Examples of solid supports include but are not limited to glass or plastic in the form of, e.g., a microtiter 10 plate, a stick, a bead, or a microbead. Antibodies can also be attached to a solid support. After incubating the sample with antibodies, the mixture is washed and the antibody marker complex formed can be detected. This can be accomplished by incubating the washed mixture with a detection reagent. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound 15 marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, 20 preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like. Usually the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10 oC to 40 oC. The immunoassay can be used to determine a test amount of a marker in a sample from a 25 subject. First, a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will form an antibody-marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above. The amount of an antibody-marker complex can optionally be determined by comparing to a standard. As noted above, the test amount of marker need not be 30 measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal.
WO 2005/116850 PCT/IB2005/002555 226 Preferably used are antibodies which specifically interact with the polypeptides of the present invention and not with wild type proteins or other isoforms thereof, for example. Such antibodies are directed, for example, to the unique sequence portions of the polypeptide variants of the present invention, including but not limited to bridges, heads, tails and insertions described 5 in greater detail below. Preferred embodiments of antibodies according to the present invention are described in greater detail with regard to the section entitled "Antibodies". Radio-immunoassay (RIA):. In one version, this method involves precipitation of the desired substrate and in the methods detailed hereinbelow, with a specific antibody and radiolabelled antibody binding protein (e.g., protein A labeled with 1125) immobilized on a 10 precipitable carrier such as agarose beads. The number of counts in the precipitated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the 15 amount of substrate in the added sample. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a 20 colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. 25 Western blot: This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabelled or enzyme linked as 30 described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and WO 2005/116850 PCT/IB2005/002555 227 determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Inmunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme 5 linked or linked to fluorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the 10 wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously. 15 Radio-imaging Methods These methods include but are not limited to, positron emission tomography (PET) single photon emission computed tbmography (SPECT). Both of these techniques are non invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be 20 used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, US Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby incorporated by reference as if fully set forth herein. 25 Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, viruses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20 50 consecutive amino acids derived from the polypeptide sequences of the present invention. 30 Methods of constructing such display libraries are well known in the art. Such methods are described in, for example, Young AC, et al., "The three-dimensional structures of a WO 2005/116850 PCT/IB2005/002555 228 polysaccharide binding antibody to Cryptococcus neoformans and its complex with a peptide from a phage display library: implications for the identification of peptide mimotopes" J Mol Biol 1997 Dec 12;274(4):622-34; Giebel LB et al. "Screening of cyclic peptide phage libraries identifies ligands that bind streptavidin with high affinities" Biochemistry 1995 Nov 5 28;34(47):15430-5; Davies EL et al., "Selection of specific phage-display antibodies using libraries derived from chicken immunoglobulin genes" J Immunol Methods 1995 Oct 12; 186(1):125-35; Jones C RT al. "Current trends in molecular recognition and bioseparation" J Chromatogr A 1995 Jul 14;707(1):3-22; Deng SJ et al. "Basis for selection of improved carbohydrate-binding single-chain antibodies from synthetic gene libraries" Proc Natl Acad Sci 10 U S A 1995 May 23;92(11):4992-6; and Deng SJ etal. "Selection of antibody single-chain variable fragments with improved carbohydrate binding by phage display" J Biol Chem 1994 Apr 1;269(13):9533-8, which are incorporated herein by reference. 15 The following sections relate to Candidate Marker Examples (first section) and to Experimental Data for these Marker Examples (second section). It should be noted that Table numbering is restarted within each section. CANDIDATE MARKER EXAMPLES SECTION This Section relates to Examples of sequences according to the present invention, 20 including illustrative methods of selection thereof. Description of the methodology undertaken to uncover the biomolecular sequences of the present invention Human ESTs and cDNAs were obtained from GenBank versions 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank/release.notes/gbl36.release.notes); NCBI genome assembly of April 25 2003; RefSeq sequences from June 2003; Genbank version 139 (December 2003); Human Genome from NCBI (Build 34) (from Oct 2003); and RefSeq sequences from December 2003; and from the LifeSeq library of Incyte Corporation (ES Ts only; Wilmington, DE, USA). With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human 30 nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, WO 2005/116850 PCT/IB2005/002555 229 see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in GenBank, may be found in Boguski et al, Nat Genet. 1993 Aug;4(4):332-3; all of which are hereby incorporated by reference as if fully set forth herein). Novel splice variants were predicted using the LEADS clustering and assembly system 5 as described in Sorek, R., Ast, G. & Graur, D. Alu-containing exons are alternatively spliced. Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as US20040101876 on May 27 2004; all of which are hereby incorporated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobulins. It then aligns the expressed sequences to the genome 10 taking alternatively splicing into account and clusters overlapping expressed sequences into "clusters" that represent genes or partial genes. These were annotated using the GeneCarta (Compugen, Tel-Aviv, Israel) platform. The GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as 15 SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain structures, known and predicted proteins and detailed homology reports. A brief explanation is provided with regard to the method of selecting the candidates. However, it should noted that this explanation is provided for descriptive purposes only, and is not intended to be limiting in any way. The potential markers were identified by a computational 20 process that was designed to find genes and/or their splice variants that are over-expressed in tumor tissues, by using databases of expressed sequences. Various parameters related to the information in the EST libraries, determined according to a manual classification process, were used to assist in locating genes and/or splice variants thereof that are over-expressed in cancerous tissues. The detailed description of the selection method is presented in Example 1 25 below. The cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1. EXAMPLE 1 Identification of differentially expressed gene products - Algorithm 30 In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes ) an algorithm based on an analysis of frequencies was WO 2005/116850 PCT/IB2005/002555 230 configured. A specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow. Dry analysis Library annotation - EST libraries are manually classified according to: 5 (i) Tissue origin (ii) Biological source - Examples of frequently used biological sources for construction of EST libraries include cancer cell-lines; normal tissues; cancer tissues; fetal tissues; and others such as normal cell lines and pools of normal cell-lines, cancer cell-lines and 10 combinations thereof. . A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above. (iii) Protocol of library construction - various methods are known in the art for library construction including normalized library construction; 15 non-normalized library construction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library construction is not indicated. The following rules were followed: EST libraries originating from identical biological samples are considered as a single 20 library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted. If the percentage of such sequences (as compared to all 25 other sequences) was at least 4 standard deviations abowe the average for all libraries being analyzed, this library was tagged as being contaminated and was eliminated from further consideration in the below analysis (see also Sorek, R. & Safer, H.M. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 31, 1067-74 (2003)for further details).
WO 2005/116850 PCT/IB2005/002555 231 Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest were analyzed. Splice variants were identified by using the LEADS software package as described above. 5 EXAMPLE 2 Identification of genes over expressed in cancer. Two different scoring algorithms were developed. Libraries score -candidate sequences which are supported by a number of cancer libraries, are more likely to serve as specific and effective diagnostic markers. 10 The basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries. Library counting: Small libraries (e.g., less than 1000 sequences) were excluded from 15 consideration unless they participate in the cluster. For thid reason, the total number of libraries is actually adjusted for each cluster. Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the normal libraries it might indicate actual over-expression. The algorithm 20 Clone counting: For counting EST clones each library protocol class was given a weight based on our belief of how much the protocol reflects actual expression levels: (i) non-normalized : 1 (ii) normalized : 0.2 (iii) all other classes : 0.1 25 Clones number score - The total weighted number of EST clones from cancer libraries was compared to the EST clones from normal libraries. To avoid cases where one library contributes to the majority of the score, the contribution of the library that gives most clones for a given cluster was limited to 2 clones. The score was computed as 30 WO 2005/116850 PCT/IB2005/002555 232 c+l C n+l1 N where: c - weighted number of "cancer" clones in the cluster. 5 C- weighted number of clones in all "cancer" libraries. n - weighted number of "normal" clones in the cluster. N- weighted number of clones in all "normal" libraries. Clones number score significance - Fisher exact test was used to check if EST clones from cancer libraries are significantly over-represented in the cluster as compared to the total 10 number of EST clones from cancer and normal libraries. Two search approaches were used to find either general cancer-specific candidates or tumor specific candidates. * Libraries/sequences originating from tumor tissues are counted as well as libraries originating from cancer cell-lines ("normal" cell-lines were 15 ignored). * Only libraries/sequences originating from tumor tissues are counted EXAMPLE 3 Identification of tissue specific genes 20 For detection of tissue specific clusters, tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue". 25 The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones (weighed - as described above) from tissue T in the cluster; and WO 2005/116850 PCT/IB2005/002555 233 2. Clones fromrn the tissue T are at least 40 % from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant. 5 EXAMPLE 4 Identification of splice variants over expressed in cancer of clusters which are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. 10 Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A "segment" (sometimes referred also as "seg" or "node") is defined as the shortest contiguous transcribed region without known splicing inside. 15 Only reliable ESTs were considered for region and segment analysis. An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and 20 (iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and 25 (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB). 30 The set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI).
WO 2005/116850 PCT/IB2005/002555 234 (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to S1 group. Fisher Exact Test P-values were used to check if: 5 S I is significantly enriched by cancer EST clones compared to S2; and S1 is significantly enriched by cancer EST clones compared to cluster background (S1+S2+S3). Identification of unique sequence regions and division of the group of transcripts accordingly is illustrated in Figure 2. Each of these unique sequence regions corresponds to a 10 segment, also termed herein a "node". Region 1: common to all transcripts, thus it is preferably not considered for determining 15 differential expression between variants; Region 2: specific to Transcript 1; Region 3: specific to Transcripts 2+3; Region 4: specific to Transcript 3; Region 5: specific to Transcripts 1 and 2; Region 6: specific to Transcript 1. 20 EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer A search for EST supported (no mRNA) regions for genes of: (i) known cancer markers (ii) Genes shown to be over-expressed in cancer in published micro-array experiments. 25 Reliable EST supported-regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or 30 (iv) 3 libraries.
WO 2005/116850 PCT/IB2005/002555 235 Actual Marker Examples The following examples relate to specific actual marker examples. It should be noted that Table numbering is restarted within each example related to a particular Cluster, as indicated by the titles below. 5 EXPERIMENTAL EXAMPLES SECTION This Section relates to Examples describing experiments involving these sequences, and illustrative, non-limiting examples of methods, assays and uses thereof. The materials and experimental procedures are explained first, as all experiments used them as a basis for the work 10 that was performed. The markers of the present invention were tested with regard to their expression in various cancerous and non-cancerous tissue samples. A description of the samples used in the panel is provided in Table 1 below. A description of the samples used in the normal tissue panel 15 is provided in Table 2 below. Tests were then performed as described in the "Materials and Experimental Procedures" section below. Table 1: Tissue samples in testing panel Lot Sample numbe Gra name r Source Tissue Pathology de gender/age 2-A-Pap ILS- Papillary Adeno G2 1408 ABS ovary adenocarcinoma 2 53/F 3-A-Pap ILS- Papillary Adeno G2 1431 ABS ovary adenocarcinoma 2 52/F 4-A-Pap CystAdeno ILS- Papillary G2 7286 ABS ovary cystadenocarcinoma 2 50/F 1-A-Pap ILS- ABS ovary Papillary 3 73/F WO 2005/116850 PCT/IB2005/002555 236 Adeno G3 1406 adenocarcinoma 14-B-Adeno A50111 BioChai G2 1 n ovary Adenocarcinoma 2 41/F 5-G-Adeno 99-12- Adenocarcinoma G3 G432 GOG ovary (Stage3C) 3 46/F 6-A-Adeno G3 A0106 ABS ovary adenocarcinoma 3 51/F
IND
7-A-Adeno G3 00375 ABS ovary adenocarcinoma 3 59/F A50 11 BioChai 8-B-Adeno G3 3 n ovary adenocarcinoma 3 60/F 9-G-Adeno 99-06- Adenocarcinoma G3 G901 GOG ovary (maybe serous) 3 84/F 10-B-Adeno A40706 Biochai G3 9 n ovary Adenocarcinoma 3 60/F 11-B-Adeno A40706 Biochai G3 8 n ovary Adenocarcinoma 3 49/F 12-B-Adeno A40602 Biochai G3 3 n ovary Adenocarcinoma 3 45/F 13-G-Adeno 94-05- right Metastasis G3 7603 GOG ovary adenocarcinoma 3 67/F 15-B-Adeno A40706 BioChai G3 5 n ovary Carcinoma 3 27/F 109038 Clontec 16-Ct-Adeno 7h ovary Carcinoma NOS F Mucinous 22-A-Muc cystadenocarcinoma CystAde G2 A0139 ABS ovary (StagelC) 2 72/F Mucinous 21-G- Muc 95-10- cystadenocarcinoma CystAde G2-3 G020 GOG ovary (Stage2) 2-3 44/F WO 2005/116850 PCT/IB2005/002555 237 Mucinous 23-A-Muc VNM- cystadenocarcinoma CystAde G3 00187 BS ovary with low malignant 3 45/F 17-B-Muc A50408 BioChai Mucinous Adeno G3 4 n ovary adenocarcinoma 3 51/F 18-B-Muc A50408 BioChai Mucinous Adeno G3 3 n ovary adenocarcinoma 3 45/F 19- B-Muc A50408 BioChai Mucinous Adeno G3 5 n ovary adenocarcinoma 34/F 20- A-Pap USA- Papillary mucinous Muc CystAde 00273 ABS ovary cystadenocarcinoma 45/F 33-B-Pap Sero CystAde A50317 BioChai Serous papillary G1 5 n ovary cystadenocarcinoma 1 41/F 25-A-Pap Papillary serous Sero Adeno adenocarcinoma G3 N0021 ABS ovary (StageT3CN1MX) 3 55/F 24-G- Pap 2001 Sero Adeno 07- Papillary serous G3 G801 GOG ovary adenocarcinoma 368/F 30-G-Pap 2001 Sero Adeno 08- Papillary serous G3 G0O11 GOG ovary carcinoma (StagelC) 372/F 70-G-Pap Sero Adeno 95-08- Papillary serous G3 G069 GOG ovary adenocarcinoma 3 F 31-B-Pap Sero CystAde A50317 BioChai Serous papillary G3 6 n ovary cystadenocarcinoma 3 52/F 32-G-Pap 93-09- GOG ovary Serous papillary 3 F WO 2005/116850 PCT/IB2005/002555 238 Sero CystAde 4901 cystadenocarcinoma G3 Papillary serous 66-G-Pap 2000- carcinoma (metastais Sero Adeno 01- of primary G3 SIV G413 GOG ovary peritoneum) (Stage4) F 2001- Serous 29-G-Sero 12- right adenocarcinoma Adeno G3 G035 GOG ovary (Stage3A) 3 50/F Mixed epithelial cystadenocarcinoma with mucinous, endometrioid, 41-G-Mix squamous and Sero/Muc/End 98-03- papillary serous o G2 G803 GOG ovary (Stage2) 2 38 Papillary serous and endometrioid 40-G-Mix 95-11- ovary,end cystadenocarcinoma Sero/Endo G2 G006 GOG ometrium (Stage3C) 2 49/F 2002- Mixed serous and 37-G-Mix 05- endometrioid Sero/Endo G3 G513 GOG ovary adenocarcinoma 3 56/F Mixed serous and 2002- endometrioid 38-G-Mix 05- adenocarcinoma of Sero/Endo G3 G509 GOG ovary mullerian (Stage3C) 3 64/F 2001- Mixed serous and 39--G-Mix 12- endometrioid Sero/Endo G3 G037 GOG ovary adenocarcinoma 3 F WO 2005/116850 PCT/IB2005/002555 239 2000 36-G-Endo 09- Endometrial Adeno G1-2 G621 GOG ovary adenocarcinoma 1-2 69/F 35-G-Endo 94-08- right Endometrioid Adeno G2 7604 GOG ovary adenocarcinoma 2 39/F Papillary 34-G-Pap endometrioid Endo Adeno 95-04- adenocarcinoma G3 2002 GOG ovary (Stage3C) 3 68/F 2001 43-G-Clear 10- Clear cell cell Adeno G3 G002 GOG ovary adenocarcinoma 3 74/F 2001- Clear cell 44-G-Clear 07- adenocarcinoma cell Adeno G084 GOG ovary (Stage3A) 73/F Epithelial adenocarcinoma of 42-G-Adeno 98-08- borderline borderline G001 GOG ovary malignancy 46/F 59-G-Sero CysAdenoFibr 98-12- Serous oma G401 GOG ovary CysAdenoFibroma 77/F Serous 63-G-Sero 2000- CysAdenoFibroma of CysAdenoFibr 10- borderline oma G620 GOG ovary malignancy 71/F 64-G-Ben Sero 99-06- Bengin Serous CysAdenoma G039 GOG ovary CysAdenoma 57/F 56-G-Ben 99-01- GOG left ovary Bengin mucinus 46/F WO 2005/116850 PCT/IB2005/002555 240 Muc G407 cysadenoma CysAdeno 62-G-Ben Muc 99-10- Bengin mucinus CysAdenoma G442 GOG ovary cysadenoma 32/F 60-G- Muc 99-01- Mucinous CysAdenoma G043 GOG ovary Cysadenoma 40/F 61-G- Muc 99-07- Mucinous CysAdenoma G011 GOG ovary Cysadenoma 63/F 65-G Endometriom 97-11- right a G320 GOG ovary Endometrioma 41/F 57-B- A40706 BioChai Thecoma 6 n ovary Thecoma 56/F Struma 58-CG-Stru ovary/monodermal teratoma CG-177 Ichilov ovary teratoma 58/F A50111 BioChai Normal (matched 50-B-N M8 4 n ovary tumor A501113) 60/F A50111 BioChai Normal (matched 49-B-N M14 2 n ovary tumorA501111) 41/F 2001 07- Normal (matched 69-G-N M24 G801N GOG ovary tumor 2001-07-G801) 68/F 2002 05- Normal (matched 67-G-N M38 509N GOG ovary tumor 2002-05-G509) 64/F 98-03- Normal (matched 51-G-N M41 G803N GOG ovary tumor 98-03-G803) 38/F 52-G-N M42 98-08- GOG ovary Normal (matched 46/F WO 2005/116850 PCT/IB2005/002555 241 G001N tumor 98-08-G001) 99-01- Normal (matched 68-G-N M56 G407N GOG ovary bengin 99-01-G407) 46/F 2000 01- Normal (matched 72-G-N M66 G413N GOG ovary tumor 2000-01-G413) F 98-12- Normal (matched 73-G-N M59 G401N GOG ovary tumor 98-12-G401) 77/F 97-11- Normal (matched 74-G-N M65 G320N GOG ovary tumor 97-11G320) 41/F 99-01- Normal (matched 75-G-N M60 G043N GOG ovary tumor 99-01-G043) 40/F A50327 BioChai 45-B-N 4 n ovary Normal PM 41/F A50408 BioChai 46-B-N 6 n ovary Normal PM 41/F A50408 BioChai 48-B-N 7 n ovary Normal PM 51/F 061 P43 Normal (CLOSED 47-Am-N A Ambion ovary HEAD) 16/F
CG
71-CG-N 188-7 Ichilov ovary Normal PM 49/F Table 2: Tissue samples in normal panel: Lot no. Source Tissue Pathology Sex/Age 1-Am-Colon (C71) 071Pl0B Ambion Colon PM F/43 2-B-Colon (C69) A411078 Biochain Colon PM-Pool of 10 M&F 3-Cl-Colon (C70) 1110101 Clontech Colon PM-Pool of 3 M&F 4-Am-Small Intestine 091P0201AAmbion Small Intestine PM M/75 WO 2005/116850 PCT/IB2005/002555 242 5-B-Small Intestine A501158 Biochain Small Intestine PM M/63 6-B-Rectum A605138 Biochain Rectum PM M/25 7-B-Rectum A610297 Biochain Rectum PM M/24 8-B-Rectum A610298 Biochain Rectum PM M/27 9-Am-Stomach I10PO4A Ambion Stomach PM M/16 10-B-Stomach A501159 Biochain Stomach PM M/24 11-B-Esophagus A603814 Biochain Esophagus PM M/26 12-B-Esophagus A603813 Biochain Esophagus PM M/41 13-Am-Pancreas 071P25C Ambion Pancreas PM M/25 14-CG-Pancreas CG-255-2 Ichilov Pancreas PM M/75 15-B-Lung A409363 Biochain Lung PM F/26 16-Am-Lung (L93) 111P0103AAmbion Lung PM F/61 17-B-Lung (L92) A503204 Biochain Lung PM M/28 18-Am-Ovary (047) 061P43A Ambion Ovary PM F/16 19-B-Ovary (048) A504087 Biochain Ovary PM F/51 20-B-Ovary (046) A504086 Biochain Ovary PM F/41 21-Am-Cervix 101P0101A Ambion Cervix PM F/40 22-B-Cervix A408211 Biochain Cervix PM F/36 23-B-Cervix A504089 Biochain Cervix PM-Pool of 5 M&F 24-B-Uterus A411074 Biochain Uterus PM-Pool of 10 M&F 25-B-Uterus A409248 Biochain Uterus PM F/43 26-B-Uterus A504090 Biochain Uterus PM-Pool of 5 M&F 27-B-Bladder A501157 Biochain Bladder PM M/29 28-Am-Bladder 071P02C Ambion Bladder PM M/20 29-B-Bladder A504088 Biochain Bladder PM-Pool of 5 M&F 30-Am-Placenta 021P33A Ambion Placenta PB F/33 31-B-Placenta A410165 Biochain Placenta PB F/26 32-B-Placenta A411073 Biochain Placenta PB-Pool of 5 M&F 33-B-Breast (B59) A607155 Biochain Breast PM F/36 34-Am-Breast (B63) 26486 Ambion Breast PM F/43
I
WO 2005/116850 PCT/IB2005/002555 243 35-Am-Breast (B64) 23036 Ambion Breast PM F/57 36-CI-Prostate (P53) 1070317 ClontechProstate PB-Pool of 47 M&F 37-Am-Prostate (P42) 061 PO4A Ambion Prostate PM M/47 38-Am-Prostate (P59) 25955 Ambion Prostate PM M/62 39-Am-Testis 11IPO104A Ambion Testis PM M/25 40-B-Testis A411147 Biochain Testis PM M/74 41-CI-Testis 1110320 ClontechTestis PB-Pool of 45 M&F 42-CG-Adrenal CG-184-10 Ichilov Adrenal PM F/81 43-B-Adrenal A610374 Biochain Adrenal PM F/83 44-B-Heart A411077 Biochain Heart PB-Pool of 5 M&F 45-CG-Heart CG-255-9 Ichilov Heart PM M/75 46-CG-Heart CG-227-1 Ichilov Heart PM F/36 47-Am-Liver 081P0101AAmbion Liver PM M/64 48-CG-Liver CG-93-3 Ichilov Liver PM F/19 49-CG-Liver CG-124-4 Ichilov Liver PM F/34 50-CI-BM 1110932 ClontechBone Marrow PM-Pool of 8 M&F 51-CGEN-Blood WBC#5 CGEN Blood M 52-CGEN-Blood WBC#4 CGEN Blood M 53-CGEN-Blood WBC#3 CGEN Blood M 54-CG-Spleen CG-267 Ichilov Spleen PM F/25 55-CG-Spleen 111P0106B Ambion Spleen PM M/25 56-CG-Spleen A409246 Biochain Spleen PM F/12 56-CG-Thymus CG-98-7 Ichilov Thymus PM F/28 58-Am-Thymus 101P0101AAmbion Thymus PM M/14 59-B-Thymus A409278 Biochain Thymus PM M/28 60-B-Thyroid A610287 Biochain Thyroid PM M/27 61-B-Thyroid A610286 Biochain Thyroid PM M/24 62-CG-Thyroid CG-119-2 Ichilov Thyroid PM F/66 63-Cl-Salivary Gland 1070319 Clontech Salivary Gland PM-Pool of 24 M&F 64-Am-Kidney 111P0101B Ambion Kidney PM-Pool of 14 M&F WO 2005/116850 PCT/IB2005/002555 244 65-Cl-Kidney 1110970 ClontechKidney PM-Pool of 14 M&F 66-B-Kidney A411080 Biochain Kidney PM-Pool of 5 M&F 67-CG-Cerebellum CG-183-5 Ichilov Cerebellum PM M/74 68-CG-Cerebellum CG-212-5 Ichilov Cerebellum PM M/54 69-B-Brain A411322 Biochain Brain PM M/28 70-Cl-Brain 1120022 Clontech Brain PM-Pool of 2 M&F 71-B-Brain A411079 Biochain Brain PM-Pool of 2 M&F 72-CG-Brain CG-151-1 Ichilov Brain PM F/86 73-Am-Skeletal Muscle 10IP013A Ambion Skeletal Muscle PM F/28 74-Cl-Skeletal Muscle 1061038 ClontechSkeletal Muscle PM-Pool of 2 M&F Materials and Experimental Procedures RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, 5 www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS (Wilmington, DE 19801, USA, http://www.absbioreagents.com) or Ambion (Austin, TX 78744 USA, http://www.ambion.com). Alternatively, RNA was generated from tissue samples using TRI-Reagent (Molecular Research Center), according to Manufacturer's instructions. Tissue and RNA samples were obtained from patients or from postmortem. Total RNA samples were 10 treated with DNaseI (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 Rg) was mixed with 150 ng Random Hexamer primers (Invitrogen) and 500 laM dNTP in a total volume of 15.6 pl. The mixture was incubated for 5 min at 65 oC and then quickly chilled on ice. Thereafter, 5 pl of 5X SuperscriptlI first strand buffer (Invitrogen), 2.4tl 0.1M DTT and 40 units RNasin (Promega) were added, and the 15 mixture was incubated for 10 min at 25 oC, followed by further incubation at 42 'C for 2 min. Then, 1 pl (200units) of SuperscriptlI (Invitrogen) was added and the reaction (final volume of 25p1l) was incubated for 50 min at 42 oC and then inactivated at 70 oC for 15min. The resulting cDNA was diluted 1:20 in TE buffer (10 mM Tris pH=8, 1 mM EDTA pH=8). Real-Time RT-PCR analysis- cDNA (51pl), prepared as described above, was used as a 20 template in Real- Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) WO 2005/116850 PCT/IB2005/002555 245 with specific primers and UNG Enzyme (Eurogentech or ABI or Roche). The amplification was effected as follows: 50 oC for 2 min, 95 oC for 10 min, and then 40 cycles of 95 "C for 15sec, followed by 60 oC for 1 min. Detection was performed by using the PE Applied Biosystem SDS 7000. The cycle in which the reactions achieved a threshold level (Ct) of fluorescence was 5 registered and was used to calculate the relative transcript quantity in the RT reactions. The relative quantity was calculated using the equation Q=efficiency ^ -c '. The efficiency of the PCR reaction was calculated from a standard curve, created by using serial dilutions of several reverse transcription (RT) reactions. To minimize inherent differences in the RT reaction, the resulting relative quantities were normalized to the geometric mean of the relative quantities of 10 several housekeeping (HSKP) genes. Schematic summary of quantitative real-time PCR analysis is presented in Figure 3. As shown, the x-axis shows the cycle number. The CT = Threshold Cycle point, which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR products signal is above the background level (passive dye ROX) and still in the 15 Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements). The y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative 20 quantification. The sequences of the housekeeping genes measured in all the examples on ovarian cancerpanel were as follows: 25 SDHA (GenBank Accession No. NM_004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon: 30 TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT
AGTGGATCATGAATTTGATGCAGTGGTGG
WO 2005/116850 PCT/IB2005/002555 246 PBGD (GenBank Accession No. BC019323), PBGD Forward primer: TGAGAGTGATTCGCGTGGG PBGD Reverse primer: CCAGGGTACGAGGCTTTCAAT 5 PBGD-amplicon: TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGGAC AGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG HPRT1 (GenBank Accession No. NM_000194), 10 HPRT1 Forward primer: TGACACTGGCAAAACAATGCA HPRT1 Reverse primer: GGTCCTTTTCACCAGCAAGCT HPRT1 -amplicon: TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAA AGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC 15 GAPDH (GenBank Accession No. BC026907) GAPDH Forward primer: TGCACCACCAACTGCTTAGC GAPDH Reverse primer: CCATCACGCCACAGTTTCC GAPDH-amplicon: 20 TGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGTATC GTGGAAGGACTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGA TGG The sequences of the housekeeping genes measured in all the examples on normal tissue 25 samples panel were as follows: RPL19 (GenBank Accession No. NM_000981), RPL19 Forward primer: TGGCAAGAAGAAGGTCTGGTTAG RPL19 Reverse primer: TGATCAGCCCATCTTTGATGAG WO 2005/116850 PCT/IB2005/002555 247 RPLI9 -amplicon: TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATGCCA ACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA TATA box (GenBank Accession No. NM_003194), 5 TATA box Forward primer: CGGTTTGCTGCGGTAATCAT TATA box Reverse primer: TTTCTTGCTGCCAGTCTGGAC TATA box -amplicon: CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATTTT CAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAGACTG 10 GCAGCAAGAAA Ubiquitin (GenBank Accession No. BC000449) Ubiquitin Forward primer: ATTTGGGTCGCGGTTCTTG Ubiquitin Reverse primer: TGCCTTGACATTCTCGATGGT Ubiquitin C -amplicon: 15 ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGAT CTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA SDHA (GenBank Accession No. NM_004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG 20 SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon: TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT AGTGGATCATGAATTTGATGCAGTGGTGG 25 Oligonucleotide-based micro-array experiment protocol Microarray fabrication Microarrays (chips) were printed by pin deposition using the MicroGrid II MGII 600 30 robot from BioRobotics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel-Aviv, IL) as described by A. Shoshan et al, "Optical WO 2005/116850 PCT/IB2005/002555 248 technologies and informatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). The designed oligonucleotides were synthesized and purified by desalting with the Sigma-Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino-modified linker at the 5' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham 5 Bioscience, Piscataway, NJ, US). The 50-mer oligonucleotides, forming the target sequences, were first suspended in Ultra-pure DDW (Cat # 01-866-IA Kibbutz Beit-Haemek, Israel) to a concentration of 50tM. Before printing the slides, the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21oC. 10 Each slide contained a total of 9792 features in 32 subarrays. Of these features, 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate. An additional 288 features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel. Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are 15 commercially available in the Array Control product (Array control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06). Post-coupling processing of printed slides After the spotting of the oligonucleotides to the glass (CodeLink) slides, the slides were 20 incubated for 24 hours in a sealed saturated NaCl humidification chamber (relative humidity 70 75%). Slides were treated for blocking of the residual reactive groups by incubating them in blocking solution at 50'C for 15 minutes (10ml/slide of buffer containing 0.1M Tris, 50mM ethanolamine, 0.1% SDS). The slides were then rinsed twice with Ultra-pure DDW (double 25 distilled water). The slides were then washed with wash solution (10ml/slide. 4X SSC, 0.1% SDS)) at 50 0 C for 30 minutes on the shaker. The slides were then rinsed twice with Ultra-pure DDW, followed by drying by centrifugation for 3 minutes at 800 rpm. Next, in order to assist in automatic operation of the hybridization protocol, the slides were treated with Ventana Discovery hybridization station barcode adhesives. The printed 30 slides were loaded on a Bio-Optica (Milan, Italy) hematology staining device and were incubated for 10 minutes in 50ml of 3-Aminopropyl Triethoxysilane (Sigma A3648 lot WO 2005/116850 PCT/IB2005/002555 249 #122K589). Excess fluid was dried and slides were then incubated for three hours in 20 mm/Hg in a dark vacuum desiccator (Pelco 2251, Ted Pella, Inc. Redding CA). The following protocol was then followed with the Genisphere 900-RP (random primer), 5 with mini elute columns on the Ventana Discovery HybStationTM, to perform the microarray experiments. Briefly, the protocol was performed as described with regard to the instructions and information provided with the device itself. The protocol included cDNA synthesis and labeling. cDNA concentration was measured with the TBS-380 (Turner Biosystems. Sunnyvale, CA.) PicoFlour, which is used with the OliGreen ssDNA Quantitation reagent and 10 kit. Hybridization was performed with the Ventana Hybridization device, according to the provided protocols (Discovery Hybridization Station Tuscon AZ). The slides were then scanned with GenePix 4000B dual laser scanner from Axon 15 Instruments Inc, and analyzed by GenePix Pro 5.0 software. Schematic summary of the oligonucleotide based microarray fabrication and the experimental flow is presented in Figures 4 and 5. Briefly, as shown in Figure 4, DNA oligonucleotides at 25uM were deposited (printed) onto Amersham 'CodeLink' glass slides generating a well defined 'spot'. These slides are 20 covered with a long-chain, hydrophilic polymer chemistry that creates an active 3-D surface that covalently binds the DNA oligonucleotides 5'-end via the C6-amine modification. This binding ensures that the full length of the DNA oligonucleotides is available for hybridization to the cDNA and also allows lower background, high sensitivity and reproducibility. 25 Figure 5 shows a schematic method for performing the microarray experiments. It should be noted that stages on the left-hand or right- hand side may optionally be performed in any order, including in parallel, until stage 4 (hybridization). Briefly, on the left-hand side, the target oligonucleotides are being spotted on a glass microscope slide (although optionally other materials could be used) to form a spotted slide (stage 1). On the right hand side, control sample 30 RNA and cancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), to form labeled probes. It should be noted that the control and cancer samples come from corresponding tissues WO 2005/116850 PCT/IB2005/002555 250 (for example, normal prostate tissue and cancerous prostate tissue). Furthermore, the tissue from which the RNA was taken is indicated below in the specific examples of data for particular clusters, with regard to overexpression of an oligonucleotide from a "chip" (microarray), as for example "prostate" for chips in which prostate cancerous tissue and normal tissue were tested as 5 described above. In stage 3, the probes are mixed. In stage 4, hybridization is performed to form a processed slide. In stage 5, the slide is washed and scanned to form an image file, followed by data analysis in stage 6. 10 WO 2005/116850 PCT/IB2005/002555 251 DESCRIPTION FOR CLUSTER H61775 Cluster H61775 features 2 transcript(s) and 6 segment(s) of interest, the namrnes for which are given in Tables I and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 5 Table ] - Transcripts of interest Transcript ame -_ H61775 T21 H61775 T22 2 Table 2 - Segments of interest Segment Name SQI~O§ H61775_node 2 3 H61775_node 4 4 H61775_node 6 5 H61775_node_8 6 H61775_node 0 7 H61775_node_5 8 Table 3 - Proteins of interest Protei NameSEQ ID NO H61775_P16 9 H61775_P17 10 10 Cluster H61775 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of 15 the table and the numbers on the y-axis of Figure 6 refer to weighted expression of ESTs in each WO 2005/116850 PCT/IB2005/002555 252 category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in 5 Figure 6 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors and a mixture of malignant tumors from different tissues. Table 4 - Normal tissue distribution Natme ofTsteNumber bladder 0 brain 0 colon 0 epithelial 10 general 3 breast 8 muscle 0 ovary 0 pancreas 0 prostate 0 uterus 0 10 Table 5 - P values and ratios for expression in cancerous tissue NaPo~ Is e I 1P 02 4 A -R 4 bladder 3.1e-01 3.8e-01 3.2e-01 2.5 4.6e-01 1.9 brain 8.8e-02 6.5e-02 1 3.5 4.1e-04 5.8 colon 5.6e-01 6.4e-01 1 1.1 1 1.1 epithelial 3.0e-02 1.3e-01 2.3e-02 2.1 3.2e-01 1.2 general 1.3e-06 4.9e-05 1.0e-07 6.3 1.5e-06 4.3 WO 2005/116850 PCT/IB2005/002555 253 breast 4.7e-01 3.7e-01 3.3e-01 2.0 4 .6e-01 1.6 muscle 2.3e-01 2.9e-01 1.5e-01 6.8 3 .9e-01 2.6 ovary 3.8e-01 4.2e-01 1.5e-01 2.4 2.6e-01 1.9 pancreas 3.3e-01 4.4e-01 4.2e-01 2.4 5.3e-01 1.9 prostate 7.3e-01 7.8e-01 6.7e-01 1.5 7.5e-01 1.3 uterus 1.0e-01 2.6e-01 2.9e-01 2.6 5.le-01 1.8 As noted above, cluster H61775 features 2 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided. Variant protein H61775_P16 according to the present invention has an amino acid 5 sequence as given at the end of the application; it is encoded by transcript(s) H61775_T21. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between H61775_P16 and Q9P2J2 (SEQ ID NO:953): 1.An isolated chimeric polypeptide encoding for H61775_P 16, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11 - 93 of Q9P2J2, which 15 also corresponds to amino acids 1 - 83 of H61775_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV corresponding to amino acids 84 - 152 ofH61775 P16, wherein said first and 20 second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H61775_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW 25 RSSCSVTLQV in H61775 P16.
WO 2005/116850 PCT/IB2005/002555 254 Comparison report between H61775_P16 and AAQ88495 (SEQ ID NO:954): l.An isolated chimeric polypeptide encoding for H61775 P16, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL 5 RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1 - 83 of AAQ88495, which also corresponds to amino acids 1 - 83 of H61775_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW 10 RSSCSVTLQV corresponding to amino acids 84 - 152 of H61775_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H61775_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 20 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein H61775_P16 also has the following non-silent SNPs (Single Nucleotide 25 Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775_P 16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 255 SNP positions) oh amino acid Alternative amino acid(s) Previously known SNP? sequence 14 I ->T No 138 G ->R No 34 G-> E Yes 48 G->R No 91 R-> * Yes Variant protein H61775_P16 is encoded by the following transcript(s): H61775_T21, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H61775 T21 is shown in bold; this coding portion starts at position 261 and ends at position 5 716. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775_P 16 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 7 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic acid Pr evioul kniiown SNP? seq~unce 117 T-> C Yes 200 T-> C No 672 G-> C No 222 T-> C Yes 301 T-> C No 361 G-> A Yes 377 G-> A No 400 -> C No 402 G -> C No 531 C -> T Yes WO 2005/116850 PCT/IB2005/002555 256 566 T-> C No Variant protein H61775_P17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H61775_T22. One 5 or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H61775_P17 and Q9P2J2: 10 1 .An isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11 - 93 of Q9P2J2, which also corresponds to amino acids 1 - 83 of H61775_P17. 15 Comparison report between H61775P17 and AAQ88495: 1 .An isolated chimeric polypeptide encoding for H61775 P17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL 20 RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1 - 83 of AAQ88495, which also corresponds to amino acids 1 - 83 of H61775_P17. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 25 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region..
WO 2005/116850 PCT/IB2005/002555 257 Variant protein H61775_P17 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775_P 17 sequence provides 5 support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations SN P position) on lamino aci[d Alternativeamino acid s) Previously k SNP?., 14 I -> T No 34 G ->E Yes 48 G ->R No Variant protein H61775_P 17 is encoded by the following transcript(s): H61775_T22, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript 10 H61775_T22 is shown in bold; this coding portion starts at position 261 and ends at position 509. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775_P 17 sequence provides support for the deduced sequence of this variant protein 15 according to the present invention). Table 9 - Nucleic acid SNPs SNP position on nuclotide Alternatac nuclete acid ,Prenously known SNP? sequence 117 T-> C Yes 200 T-> C No 222 T-> C Yes 301 T-> C No 361 G-> A Yes 377 G-> A No WO 2005/116850 PCT/IB2005/002555 258 400 -> C No 402 G -> C No 596 T -> A Yes As noted above, cluster H61775 features 6 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster H61775_node_2 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775_T21 and H61775_T22. Table 10 below describes 10 the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts TracrpSegnent ti position Segmnteniding positioMn H61775 T21 87 318 H61775 T22 87 318 Segment cluster H61775_node_4 according to the present invention is supported by 20 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775_T21 and H61775_T22. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts TraiTnscipt nile Segnmedn t j ting position Segmen ending position H61775 T21 319 507 H61775 T22 319 507 20 WO 2005/116850 PCT/IB2005/002555 259 Segment cluster H61775_node 6 according to the present invention is supported by I libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775_T22. Table 12 below describes the starting and ending position of this segment on each transcript. 5 Table 12 - Segment location on transcripts ITranscIt1rine .Segment staring positIIon SegmentI endg positi on H61775_T22 515 715 Segment cluster H61775_node_8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be 10 found in the following transcript(s): H61775 T21. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Transcript name bemet starting Position Segieitendng position H61775_T21 508 1205 15 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 20 Segment cluster H61775_node_0 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775_T21 and H61775_T22. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts ITranscnipt name - --/ Segment starting-position Segment eing positi n WO 2005/116850 PCT/IB2005/002555 260 H61775 T21 1 86 H61775 T22 1 86 Segment cluster H61775_node_5 according to the present invention can be found in the following transcript(s): H61775_T22. Table 15 below describes the starting and ending position 5 of this segment on each transcript. Table 15 - Segment location on transcripts Tran criplt name mnttarngpoition met edingposition H61775_T22 508 514 10 Variant protein alignment to the previously known protein: Sequence name: /tmp/Psw0RJLCti/aLAXQjXh07:Q9P2J2 15 Sequence documentation: Alignment of: H61775_P16 x Q9P2J2 Alignment segment 1/1: 20 Quality: 803.00 Escore: 0 Matching length: 83 Total length: 83 25 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 261 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 5 Alignment: 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 I I I I I I I I l I I I I I I I I II I I l l I l lI I I I l l l II I I l l I l l l I I 11 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 60 10 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 I I I I I I I I I I I I I I l I I I I I I I I I I I I II 61 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 93 15 20 Sequence name: /tmp/Psw0RJLCti/aLAXQjXh07:AAQ88495 Sequence documentation: Alignment of: H61775 P16 x AAQ88495 . 25 Alignment segment 1/1: Quality: 803.00 Escore: 0 30 Matching length: 83 Total length: 83 WO 2005/116850 PCT/IB2005/002555 262 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 1 0l l l l i il l I l i l l l ll l i l l l l l l l 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 I I I I I I I I I I I I I I Il l l I I Il l l l i l l I I I I 15 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 20 Sequence name: /tmp/naab8yR3GC/pSM412IL5o:Q9P2J2 Sequence documentation: 25 Alignment of: H61775 P17 x Q9P2J2 Alignment segment 1/1: 30 Quality: 803.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 263 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 IllillllllIIlllilIIIlIIIIllIIIIIIIllilIIIIIII 11 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 60 15 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 l I l i I I I I I I I I I II l l l I l i l l l l l I Il l li 61 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 93 20 Sequence name: /tmp/naab8yR3GC/pSM412IL5o:AAQ88495 25 Sequence documentation: Alignment of: H61775 P17 x AAQ88495 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 264 Quality: 803.00 Escore: 0 Matching length: 83 Total length: 83 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 I li l l l i l l i l I I I II ll i I ll l l i1 1 i i l l II l ll I I I IIl 15 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 IIIIIlli111111IIllIIIIlllilliil 51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 20 25 Expression of immunoglobulin superfamily, member 9 H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8 in normal and cancerous ovary tissues. Expression of immunoglobulin superfamily, member 9 transcripts detectable by or according to H61775seg8, H61775seg8 amplicon(s) and H61775seg8F2 and H61775seg8R2 30 primers was measured by real time PCR. In parallel the expression of four housekeeping genes: PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRT1 (GenBank WO 2005/116850 PCT/IB2005/002555 265 Accession No. NM_000194; amplicon - HPRTI-amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. 5 The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48,71, Table 1, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 7 is a histogram showing over expression of the above-indicated immunoglobulin 10 superfamily, member 9 transcripts in cancerous ovary samples relative to the normal samples. (Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained As is evident from Figure 7, the expression of immunoglobulin superfamily, member 9 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than 15 in the non-cancerous samples (Sample Nos. 45-48,,71 Table 1, "Tissue samples in testing panel") and including benign samples (samples No. 56, 62, 64). Notably an over-expression of at least 5 fold was found in 21 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. 20 The P value for the difference in the expression levels of immunoglobulin superfamily, member 9 transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 2.76E-4. The above value demonstrates statistical significance of the results.
WO 2005/116850 PCT/IB2005/002555 266 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: H61775seg8F2 forward primer; and H61775seg8R2 reverse primer. 5 The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: H61775seg8 H61775seg8F2 (SEQ ID NO:955) GAAGGCTCTTGTCACTTACTAGCCAT 10 H61775seg8R2 (SEQ ID NO:956) TGTCACCATATTTAATCCTCCCAA Amplicon (SEQ ID NO:957) GAAGGCTCTTGTCACTTACTAGCCATGTGATTTTGGAAAGAAACTTAACATTAATTC CTTCAGCTACAATGGAATTCTTGGGAGGATTAAATATGGTGACA 15 20 Expression of immunoglobulin superfamily, member 9 H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8 in different normal tissues. 25 Expression of immunoglobulin superfamily, member 9 transcripts detectable by or according to H61775 seg8 amplicon(s) and H61775 seg8F and H61775 seg8R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank 30 Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - WO 2005/116850 PCT/IB2005/002555 267 Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the 5 ovary samples (Sample Nos. 18-20, Table 2 "Tissue samples in normal panel", above), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 8, presenting the histogram showing the expression of H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8, in different normal tissues. Amplicon and primers are as above. 10 DESCRIPTION FOR CLUSTER HSAPHOL Cluster HSAPHOL features 7 transcript(s) and 18 segment(s) of interest, the names for 15 which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Name SFQ ID NO: HSAPHOL T10 11 HSAPHOL T4 12 HSAPHOL T5 13 HSAPHOL T6 14 HSAPHOL T7 15 HSAPHOL _T8 16 HSAPHOL _T9 17 Table 2 - Segments of interest Segment Name SEQ ID.NO' HSAPHOL node_11 18 WO 2005/116850 PCT/IB2005/002555 268 HSAPHOL node 13 19 HSAPHOL node 15 20 HSAPHIOL node 19 21 HSAPHOL node 2 22 HSAPHOL node 21 23 HSAPHOL node 23 24 HSAPHOL node 26 25 HSAPHOL node 28 26 HSAPHOL node 38 27 HSAPHOL node 40 28 HSAPHOL node 42 29 HSAPHOL node 16 30 HSAPHOL node 25 31 HSAPHOL node 34 32 HSAPHOL node 35 33 HSAPHOL node 36 34 HSAPHOLnode 41 35 Table 3 - Proteins of interest Protin Naune SEQ I D NO HSAPHOL P2 37 HSAPHOL P3 38 HSAPHOL P4 39 HSAPHOL P5 40 HSAPHOL P6 41 HSAPHOL P7 42 HSAPHOL P8 43 These sequences are variants of the known protein Alkaline phosphatase, tissue 5 nonspecific isozyme precursor (SwissProt accession identifier PPBTHUMAN; known also WO 2005/116850 PCT/IB2005/002555 269 according to the synonyms EC 3.1.3.1; AP-TNAP; Liver/bone/kidney isozyme; TNSALP), SEQ ID NO: 36, referred to herein as the previously known protein. The variant proteins according to the present invention are variant(s) of a known diagnostic marker, called Alkaline Phosphatase. 5 Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor is known or believed to have the following function(s): THIS ISOZYME MAY PLAY A ROLE IN SKELETAL MINERALIZATION. The sequence for protein Alkaline phosphatase, tissue-nonspecific isozyme precursor is given at the end of the application, as "Alkaline phosphatase, tissue 10 nonspecific isozyme precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNPpoiton(s) o Comment mnoacid sequence 28 Y -> C (in hypophosphatasia; infantile; 7% of activity). /FTId=VAR 013972. 33 A -> V (in hypophosphatasia). /FTId=VAR 006147. 111 A -> T (in hypophosphatasia; odonto). /FTId=VAR 006151. 116 A -> T (in hypophosphatasia; loss of activity). /FTId=VAR 013977. 120 G -> R (in hypophosphatasia). /FTId=VAR_013978. 129 G -> R (in hypophosphatasia). /FTId=VAR 013979. 132 A -> V (in hypophosphatasia). /FTId=VAR 013146. 134 T -> N (in hypophosphatasia; 9% of activity). /FTId=VAR 011082. 136 R -> H (in hypophosphatasia; moderate; 33% of activity). /FTId=VAR 006152. 152 R -> H (in hypophosphatasia). /FTId=VAR 013980. 162 G -> V (in hypophosphatasia; severe; 1% of activity).
WO 2005/116850 PCT/IB2005/002555 270 /FTId=VAR 006153. 170 N -> D (in hypophosphatasia). /FTId=VAR_013981. 40 A -> V (in hypophosphatasia; 2% of activity). /FTId=VAR 011081. 171 H -> Y (in hypophosphatasia; severe; 2% of activity). /FTId=VAR 006154. 176 A -> T (in hypophosphatasia). /FTId=VAR_011083. 177 A -> T (in hypophosphatasia; adult type). /FTId=VAR 006155. 179 A -> T (in hypophosphatasia). /FTId=VAR_006156. 181 S -> L (in hypophosphatasia; 1% OF activity). /FTId=VAR 013982. 184 R -> W (in hypophosphatasia; loss of activity). /FTId=VAR 013983. 191 E -> G (in hypophosphatasia; odonto). /FTId=VAR 006157. 191 E -> K (in hypophosphatasia; moderate; frequent mutation in European countries). /FTId=VAR_006158. 201 C -> Y (in hypophosphatasia). /FTId=VAR_006159. 207 Q -> P (in hypophosphatasia). /FTId=VAR_006160. 51 A -> V (in hypophosphatasia). /FTId=VAR_013973. 211 N -> D (in hypophosphatasia). /FTId=VAR_013984. 220 G -> V (in hypophosphatasia; odonto). /FTId=VAR 013985. 223 R -> W (in hypophosphatasia; 3% of activity). /FTId=VAR_013986. 224 K -> E (in hypophosphatasia; infantile; partial loss of activity). /FTId=VAR 011084. 235 E-> G (in hypophosphatasia). /FTId=VAR_013987. 246 R -> S (in hypophosphatasia; 4% of activity).
WO 2005/116850 PCT/IB2005/002555 271 /FTId=VAR_011085. 249 G -> V (in hypophosphatasia; partial loss of activity). /FTId=VAR_013988. 263 H-> Y (common polymorphism)./FTId=VAR_006161. 289 L-> F (in hypophosphatasia). /FTId=VAR_006162. 291 E-> K (in hypophosphatasia; moderate; 8% of activity). /FTId=VAR 013989. 62 M -> L (in hypophosphatasia; moderate; 27% of activity). /FTId=VAR 006148. 294 D-> A (in hypophosphatasia). /FTId=VAR 006163. 294 D-> Y (in hypophosphatasia). /FTId=VAR _013990. 306 D-> V (in hypophosphatasia). /FTId=VAR_006164. 326 G -> R (in hypophosphatasia; in a patient carrying also lys 291). /FTId=VAR 013991. 327 F -> G (in hypophosphatasia; requires 2 nucleotides substitutions). /FTId=VAR _013992. 327 F -> L (in hypophosphatasia; childhood). /FTId=VAR _006165. 334 G -> D (in hypophosphatasia). /FTId=VAR_006166. 348 A -> T (in hypophosphatasia). /FTId=VAR_011086. 378 D -> V (in hypophosphatasia; loss of activity). /FTId=VAR_006167. 381 H-> R (in hypophosphatasia). /FTId=VAR_011087. 63 G -> V (in hypophosphatasia; loss of activity). /FTId=VAR 013974. 382 V -> I (in hypophosphatasia). /FTId=VAR_006168. 391 R-> C (in hypophosphatasia; moderate; 10% of activity). /FTId=VAR_013993. 399 A-> S (in hypophosphatasia). /FTId=VAR_013994. 406 D -> G (in hypophosphatasia; 15% of activity).
WO 2005/116850 PCT/IB2005/002555 272 /FTId=VAR 011088. 423 V -> A (in hypophosphatasia; 16% of activity). /FTId=VAR_013995. 426 G -> C (in hypophosphatasia; infantile; partial loss of activity). /FTId=VAR 011089. 436 Y -> H (in hypophosphatasia). /FTId=VAR 006169. 445 S -> P (in hypophosphatasia; severe; 2% of activity). /FTId=VAR 013996. 450 R -> C (in hypophosphatasia; severe; 4% of activity). /FTId=VAR 013997. 450 R -> H (in hypophosphatasia). /FTId=VAR _011090. 71 R -> C (in hypophosphatasia). /FTId=VAR_006149. 456 G -> R (in hypophosphatasia; loss of activity). /FTId=VAR 011091. 459 V -> M (in hypophosphatasia; infantile). /FTId=VAR 013998. 473 G -> S (in hypophosphatasia). /FTId=VAR_013999. 476 E -> K (in hypophosphatasia). /FTId=VAR_006170. 478 N -> I (in hypophosphatasia; 9% of activity). /FTId=VAR_011092. 489 C -> S (in hypophosphatasia; 9% of activity). /FTId=VAR 011093. 490 I -> F (in hypophosphatasia; odonto; partial loss of activity). /FTId=VAR 014000. 491 G -> R (in hypophosphatasia). /FTId=VAR 014001. 522 V -> A. /FTId=VAR 011094. 29 W ->A 71 R-> H (in hypophosphatasia). /FTId=VAR 013975. 104 N ->K 71 R -> P (in hypophosphatasia). /FTId=VAR_006150.
WO 2005/116850 PCT/IB2005/002555 273 75 G -> S (in hypophosphatasia; severe; 3.5% of activity). /FTId=VAR_013976. Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor localization is believed to be attached to the membrane by a GPI-anchor. 5 The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ossification; metabolism, which are annotation(s) related to Biological Process; magnesium binding; alkaline phosphatase; hydrolase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. 10 The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 15 As noted above, cluster HSAPHOL features 7 transcript(s), which were listed in Table I above. These transcript(s) encode for protein(s) which are variant(s) of protein Alkaline phosphatase, tissue-nonspecific isozyme precursor. A description of each variant protein according to the present invention is now provided. 20 Variant protein HSAPHOLP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL T4. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the 25 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOLP2 and AAH21289 (SEQ ID NO: 36): 1.An isolated chimeric polypeptide encoding for HSAPHOLP2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more WO 2005/116850 PCT/IB2005/002555 274 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCA corresponding to amino acids 1 - 22 of HSAPHIOL_P2, second amino acid sequence being at least 90 % homologous to PATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids I - 27 of AAH21289, 5 which also corresponds to amino acids 23 - 49 of HSAPHOL_P2, and a third amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE 10 MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN 15 YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 83 - 586 of AAH21289, which also corresponds to amino acids 50 - 553 of HSAPHOLP2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HSAPHOL P2, comprising a 20 polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL P2. 3.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 25 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. 30 Comparison report between HSAPHOLP2 and PPBT HUMAN: WO 2005/116850 PCT/IB2005/002555 275 1.An isolated chimeric polypeptide encoding for HSAPHOLP2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA 5 corresponding to amino acids 1 - 49 of HSAPHOL_P2, second amino acid sequence being at least 90 % homologpus to EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE 10 MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHF1WNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN 15 YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 21 - 524 of PPBT_HUMAN, which also corresponds to amino acids 50 - 553 of HSAPHOLP2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HSAPHOLP2, comprising a 20 polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOLP2. 3.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P2, 25 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending 30 at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
WO 2005/116850 PCT/IB2005/002555 276 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial 5 protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region, and similarity to known proteins suggests a GPI anchor.Variant protein HSAPHOLP2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the 10 presence of known SNPs in variant protein HSAPHOLP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations SNP positionis on amino acid Alternative a1mo acid(s) Previously known SNP Lsequencee 153 N ->S Yes 172 Q-> No 551 V->A No 206 A -> No 272 R-> No 292 Y -> H Yes 342 V -> No 344 V -> No 354 K -> No 354 K ->Q No 380 E-> No Variant protein HSAPHOL_P2 is encoded by the following transcript(s): HSAPHOL_T4, 15 for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOLT4 is shown in bold; this coding portion starts at position 1 and ends at position 1659. The transcript also has the following SNPs as listed in Table 6 (given according WO 2005/116850 PCT/IB2005/002555 277 to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 5 Table 6 - Nucleic acid SNPs Patomaien oni .nclelcacid Pievonusly knonS ,.. 417 C ->T Yes 458 A ->G Yes 1140 G-> No 1509 C ->T Yes 1629 G ->T Yes 1652 T-> C No 1727 C ->T Yes 1788 G ->A Yes 1895 A ->C Yes 2050 C -> T Yes 2095 A ->G Yes 2240 G-> No 516 G-> No 2347 -> A No 2364 T ->G No 617 C-> No 815 G -> No 874 T ->C Yes 1026 G-> No 1032 G-> No 1060 A-> No 1060 A -> C No WO 2005/116850 PCT/IB2005/002555 278 Variant protein HSAPHOL_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T5. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL P3 and AAH21289: 10 1 .An isolated chimeric polypeptide encoding for HSAPHOLP3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 63 - 82 of AAH21289, which also corresponds to amino acids 1 - 20 of HSAPHOLP3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL 15 CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK 20 PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 123 - 586 of AAH21289, which also corresponds to amino acids 21 - 484 of HSAPHOL_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a 30 structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2.
WO 2005/116850 PCT/IB2005/002555 279 Comparison report between HISAPHOL P3 and PPBTHUMAN: I.An isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 1 - 20 of PPBT_HUMAN, which also corresponds to amino acids 1 - 20 of 5 HISAPHIOL P3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE 10 PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 61 - 524 of PPBT HUMAN, which also corresponds to amino 15 acids 21 - 484 of HSAPHOL_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino 20 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. 25 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure, and/or similarity to known proteins.. 30 Variant protein HSAPHOL_P3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid WO 2005/116850 PCT/IB2005/002555 280 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations SNP position( on mino acid Alternative amino acid(s) Previoisly known SNP? sequence 103 Q -> No 137 A-> No 84 N -> S Yes 10 1 -> No 203 R-> No 223 Y -> H Yes 273 V-> No 275 V -> No 285 K -> No 285 K ->Q No 311 E -> No 482 V -> A No 5 Variant protein HSAPHOL_P3 is encoded by the following transcript(s): HSAPHOL_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL_T5 is shown in bold; this coding portion starts at position 253 and ends at position 1704. The transcript also has the following SNPs as listed in Table 8 (given according 10 to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs WO 2005/116850 PCT/IB2005/002555 281 SNP position on nucleotide 'Alternative nucleic acid previusly known SNP? .sequence 179 G -> C No 231 A-> No 1071 G-> No 1077 G-> No 1105 A-> No 1105 A ->C No 1185 G-> No 1554 C ->T Yes 1674 G ->T Yes 1697 T-> C No 1772 C->T Yes 1833 G->A Yes 232 A-> T No 1940 A-> C Yes 2095 C -> T Yes 2140 A-> G Yes 2285 G-> No 2392 ->A No 2409 T-> G No 281 T-> No 462 C -> T Yes 503 A-> G Yes 561 G-> No 662 C -> No 860 G-> No 919 T-> C Yes WO 2005/116850 PCT/IB2005/002555 282 Variant protein HSAPHOL_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOLT6. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously 5 published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOLP4 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL P4, comprising a first amino 10 acid sequence being at least 90 % homologous to MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP 15 GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F corresponding to amino acids 124 - 586 of AAH21289, which also corresponds to amino acids 20 1 - 463 of HSAPHOL P4. Comparison report between HSAPHOL P4 and PPBT HUMAN: 1 .An isolated chimeric polypeptide encoding for HSAPHOLP4, comprising a first amino acid sequence being at least 90 % homologous to 25 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 30 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP
FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP
WO 2005/116850 PCT/IB2005/002555 283 MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSIAAGPLLLALALYPLSVL F corresponding to amino acids 62 - 524 of PPBTHUMAN, which also corresponds to amino acids 1 - 463 of HSAPHOL P4. 5 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because only one of the two trans-membrane region prediction programs (Tmpred: 1, Tmhmm: 0) has predicted that this 10 protein has a trans-membrane region, but similarity to known proteins suggests a GPI anchor. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HSAPHOL_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid 15 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations SNP position(s) on timno acid Altern-1ativ tunIIino acid(s) Prei~coulsly nw SNP? 116 A ->No 182 R -> No 82 Q -> No 202 Y -> H Yes 252 V-> No 254 V-> No 264 K -> No 264 K->Q No 290 E -> No 461 V -> A No WO 2005/116850 PCT/IB2005/002555 284 163 1N-> SjYes Variant protein HSAPHOL_P4 is encoded by the following transcript(s): HSAPHOL_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL_T6 is shown in bold; this coding portion starts at position 215 and ends at 5 position 1603. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 10 - Nucleic acid SNPs SNP position on n i AltecrnatPe nuclecl acid Previousy own SNP? sequence 361 C -> T Yes 402 A ->G Yes 1084 G-> No 1453 C ->T Yes 1573 G ->T Yes 1596 T-> C No 1671 C -> T Yes 1732 G ->A Yes 1839 A ->C Yes 1994 C ->T Yes 2039 A -> G Yes 2184 G-> No 460 G-> No 2291 ->A No 2308 T-> G No 561 C -> No 759 G-> No WO 2005/116850 PCT/IB2005/002555 285 818 T->C Yes 970 G -> No 976 G -> No 1004 A -> No 1004 A -> C No Variant protein HSAPHOLP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOLT7. An 5 alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HSAPHOL P5 and AAH21289: I.An isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 15 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids 63 - 417 of AAH21289, which also corresponds 20 to amino acids 1 - 355 of HSAPHOLP5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 440 - 586 of 25 AAH21289, which also corresponds to amino acids 356 - 502 of HSAPHOLP5, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 286 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 5 least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between HSAPHOL P5 and PPBT HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA 15 TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids 1 - 355 of PPBT_HUMAN, which also corresponds to amino acids 1 - 355 of HSAPHOL_P5, and a second amino acid sequence being 20 at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 377 - 524 of PPBT_HUMAN, which also corresponds to amino acids 356 - 502 of HSAPHOL_P5, wherein 25 said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 30 least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a WO 2005/116850 PCT/IB2005/002555 287 structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of 5 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure and/or similarity to known protein.. Variant protein HSAPHOL_P5 also has the following non-silent SNPs (Single Nucleotide 10 Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNP positlon(s)on amiino acid Abenative amino acid s) Previously known SNP 124 N -> S Yes 143 Q -> No 500 V-> A No 10 I -> No 177 A-> No 243 R-> No 263 Y -> H Yes 313 V-> No 315 V-> No 325 K-> No 325 K ->Q No 351 E-> No 15 WO 2005/116850 PCT/IB2005/002555 288 Variant protein HSAPHOL_P5 is encoded by the following transcript(s): HSAPHOL T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL_T7 is shown in bold; this coding portion starts at position 253 and ends at position 1758. The transcript also has the following SNPs as listed in Table 12 (given according 5 to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNP position on nuceond A eti eic acid eosNP 179 G-> C No 231 A-> No 1191 G-> No 1197 G-> No 1225 A-> No 1225 A ->C No 1305 G-> No 1608 C ->T Yes 1728 G ->T Yes 1751 T ->C No 1826 C ->T Yes 1887 G ->A Yes 232 A ->T No 1994 A-> C Yes 2149 C ->T Yes 2194 A-> G Yes 2339 G -> No 2446 ->A No 2463 T-> G No WO 2005/116850 PCT/IB2005/002555 289 281 T-> No 582 C ->T Yes 623 A ->G Yes 681 G-> No 782 C-> No 980 G-> No 1039 T-> C Yes Variant protein HSAPHOLP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HISAPHOL_T8. An 5 alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HSAPHOL_P6 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 15 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 63 - 349 of AAH21289, which also corresponds to amino acids 1 287 of HSAPHOL_P6, and a second amino acid sequence being at least 90 % homologous to 20 GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAP ASSAG SLAAGPLLLALALYPLSVLF corresponding to amino acids 395 - 586 of AAH21289, which WO 2005/116850 PCT/IB2005/002555 290 also corresponds to amino acids 288 - 479 of HSAPHOL P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HISAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 5 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. 10 Comparison report between HSAPHOLP6 and PPBTHUMAN: 1.An isolated chimeric polypeptide encoding for HSAPHOLP6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL 15 GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 1 - 287 of PPBT_HUMAN, which also corresponds to amino 20 acids 1 - 287 of HSAPHOLP6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG 25 SLAAGPLLLALALYPLSVLF corresponding to amino acids 333 - 524 of PPBT_HUMAN, which also corresponds to amino acids 288 - 479 of HSAPHOLP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOLP6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 30 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at WO 2005/116850 PCT/IB2005/002555 291 least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to rn-2. 5 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both signal-peptide prediction programs predict that this protein has a signal peptide, and at least one of two trans 10 membrane region prediction programs predicts that this protein has a trans-membrane region, also similarity to known proteins suggests a GPI anchor.. Variant protein HSAPHOL P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein HSAPHOL_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations SN posiins on aini1no aicid Alternative inoflk acid(s-) PrvosykwnSP 124 N -> S Yes 143 Q -> No 177 A-> No 243 R-> No 263 Y ->H Yes 306 E-> No 477 V ->A No 10 I-> No Variant protein HSAPHOL_P6 is encoded by the following transcript(s): HSAPHOL_T8, 20 for which the sequence(s) is/are given at the end of the application. The coding portion of WO 2005/116850 PCT/IB2005/002555 292 transcript HSAPHOL T8 is shown in bold; this coding portion starts at position 253 and ends at position 1689. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant 5 protein HSAPHOL P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs SNP iositio on 10 nuc1eIotide Alternativ nucleic aCId Previously kmwn SNP 179 G->C No 231 A-> No 1170 G-> No 1539 C -> T Yes 1659 G -> T Yes 1682 T-> C No 1757 C -> T Yes 1818 G-> A Yes 1925 A-> C Yes 2080 C -> T Yes 2125 A-> G Yes 2270 G-> No 232 A-> T No 2377 -> A No 2394 T -> G No 281 T-> No 582 C -> T Yes 623 A-> G Yes 681 G -> No 782 C-> No 980 G-> No WO 2005/116850 PCT/IB2005/002555 293 1039 T >C Yes Variant protein HSAPHOL P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOLT9. An 5 alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HSAPHOLP7 and AAH21289: 1 .An isolated chimeric polypeptide encoding for HSAPHOLP7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 15 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 63 326 of AAH21289, which also corresponds to amino acids 1 - 264 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 20 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOLP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL P7, comprising a polypeptide 25 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7. Comparison report between HSAPHOLP7 and PPBT_HUMAN: WO 2005/116850 PCT/IB2005/002555 294 L.An isolated chimeric polypeptide encoding for HSAPHOL P7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT 5 AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR corresponding to amino acids 1 - 262 of PPBT_HUMAN, which also corresponds to amino acids 1 - 262 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 10 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 263 - 306 ofHSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOLP7, comprising a polypeptide 15 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7. Comparison report between HSAPHOL P7 and 075090: 20 1.An isolated chimeric polypeptide encoding for HSAPHOL P7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA 25 TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 1 264 of 075090, which also corresponds to amino acids 1 - 264 of HSAPHOLP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 30 the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding WO 2005/116850 PCT/IB2005/002555 295 to amino acids 265 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOLP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 5 at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 10 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSAPHOLP7 also has the following non-silent SNPs (Single Nucleotide 15 Polymorphisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOLP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 -Amino acid mutations SNP po(sitionT(s;) on amiino acidI Alternlative ,ilio1 acid(s) Previ]ouslyv kndwn SN P? sequeceIC 124 N -> S Yes 143 Q -> No 177 A-> No 243 R-> No 263 Y->H Yes 273 N -> T Yes 10 I -> No 20 WO 2005/116850 PCT/IB2005/002555 296 Variant protein HSAPHOLP7 is encoded by the following transcript(s): HSAPHOL_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOLT9 is shown in bold; this coding portion starts at position 253 and ends at position 1170. The transcript also has the following SNPs as listed in Table 16 (given according 5 to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16- Nucleic acid SNPs SNP posilion1 on) nutcleotidek Alternatl ive nlcic acid Previoulsly kn Iown SNPT11 179 G-> C No 231 A -> No 1070 A ->C Yes 1225 C ->T Yes 1270 A ->G Yes 1415 G-> No 1522 -> A No 1539 T-> G No 232 A -> T No 281 T-> No 582 C ->T Yes 623 A-> G Yes 681 G-> No 782 C -> No 980 G-> No 1039 T -> C Yes 10 WO 2005/116850 PCT/IB2005/002555 297 Variant protein HSAPHOL_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOLTIO. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously 5 published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOLP8 and AAH21289: 1 .An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino 10 acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK 15 NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 63 - 350 of AAH21289, which also corresponds to amino acids 1 - 288 of HSAPHOLP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOLP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 25 at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. Comparison report between HSAPHOL P8 and PPBT HUMAN: 1 .An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino 30 acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL
WO 2005/116850 PCT/IB2005/002555 298 GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHF1WNRTELLTLDPHNVDYLL 5 G corresponding to amino acids 1 - 288 of PPBT HUMAN, which also corresponds to amino acids 1 - 288 of HSAPHOLP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of 10 HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. Comparison report between HSAPHOL_P8 and 075090 (SEQ ID NO:958): 1 .An isolated chimeric polypeptide encoding for HSAPHOLP8, comprising a first amino acid sequence being at least 90 % homologous to 20 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL 25 G corresponding to amino acids 1 - 288 of 075090, which also corresponds to amino acids 1 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of 30 HSAPHOLP8, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 299 2.An isolated polypeptide encoding for a tail of HSAPHOLP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8. 5 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 10 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSAPHOL_P8 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein HSAPHOL_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations SNP p)osition(s) on amino aicid Allernaitive M1no1 acid(s) Prev iouISy known SNP?' 124 N -> S Yes 143 Q -> No 177 A-> No 243 R-> No 263 Y-> H Yes 294 R-> S Yes 305 G -> R Yes 307 A-> V Yes 10 I-> No WO 2005/116850 PCT/IB2005/002555 300 Variant protein HSAPHOL_P8 is encoded by the following transcript(s): HSAPHOLTIO0, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HISAPHOL_TIO is shown in bold; this coding portion starts at position 253 and ends at position 1200. The transcript also has the following SNPs as listed in 5 Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOLP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs SNP position on nucleotide AltrnaTve nuclic acid P'eviously known SNP? ,sequence 179 G ->C No 231 A-> No 1134 G ->T Yes 1165 G ->A Yes 1172 C ->T Yes 1376 T ->C Yes 1384 G ->C Yes 1565 T ->G Yes 232 A ->T No 281 T-> No 582 C ->T Yes 623 A ->G Yes 681 G-> No 782 C-> No 980 G-> No 1039 T-> C Yes 10 As noted above, cluster HSAPHOL features 18 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are WO 2005/116850 PCT/IB2005/002555 301 of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HSAPHOL node_ 11 according to the present invention is supported by 5 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLTI0, HSAPHOLT5, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Tnincriptn me 1Segent istarting position Smentendi ngpsitionjKl HSAPHOL T10 149 313 HSAPHOL T5 149 313 HSAPHOL T7 149 313 HSAPHOL T8 149 313 HSAPHOL T9 149 313 10 Segment cluster HSAPHOL_node_I 3 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLT10, HSAPHOL T4, HSAPHOL_T7, 15 HSAPHOL_T8 and HSAPHOLT9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts -Transcript name Seghet startig position 2 Segment endiingpositi HSAPHOL T10O 314 433 HSAPHOL T4 149 268 HSAPHOL T7 314 433 HSAPHOL T8 314 433 HSAPHOL T9 314 433 WO 2005/116850 PCT/IB2005/002555 302 Segment cluster HSAPHOL_node 15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): HSAPHOL_T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts Transciptname en tigpositi egnt en position HSAPHOLT6 1 212 10 Segment cluster HSAPHOL_node_ 9 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 22 below describes the starting and ending position of this segment on each transcript. 15 Table 22 - Segment location on transcripts Transcript name Segment position Segment ending position HSAPHOL T10 550 724 HSAPHOL T4 385 559 HSAPHOL T5 430 604 HSAPHOL T6 329 503 HSAPHOL T7 550 724 HSAPHOL T8 550 724 HSAPHOL T9 550 724 Segment cluster HSAPHOLnode_2 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be 20 found in the following transcript(s): HSAPHOLT10, HSAPHOLT4, HSAPHOLT5, WO 2005/116850 PCT/IB2005/002555 303 HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts spameic Segment sting position Segiei ending position HSAPHOL T10 1 148 HSAPHOL T4 1 148 HSAPHOL T5 1 148 HSAPHOL T7 1 148 HSAPHOL T8 1 148 HSAPHOL T9 1 148 5 Segment cluster HSAPHOL node_21 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOLT7, HSAPHOL_T8 and HSAPHOL_T9. Table 24 below describes 10 the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts Tanscript name Segmit starting poison Segment endin position HSAPHOL T10 725 900 HSAPHOL T4 560 735 HSAPHOL T5 605 780 HSAPHOL T6 504 679 HSAPHOL T7 725 900 HSAPHOL T8 725 900 HSAPHOL T9 725 900 WO 2005/116850 PCT/IB2005/002555 304 Segment cluster HSAPHOL_node_23 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_TI 0, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOLT7, HSAPHOL_T8 and HSAPHOL_T9. Table 25 below describes 5 the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Transcipt name Se ent starting position Sgmentei2nding position HSAPHOL T10 901 1044 HSAPHOL T4 736 879 HSAPHOL T5 781 924 HSAPHOL T6 680 823 HSAPHOL T7 901 1044 HSAPHOL T8 901 1044 HSAPHOL T9 901 1044 Segment cluster HSAPHOL_node_26 according to the present invention is supported by 2 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Transcript name Siegment stanp n ending position HSAPHOLT10 1115 1572 15 Segment cluster HSAPHOLnode_28 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6 and HSAPHOLT7. Table 27 below describes the starting and ending position of this segment on 20 each transcript.
WO 2005/116850 PCT/IB2005/002555 305 Table 27 - Segment location on transcripts Transcript name Segment startingposition Segnient endIm positi on w HSAPHOL T4 950 1084 HSAPHOL T5 995 1129 HSAPHOL T6 894 1028 HSAPHOL T7 1115 1249 Segment cluster HSAPHOL_node_38 according to the present invention is supported by 5 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7 and HSAPHOLT8. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts Transcri Tnme Segmetstaing position Segmnt ending posiIOn HSAPHOL T4 1277 1396 HSAPHOL T5 1322 1441 HSAPHOL T6 1221 1340 HSAPHOL T7 1376 1495 HSAPHOL T8 1307 1426 10 Segment cluster HSAPHOL node_40 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOLT6, 15 HSAPHOL_T7 and HSAPHOL_T8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Trainsc~iptame : Segment staing position -egment ending position WO 2005/116850 PCT/IB2005/002555 306 HSAPHOL T4 1397 1759 HSAPHOL T5 1442 1804 HSAPHOL T6 1341 1703 HSAPHOL T7 1496 1858 HSAPHOL T8 1427 1789 Segment cluster HSAPHOL_node_42 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can 5 be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOLT6, HSAPHOLT7, HSAPHOL_T8 and HSAPHOL_T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Tnuis crpt niame begment(I srtnpoion Segmenit ening11 postio HSAPHOL T4 1870 2426 HSAPHOL T5 1915 2471 HSAPHOL T6 1814 2370 HSAPHOL T7 1969 2525 HSAPHOL T8 1900 2456 HSAPHOL T9 1045 1601 10 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 15 Segment cluster HSAPHOLnode_I 6 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOLT4, HSAPHOL_T5, WO 2005/116850 PCT/IB2005/002555 307 HSAPHOLT6, HSAPHOL T7, HSAPHOL T8 and HSAPHOLT9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts Transenipt n ame.. Sme,;nt starting position Segmnt end:g posliIn HSAPHOL T10 434 549 HSAPHOL T4 269 384 HSAPHOL T5 314 429 HSAPHOL T6 213 328 HSAPHOL T7 434 549 HSAPHOL T8 434 549 HSAPHOL T9 434 549 5 Segment cluster HSAPHOL_node 25 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T10, HSAPHOLT4, HSAPHOL_T5, HSAPHOLT6, HSAPHOL_T7 and HSAPHOLT8. Table 32 below describes the starting and 10 ending position of this segment on each transcript. Table 32 - Segment location on transcripts Transcript name - Segmeinl starting position Segment ending positiln HSAPHOL T10 1045 1114 HSAPHOL T4 880 949 HSAPHOL T5 925 994 HSAPHOL T6 824 893 HSAPHOL T7 1045 1114 HSAPHOL T8 1045 1114 WO 2005/116850 PCT/IB2005/002555 308 Segment cluster HSAPHOL node 34 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6 HSAPHOLT7 and HSAPHOL_T8. Table 33 below describes the starting and ending position 5 of this segment on each transcript. Table 33 - Segment location on transcripts Transcript name eg ntstring position Seget endin P HSAPHOL T4 1085 1155 HSAPHOL T5 1130 1200 HSAPHOLT6 1029 1099 HSAPHOL T7 1250 1320 HSAPHOL T8 1115 1185 Segment cluster HSAPHOL_node 35 according to the present invention is supported by 10 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLT4, HSAPHOL_T5, HSAPHOL T6 and HSAPHOL_T8. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Transcript name Segient sta ring position Segnent eidIM.; Position HSAPHOLT4 1156 1221 HSAPHOLT5 1201 1266 HSAPHOL_T6 1100 1165 HSAPHOL_T8 1186 1251 15 Segment cluster HSAPHOL_node_36 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLT4, HSAPHOLT5, HSAPHOLT6, WO 2005/116850 PCT/IB2005/002555 309 HISAPHOLT7 and HSAPHOL_T8. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Transcrinpt nflame Segmen2tstartin position segment ending positOn HSAPHOL T4 1222 1276 HSAPHOL T5 1267 1321 HSAPHOL T6 1166 1220 HSAPHOL T7 1321 1375 HSAPHOL T8 1252 1306 5 Segment cluster HSAPHOL_node_41 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7 and HSAPHOLT8. Table 36 below describes the starting and ending position 10 of this segment on each transcript. Table 36 - Segment location on transcripts Tra'nsenpjt name . Segmencit srigpos itionl SegmIlent enlding osiio HSAPHOL T4 1760 1869 HSAPHOL T5 1805 1914 HSAPHOL T6 1704 1813 HSAPHOL T7 1859 1968 HSAPHOL T8 1790 1899 Microarray (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially 15 expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 37. Table 3 7- Oligonucleotides related to this gene WO 2005/116850 PCT/IB2005/002555 310 01igonucleotide name Overexpressed in cancers Cip reference HSAPHOL_0 11 0 Ovarian cancer Ovary Variant protein alignment to the previously known protein: Sequence name: /tmp/rTOip7OHMr/xEFXPsrVLD:PPBT HUMAN 5 Sequence documentation: Alignment of: HSAPHOL P2 x PPBT HUMAN 10 Alignment segment 1/1: Quality: 4926.00 Escore: 0 Matching length: 507 Total 15 length: 507 Matching Percent Similarity: 99.61 Matching Percent Identity: 99.41 Total Percent Similarity: 99.61 Total Percent Identity: 99.41 20 Gaps: 0 Alignment: 47 LCAEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 96 2 5 I I I I I l l I I l l l l l l l l l l I I I l l l l 18 LVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 67 97 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 146 I I111111 1111111 111111 111111111 1IIIl illll llill II WO 2005/116850 PCT/IB2005/002555 311 68 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 117 147 LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT 196 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1i1 1 5 118 LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT 167 197 RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDV 246 168 RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDV 217 10 247 IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSH 296 h)l IIIIII II I i I I i hhhhh: lii 218 IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRHKHSH 267 15 297 FIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAI 346 S I I IIhhhhhhhh hl hIl l 268 FIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAI 317 347 QILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 396 20I Ill l il I l I I I ll I I lil lI I lII 318 QILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 367 397 SEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 446 I li IIII Ihhh lii 11 Ih f li 25 368 SEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 417 447 GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 496 1111111 111 1 1111 hI 111111111111 111111 1111111111 418 GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 467 30 497 AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL 546 WO 2005/116850 PCT/IB2005/002555 312 1 1 1l lll l l l l l l l i I I Il lI I l l l l l I I I I l l l I I I I I I l l l I 468 AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL 517 547 YPLSVLF 553 5 1 1 1 1 1 1 1 518 YPLSVLF 524 10 Sequence name: /tmp/rTOip70HMr/xEFXPsrVLD:AAH21289 15 Sequence documentation: Alignment of: HSAPHOL P2 x AAH21289 Alignment segment 1/1: 20 Quality: 5108.00 Escore: 0 Matching length: 531 Total length: 586 25 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 90.61 Total Percent Identity: 90.61 Gaps: 1 30 Alignment: WO 2005/116850 PCT/IB2005/002555 313 23 PATPRPLSWLRAPTRLCLDGPSPVLCA ....................... 49 IILIII I (I I I I I111 1 I 1 PATPRPLSWLRAPTRLCLDGPSPVLCAGLEHQLTSDHCQPTPSHPRRLHL 50 5 50 ................................
EKEKDPKYWRDQAQETLK 67 111Iill~ll il 51 WAPGIKQVLGCTMISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLK 100 10 68 YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE 117 I llt I I l il l l 111111 I II Il I 101 YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE 150 118 MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR 167 15 i 11111111111 l i 111111111 1111 11l II1 lI I III 151 MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR 200 168 CNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS 217 llll lll ll II)I 111111 iI II 1 1 Il 20 201 CNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS 250 218 DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES 267 251 DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES 300 25 268 DEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 317 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 301 DEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 350 30 318 LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 367 1I1 ll1 lll lill lllIll IIIIIIII llI llIIIIIII I lll I WO 2005/116850 PCT/IB2005/002555 314 351 LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 400 368 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGY 417 i lI II I I i l l l II l i I I 1 I1 III I 1 I 1 5 401 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGY 450 418 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH 467 Illi~illlilllllllllilllllilillillllllll 451 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH 500 10 468 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA 517 501 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA 550 15 518 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 553 551 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 586 20 Sequence name: /tmp/pYLJnulFqm/UcqrrsA3UA:PPBT HUMAN 25 Sequence documentation: Alignment of: HSAPHOL P3 x PPBT HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 315 Quality: 4615.00 Escore: 0 Matching length: 484 Total length: 524 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 Total Percent Similarity: 92.37 Total Percent Identity: 92.18 Gaps: 1 10 Alignment: 1 MISPFLVLAIGTCLTNSLVP .............................. 20 l I l l l l l l I I I I I I I II I 15 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 21 .......... GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60 111111111Il 111111111Iili 111111111i I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 20 61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 110 I I I I l l lI I l l l l1 l l l l l l l l l l l l l l l l llll I I I I I l l l l l 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 25 111 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 160 I l l l l l I l l l l l l I I I l l l l I I l l l l I I I I l l l l l l l 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 210 3 0 l l l l l l l l Il l I Il l l l l l l l l l l Ill l I I I I I IIll l l l l II 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 WO 2005/116850 PCT/IB2005/002555 316 211 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 260 111I l l l l l: I I ii i ll l I l i l lI l Il l l l l il l I I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 5 261 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 310 il lil il Jlil l l II ~ l II l llli I iff l l lll l 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 10 311 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 360 I I I I l l l l I I l l l li l l iIl l l I IIIllI I l lii 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400 361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 1 5 I l l l l l i l l l l ll l l i l i lll l l l I I I I I i ll l lI l 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450 411 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 460 l l l l l l 1 I I I I I l l l l l l ll l l I l i l l ll l illl lll lI I I 20 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500 461 SSAGSLAAGPLLLALALYPLSVLF 484 l i l l ll l II Il l I I I I II 501 SSAGSLAAGPLLLALALYPLSVLF 524 25 30 Sequence name: /tmp/pYLJnulFqm/UcqrrsA3UA:AAH21289 WO 2005/116850 PCT/IB2005/002555 317 Sequence documentation: Alignment of: HSAPHOL P3 x AAH21289 5 Alignment segment 1/1: Quality: 4626.00 Escore: 0 10 Matching length: 484 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.37 Total Percent 15 Identity: 92.37 Gaps: 1 Alignment: 20 1 MISPFLVLAIGTCLTNSLVP .............................. 20 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 21 .......... GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60 25 1 1 1 1 1 1 1 1 1111111111 1 1 1 1 1 1 1 I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 110 30 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 221111111111 30 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 WO 2005/116850 PCT/IB2005/002555 318 111 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 160 I I I l l l Il l I lI IIl ll11 1 l l II l lIlli I 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 5 161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 210 I I I III I I I I I I I I I I I I l l I I I I I I I l l l I I I I I IIII I 1 1 1 i ll 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 211 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 260 1 0 I l I I I I I I l l l l l l II I I l 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 261 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 310 I l I I I l I I l I I I l I I I l l II II I I I I 15 363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 311 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 360 I I 1 l l i l l l lI I I I I I I llFI l l i l I I l l l lIII II l l l l I I 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 20 361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 I I I l l I I I I I ~ II IlI l l I l l l l I l l 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512 25 411 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 460 I I I l I l I I l ~ II I II l l l i l Il l I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 461 SSAGSLAAGPLLLALALYPLSVLF 484 30 II I I 563 SSAGSLAAGPLLLALALYPLSVLF 586 WO 2005/116850 PCT/IB2005/002555 319 5 Sequence name: /tmp/iYbOicGuUc/lMWHKKvSld:PPBT HUMAN Sequence documentation: 10 Alignment of: HSAPHOL P4 x PPBT HUMAN Alignment segment 1/1: 15 Quality: 4517.00 Escore: 0 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 99.78 Total Percent Similarity: 100.00 Total Percent Identity: 99.78 Gaps: 0 25 Alignment: 1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 I I I I I I I III I I I I I l l 1i l il l l l l lI III l l l lII I IIIl l 62 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 111 30 51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 100 WO 2005/116850 PCT/IB2005/002555 320 I I l l l l l l l l l I I I I l l 1I I I I I I l l l l l l l i 1 l 112 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 161 101 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 150 5 I I I II li I I I I I I I I l 162 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 211 151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 200 I I I II l I lI I I I l l l l l l lI l l I I II 10 212 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 261 201 RYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 I: 11111 t 11I1 I 111 I I 11111I 262 RHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 311 15 251 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 I I I 1l l I I 1lil l l l l l l I I l I I I I I I I I l l l l l l l I 312 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 361 20 301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 IIII I I I III II I 1 1 362 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 411 351 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 400 25 III III ll IllI Il ll I I l I Ii l l li l 412 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 461 401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 450 I l l6 l lill I l l l ll l l I l l l l l lGI Il L 5l 1 30 462 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 511 WO 2005/116850 PCT/IB2005/002555 321 451 LLALALYPLSVLF 463 IIIIilliI1ll 512 LLALALYPLSVLF 524 5 10 Sequence name: /tmp/iYbOicGuUc/1MWHKKvS1d:AAH21289 Sequence documentation: Alignment of: HSAPHOL P4 x AAH21289 15 Alignment segment 1/1: Quality: 4528.00 Escore: 0 20 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 25 Identity: 100.00 Gaps: 0 Alignment: 30 1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 lIIIIillllllillliillliIillllillilillllillllllIII WO 2005/116850 PCT/IB2005/002555 322 124 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 173 51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 100 I Il I l i i i II I I I I I I I I l l I I I I II l II l l l l II I 5 174 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 223 101 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 150 I l I I I I I I IIIIII I i l l I I l l I I I I 224 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 273 *10 151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 200 I I I I II I I l l I l l I I I I I I I I I I I I I I I 274 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 323 15 201 RYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 III llII1 lll1 l1 llll1 lI ll lI I lll ll IIIII 324 RYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 373 251 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 2 0 II I III I I I I I I 374 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 423 301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 Ii l l l l l i l l l l l l l l lll l l l l i I I I 25 424 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 473 351 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 400 I I I 11111 111 111111 I 474 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 523 30 401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 450 WO 2005/116850 PCT/IB2005/002555 323 I I[ l l l l l l l l l l l l l I l l l l i I l lI I I I l l l l l l il l l l l l l i 524 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 573 451 LLALALYPLSVLF 463 5 I l l l l lIll l 574 LLALALYPLSVLF 586 10 Sequence name: /tmp/v0YiupJ4xl/W6HH5Tm6Ym:PPBT HUMAN 15 Sequence documentation: Alignment of: HSAPHOL P5 x PPBT HUMAN Alignment segment 1/1: 20 Quality: 4816.00 Escore: 0 Matching length: 502 Total length: 524 25 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.80 Total Percent Similarity: 95.80 Total Percent Identity: 95.61 Gaps: 1 30 Alignment: WO 2005/116850 PCT/IB2005/002555 324 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I II I I II l l l l i l l lIl l l l i l lI l I l l l l I l l 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 5 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I II I I I I I I II I I I I l l l l llI I l l I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 10 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 111111111111111111111111 ti 11111111 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 15 IIIIIl I l II I 1111 11111111 111111 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 li lilillll 1 11 ll i tII li tlll llil 1 11 illllIIII1lt 20 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 I l I l l l l l l :1 I i l l l I l i l l l I I Il l i l l l l Il l I I I I l l l l | 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 25 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 l l l itlII I l l li lI II II lllllII l l l l II 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 30 351 EAVEM ...................... DHSHVFTFGGYTPRGNSIFGLAP 378 I i l I I I I I I I I I I I I I l I I I I I III WO 2005/116850 PCT/IB2005/002555 325 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400 379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 428 IIllIflillil~lIllIlliiIIIll~lllilllilllilllll 5 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450 429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 478 II I l l l l l l l l ii l l i1 i 1 1l i I I I I Il l I l l l II l l I l l i l l ll 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500 10 479 SSAGSLAAGPLLLALALYPLSVLF 502 IIIIIIlllilliIllIIIIIll 501 SSAGSLAAGPLLLALALYPLSVLF 524 15 20 Sequence name: /tmp/v0YiupJ4xl/W6HH5Tm6Ym:AAH21289 Sequence documentation: Alignment of: HSAPHOL P5 x AAH21289 25 Alignment segment 1/1: Quality: 4827.00 Escore: 0 30 Matching length: 502 Total length: 524 WO 2005/116850 PCT/IB2005/002555 326 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.80 Total Percent Identity: 95.80 5 Gaps: 1 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 10 i1 I t t I 1t I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 Il i li l l I I I I l li l l l l l Il il I I l l i i l l 1 1l l I I II 15 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 11111 111111I I II 11(1 I II II 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 20 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I l l l l l ll i l i l i l l l l l i l l l l l l l | 1 1 1 1 1 1 1 1 1 1 1 1 1 i 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 25 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 3 0 i l I l i l l I I l l 1 i l I I l i l I i 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 WO 2005/116850 PCT/IB2005/002555 327 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 III llllIIII I l111111111 111111111 111111111 11111 I 363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 5 351 EAVEM ...................... DHSHVFTFGGYTPRGNSIFGLAP 378 I l l l I I I I I I I I I I I I I I I I I I II I I l l 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 10 379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 428 I I I I l l l lI I I I I I I I I I I lI I I I I I I I I I I I I I I I I I I I I I I I I II 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512 429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 478 1 5 I I I I I I I I l l lI 1lll l l l l lI I I I I I l l l lII I I I I l l l I Il 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 479 SSAGSLAAGPLLLALALYPLSVLF 502 IllllllilllllllllllIIIl 20 563 SSAGSLAAGPLLLALALYPLSVLF 586 25 Sequence name: /tmp/Llylq0ddii/1FFtdNNCUx:PPBT HUMAN Sequence documentation: 30 Alignment of: HSAPHOL P6 x PPBT HUMAN ..
WO 2005/116850 PCT/IB2005/002555 328 Alignment segment 1/1: Quality: 4575.00 5 Escore: 0 Matching length: 479 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 10 Total Percent Similarity: 91.41 Total Percent Identity: 91.22 Gaps: 1 Alignment: 15 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I Il I I Il l l l l l l lI I I I I l l I l l l l lI I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 20 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 l l l I I I I I I I I I I l l l l I I I I I I I Il l l l l l l l l l l l l l l l l l 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 2 5 Il l l l l l l l l I I I Il l l ll I I I I I Il l l l l l l l l l l l l l l l l ll l l l l ll l 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 30 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200llllllllllllllllllllllllllllllllll 30 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 WO 2005/116850 PCT/IB2005/002555 329 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I l l l I 1 1 1 1 l l l l Il l I I l l Il l l l Il I I I I I l l l II 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 5 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL. ............. 287 I I I I I I l l : 1 I I I I I I I I I l l I I I I I I I I I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 288 ................................GGRIDHGHHEGKAKQALH 305 10 I l lII I l l l l l I I II 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 306 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 355 I I I I I I I I I I I I III l l 1 I I I l l I I I I I I I l l I I I I I I l l I I I I I I I 15 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400 356 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 405 I I l i l l I I I I I I I I I I I I I l l l l l l l l I I I Il l l l l I I I I l l l l I I I I I I 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450 20 406 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 455 I I l I l l I I I I I I I I I I I I I I l l l I I I I I l l l I I I I I l l l l I I I 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500 25 456 SSAGSLAAGPLLLALALYPLSVLF 479 IllillllllllllIIIIIIIIIll 501 SSAGSLAAGPLLLALALYPLSVLF 524 30 WO 2005/116850 PCT/IB2005/002555 330 Sequence name: /tmp/Llylq0ddii/1FFtdNNCUx:AAH21289 5 Sequence documentation: Alignment of: HSAPHOLP6 x AAH21289 Alignment segment 1/1: 10 Quality: 4586.00 Escore: 0 Matching length: 479 Total length: 524 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.41 Total Percent Identity: 91.41 Gaps: 1 20 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I li I l l I I I l l I I I l lI I I l l i l li l l l l l l l l l l l l l l l l i l l l i 25 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 30 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 WO 2005/116850 PCT/IB2005/002555 331 I I l l l l l l l l l l I l l l l l l l l I I I I I 1 I I l I II I l l l l l l I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 5 l l l il1 ll l l l l llII l l l l llI I I II1 1l l l ll I I I 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I l l lI Ill I I I I Il l l l l l l I I I ll l l l l I I I I IlI l l l l l II I 10 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL. ............. 287 I I l l l ll l l l l l l I I I l I l l l l l I l l l I I I I I I 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 15 288 ................................ GGRIDHGHHEGKAKQALH 305 I I l l I I I I I lII I 363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 20 306 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 355 l lII I I I I I I I I I l lI l l I I Ill l l l l l l l l l l I I I I I I I I I I l 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 356 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 405 2 5 l ll I l l ll I I I l l l l l l l l l l lI l I l l l Il Il l I I I I II I 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512 406 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 455 I l l I I l l l I I I I I I l l l II I I I I I I V II 1 l l l l l l l l l l I l l 5 30 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 WO 2005/116850 PCT/IB2005/002555 332 456 SSAGSLAAGPLLLALALYPLSVLF 479 III 1l l I I I I l l l11111 I 563 SSAGSLAAGPLLLALALYPLSVLF 586 5 10 Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKcW:PPBT HUMAN Sequence documentation: Alignment of: HSAPHOL P7 x PPBT HUMAN 15 Alignment segment 1/1: Quality: 2574.00 Escore: 0 20 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.62 Total Percent Similarity: 100.00 Total Percent 25 Identity: 99.62 Gaps: 0 Alignment: 30 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I l ll l l l l l l l l I I I l i l I I I I I l I l l l l l l l l l I I I I l l l l l i WO 2005/116850 PCT/IB2005/002555 333 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 IllllllllIIllIIllIlllilIllIIIllIIIllll| 5 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 llll11111 lI 11111111 111l111 lI11111 111111111 li 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 10 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I l 1 ll l l i l l l l l1 I I I l l llll l l l l llll l I I I I I I I l I I I l l l 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 15 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I l l l i l l l l i l I I I I I I l l II l I I I I I I l l l II l l l I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYK 264 2 0 I I lI I l llI 1 : 1 251 DLVDTWKSFKPRHK 264 25 Sequence name: /tmp/KO5Xam2Hdo/CVOGTdjKcW:AAH21289 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 334 Alignment of: HSAPHOLP7 x AAH21289 Alignment segment 1/1: 5 Quality: 2585.00 Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I l l I I l l l l l l l l l l l l l I I I I l I I I l l l l l l l l l 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 20 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I1111 111111 l III 111111 111111111I I II ll lII III 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 25 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I l l Il l l l l l l l l l l l ll l l llll l l l l l l I I I l l l l l II 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 3 0 I I I I l l I I l llI I I I I I l l l l lI l I I I Il l l l l l lI l l l lII I 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 WO 2005/116850 PCT/IB2005/002555 335 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 IIIIIIIIIlllllllllIIIIIlllllii~lllllllllllllllI 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 5 251 DLVDTWKSFKPRYK 264 313 DLVDTWKSFKPRYK 326 10 15 Sequence name: /tmp/K05Xam2Hdo/CVOGTdjKcW:O75090 Sequence documentation: Alignment of: HSAPHOL P7 x 075090 20 Alignment segment 1/1: Quality: 2585.00 Escore: 0 25 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 30 Identity: 100.00 Gaps: 0 WO 2005/116850 PCT/IB2005/002555 336 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 5 I I l l i l l l I l l llI l l l l I llI Il l lll l l Il l l l l l l l i l 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I IIl l l 1 l l l l l I I I I l il I ll l l ll l I I I I I IIlll 10 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I l l I I I I I l l l l l l l l l II I I l l l l l l l l l l II l l l l I l il 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 15 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I111 IIIII lII I ll l IIl l l1 IIIII IIl ll lll ll llll l 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 20 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 11 lill lI l l l l I ll I l llil l l li i i l l l ll l ll l l I I l l l I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYK 264 2 5 I l l i l I l l1 251 DLVDTWKSFKPRYK 264 30 WO 2005/116850 PCT/IB2005/002555 337 Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll:PPBT HUMAN Sequence documentation: 5 Alignment of: HSAPHOL P8 x PPBT HUMAN Alignment segment 1/1: 10 Quality: 2819.00 Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent 15 Identity: 99.65 Total Percent Similarity: 100.00 Total Percent Identity: 99.65 Gaps: 0 20 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I l l II III I I I I I I I I I I I I I I I I I I I I I I lII l lI l i i i l 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 25 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I II l IIl I llll l IIII lII I I I I I I I I lI I II l lI I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 30 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I l lII I I I II l iII III I l l l I l l lI l l llII I II I I II WO 2005/116850 PCT/IB2005/002555 338 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 II I I I III II ll 111 111 11 11111 I1}l l 5 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 1i i I I Il li l li l l I l i l I l l l I I l l l il l l l l illl l li l 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 10 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 iili l i iI ll1:i I ll l ll lllI llllII IIIl 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLG 288 15 20 Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll:AAH21289 Sequence documentation: Alignment of: HSAPHOL P8 x AAH21289 25 Alignment segment 1/1i: Quality: 2830.00 Escore: 0 30 Matching length: 288 Total length: 288 WO 2005/116850 PCT/IB2005/002555 339 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 1 0 i l l l I I l I l l l I I I I l l l l l l l l l l l i 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 1 l 1 l l I I l I l l l I I I I I I l II I I I II I I I I I I l I I 15 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I ll l l I ll lI l II I I l i i i l I l I lI lII I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 20 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I Ili II II ll ll I II III ll lIll lII 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 25 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I II l l I I I I I I I I IIIIl I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 3 0 II I I i i i i i iIIII I I I II I I I I I I I I I I I 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 350 WO 2005/116850 PCT/IB2005/002555 340 5 Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll:075090 Sequence documentation: 10 Alignment of: HSAPHOL P8 x 075090 Alignment segment 1/1: 15 Quality: 2830.00 Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 IlilllllllfllilllllllllllllfllllfllllIIllfilllllI 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 30 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 WO 2005/116850 PCT/IB2005/002555 341 II l l l l l l l I I Il l I I I I I I I I l l l I I I Il l l l l llll1 11 I l l l l 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 5 llllIll l i IIIllll I I ililllII llIII1 il l 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I l l I l i i i I I I I I I I I I I I I Il lI ll li 10 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 Il l l l l ll II I I I I I I I I l l I Il l l il l l I I I l llI I I Il lI 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 15 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 lIIllllllllIIIlliillllliilllillllII 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 20 DESCRIPTION FOR CLUSTER T10888 Cluster T10888 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 25 Table 1 - Transcripts of interest Transcript Name SEQ ID NO': T10888_PEA_1_T1 44 T10888 PEA 1_T4 45 T10888_PEAlT5 46 T10888_PEA_1 T6 47 WO 2005/116850 PCT/IB2005/002555 342 Table 2 - Segments of interest Segment Name ,.SEQ I NO: T10888_PEA I node I I 48 T10888 PEA _ node 12 49 T10888_PEA I node_17 50 T10888_PEA _1 node 4 51 T10888_PEA 1 node 6 52 T10888 PEAI node 7 53 T10888 PEA _ node 9 54 T10888 PEA 1 node_15 55 Table 3 - Proteins of interest Proti Name SEQ ID NO: T10888_PEA_1 P2 57 T10888_PEA_1 P4 58 T10888_PEA _1 P5 59 T10888_PEA 1 P6 60 5 These sequences are variants of the known protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor (SwissProt accession identifier CEA6_HUMAN; known also according to the synonyms Normal cross-reacting antigen; Nonspecific crossreacting antigen; CD66c antigen), SEQ ID NO:56, referred to herein as the previously known protein. The sequence for protein Carcinoembryonic antigen-related cell adhesion molecule 6 10 precursor is given at the end of the application, as "Carcinoembryonic antigen-related cell adhesion molecule 6 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP poitio s o ent .-.. ainfo aicid sequence, WO 2005/116850 PCT/IB2005/002555 343 138 F -> L 239 V-> G Protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor localization is believed to be attached to the membrane by a GPI-anchor. The previously known protein also has the following indication(s) and/or potential 5 therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. 10 The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following 15 annotation(s) were found: signal transduction; cell-cell signaling, which are annotation(s) related to Biological Process; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBI Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available 20 from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster T10888 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of 25 the table and the numbers on the y-axis of Figure 9 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
WO 2005/116850 PCT/IB2005/002555 344 Overall, the following results were obtained as shown with regard to the histograms in Figure 9 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma. 5 Table 5 - Normal tissue distribution N ame of Tissue Nuinhe bladder 0 colon 107 epithelial 52 general 22 head and neck 40 lung 237 breast 0 pancreas 32 prostate 12 stomach 0 Table 6 - P values and ratios for expression in cancerous tissue Name ofisSue PP2 SPI R SP2 R4 bladder 5.4e-01 3.4e-01 5.6e-01 1.8 4.6e-01 1.9 colon 1.2e-01 1.7e-O1 2.8e-05 3.7 7.9e-04 2.8 epithelial 3.3e-02 2.1e-01 2.8e-20 2.8 4.8e-10 1.9 general 3.3e-05 2.2e-03 1.9e-44 4.9 4.6e-27 3.3 head and neck 4.6e-01 4.3e-01 1 0.8 7.5e-01 1.0 lung 7.6e-01 8.2e-01 8.9e-01 0.6 1 0.3 breast 3.7e-02 4.1e-02 1.5e-O1 3.3 3.1e-01 2.4 pancreas 2.6e-01 2.4e-01 8.6e-23 2.8 1.5e-19 4.5 prostate 9.1e-01 9.3e-01 4.le-02 1.2 1.Oe-01 1.0 WO 2005/116850 PCT/IB2005/002555 345 stomach 4.5e-02 5.6e-02 5.le-04 4.1 4.7e-04 6.3 As noted above, cluster T10888 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor. A description of each 5 variant protein according to the present invention is now provided. Variant protein T10888_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEAIT1. An alignment is given to the known protein (Carcinoembryonic antigen 10 related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 15 Comparison report between T10888_PEA 1_P2 and CEA6_HUMAN: 1.An isolated chimeric polypeptide encoding for T10888_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG 20 FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVS corresponding to amino acids 1 - 319 of CEA6_HUMAN, which also corresponds to amino acids 1 - 319 of T1 0888_PEA_l1P2, and a 25 second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP corresponding to amino acids 320 - 324 of T10888_PEAlP2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA 1_P2, comprising a 30 polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, WO 2005/116850 PCT/IB2005/002555 346 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888 PEA_1 P2. The location of the variant protein was determined according to results from a number of 5 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 10 Variant protein T10888_PEA_1 P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888 PEA 1 P2 sequence provides support for the deduced sequence of this variant protein according to the 15 present invention). Table 7 - Amino acid mutations SNP position(s} on amino aci Aternive amino acid() Previously known SNP? 13 V ->No 232 N -> D No 324 P -> No 63 I -> No 92 G -> No Variant protein T10888_PEAlP2 is encoded by the following transcript(s): T10888_PEA _IT1, for which the sequence(s) is/are given at the end of the application. The 20 coding portion of transcript T10888_PEA 1_TI is shown in bold; this coding portion starts at position 151 and ends at position 1122. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of WO 2005/116850 PCT/IB2005/002555 347 known SNPs in variant protein T10888_PEA_I_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs SNP position Oni uclotide Alternative nuclicaci Previously kliuin SNP? 119 C -> T No 120 A ->T No 1062 A-> G Yes 1120 C-> No 1297 G ->T Yes 1501 A-> G Yes 1824 G -> A No 2036 A-> C No 2036 A-> G No 2095 A-> C No 2242 A -> C No 2245 A ->C No 189 C-> No 2250 A ->T Yes 2339 C ->A Yes 276 G-> A Yes 338 T-> No 424 G-> No 546 A-> G No 702 C ->T No 844 A-> G No 930 C ->T Yes 5 WO 2005/116850 PCT/IB2005/002555 348 Variant protein T10888_PEA_l_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA 1 T4. An alignment is given to the known protein (Carcinoembryonic antigen related cell adhesion molecule 6 precursor) at the end of the application. One or more 5 alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10888_PEA_1_P4 and CEA6_HUMAN: 10 1.An isolated chimeric polypeptide encoding for T10888_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL 15 WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL corresponding to amino acids 1 - 234 of CEA6_HUMAN, which also corresponds to amino acids 1 - 234 of T10888_PEA 1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 LLLSSQLWPPSASRLECWPGWL corresponding to amino acids 235 - 256 of T10888 PEA 1 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888 PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 25 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA 1_P4. Comparison report between T10888_PEA_1_P4 and Q13774 (SEQ NO:959): 1.An isolated chimeric polypeptide encoding for T10888_PEA 1_P4, comprising a first 30 amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 349 MGPPSAPPCRLHVPWKEVLLTAS LLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 5 corresponding to amino acids I - 234 of Q13774, which also corresponds to amino acids 1 - 234 of T10888_PEA 1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL corresponding to amino acids 235 - 256 ofT10888_PEA 1 P4, wherein said first and second 10 amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_IP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888 PEA_1 P4. 15 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 20 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein T10888_PEA_1 P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 25 the SNP is known or not; the presence of known SNPs in variant protein T10888 PEA 1 P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations osiin(s) on ammo acid Aherna ive anino .acid(s) Previo'ly knwS sequence J1 WO 2005/116850 PCT/IB2005/002555 350 13 V-> No 232 N ->D No 63 I -> No 92 G-> No Variant protein T10888_PEA lP4 is encoded by the following transcript(s): T10888_PEAIT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA_1_T4 is shown in bold; this coding portion starts at 5 position 151 and ends at position 918. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA 1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 10 - Nucleic acid SNPs SNP positti oni nucleotide Arnative nuCcic acId PreIou\Isly, knownl SNP? sequLence 119 C -> T No 120 A -> T No 978 C -> No 1155 G ->T Yes 1359 A -> G Yes 1682 G ->A No 1894 A ->C No 1894 A ->G No 1953 A ->C No 2100 A-> C No 2103 A -> C No 2108 A-> T Yes 189 C -> No 2197 C -> A Yes WO 2005/116850 PCT/IB2005/002555 351 276 G -> A Yes 338 T-> No 424 G -> No 546 A ->G No 702 C ->T No 844 A -> G No 958 G-> No Variant protein T10888_PEA_1 P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 T10888_PEAIT5. An alignment is given to the known protein (Carcinoembryonic antigen related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between T10888_PEAlP5 and CEA6_HUMAN: 1 .An isolated chimeric polypeptide encoding for T10888_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG 15 FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVSG corresponding to amino acids 1 - 320 of CEA6_HUMAN, which also corresponds to amino acids 1 - 320 of T10888_PEA 1_P5, and a 20 second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF
WO 2005/116850 PCT/IB2005/002555 352 VVFCFLISHV corresponding to amino acids 321 - 390 of T10888_PEA l P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail ofTI0888_PEA_1 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888_PEA_1 P5. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal peptide prediction programs agree that this protein has a signal peptide, both trans-membrane 15 region prediction programs predict that this protein has a trans-membrane region downstream of this signal peptide.. Variant protein T10888_PEA_1 P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 20 the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1 _P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNP position(s) on amino acid Alterative amino acid(s) Previously known SNP'.. ~sequenfce~ ~ 13 V-> No 232 N ->D No 63 I-> No 92 G -> No WO 2005/116850 PCT/IB2005/002555 353 Variant protein T10888_PEA_1 P5 is encoded by the following transcript(s): T10888_PEA I T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript TI0888_PEA_1 T5 is shown in bold; this coding portion starts at position 151 and ends at position 1320. The transcript also has the following SNPs as listed in 5 Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNP position onl nu~cleolide AlrnAfti Inucleic acid Previously known SNP ~sequece 7H> 119 C -> T No 120 A-> T No 1062 A-> G Yes 1943 C ->A Yes 2609 C -> T Yes 2647 C ->G No 2701 C ->T Yes 2841 T-> C Yes 189 C -> No 276 G -> A Yes 338 T-> No 424 G-> No 546 A ->G No 702 C ->T No 844 A ->G No 930 C ->T Yes 10 WO 2005/116850 PCT/IB2005/002555 354 Variant protein T10888_PEA 1 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888 PEA_1_T6. An alignment is given to the known protein (Carcinoembryonic antigen related cell adhesion molecule 6 precursor) at the end of the application. One or more 5 alignments to one or more previously published protein sequences are given at the end of the application. Comparison report between T10888_PEA_1 P6 and CEA6_HUMAN: I.An isolated chimeric polypeptide encoding for T10888_PEAI P6, comprising a first amino acid sequence being at least 90 % homologous to 10 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLA HNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQ NDTGFYTLQVIKSDLVNEEATGQFHVY corresponding to amino acids 1 - 141 of CEA6_HUMAN, which also corresponds to amino acids 1 - 141 of T10888 PEA_1_P6, and a second amino acid sequence being at least 15 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI corresponding to amino acids 142 - 183 of T10888_PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 20 2.An isolated polypeptide encoding for a tail of T10888_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI in T10888_PEA_1_P6. 25 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 30 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
WO 2005/116850 PCT/IB2005/002555 355 Variant protein T10888_PEA_I_P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1 _P6 5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations SNPI position(s) on amino acid Mternativkeamino acid4s) Priously known SNP ~sequence~ 13 V-> No 63 I-> No 92 G-> No Variant protein T10888_PEA_1_P6 is encoded by the following transcript(s): 10 T10888_PEAlT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA 1 T6 is shown in bold; this coding portion starts at position 151 and ends at position 699. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 15 known SNPs in variant protein T10888_PEA 1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs SNP position on nucleotide Alternative uceic acid reviously known SNP? 119 C ->T No 120 A ->T No 189 C-> No 276 G ->A Yes 338 T-> No WO 2005/116850 PCT/IB2005/002555 356 424 G-> No 546 A -> G No As noted above, cluster T10888 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster T10888_PEA _I_ node_11 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA1__TI and T10888_PEAIT5. Table 10 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Transcffpt name Segmen(startig position Segmelit ending position TI0888_PEA 1_TI 854 1108 T10888_PEA_1_T5 854 1108 Segment cluster T10888_PEA_1_node_12 according to the present invention is supported 15 by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Trmscript name Segment sting position Segmetendigpositi or T10888_PEA 1 T5 1109 3004 20 Segment cluster T10888_PEA 1_node_17 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment WO 2005/116850 PCT/IB2005/002555 357 can be found in the following transcript(s): T10888_PEA ITI and T10888_PEA _lT4. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17- Segment location on transcripts Iranscrit name S Iegment startIngMpostion Segment ending positiOn , T10888 PEA 1 T1 1109 2518 T10888 PEA 1 T4 967 2376 5 Segment cluster T10888_PEA 1 node_4 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1 T1, T10888_PEA_1_T4, T10888_PEAlT5 and T10888_PEAIT6. Table 18 below describes the starting and ending 10 position of this segment on each transcript. Table 18 - Segment location on transcripts Trnscript narne Segment starting position Segment ending position T10888 PEA 1 TI 1 214 T10888 PEA 1 T4 1 214 T10888PEA 1 T5 1 214 T10888 PEA 1 T6 1 214 Segment cluster T10888_PEA_1_node_6 according to the present invention is supported 15 by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1, T10888_PEA 1_T4, T10888_PEA_1_T5 and T10888_PEA_1lT6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Tri ptname Segment starting position Segent ending posi on T10888_PEA_1_TI 215 574 WO 2005/116850 PCT/IB2005/002555 358 T10888_PEA 1_T4 215 574 T10888_PEA_1_T5 215 574 T10888_PEA 1_T6 215 574 Segment cluster T10888_PEA_1 node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): T10888_PEA_1 T6. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts 1Transcriptn name Segment s n iStin position T10888_PEA 1 T6 575 1410 10 Segment cluster T10888_PEA_1_node_9 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA 1_TI, T10888 PEAlT4 and T10888_PEA 1_T5. Table 21 below describes the starting and ending position of this segment on each transcript. 15 Table 21 - Segment location on transcripts Trnscript n1ame4 Segmient starting position Segmelit en-ding- position T10888_PEA 1 TI 575 853 T10888_PEA 1_T4 575 853 T10888_PEA_1 T5 575 853 According to an optional embodiment of the present invention, short segments related to 20 the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
WO 2005/116850 PCT/IB2005/002555 359 Segment cluster TI0888_PEA_ I node_15 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888 PEA_1_T4. Table 22 below describes the 5 starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts f r a s e n p n a e S e m e n st a ti n g p o s ti o nS e g e t f d in g~ p o s itio n . T10888_PEA 1 T4 854 966 10 Variant protein alignment to the previously known protein: Sequence name: /tmp/tM4EgaoKvm/vuztUrlRc7:CEA6 HUMAN 15 Sequence documentation: Alignment of: T10888 PEA 1 P2 x CEA6 HUMAN Alignment segment 1/1: 20 Quality: 3163.00 Escore: 0 Matching length: 319 Total length: 319 25 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 360 Gaps: 0 Alignment: 5 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 1 I I I II l l l I I I I I I I I I llII I I I I I I I I I I I I I III I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 10 I I II I I I I I I I I I I I l I I l 1 1 II I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 15 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I l l l l l l l lI l l i I I1 l I I I I I I l l l l l l l l l l l l l l l I l 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 20 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 III l l l l l I l l I I I I I l l I 1l l I I I l II I I I I I I l l I l l 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 25 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 I l l i l i l l l I I I I I I I I I l l 1 I I l l l l 1 l l l l l l l I I I II I I 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 301 AHNSATGLNRTTVTMITVS 319 30 IlI 301 AHNSATGLNRTTVTMITVS 319 WO 2005/116850 PCT/IB2005/002555 361 5 Sequence name: /tmp/Yjllgj7TCe/PgdufzLOlW:CEA6 HUMAN Sequence documentation: 10 Alignment of: T10888 PEA 1 P4 x CEA6 HUMAN Alignment segment 1/1: 15 Quality: 2310.00 Escore: 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 Il ll i l ll l l i l l lil l l l i I II I l l l l l ii ll il l li l li 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 30 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 WO 2005/116850 PCT/IB2005/002555 362 ilii 1 1i 11 I I I IIII III I tt t t ll t t lII 1111 111 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 5 1111 111111i I i ii 1 11 |11 1 |ii 1i l iI I IIl II I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 i i l li I l l i l l l l l l l l l l i i i l l i l l lli l iIl l l I i ll i l lI I I 10 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I I I f li l l il l i l l i l lII I I1 i l l Il lI i l l i 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 15 20 Sequence name: /tmp/Yjllgj7TCe/PgdufzLOlW:Q13774 Sequence documentation: 25 Alignment of: T10888 PEA 1 P4 x Q13774 Alignment segment 1/1: Quality: 2310.00 30 Escore: 0 WO 2005/116850 PCT/IB2005/002555 363 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I l l l l ll l l l l I I I I I I I I I I I l l I I I I I I I I I I I I I I I I I I Il 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 15 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 IllllIIIIIIIIIIIIIIIIIIIIllIIIIIIllllIIIII 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 2 0 I I I I I I I I I I I I I I I I I I I I I I I I III I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 l I l l I I I I I I I I I I I I I I I I l l l l l l l l l l l l I I I I II I I l l l I I I 25 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I I I lI lI I I I I I I I I I I I I l II I I I I I II 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 30 WO 2005/116850 PCT/IB2005/002555 364 5 Sequence name: /tmp/x5xDBacdpj/rTXRGepv3y:CEA6 HUMAN Sequence documentation: Alignment of: T10888 PEA 1 P5 x CEA6 HUMAN 10 Alignment segment 1/1: Quality: 3172.00 Escore: 0 15 Matching length: 320 Total length: 320 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 20 Identity: 100.00 Gaps: 0 Alignment: 25 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I l I I I I I I I I I I I I I l I I I I I I II l l l l l l l II I lI II I II I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 3 0 I I I I I I l l l lI Il llIIIII l l I l l lI l llII l llI I I I I I I I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 WO 2005/116850 PCT/IB2005/002555 365 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I111111i l llllll illll ll I111111111 II III Ill III 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 5 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I llll IllIII II 111111 l l l 1111111111 11111Til1 ll i 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 10 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 lIIIIIIlllllIIllIIIIIlliIIIIIIlllIIIIIIIIll 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 1 5 l l l l I I I 1 1 I l l l llIl l I I I l l I I I I Il lll l l ll I I l l 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 301 AHNSATGLNRTTVTMITVSG 320 IIll IIIlllilliIIIII 20 301 AHNSATGLNRTTVTMITVSG 320 25 Sequence name: /tmp/VAhvYFeatq/QNEM573uCo:CEA6_HUMAN Sequence documentation: 30 Alignment of: T10888 PEA 1 P6 x CEA6 HUMAN ..
WO 2005/116850 PCT/IB2005/002555 366 Alignment segment 1/1: Quality: 1393.00 5 Escore: 0 Matching length: 143 Total length: 143 Matching Percent Similarity: 99.30 Matching Percent Identity: 99.30 10 Total Percent Similarity: 99.30 Total Percent Identity: 99.30 Gaps: 0 Alignment: 15 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 lI l II I lI II I I l ll I I I I I I II I I I I I I I l I I I I I I l I I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 20 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 li iI l l ll l IlI I II I I I III I I l l II III lIIII IIIII I I I l l 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYRE 143 2 5 lII IllI Il lII l I IlI I l l lI l lI l lI l 1 I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPE 143 Alignment of: T10888 PEA 1 P6 x CEA6 HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 367 Quality: 101.00 Escore: 0 Matching length: 141 Total length: 183 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 77.05 Total Percent Identity: 77.05 Gaps: 1 10 Alignment: 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I l l I I I I llIl l I I I I l l I l l l l llll l l l l lli l i 15 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 | | 1 |l l l l l l l l l l l l l l l l i ll I l l l i ll l l l l l l l l l i I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 20 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYREYFHMTSG 150 IllIllllllllllllllllllllIIIIIIllIlllll 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVY. ......... 141 25 151 CWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI 183 141 ................. ................ . ... 141 Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T 30 transcripts which are detectable by amplicon as depicted in sequence name T juncl 1-17 in normal and cancerous ovary tissues WO 2005/116850 PCT/IB2005/002555 368 Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to juncll-17, T juncll-17 amplicon(s) and T10888juncl 1-17F and TO10888juncl 1-17R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; 5 amplicon - PBGD-amplicon), HPRTI (GenBank Accession No. NM 000194; amplicon HPRTI-amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA amplicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each 10 RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 10 is a histogram showing over expression of the above-indicated CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts in 15 cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 20 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 10, the expression of CEA6_HUMAN Carcinoembryonic 20 antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45 48, 71,Table 1, "Tissue samples in testing panel") and including benign samples (samples No. 56-65). Notably an over-expression of at least 20 fold was found in 25 out of 43 adenocarcinoma samples. 25 Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEA6 HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 3.79E-02.
WO 2005/116850 PCT/IB2005/002555 369 Threshold of 20 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.97E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. 5 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: T10888juncl 1-17F forward primer; and T10888juncl l-17R reverse primer. The present invention also preferably encompasses any amplicon obtained through the 10 use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T10888juncl I 17 T10888junc 1-17F (SEQ ID NO:960) CCAGCAATCCACACAAGAGCT 15 T10888juncll-17R (SEQ ID NO:961) CAGGGTCTGGTCCAATCAGAG TIO888juncl 1-17 (SEQ ID NO:962) CCAGCAATCCACACAAGAGCTCTTTATCCCCAACATCACTGTGAATAATAGCGGAT CCTATATGTGCCAAGCCCATAACTCAGCCACTGGCCTCAATAGGACCACAGTCACG 20 ATGATCACAGTCTCTGATTGGACCAGACCCTG 25 Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T 30 transcripts which are detectable by amplicon as depicted in sequence name T junc I1-17 in different normal tissues.
WO 2005/116850 PCT/IB2005/002555 370 Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to T10888 juncl 1-17 amplicon(s) and T10888 junc l - 17F and T10888 junc 1- 17R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPLl 9 (GenBank Accession No. NM_000981; RPLI 9 amplicon), 5 TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the 10 median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2 above, "Tissue samples in normal panel") to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 11, presenting the histogram showing the expression of T transcripts which are detectable by amplicon as depicted in sequence name T 15 juncl 1-17, in different normal tissues. Amplicon and primers are as above. DESCRIPTION FOR CLUSTER HSECADH Cluster HSECADH features 4 transcript(s) and 30 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end 20 of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Name SFQ ID NO HSECADH T11 61 HSECADHT18 62 HSECADHT19 63 HSECADH T20 64 Table 2 - Segments of interest Segnennt Namet..
SEQ:ID.NO:
WO 2005/116850 PCT/IB2005/002555 371 HSECADHnode_0 65 HSECADH node_ 14 66 HSECADH node_15 67 HSECADHnode_21 68 HSECADH node_22 69 HSECADH node 25 70 HSECADH node 26 71 HSECADH node 48 72 HSECADH node 52 73 HSECADH node 53 74 HSECADH node 54 75 HSECADH node 57 76 HSECADH node 60 77 HSECADH node 62 78 HSECADH node 63 79 HSECADH node 7 80 HSECADH node 1 81 HSECADH node 11 82 HSECADH node 12 83 HSECADH node 17 84 HSECADH node 18 85 HSECADHnode_ 19 86 HSECADHnode_3 87 HSECADHnode_42 88 HSECADHnode_45 89 HSECADHnode_46 90 HSECADHnode_55 91 HSECADHnode_56 92 HSECADHnode_58 93 HSECADH node 59 94 WO 2005/116850 PCT/IB2005/002555 372 Table 3 - Proteins of interest Poe N ______ ~SEQ ID NO: HSECADH P9 96 HSECADH P13 97 HSECADH P14 98 HSECADH P15 99 These sequences are variants of the known protein Epithelial-cadherin precursor 5 (SwissProt accession identifier CADI_HUMAN; known also according to the synonyms E cadherin; Uvomorulin; Cadherin-1; CAM 120/80), SEQ ID NO:95, referred to herein as the previously known protein. The variant proteins according to the present invention are variants of a known diagnostic marker, called E-Cadherin. 10 Protein Epithelial-cadherin is known or believed to have the following function(s): Cadherins are calcium dependent cell adhesion proteins. They preferentially interact with themselves in a homophilic manner in connecting cells; cadherins may thus contribute to the sorting of heterogeneous cell types. E-cadherin has a potent invasive suppressor role. It is also a ligand for integrin alpha- E/beta-7. The sequence for protein Epithelial-cadherin precursor is 15 given at the end of the application, as "Epithelial-cadherin precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein S NP positon)(s) on comment 123 H -> Y (in diffuse gastric cancer). /FTId=VAR_001306. 193 T -> P (in diffuse gastric cancer). /FTId=VAR_001307. 418 - 423 Missing (in gastric carcinoma). /FTId=VAR 001313. 463 E -> Q (in diffuse gastric cancer). /FTId=VAR_001314. 470 T-> I./FTId=VAR 001315.
WO 2005/116850 PCT/IB2005/002555 373 473 V -> D (in diffuse gastric cancer). /FTId=VAR_001317. 487 V-> A (in HDGC). /FTId=VAR_008713. 592 A -> T (in thyroid cancer; may play a role in colorectal carcinogenesis). /FTId=VAR_001318. 598 R-> Q (in diffuse gastric cancer). /FTId=VAR 001319. 617 A -> T (in endometrial cancer; loss of heterozygosity). /FTId=VAR 001320. 711 L-> V (in endometrial cancer). /FTId=VAR_001321. 838 S -> G (in ovarian cancer; loss of heterozygosity). /FTId=VAR 001322. 244 D -> G (in HDGC). /FTId=VAR_008712. 10 A-> G 16 -51 QVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRV > RSPLGSQERSPPPCLTRELHVHGAPAPPEKRPR 68 -75 YFSLDTRF -> IFLTPIP 95 - 102 QIHFLVYA -> TDPFLGLR 483 A -> G 530 A->R 543 S -> F 615 I ->H 634 -636 ASA -> RVP 868 R->P 270 S -> A (may contribute to prostate cancer). /FTId=VAR 013970. 882 D -> H 274 - 277 Missing (in gastric adenocarcinoma). /FTId=VAR_001308. 315 N -> S (in lobular breast carcinoma). /FTId=VAR 001309. 336 E -> D./FTId=VAR 001310. 340 T -> A (in HDGC and colorectal cancer). /FTId=VAR 013971.
WO 2005/116850 PCT/IB2005/002555 374 370 D -> A (in diffuse gastric cancer). /FTId=VAR_001311. 400 Missing (in gastric carcinoma; loss of heterozygosity). /FTId=VAR 001312. Protein Epithelial-cadherin localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion; homophilic cell adhesion, which are annotation(s) 5 related to Biological Process; calcium binding; protein binding, which are annotation(s) related to Molecular Function; and membrane; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBI Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available 10 from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSECADH can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of 15 the table and the numbers on the y-axis of Figure 12 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in 20 Figure 12 and Table 5. This cluster is oxerexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues and ovarian carcinoma. Table 5 - Normal tissue distribution Name of Tissue Nui ,ber bladder 41 brain 3 WO 2005/116850 PCT/IB2005/002555 375 colon 299 epithelial 190 general 67 head and neck 10 kidney 103 liver 9 lung 93 breast 52 ovary 0 pancreas 105 prostate 279 skin 457 stomach 659 Thyroid 64 uterus 118 Table 6 - P values and ratios for expression in cancerous tissue Name oFissue. P1 P2 SPl R SP2. R4 bladder 3.9e-01 3.4e-01 4.1e-01 1.7 3.8e-01 1.7 brain 3.7e-01 4.9e-01 1 1.4 1 1.0 colon 6.6e-01 7.4e-01 9.5e-01 0.6 9.3e-01 0.5 epithelial 1.3e-01 6.8e-01 9.5e-01 0.8 1 0.5 general 1.6e-06 1.5e-03 6.3e-05 1.5 5.6e-01 0.9 head and neck 1.5e-01 2.7e-01 4.6e-01 2.1 7.5e-01 1.2 kidney 8.3e-01 8.7e-01 9.9e-01 0.4 1 0.3 liver 4.4e-01 6.9e-01 1 1.7 6.9e-01 1.5 lung 7.2e-01 8.8e-01 7.5e-01 0.9 9.9e-01 0.4 breast 7.5e-02 1.1e-01 3.le-01 1.7 5.1e-01 1.2 ovary 4.5e-02 3.6e-02 4.7e-03 3.8 1.4e-02 3.5 WO 2005/116850 PCT/IB2005/002555 376 pancreas 5.5e-01 6.5e-01 2.4e-01 0.9 5.2e-01 0.7 prostate 8.1e-01 8.5e-01 6.4e-01 0.8 9.0e-01 0.6 skin 5.7e-01 7.4e-01 1 0.0 1 0.1 stomach 2.2e-01 5.2e-01 1 0.2 1 0.1 Thyroid 5.5e-01 5.5e-01 4.4e-01 1.6 4.4e-01 1.6 uterus 5.0e-02 2.4e-01 1.0e-01 1.3 5.8e-01 0.8 As noted above, cluster HSECADH features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Epithelial cadherin precursor. A description of each variant protein according to the present invention is now provided. 5 Variant protein HSECADH_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH_TI 1. An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are 10 given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSECADH_P9 and Q9UII7 (SEQ ID NO:963): 1.An isolated chimeric polypeptide encoding for HSECADHP9, comprising a first amino 15 acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 20 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 274 of Q9UII7, which also corresponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG WO 2005/116850 PCT/IB2005/002555 377 corresponding to amino acids 275 - 322 of HSECADH_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADHP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 5 at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. Comparison report between HSECADH_P9 and Q9UII8 (SEQ ID NO:964): 10 1 .An isolated chimeric polypeptide encoding for HSECADHP9, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI 15 KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 274 of Q9UII8, which also corresponds to amino acids 1 - 274 of HSECADHP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 20 the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADHP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 25 at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. Comparison report between HSECADH_P9 and CADI_HUMAN: 1.An isolated chimeric polypeptide encoding for HSECADHP9, comprising a first amino 30 acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED
WO 2005/116850 PCT/IB2005/002555 378 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPP1SCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG corresponding to amino acids 1 5 274 of CADI_HUMAN, which also corresponds to amino acids 1 - 274 of HSECADH_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to 10 amino acids 275 - 322 of HSECADH_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 15 TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 20 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSECADH_P9 also has the following non-silent SNPs (Single Nucleotide 25 Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 379 SNP position(s), on aminoacid Alternative amino acid(s) Previously known SNP? sequence 127 P ->T No 141 T->A No 276 A ->V No Variant protein HSECADHP9 is encoded by the following transcript(s): HSECADH_T1 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADH_T Il is shown in bold; this coding portion starts at 5 position 125 and ends at position 1090. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADHP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 8 - Nucleic acid SNPs 'SNPposition on nicleotIde Alternative( 'ncleic acld~ knowiix SNIP? 71 G-> C Yes 469 G-> A Yes 1487 C -> T Yes 1556 C -> A Yes 1556 C -> G Yes 1556 C -> T Yes 1603 G -> A Yes 1604 G -> A Yes 1688 A-> G Yes 1712 T-> No 1890 T-> G No 1895 T-> G No WO 2005/116850 PCT/IB2005/002555 380 503 C->A No 2090 C -> T Yes 2621 T->A Yes 2621 T->C Yes 2621 T->G Yes 2797 ->G No 2849 G->A No 2992 A -> C No 3027 C -> G No 3029 C -> A No 3134 T-> No 545 A-> G No 3211 T-> No 3258 A-> G No 3336 T->C Yes 847 A-> G No 951 C -> T No 1331 T->C No 1377 G->A No 1487 C ->A Yes 1487 C -> G Yes Variant protein HSECADH_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADHT18. 5 An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: WO 2005/116850 PCT/IB2005/002555 381 Comparison report between HSECADH_P13 and Q9UII7: l.An isolated chimeric polypeptide encoding for HSECADH_P13, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 5 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL 10 STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids I - 379 of Q9UII7, which also corresponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 of HSECADH_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 15 Comparison report between HSECADH_P13 and Q9UII8: 1 .An isolated chimeric polypeptide encoding for HSECADHP 13, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT 20 VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids 1 - 379 of Q9UII8, which 25 also corresponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 ofHSECADH_P 13, wherein said first and second amino acid sequences are contiguous and in a sequential order. Comparison report between HSECADHP13 and CADIHUMAN: WO 2005/116850 PCT/IB2005/002555 382 1.An isolated chimeric polypeptide encoding for HSECADH P13, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT 5 VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT corresponding to amino acids 1 - 379 of CADI_HUMAN, 10 which also corresponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL corresponding to amino acids 380 - 382 of HSECADH_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of 15 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 20 Variant protein HSECADH_Pl13 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH_P 13 sequence provides support for the deduced sequence of this variant protein according to the 25 present invention). Table 9 - Amino acid mutations SNP positions) on ammno acid Alterative ammIo acid(s)- Previously known SNP" sequtenice 127 P ->T No 141 T->A No WO 2005/116850 PCT/IB2005/002555 383 Variant protein HSECADH_P13 is encoded by the following transcript(s): HSECADH_TI 8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADH_T18 is shown in bold; this coding portion starts at 5 position 125 and ends at position 1270. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 10 - Nucleic acid SNPs SNP position on nucleotide Alternativei nuclCicacid Previously known SNP. 71 G-> C Yes 469 G ->A Yes 503 C ->A No 545 A ->G No 847 A ->G No 1545 A -> G Yes Variant protein HSECADH_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH_T19. 15 An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 20 Comparison report between HSECADH_P14 and Q9UII7: 1 .An isolated chimeric polypeptide encoding for HSECADHP14, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 384 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVN FED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 5 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 1 - 336 of Q9UII7, which also corresponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide 10 having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADHP14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADHP 14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 15 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH_P14. Comparison report between HSECADH_P14 and Q9UII8: 1.An isolated chimeric polypeptide encoding for HSECADH_ P14, comprising a first 20 amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 25 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 1 - 336 of Q9UH8, which also corresponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide 30 having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV WO 2005/116850 PCT/IB2005/002555 385 corresponding to amino acids 337 - 373 of HSECADH_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADHP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH_P14. Comparison report between HSECADHP14 and CADIHUMAN: 1 .An isolated chimeric polypeptide encoding for HSECADHP 14, comprising a first 10 amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN 15 GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 1 - 336 of CADI_HUMAN, which also corresponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a 20 polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADHP14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH_ P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 25 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH_P14. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 30 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide WO 2005/116850 PCT/IB2005/002555 386 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSECADH_P14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the 5 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNP positioi(s):iuinoi .id AlternatIveamino I id(s) j Previously knon n SNP ~sequence~,~ 127 P->T No 141 T->A No 10 Variant protein HSECADHP14 is encoded by the following transcript(s): HSECADH_T 19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADH_T 19 is shown in bold; this coding portion starts at position 125 and ends at position 1243. The transcript also has the following SNPs as listed in 15 Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADHP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNP posMton on nucleotide Altemaic nuclic&acid Previously known SNP? sequience ~ 71 G-> C Yes 469 G-> A Yes 503 C -> A No 545 A-> G No WO 2005/116850 PCT/IB2005/002555 387 847 A-> G No Variant protein HSECADH_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH_T20. 5 An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HSECADH_P15 and Q9UII7: 1.An isolated chimeric polypeptide encoding for HSECADH P15, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT 15 VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding to amino acids 1 - 229 of Q9UII7, which also corresponds to amino acids 1 - 229 of HSECADHP15, and a second amino acid sequence VSIS corresponding to amino acids 230 233 of HSECADHP15, wherein said first and second amino acid sequences are contiguous and 20 in a sequential order. Comparison report between HSECADH_P 15 and Q9UII8: 1.An isolated chimeric polypeptide encoding for HSECADHP 15, comprising a first 25 amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding WO 2005/116850 PCT/IB2005/002555 388 to amino acids 1 - 229 of Q9U1I8, which also corresponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS corresponding to amino acids 230 233 of HSECADH_Pl 5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 Comparison report between HSECADHP15 and CAD IHUMAN: 1.An isolated chimeric polypeptide encoding for HSECADHP 15, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED 10 CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT corresponding to amino acids 1 - 229 of CADI_HUMAN, which also corresponds to amino acids 1 - 229 of HSECADHP15, and a second amino acid sequence VSIS corresponding to amino acids 230 15 233 of HSECADH_P 15, wherein said first and second amino acid sequences are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 20 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSECADH_P15 also has the following non-silent SNPs (Single 25 Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADHP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). 30 Table 13 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 389 SNP position n(s) on aino acid Alternative amno acid(s). P reviously k n own SNP? sequence~ 127 P -> T No 141 T-> A No Variant protein HSECADHP15 is encoded by the following transcript(s): HSECADH_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADHT20 is shown in bold; this coding portion starts at 5 position 125 and ends at position 823. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADHP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 14 - Nucleic acid SNPs SNP position on nucleotideI Alternativ einucleic acid P oreviousy know n NP 71 G -> C Yes 469 G -> A Yes 503 C -> A No 545 A -> G No 955 G -> A Yes As noted above, cluster HSECADH features 30 segment(s), which were listed in Table 2 15 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
WO 2005/116850 PCT/IB2005/002555 390 Segment cluster HSECADH_node_0 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_TI 1, HSECADHT18, HSECADH_T19 and HSECADH_T20. Table 15 below describes the starting and ending position of this segment 5 on each transcript. Table 15 - Segment location on transcripts Traiiscript n1e1 S6gment tarting position Segment ending position HSECADH T1l1 1 166 HSECADH T18 1 166 HSECADH T19 1 166 HSECADHT20 1 166 Segment cluster HSECADH_node_14 according to the present invention is supported by 10 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11, HSECADHT 18, HSECADH_T19 and HSECADHT20. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Tmanscript name Segmnt stating position :SIgment ending Position HSECADH T11 656 811 HSECADH T18 656 811 HSECADH T19 656 811 HSECADH T20 656 811 15 Segment cluster HSECADHnode_15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T20. Table 17 below describes the starting 20 and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 391 Table 17 - Segment location on transcripts Transcript name Segment starting position Segment endmg position HSECADH_T20 812 970 Segment cluster HSECADH_node_21 according to the present invention is supported by 5 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T18 and HSECADHT19. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Transcript name Segmnt starting psito Se n e t HSECADH_T18 957 1132 HSECADH_T19 957 1132 10 Segment cluster HSECADH_node_22 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T19. Table 19 below describes the starting and ending position of this segment on each transcript. 15 Table 19 - Segment location on transcripts Transcript name Segment starting position Segment ending position HSECADH_T19 1133 1269 Segment cluster HSECADH_node_25 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can 20 be found in the following transcript(s): HSECADH_TI18. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 392 Transcript name Segmenh starting position Sgiient ending position HSECADH T18 1133 1261 Segment cluster HSECADH_node_26 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can 5 be found in the following transcript(s): HSECADH_TI 8. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts ranscrpt name Segmnt start po ositionSegment Position HSECADHT18 1262 1584 10 Segment cluster HSECADH_node_48 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T 11. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Tnscrpn ent seh starting position Segment nding pot HSECADHT11 1149 1292 15 Segment cluster HSECADHnode_52 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T1 1. Table 23 below describes the starting 20 and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Transit name 2 Segrient starting position Segmentending position WO 2005/116850 PCT/IB2005/002555 393 HSECADH-TI 1293 1449 Segment cluster HSECADH_node_53 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can 5 be found in the following transcript(s): HSECADH_TI I. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts TInscript naime .Segment starting position SenI pt ending posItionl HSECADHT11 1450 1933 10 Segment cluster HSECADH_node_54 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_TI 1. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts P(', oilrnei"tartingjposition$o. t M HSECADHT11 1934 2053 15 Segment cluster HSECADH_node_57 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T1 1. Table 26 below describes the starting 20 and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Trnscript name i Segent staig psiitioil Segmhent ending pos to HSECADH T11 2241 2430 WO 2005/116850 PCT/IB2005/002555 394 Segment cluster HSECADH_node_60 according to the present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T1 1. Table 27 below describes the starting 5 and ending position of this segment on each transcript. Table 27 - Segment location on transcripts Transcti Cpt natepc ScQ.- -- HSECADH T11 2504 3096 Segment cluster HSECADH_node_62 according to the present invention is supported by 10 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_TI 1. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts hb$ranscrlo.t name Sc7-eit Sggmft en dinhgpositi0nw HSECADHT11 3097 3245 15 Segment cluster HSECADH_node_63 according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11. Table 29 below describes the starting and ending position of this segment on each transcript. 20 Table 29 - Segment location on transcripts Trascript name Segiient starting position Segment ending- position n p HSECADH T11 3246 3544 WO 2005/116850 PCT/IB2005/002555 395 Segment cluster HSECADH_node_7 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_Ti 1, HSECADH_T18, HSECADH_TI9 and HSECADH_T20. Table 30 below describes the starting and ending position of this segment 5 on each transcript. Table 30 - Segment location on transcripts Trar rc R name Segment starting position Sm ent ndi i ion HSECADH T11 288 511 HSECADH T18 288 511 HSECADH TI9 288 511 HSECADH T20 288 511 10 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSECADHnode_1 according to the present invention can be found in 15 the following transcript(s): HSECADH_T 11, HSECADH_T18, HSECADHT19 and HSECADH_T20. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts TranScrt name Segmnt strain o Segment ending position HSECADH T11 167 172 HSECADH T18 167 172 HSECADHT19 167 172 HSECADHT20 167 172 20 WO 2005/116850 PCT/IB2005/002555 396 Segment cluster HSECADH_node_ I 1 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11, HSECADH_TI8, HSECADH_T19 and HSECADHT20. Table 32 below describes the starting and ending position of this segment 5 on each transcript. Table 32 - Segment location on transcripts PtiareSg t4P a Positiono~u rHSECADHr I 512 592 i HSECADH T18 512 592 HSECADH T189 512 592 HSECADH T920 512 592 HSECADH T20 512 592 Segment cluster HSECADH_node_I 2 according to the present invention is supported by 10 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11, HSECADH_T18, HSECADH_T19 and HSECADH_T20. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts HSECADH T11 593 655 HSECADH T18 593 655 HSECADHT19 593 655 HSECADHT20 593 655 15 Segment cluster HSECADH_node_17 according to the present invention can be found in the following transcript(s): HSECADHT 11, HSECADH_T18 and HSECADH_T19. Table 34 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 397 Table 34 - Segment location on transcripts Transcript name Segment starting position Segment ending positioI HSECADH_T1 812 827 HSECADH TI8 812 827 HSECADH_T 19 812 827 Segment cluster HSECADHnode_1 8 according to the present invention is supported by 5 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T 1, HSECADHTI18 and HSECADHT19. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts I r.nscrpt name. Se e stating IIIosti c, Segmenten diI poiti HSECADH _Tl 1 828 944 HSECADH T18 828 944 HSECADH T 19 828 944 10 Segment cluster HSECADHnode_19 according to the present invention can be found in the following transcript(s): HSECADH_T18 and HSECADH_T19. Table 36 below describes the starting and ending position of this segment on each transcript. 15 Table 36 - Segment location on transcripts ''Panrt inTam Seginnt startmg position Segnet ending position HSECADHT18 945 956 HSECADH_T19 945 956 WO 2005/116850 PCT/IB2005/002555 398 Segment cluster HSECADHnode_3 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_Ti 1, HSECADH_T18, HSECADHTI 9 and HSECADH_T20. Table 37 below describes the starting and ending position of this segment 5 on each transcript. Table 37 - Segment location on transcripts ~~~~'ie cml- pSwhit ~piin Transc~kript na1me Segiment statrtingo position Semn eili 0po 1to HSECADH T11 173 287 HSECADH T18 173 287 HSECADH T19 173 287 HSECADH T20 173 287 Segment cluster HSECADH_node_42 according to the present invention is supported by 10 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_Tl 1. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts Trans crip name1 11) Segmnt starting posfi)in Segmntt eningM postio HSECADHT11 945 1017 15 Segment cluster HSECADH_node_45 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADHT1 1. Table 39 below describes the starting and ending position of this segment on each transcript. 20 Table 39 - Segment location on transcripts Trrcnsri mn ae Segment stirtig position S~egm(:et eing position; HSECADH T11 1018 1051 WO 2005/116850 PCT/IB2005/002555 399 Segment cluster HSECADH_node 46 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can 5 be found in the following transcript(s): HSECADH-_T11I. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts Transcript ame eg ntstaigp Segmet~dingoitio HSECADHTI 1 1052 1148 10 Segment cluster HSECADH node_55 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T1 1. Table 41 below describes the starting and ending position of this segment on each transcript Table 41 - Segment location on transcripts Transcript name Segment styling potion Sgnt ending position HSECADHT11 2054 2166 15 Segment cluster HSECADHnode 56 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADHT1 1. Table 42 below describes the starting 20 and ending position of this segment on each transcript. Table 42 - Segment location on transcripts Transciipt ame Se1It starting position Segnent ending position' HSECADHT11 2167 2240 WO 2005/116850 PCT/IB2005/002555 400 Segment cluster HSECADH node_58 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_Ti I. Table 43 below describes the starting and ending position of this segment on each transcript. 5 Table 43 - Segment location on transcripts Trascrpt ameSegment startiing position Semn endingpositionL HSECADH T I1 2431 2481 Segment cluster HSECADH_node_59 according to the present invention can be found in the following transcript(s): HlSECADH_T11. Table 44 below describes the starting and ending 10 position of this segment on each transcript. Table 44 - Segment location on transcripts TranSCrp1t name begnent starting position Sgmnt ending position HSECADHTl1 2482 2503 15 Variant protein alignment to the previously known protein: Sequence name: /tmp/2xOI2XZlA3/JXvUszCm30:Q9UII7 20 Sequence documentation: Alignment of: HSECADH P9 x Q9UII7 Alignment segment 1/1: 25 WO 2005/116850 PCT/IB2005/002555 401 Quality: 2727.00 Escore: 0 Matching length: 274 Total length: 274 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 l I I II 1 .1 } I l i I 1 1 11 1 1 1 1 1 l l 1 11 1 1 1 i I I I I 15 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 i iI I I liii II I 1 i 1111 111 IIIll l liii lfII i II 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 20 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 liiIIli l l I l f fl I ll l i l l I I I l I I t11 1I 1111 1111111 1 1 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 25 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I IIII I 1111 I I 11111 I I 1111 1111 II IIl~l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 3 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1i 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 WO 2005/116850 PCT/IB2005/002555 402 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 Il lI l l l l l l l I l l l l l l l l I 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 5 10 Sequence name: /tmp/2x0I2XZlA3/JXvUszCm30:Q9UII8 Sequence documentation: 15 Alignment of: HSECADHP9 x Q9UIIS8 Alignment segment 1/1: Quality: 2727.00 20 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 25 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 30 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 WO 2005/116850 PCT/IB2005/002555 403 I l I Il I l iii II 111111lll lll 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 5 liili ll Ilill lli llI1lllIl ll ill ll 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 illl IIlil il Ill illllll illIlllil lllillil lIllil il 10 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 15 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 1111111111 lililllIl I IIII IIIIIIIIIiiilll 11111111 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 20 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 25 Sequence name: /tmp/2xOI2XZlA3/JXvUszCm30:CAD1 HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 404 Alignment of: HSECADH P9 x CAD1 HUMAN Alignment segment 1/1: 5 Quality: 2727.00 Escore: 0 Matching length: 274 Total length: 274 10 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I l l I l l l 1 l i l l l l l 1 1 1 1 l l l l l l l ll l l l l l l1 II I I I I l l l I 20 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 IllilllllllilllllllIIIillIIllIllIIIIIIIIIIIII 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 25 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I l llI I I l l l l l l l l l I I I l l I I I I I I I I I I I I I I I I I I l l 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 30 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I l l l l l l l l l l l l l l l l l l 1 1 l l l l l l l l l l l l l l lI I I I II l l l l l WO 2005/116850 PCT/IB2005/002555 405 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 11lIIIIllillllilllilllilllllIlllilllIlllllill 5 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 IllllllJlllilllIIlllIflli 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 10 15 Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl:Q9UII7 Sequence documentation: 20 Alignment of: HSECADH P13 x Q9UII7 Alignment segment 1/1: Quality: 3720.00 25 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 30 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 406 Gaps: 0 Alignment: 5 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 II l i I I I I I 11 1l l1II I I I1l1II I I I I II I I l l l l I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 1 0 I I I II l I I l l Il I I I I l l l l I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I1 I I I l l i i I I I I IIII II I I I I I I I I I I III I I I l l 15 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I Il l l l l II I I I I I I II I I I I I I lI I lII Il l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 20 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 25 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I l l I l l I III l l l l l I I I I l I I I l I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350 3 0 I I I I l I l l l II I I I IIl I I l I I Il 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350 WO 2005/116850 PCT/IB2005/002555 407 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 Il li l l I l l I li l l II I l l1 l l l I I I 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 5 10 Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl:Q9UII8 Sequence documentation: 15 Alignment of: HSECADHP13 x Q9UII8 Alignment segment 1/1: Quality: 3720.00 20 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 25 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 30 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 WO 2005/116850 PCT/IB2005/002555 408 I I I l I lI I I I I I l l l l ll I I I i l l l l l l l l l I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 5 I I 1I I I IIII I I ll i i I I I I I ll I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I111 11 I I I I I I I I I l l I I I I l I I I lI I l l l I III 10 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I l lI I I I I1 Il i l l l Il l l I l l I I I I I l l l l l lI I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 15 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I I I I I ll l l I II I I l l I I I l l I I l i l l l I l I i l l l I l 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 20 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I l l l l l l l l I II I I I l l l l l l l l l l I l l i I l l II 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350 25 I l I l lIl l ill Illl ll I III Ilill II III 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 I l31 1 1L ll Il ll lllllIII l Ill I 30 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 WO 2005/116850 PCT/IB2005/002555 409 5 Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl:CADl HUMAN Sequence documentation: 10 Alignment of: HSECADH P13 x CAD1 HUMAN Alignment segment 1/1: Quality: 3720.00 15 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 20 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 25 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 30 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l i l l l l l l l l l l WO 2005/116850 PCT/1B2005/002555 410 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIFQAELLTFPNSSPGLR 150 5 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 151 RQKRDWVIPPISCPENEKGPE'PKNLVQIKSNKDKEGKVFYSITGQGADTP 200 10 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILT 250 201 PVGVFITERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMETLT 250 15 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRBSFPTYTLVVQAADL 350 20 301 AYTILSQDPELPDKNMFTINRNTGVTSVVTTGLDRESFPTYTLVVQAADL 350 351 QGEGLSTTATAVITVTDTNDNPPTFNPTT 379 25 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 30 WO 2005/116850 PCT/IB2005/002555 411 Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU:Q9UII7 Sequence documentation: 5 Alignment of: HSECADHP14 x Q9UII7 Alignment segment 1/1: Quality: 3313.00 10 Escore: 0 Matching length: 336 Total length: 336 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 15 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 20 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I l l l I I I I Il l l l l l l l I I I l i l l l l l l I Il l l l l l l I I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 25 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I l I I I I I I I l l I I I I I I I I I I I I I l l l l lI I l lI I I I I l 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 3 0 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 WO 2005/116850 PCT/IB2005/002555 412 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I l l l lI l l l l l l l l lI II IIl lI l l l l l l11 1 1 1111111 II 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 5 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I I I I I I I I III llIl l 1 l l I I l li 1 l ll I I I I I I I I I I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 10 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I l ll I I I l l I l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 1 5 I I l l l l l l l l l l l l lll l l lI I I I I I I I I I l l 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 20 Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU:Q9UII8 25 Sequence documentation: Alignment of: HSECADHP14 x Q9UII8 Alignment segment 1/1: 30 WO 2005/116850 PCT/IB2005/002555 413 Quality: 3313.00 Escore: 0 Matching length: 336 Total length: 336 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 l lI l l l l I I l lI l ii l ll l l l l II II l l il l l l l l If l l i i 15 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 l li 1 1 l l l I I I l l l I I I I Ill l l l l 1 l i l li lI i ll l ll l 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 20 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I l l lI l I I li l i l ll l lI l ll I I I l l l lli l l I I l li l li 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 25 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 IIl I l liIl I I l l 11 l l l l l lll l i liiil li l lI I I i Il l l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 3 0 I I l ll l l i l l i l I l l l ll l l l ll l l l l l l l l l ll l il l I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 WO 2005/116850 PCT/IB2005/002555 414 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 l l li i I I l l l l l l I I I I l l I I l l 1 l i llI Il l l l l l l I Il l I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 5 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 IlllllllllllllillllIIllllllIIIIII 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 10 15 Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU:CAD1 HUMAN Sequence documentation: Alignment of: HSECADH P14 x CAD1 HUMAN 20 Alignment segment 1/1: Quality: 3313.00 Escore: 0 25 Matching length: 336 Total length: 336 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 30 Identity: 100.00 Gaps: 0 WO 2005/116850 PCT/IB2005/002555 415 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 5 IIIIl lII 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I 1 l l 1 III I I I I l i I I I IIl l l I i I1 l l l 10 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I I I I l l l l l l l l l I I I I IIl l 1I I I I l I l l ll1 1l II I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 15 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 II I I l l l I l l I l II 1 1I I I I II I I I I I l l l l l l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 20 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 111 111I1 111 11111 11l1 11111 I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 25 |lII 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 30 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 WO 2005/116850 PCT/IB2005/002555 416 5 Sequence name: /tmp/rMRrwmuokD/lrmk2jOfgw:Q9UII7 Sequence documentation: 10 Alignment of: HSECADH P15 x Q9UII7 Alignment segment 1/1: Quality: 2289.00 15 Escore: 0 Matching length: 229 Total length: 229 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 20 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 25 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I l l l l I I I l l l l l lll l l l l l l l l l l l l l l l l l lI I I l l l l l l l l il 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 30 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I l ll I l l l l l l l I I I l l l l l l l l I l l l l l l l l l l l l l l l l l l WO 2005/116850 PCT/IB2005/002555 417 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I Il l l l l l II l l i I l l l ll I I I I l I I l l l lI I I I I I I 5 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I l l i l l l I I I I I I l l il II l l l l I I1 l l l I I I I I I I lI I l l l I Il 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 10 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 I I l l l l lll l I I I l l i I I l ll 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 15 20 Sequence name: /tmp/rMRrwmuokD/lrmk2jOfgw:Q9UII8 Sequence documentation: Alignment of: HSECADH P15 x Q9UII8 25 Alignment segment 1/1: Quality: 2289.00 Escore: 0 30 Matching length: 229 Total length: 229 WO 2005/116850 PCT/IB2005/002555 418 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 1 0 I I I I I I I I I l l l i I l ll l l l1 l l l lI I I l l l I Ill I l l l l l I I I l li 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I ll i I I I I I l l l lI I I I I I I I Il l l l l l l lI I I I I I l I l l I I I 15 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 IlllilllllllllllllllllllIIlllllllllllllllill 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 20 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I l l l I l l l l iI I II l l l I I I l l l l l lI I I I l l l l lI I I I l lI l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 25 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 IllllllllllllIIIllllllIIIIIIII 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 30 WO 2005/116850 PCT/IB2005/002555 419 Sequence name: /tmp/rMRrwmuokD/lrmk2jOfgw:CAD1_HUMAN 5 Sequence documentation: Alignment of: HSECADH P15 x CAD1 HUMAN Alignment segment 1/1: 10 Quality: 2289.00 Escore: 0 Matching length: 229 Total length: 229 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 20 Alignment: 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I11 I I I I IIli ll ll Illl ll l l I I I I1 ll l l ll l l l l l l l l l l l l l lll 25 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 IIIIillllll111Illlll1111111ill111111111111 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 30 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 WO 2005/116850 PCT/IB2005/002555 420 IIIIllilllllllllllllllIIllllllllllllllllllll 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 5 l l l l I l I l l I l l l l l l i l l I l l i l l l l l l l l l l l 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 IIIIIIIIIIIIIIIllIlllllllIll 10 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 DESCRIPTION FOR CLUSTER HUMGRP5E 15 Cluster HUMGRP5E features 2 transcript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest TrIanscipt Name o SEQ ID NO HUMGRP5E T4 100 HUMGRP5ET5 101 20 Table 2 - Segments of interest Segment Name SFQID No: HUMGRP5E node_0 102 HUMGRP5E node_2 103 HUMGRP5E node 8 104 HUMGRP5E node 3 105 HUMGRP5E node 7 106 WO 2005/116850 PCT/IB2005/002555 421 Table 3 - Proteins of interest Protein Name t SEQ ID NO: HUMGRP5E P4 108 HUMGRP5E_P5 109 These sequences are variants of the known protein Gastrin-releasing peptide precursor (SwissProt accession identifier GRPHUMAN; known also according to the synonyms GRP; 5 GRP- 10), SEQ ID NO: 107, referred to herein as the previously known protein. Gastrin-releasing peptide is known or believed to have the following function(s): stimulates gastrin release as well as other gastrointestinal hormones. The sequence for protein Gastrin-releasing peptide precursor is given at the end of the application, as "Gastrin-releasing peptide precursor amino acid sequence". Known polymorphisms for this sequence are as shown 10 in Table 4. Table 4 - Amino acid mutations for Known Protein SN1 psaiol(,)ol Commifent 4 S -> R Protein Gastrin-releasing peptide localization is believed to be Secreted. 15 The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Diabetes, Type II. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential 20 pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Bombesin antagonist; Insulinotropin agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) WO 2005/116850 PCT/IB2005/002555 422 that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anorectic/Antiobesity; Releasing hon-rmone; Anticancer; Respiratory; Antidiabetic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; neuropeptide signaling pathway, which are 5 annotation(s) related to Biological Process; growth factor, which are annotation(s) related to Molecular Function; and secreted, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 10 As noted above, cluster HUMGRP5E features 2 transcript(s), which were listed in Table I above. These transcript(s) encode for protein(s) which are variant(s) of protein Gastrin-releasing peptide precursor. A description of each variant protein according to the present invention is 15 now provided. Variant protein HUMGRP5EP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGRP5E_T4. An alignment is given to the known protein (Gastrin-releasing peptide precursor) at the end of 20 the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMGRP5E_P4 and GRPHUMAN: 25 1.An isolated chimeric polypeptide encoding for HUMGRP5EP4, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK corresponding to amino acids 1 - 127 of GRPHUMAN, which also 30 corresponds to amino acids 1 - 127 of HUMGRP5E_P4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ corresponding to amino acids 135 - WO 2005/116850 PCT/IB2005/002555 423 148 of GRP_HUMAN, which also corresponds to amino acids 128 - 141 of HUMGRP5E_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMGRP5E_P4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 5 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 15 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMGRP5E_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 20 the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations SNP position(s) on amino acid Altcrnati c anino acid(s) PIreviously known SNP 4 S -> R Yes 25 Variant protein HUMGRP5E_P4 is encoded by the following transcript(s): HUMGRP5E_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGRP5E_T4 is shown in bold; this coding portion starts at WO 2005/116850 PCT/IB2005/002555 424 position 622 and ends at position 1044. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EP4 sequence provides support for the deduced 5 sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs SNP position On nuacleotide Alenutive nuc leic acid P$reiously knmen SNP Jsequencee>"7 ~ 541 -> T No 542 G -> T No 631 A -> C Yes 672 G ->A Yes 1340 C-> No 1340 C ->A No 1341 A-> No 1341 A ->G No Variant protein HUMGRP5E_P5 according to the present invention has an amino acid 10 sequence as given at the end of the application; it is encoded by transcript(s) HUMGRP5E_T5. An alignment is given to the known protein (Gastrin-releasing peptide precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 15 Comparison report between HUMGRP5EP5 and GRPHUMAN: 1 .An isolated chimeric polypeptide encoding for HUMGRP5EP5, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG 20 ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED WO 2005/116850 PCT/IB2005/002555 425 SSNFKDVGSKGK corresponding to amino acids I - 127 of GRP_HUMAN, which also corresponds to amino acids I - 127 of HUMGRP5E_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most prefemrably at least 95% homologous to a polypeptide having the sequence 5 DSLLQVLNVKEGTPS corresponding to amino acids 128 - 142 of HUMGRP5E_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMGRP5E_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 10 sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 15 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMGRP5E_P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the 20 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5E_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations SNPI position(s) on amo acid Alternative amino acid(.s) Preously known SNP'-) ~sequence~ 4 S->R Yes 25 Variant protein HUMGRP5E_P5 is encoded by the following transcript(s): HUMGRP5E_T5, for which the sequence(s) is/are given at the end of the application. The WO 2005/116850 PCT/IB2005/002555 426 coding portion of transcript HUMGRP5ET5 is shown in bold; this coding portion starts at position 622 and ends at position 1047. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 5 known SNPs in variant protein HUMGRP5EP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs SNP positions on nucleotide Al1ternat1ivenucleic acid Previously known SNP? 541 -> T No 542 G -> T No 631 A -> C Yes 672 G ->A Yes 1354 C-> No 1354 C ->A No 1355 A-> No 1355 A ->G No As noted above, cluster HUMGRP5E features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) 10 are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMGRP5E_node_0 according to the present invention is supported by 15 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5ET4 and HUMGRP5ET5. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts Transcript name ~ Segment starting position Segmunt ending position WO 2005/116850 PCT/IB2005/002555 427 HUMGRP5E T4 1 760 HUMGRP5E T5 1 760 Segment cluster HUMGRP5E_node_2 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can 5 be found in the following transcript(s): HUMGRP5E_T4 and HUMGRP5E_T5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts Transcript, name Segment ,starting position Segment ending position HUMGRP5E T4 761 984 HUMGRP5E T5 761 984 10 Segment cluster HUMGRP5E_node_8 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5E_T4 and HUMGRP5E_T5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts Transcript wnime Segment starting position Segment endingp HUMGRP5ET4 1004 1362 HUMGRP5E T5 1018 1376 15 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HUMGRP5Enode_3 according to the present invention can be found in 20 the following transcript(s): HUMGRP5E_T4 and HUMGRP5ET5. Table 12 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 428 Table 12 - Segment location on transcripts Transcript name Segment starting position Sgr-ient ending position HUMGRP5E T4 985 1003 HUMGRP5E T5 985 1003 Segment cluster HUMGRP5E_node_7 according to the present invention can be found in 5 the following transcript(s): HUMGRP5ET5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Trannt ame Segment startingpositio 'emenit enSding psitioni HUMGRP5ET5 1004 1017 10 Variant protein alignment to the previously known protein: Sequence name: /tmp/412zs2mwyT/B0wjOUAXOd:GRP HUMAN Sequence documentation: 15 Alignment of: HUMGRP5E P4 x GRP HUMAN Alignment segment 1/1: 20 Quality: 1291.00 Escore: 0 Matching length: 141 Total length: 148 Matching Percent Similarity: 100.00 Matching Percent 25 Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 429 Total Percent Similarity: 95.27 Total Percent Identity: 95.27 Gaps: 1 5 Alignment: 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 IIIIllllllIIIIIIIIIIlllllIllllillllllllilllIll 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 10 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 I I I I I I l I Il I I I I I I I I I I I I I I l l l l llI Il llII I I III l 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 15 101 PKALGNQQPSWDSEDSSNFKDVGSKGK....... GSQREGRNPQLNQQ 141 Il l l l l l l l i ll l i i I I I l l l l l l I I I I I I I I I I I lI II 101 PKALGNQQPSWDSEDSSNFKDVGSKGKVGRLSAPGSQREGRNPQLNQQ 148 20 Sequence name: /tmp/lme91dnvfv/KbP5io8PtU:GRP HUMAN 25 Sequence documentation: Alignment of: HUMGRP5E P5 x GRP HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 430 Quality: 1248.00 Escore: 0 Matching length: 127 Total length: 127 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 15 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 I l l l l il l I l ll l ill ll I IIl I I I l l I lII I I I I l l l I I II II I I 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 20 101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127 I I I I I I I I I I I I I I I I I I I I 101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127 25 Expression of GRPHUMAN - gastrin-releasing peptide HUMGRP5E transcripts which are detectable by amplicon as depicted in sequence name HUMGRP5Ejunc3-7 in normal and cancerous ovary tissues Expression of GRP_HUMAN - gastrin-releasing peptide transcripts detectable by or 30 according to junc3-7, HUMGRP5Ejunc3-7 amplicon(s) and HUMGRP5Ejunc3-7F and HUMGRP5Ejunc3-7R primers was measured by real time PCR. In parallel the expression of WO 2005/116850 PCT/IB2005/002555 431 four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD amplicon), HPRTI (GenBank Accession No. NM_000194; amplicon - HPRTI-ampliconand SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT 5 sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample No 45-48, 71 Table I above, "Tissue samples in testing panel"), to obtain a value of fold up regulation for each sample relative to median of the normal PM samples. 10 Figure 13 is a histogram showing over expression of the above-indicated GRPHUMAN - gastrin-releasing peptide transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained). As is evident from Figure 13, the expression of GRPHUMAN gastrin-releasing peptide transcripts detectable by the above amplicon(s) in several cancer 15 samples was higher in several cancerous samples than in the non-cancerous samples (Sample No. 45, 47-48, 71Table I above, "Tissue samples in testing panel") and including benign samples (samples No. 57-62 Table 1 above, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 13 out of 43 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present 20 invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: HUMGRP5Ejunc3-7F forward primer; and HUMGRP5Ejunc3-7R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon 25 was obtained as a non-limiting illustrative example only of a suitable amplicon: HUMGRP5Ejunc3-7. HUMGRP5Ejunc3-7F (SEQ ID NO:965) ACCAGCCACCTCAACCCA 30 HUMGRP5Ejunc3-7R (SEQ ID NO:966)
CTGGAGCAGAGAGTCTTTGCCT
WO 2005/116850 PCT/IB2005/002555 432 HUMGRP5Ejunc3-7 (SEQ ID NO:967) ACCAGCCACCTCAACCCAAGGCCCTGGGCAATCAGCAGCCTTCGTGGGATTCAGAG GATAGCAGCAACTTCAAAGATGTAGGTTCAAAAGGCAAAGACTCTCTGCTCCAG 5 Expression ofGRP HUMAN - gastrin-releasing peptideHUMGRP5E transcripts, which are detectable by amplicon as depicted in sequence name HUMGRP5E junc3-7 in different normal tissues. 10 Expression of GRPHUMAN - gastrin-releasing peptide transcripts detectable by or according to HUMGRP5E junc3-7 amplicon(s) and HUMGRP5E junc3-7F and HUMGRP5E junc3-7R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; 15 amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35 above), to obtain a value of relative expression of each 20 sample relative to median of the breast samples. The results are described in Figure 14, presenting the histogram showing the expression of HUMGRP5E transcripts, which are detectable by amplicon as depicted in sequence name HUMGRP5E junc3-7, in different normal tissues. Primers and amplicons are as above. 25 DESCRIPTION FOR CLUSTER R1 1723 Cluster R1 1723 features 6 transcript(s) and 26 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Nar SEQ ID NO:
N
WO 2005/116850 PCT/IB2005/002555 433 R11723_PEA 1 T15 110 Rl1723_PEA 1 TI7 Il R11723 PEA 1 TI9 112 R11723 PEA 1_T20 113 Rl 1723_PEA 1_T5 114 R11723 PEA 1 T6 115 Table 2 - Segments of interest LSemiit Nine SQ ID NO R11723_PEA_1 node_13 116 R11723 PEA 1 node 16 117 R11723_PEA 1 node 19 118 R11723_PEA _1node 2 119 R11723_PEA 1_node 22 120 R11723_PEAI lnode 31 121 R11723 PEA 1 node 10 122 R11723_PEA _I node 11 123 R11723 PEA 1_node 15 124 R11723_PEA 1 node 18 125 R11723_PEA 1 node 20 126 R11723_PEA 1 node 21 127 R11723 PEA 1 node 23 128 R11723 PEA 1_node 24 129 R11723 PEA 1 node 25 130 R11723_PEA 1_node 26 131 RI 1723 PEA 1 node_27 132 Rl1723 PEA 1 node 28 133 R11723 PEA 1 node 29 134 R11723_PEA 1 node 3 135 WO 2005/116850 PCT/IB2005/002555 434 R11723 PEA 1 node 30 136 R11723 PEA 1 node 4 137 R11723 PEA 1 node 5 138 R11723 PEA 1 node 6 139 R11723 PEA 1 node 7 140 RI 1723 PEA 1 node 8 141 Table 3 - Proteins of interest [Prtein Niie SEQ ID No: R11723 PEA 1 P2 142 R11723 PEA 1 P6 143 Rl 1723 PEA 1 P7 144 R11723 PEA 1 P13 145 R11723_PEA 1 P10O 146 5 Cluster R1 1723 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 15refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to 10 the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 15 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors 15 from different tissues and kidney malignant tumors. Table 4 - Normal tissue distribution WO 2005/116850 PCT/IB2005/002555 435 Name of Tissue Number adrenal 0 brain 30 epithelial 3 general 17 head and neck 0 kidney 0 lung 0 breast 0 ovary 0 pancreas 10 skin 0 uterus 0 Table 5 - P values and ratios for expression in cancerous tissue Namie ofT Tssue PI P2 sP1 R3 SP2 R4 adrenal 4.2e-01 I4.6e-01 4.6e-01 2.2 5.3e-01 1.9 brain 2.2e-01 2.0e-01 1.2e-02 2.8 5.0e-02 2.0 epithelial 3.0e-05 6.3e-05 1.8e-05 6.3 3.4e-06 6.4 general 7.2e-03 4.0e-02 1.3e-04 2.1 1.1le-03 1.7 head and neck 1 5.0e-O1 1 1.0 7.5e-01 1.3 kidney 1.5e-01 2.4e-01 4.4e-03 5.4 2.8e-02 3.6 lung 1.2e-O1 1.6e-01 1 1.6 1 1.3 breast 5.9e-01 4.4e-01 1 1.1 6.8e-01 1.5 ovary 1.6e-02 1.3e-02 1.Oe-01 3.8 7.0e-02 3.5 pancreas 5.5e-01 2.0e-01 3.9e-01 1.9 1.4e-01 2.7 skin 1 4.4e- 01 1 1.0 1.9e-02 2.1 uterus 1.5e-02 5.4e-02 1.9e-01 3.1 1.4e-O1 2.5 WO 2005/116850 PCT/IB2005/002555 436 It should be noted that the variants of this cluster are variants of the hypothetical protein PSEC0 181 (referred to herein as "PSEC"). Furthermore, use of the known protein (WT protein) for detection of ovarian cancer, alone or in combination with one or more variants of this cluster 5 and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention. As described in greater detail below, in ovarian cancer, the variants of the present invention show a similar expression patter to that of PSEC, except that at least one variant shows greater overexpression than PSEC in ovarian cancer. 10 As noted above, cluster R1 1723 features 6 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided. Variant protein RI 1723_PEA 1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 15 RI 1723_PEAI T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither 20 trans-membrarne region prediction program predicts that this protein has a trans-membrane region.. Variant protein R1 1723_PEA_ 1_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 25 the SNP is known or not; the presence of known SNPs in variant protein R1 1723 PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations SNP postion(s).on amino acid Alternative amio acid(s) i&eviouslyknowI S P sequence WO 2005/116850 PCT/IB2005/002555 437 107 H ->P Yes 70 G -> No 70 G -> C No Variant protein R1 1723_PEA 1 P2 is encoded by the following transcript(s): R1 1723_PEA_1 T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R1 1723_PEAIT6 is shown in bold; this coding portion starts at 5 position 1716 and ends at position 2051. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 7 - Nucleic acid SNPs SNP p~ositIon on nuICceotideC Ailterna'tive nuicic acid Previoul known SNP? sequence~ 1231 C -> T Yes 1278 G-> C Yes 1923 G-> No 1923 G ->T No 2035 A ->C Yes 2048 A ->C No 2057 A ->G Yes Variant protein R1 1723_PEAlP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 15 R1 1723_PEA_1_T15. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: WO 2005/116850 PCT/IB2005/002555 438 Comparison report between RI 1723_PEA_1_P6 and Q81XMO (SEQ ID NO:968): 1 .An isolated chimeric polypeptide encoding for RI 1723 PEA_I_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having 5 the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR corresponding to amino acids 1 - 110 of R1 1723_PEA_1 P6, and a second amino acid sequence being at least 90 % homologous to 10 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 1 - 112 of Q8IXMO, which also corresponds to amino acids 111 222 of R1 1723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 15 2.An isolated polypeptide encoding for a head of RI 1723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV 20 MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R11723 PEA 1 P6. Comparison report between RI 1723_PEA_l_P6 and Q96AC2 (SEQ ID NO:969): 1.An isolated chimeric polypeptide encoding for R1 1723_PEA 1 P6, comprising a first 25 amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q96AC2, which also corresponds to amino acids 1 - 83 of R1 1723_PEA 1_ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 30 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL
WO 2005/116850 PCT/IB2005/002555 439 RGI-IRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of R1 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 2.An isolated polypeptide encoding for a tail of RI 1723_PEA 1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL 10 RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA_1 P6. Comparison report between RI 1723_PEAlP6 and Q8N2G4 (SEQ ID NO:970): I.An isolated chimeric polypeptide encoding for RI 1723_PEA 1_P6, comprising a first 15 amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q8N2G4, which also corresponds to amino acids 1 - 83 of R 1723_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 20 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of R1 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in 25 a sequential order. 2.An isolated polypeptide encoding for a tail of R1 1723_PEA 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 30 SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL WO 2005/116850 PCT/IB2005/002555 440 RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CH-INNQPWADTSRRERQRKEKIHSMRTQ in RI 1723 PEA 1 P6. Comparison report between RI 1723_PEA lP6 and BAC85518 (SEQ ID NO:971): 5 1.An isolated chimeric polypeptide encoding for RI 1723_PEA 1_P6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24 - 106 of BAC85518, which also corresponds to amino acids 1 - 83 of R1 1723_PEA_1 P6, and a second amino acid 10 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of 15 RI 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R1 1723_PEA 1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 20 sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA 1_P6. 25 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane 30 region prediction program predicts that this protein has a trans-membrane region..
WO 2005/116850 PCT/IB2005/002555 441 Variant protein RI 1723_PEA_1 P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R1 1723_PEA 1 _P6 5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations SNP positol() Ion inin6 I c d AItrinative a1io i acid(s) PI vou
S
l kn'1ow SNP? ' sequence __ 180 G-> No 180 G ->C No 217 H -> P Yes Variant protein R1 1723_PEA_1_P6 is encoded by the following transcript(s): 10 Ri 1723_PEA 1_T15I, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R1 1723_PEA 1 T15 is shown in bold; this coding portion starts at position 434 and ends at position 1099. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 15 known SNPs in variant protein RI 1723_PEA 1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs SNP position on nuclcotide Alternate nuCICI cId Previously knownSP1 sequence 971 G -> No 971 G -> T No 1083 A -> C Yes 1096 A->C No 1105 A ->G Yes WO 2005/116850 PCT/IB2005/002555 442 Variant protein RI 1723_PEA 1 P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R1 1723_PEA 1_T17. One or more alignments to one or more previously published protein 5 sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R1 1723_PEAIP7 and Q96AC2: 1.An isolated chimeric polypeptide encoding for RI 1723 PEA 1 P7, comprising a first 10 amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 1 - 64 of Q96AC2, which also corresponds to amino acids 1 - 64 of RI 1723_PEA 1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 15 preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R1 1723_PEA_1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723_PEAlP7, comprising a 20 polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723_PEAlP7. Comparison report between R1 1723_PEAlP7 and Q8N2G4: 1.An isolated chimeric polypeptide encoding for R1 1723_PEA_1_P7, comprising a first 25 amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 1 - 64 of Q8N2G4, which also corresponds to amino acids 1 - 64 of RI 1723_PEA_1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 30 preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of WO 2005/116850 PCT/IB2005/002555 443 RI 1723_PEA I P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723 PEA 1 P7. Comparison report between R1 1723_PEA 1 P7 and BAC85273 (SEQ ID NO:972): 1.An isolated chimeric polypeptide encoding for R1 1723_PEA lP7, comprising a first 10 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of R1 1723_PEA_1 P7, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG 15 corresponding to amino acids 22 - 80 of BAC85273, which also corresponds to amino acids 6 64 of R1 1723_PEA 1 P7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of 20 R1 1723_PEA 1 P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R1 1723_PEA 1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 25 sequence MWVLG of R1 1723 PEA 1 P7. 3.An isolated polypeptide encoding for a tail of R1 1723_PEA 1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723_PEA 1 P7. 30 WO 2005/116850 PCT/IB2005/002555 444 Comparison report between R 1723 PEA 1 P7 and BAC85518: 1.An isolated chimeric polypeptide encoding for RI 1723_PEA_ I P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV 5 MEQSAG corresponding to amino acids 24 - 87 of BAC85518, which also corresponds to amino acids 1 - 64 of RI 1723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of 10 R 11723_PEA 1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723_PEA 1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 15 sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723 PEA_1_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein R1 1723_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the 25 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R 11723_PEA_1 P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 445 SNP'position(s)on anno acid Alternative amino acid(s) Previolsly known SNP? ~sequence 67 C -> S Yes Variant protein R1 1723 PEA 1 P7 is encoded by the following transcript(s): RI 1723_PEA 1_TI7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript RI 1723_PEA_1 TI7 is shown in bold; this coding portion starts at 5 position 434 and ends at position 712. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723_PEA 1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 11 - Nucleic acid SNPs SNP p~osition on nuLcIkide Alternative nuIclI(:c acid Precviously kniown- SNP?. 625 G->T Yes 633 G->C Yes 1303 C ->T Yes Variant protein R 11723_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 19 and 15 R1 1723_PEAlT5. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R1 1723_PEA 1 P13 and Q96AC2: 1.An isolated chimeric polypeptide encoding for R1 1723_PEA 1_ P13, comprising a first 20 amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino WO 2005/116850 PCT/IB2005/002555 446 acids 1 - 63 of R1 1723 PEA 1 PI3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT corresponding to amino acids 64 - 84 of 5 R11723_PEA 1 P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 10 sequence DTKRTNTLLFEMRHFAKQLTT in RI 1723_PEA 1_P13. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 15 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein R1 1723_PEA_1_P13 is encoded by the following transcript(s): 20 R1 1723_PEA_1_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Rl 1723_PEAI T19 is shown in bold; this coding portion starts at position 434 and ends at position 685. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 25 known SNPs in variant protein R1 1723_PEA 1 P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNP position 6n nucleotide Alte a tive nucleic acid. Previously known SNP? 778 Gseque T Yes 778 G -> T Yes WO 2005/116850 PCT/IB2005/002555 447 786 G ->C Yes 1456 C ->T Yes Variant protein RI 1723_PEA 1_Pl0 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 RI 1723_PEA_1 T20. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R1 1723_PEA_1 P10 and Q96AC2: 10 1 .An isolated chimeric polypeptide encoding for R1 1723_PEA_1_P10O, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of R1 1723_PEA_1_P10, and a second amino acid sequence being at least 70%, 15 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of R1 1723_PEA_1_P10O, wherein said first and second amino acid sequences are contiguous and in a sequential order. 20 2.An isolated polypeptide encoding for a tail of R1 1723_PEA_1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA_1_P 10. 25 Comparison report between R1 1723_PEA_1 P10O and Q8N2G4: 1.An isolated chimeric polypeptide encoding for R1 1723_PEA_ 1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV
WO 2005/116850 PCT/IB2005/002555 448 MEQSA corresponding to amino acids I - 63 of Q8N2G4, which also corresponds to amino acids I - 63 of RI 1723_PEA 1 Pl0, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 5 DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of R1 1723 PEA 1 Pl0, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723_PEA_1 P10O, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 10 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723_PEA 1 P10. Comparison report between RI 1723_PEA_1_P10O and BAC85273: 1.An isolated chimeric polypeptide encoding for R1 1723_PEA 1 P10, comprising a first 15 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide haying the sequence MWVLG corresponding to amino acids 1 - 5 of R1 1723_PEA_1_P10, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA 20 corresponding to amino acids 22 - 79 of BAC85273, which also corresponds to amino acids 6 63 of R1 1723_PEA_1_P10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of 25 R1 1723 PEA 1 P 10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R1 1723_PEA 1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 30 sequence MWVLG of R11723_PEA_1_P10.
WO 2005/116850 PCT/IB2005/002555 449 3.An isolated polypeptide encoding for a tail of RI 1723_PEA I Pl0, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 PEA 1 Pl0. 5 Comparison report between RI 1723_PEAIPI0 and BAC85518: 1.An isolated chimeric polypeptide encoding for R1 1723_PEA_1_P10O, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV 10 MEQSA corresponding to amino acids 24 - 86 of BAC85518, which also corresponds to amino acids 1 - 63 of R1 1723_PEA_1 P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of 15 RI 1723_PEA_1_P10O, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723_PEA_1 PlO0, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 20 sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA_1_P10. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 25 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein R1 1723_PEA 1_Pl0 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the 30 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723_PEAIPl0 WO 2005/116850 PCT/IB2005/002555 450 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations SNP positions) on amino acid Alternative amino acid(s) Previously knownSNP? sequence~ 66 V ->F Yes 5 Variant protein RI 1723_PEA_1 _Pl0 is encoded by the following transcript(s): R1 1723_PEA_1 T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R1 1723_PEA_1 T20 is shown in bold; this coding portion starts at position 434 and ends at position 703. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative 10 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723_PEA 1_Pl0 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs SNP position on nucleotide Alternative n eacidC PrCeIously known S'NP? ~sequence 629 G-> T Yes 637 G->C Yes 1307 C ->T Yes As noted above, cluster Rl 1723 features 26 segment(s), which were listed in Table 2 15 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. 20 Segment cluster R1 1723_PEA_1_node_13 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_1_T19, R1 1723_PEA_1_T5 and WO 2005/116850 PCT/IB2005/002555 451 RI 1723_PEA_ I _T6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Tnsp name Segmentstartng position Segment ending poto RI1723 PEA 1 TI9 624 776 RI 1723 PEA 1 T5 624 776 RI1723 PEA 1 T6 658 810 5 Segment cluster RI 1723_PEA_1_node_16 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_1 TI7, R1 1723_PEA_ ITI9 and R1 1723 PEA _T20. Table 16 below describes the starting and ending position of this segment 10 on each transcript. Table 16 - Segment location on transcripts n~s~np, en aing h S ding Posit RI1723_PEAI _T17 624 1367 RI 1723 PEA 1 T19 777 1520 RI 1723 PEA 1 T20 628 1371 Segment cluster R1 1723_PEA_1_node_19 according to the present invention is supported 15 by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEAIT5 and RI 1723_PEAlT6. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17- Segment location on transcripts Transcript name . Segment starting position Segment ending position R11723_PEA 1 T5 835 1008 R11723 PEA I T6 869 1042 WO 2005/116850 PCT/IB2005/002555 452 Segment cluster RI 1723_PEAInode_2 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): R11723_PEA1__TI5, RI 1723_PEA_1_T17, R11723_PEA_1_T19, Rl1723_PEAlT20, RI1723_PEA_1 T5 and Rl1723_PEA 1 T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Trnscrzipt nameyrnejnit starting position> Segmnteind position J7 R11723 PEA 1 T15 1 309 R11723 PEA 1 T17 1 309 R11723 PEA 1 T19 1 309 RI1723 PEA 1 T20 1 309 R11723 PEA 1 T5 1 309 Rl1723 PEA 1 T6 1 309 10 Segment cluster R1 1723_PEA_1_node_22 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1_T5 and R1 1723_PEA 1 T6. Table 19 below describes the starting and ending position of this segment on each transcript. 15 Table 19 - Segment location on transcripts Transcipt ilnme Segmient starting position Segmenit enldinlg position R11723_PEA_1 T5 1083 1569 Rl 1723 PEA_1 T6 1117 1603 Segment cluster R1 1723_PEA_1 node_31 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment 20 can be found in the following transcript(s): R1 1723_PEA_1_T15, R1 1723_PEA 1_T5 and WO 2005/116850 PCT/IB2005/002555 453 RI 1723_PEA_ l_T6. Table 20 below describes the starting and ending position of this segment on each transcript (it should be noted that these transcripts show alternative polyadenylation). Table 20 - Segment location on transcripts ~Trinhipt'natfie, cSegni~t rtigposii~n SeienOding-- oit17W RI 1723 PEA 1 T15 1060 1295 RI1723 PEA 1 T5 1978 2213 RI 1723 PEA 1 T6 2012 2247 5 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 10 Segment cluster RI 1723_PEA_1 _node_10 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_1_TIS, R1 1723_PEA_1_T17, R11723_PEA_1_T19, R11723_PEAlT20, Rl1723_PEA 1 T5 and Rl1723_PEA 1 T6. 15 Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts R11723 PEA 1 T15 486 529 R11723 PEA 1 T17 486 529 R11723 PEA 1 T19 486 529 R11723 PEA 1 T20 486 529 R11723 PEA 1 T5 486 529 R11723 PEA 1 T6 520 563 WO 2005/116850 PCT/IB2005/002555 454 Segment cluster RI 1723_PEA I node_ 11 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723_PEA 1 _T15, RI 1723_PEA_1 TI7, Rl1723_PEAI _T19, RI 1723_PEA_I_T20, Rl1723_PEA_1_T5 andRl1723_PEA_1lT6. 5 Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Ir g p Segment endig position R11723_PEA 1 TI5 530 623 R11723_PEA 1 T17 530 623 R11723_PEA I T19 530 623 R11723_PEA I _T20 530 623 Rl1723_PEA IT5 530 623 RI1723_PEA I T6 564 657 Segment cluster RI 1723_PEA_ 1 Inode_15 according to the present invention can be 10 found in the following transcript(s): R1 1723_ PEA 1 T20. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Tnmsrrcr'ipt namec Seg nellnt starting) posit on Seglment en~dingL position RI 1723PEA 1 T20[ 624 627 15 Segment cluster R1 1723_PEA_1_node_18 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA 1 T15, R1 1723 PEA 1 T5 and RI 1723_PEA_1_T6. Table 24 below describes the starting and ending position of this segment on each transcript. 20 Table 24 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 455 Transenpt name Segment starting position Scgment ending position: R11723 PEA 1 TI5 624 681 RI 1723 PEA 1 T5 777 834 RI 1723 PEA 1 T6 811 868 Segment cluster RI 1723_PEA 1_node_20 according to the present invention can be found in the following transcript(s): RI 1723_PEAlT5 and RI 1723_PEAlT6. Table 25 5 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Transcrnpt niame eLIIIICII[grstartigL positions begment RI 1723 PEA 1 T5 1009 1019 R11723 PEA 1 T6 1043 1053 Segment cluster R1 1723_PEAInode_21 according to the present invention is supported 10 by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEAlT5 and R11723_PEA_1 T6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Transcipt name Segmeint starting position Sgment ending position R11723 PEA_1_T5 1020 1082 R11723_PEA_1_T6 1054 1116 15 Segment cluster R1 1723_PEA 1_node_23 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_1_T5 and R1 1723_PEA_1 T6. Table 27 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 456 Table 27 - Segment location on transcripts r anisptin e Seg1 n tar(
I
ng positionegnft e d np.osition RI1723 PEA 1 T5 1570 1599 RI 1723 PEA 1 T6 1604 1633 Segment cluster RI 1723_PEA_1 node_24 according to the present invention is supported 5 by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723_PEA I_T15, R1 1723_PEA_1_T5 and R1 1723_PEAlT6. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts Transcript name e si Segment ending position Y&: RI 1723 PEA 1 T15 682 765 R11723 PEA 1 T5 1600 1683 R11723 PEA_1 T6 1634 1717 10 Segment cluster R1 1723_PEAInode_25 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_1_TI5, R1 1723 PEAlT5 and 15 R1 1723_PEA_1_T6. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Transcijpt name, Segment starting position cgmSegni t ending position R11723_PEA 1 T15 766 791 R11723_PEA 1 T5 1684 1709 Rl1723 PEA 1 T6 1718 1743 WO 2005/116850 PCT/IB2005/002555 457 Segment cluster RI 1723 PEA_1_node_26 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723_PEAI _TI5, R1 1723_PEA_1_T5 and 5 RI 1723_PEAIT6. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Trncrp nam Segment strigpsto <Segment ending position RI 1723 PEA I TI5 792 904 RI 1723 PEA 1 T5 1710 1822 R11723 PEA 1 T6 1744 1856 10 Segment cluster R1 1723_PEA_1_node_27 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1_T1 5, R1 1723_PEA 1_T5 and R11723_PEA_1 T6. Table 31 below describes the starting and ending position of this segment on each transcript. 15 Table 31 - Segment location on transcripts anscriinme Segment starting position Segment ending position R11723 PEA 1 T15 905 986 R11723 PEA 1 T5 1823 1904 R11723 PEA 1 T6 1857 1938 Segment cluster R1 1723_PEA 1_node_28 according to the present invention can be found in the following transcript(s): Rl 1723_PEAIT15I, R11723_PEA_1 T5 and 20 R1 1723_PEA_1 T6. Table 32 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 458 Table 32 - Segment location on transcripts Iranscript name Segment starting positIOn Segmnlt eng 1position1 R11723_PEA 1 TI5 987 1010 R11723 PEA _1 T5 1905 1928 Rl1723_PEA l T6 1939 1962 Segment cluster R1 1723_PEA_1 node_29 according to the present invention is supported 5 by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723_PEA_ I _T15, RI 1723_PEAIT5 and R1 1723_PEA _T6. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts Transcriptiae ISegieIt starting position Segnien ending positio Rl 1723_PEAIT15 1011 1038 R 1723_PEA 1 T5 1929 1956 R11723_PEA 1 T6 1963 1990 10 Segment cluster R1 1723_PEA_1 _node_3 according to the present invention can be found in the following transcript(s): R11723_PEA 1 T15, R 11723_PEAI_T17, R11723_PEA__1T19, R11723_PEA_1 T20, R11723_PEAlT5andRl1723 PEA 1 T6. 15 Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Transcipt namle Segment starting position SegIntent endinIg position R11723 PEA 1 TI5 310 319 R11723_PEA 1_T17 310 319 RI 1723_PEA 1 T19 310 319 R11723_PEA 1_T20 310 319 WO 2005/116850 PCT/IB2005/002555 459 Rl1723 PEA I T5 310 319 R11723 PEA 1 T6 310 319 Segment cluster R1 1723_PEA_1_node_30 according to the present invention can be found in the following transcript(s): RI 1723_PEA 1_TI5, RI 1723 PEA 1 T5 and 5 RI 1723_PEA_ I _T6. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Transcript ime2 SeguIent starting position SegImnt-ending positIon RI 1723 PEA 1 T15 1039 1059 R11723 PEA 1 T5 1957 1977 R11723 PEA 1 T6 1991 2011 10 Segment cluster R1 1723_PEA_1_node_4 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1 T15, R11723_PEA_1_T17, R11723_PEA_1_T19, R11723_PEA_1 T20, R11723_PEA_1 T5andRl1723_PEA_1lT6. Table 36 below describes the starting and ending position of this segment on each transcript. 15 Table 36 - Segment location on transcripts TransCrit name Smetsatnpotoneg nt endin oitio R11723 PEA 1 T15 320 371 R11723 PEA 1 T17 320 371 R11723 PEA 1 T19 320 371 Rl 1723_PEA 1 T20 320 371 R11723 PEA lT5 320 371 R11723_PEA I T6 320 371 WO 2005/116850 PCT/IB2005/002555 460 Segment cluster RI 1723_PEA 1 node_5 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723_PEA l _TI5, RI 1723_PEA_ I_T17, 5 R11723_PEA_1_T19, RI1723_PEA_I_T20, RI1723 PEA l _T5andRlI1723_PEA I T6. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts TrFanscript nm SegmeIt starting oSition ~SegmentlenIng position R11723_PEA 1 TI5 372 414 R11723_PEA 1 T17 372 414 R11723_PEA _ TI9 372 414 R11723_PEA 1 T20 372 414 Rl1723_PEA 1 T5 372 414 R11723_PEA 1 T6 372 414 10 Segment cluster R1 1723_PEA_1_node 6 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723 PEA _I_TI5, R1 1723_PEA_1_T17, RI 1723_PEA_1 T19, R11723_PEA_1_T20, R11723_PEA 1 T5 and R11723 PEA 1 T6. Table 38 below describes the starting and ending position of this segment on each transcript. 15 Table 38 - Segment location on transcripts Transcript name Sement starting po sition Smentendi g position R11723_PEA_1_T15I 415 446 R11723_PEA 1 T17 415 446 R11723_PEA 1_T19 415 446 R11723_PEA 1 T20 415 446 R11723_PEAIT5 415 446 Rl1723_PEA _1 T6 415 446 WO 2005/116850 PCT/IB2005/002555 461 Segment cluster Ri 1723_PEA 1 node_7 according to the present invention is supported by 29 libraries. The number of libraries was detennrined as previously described. This segment 5 can be found in the following transcript(s): R11723_PEA 1 T15, Rl 1723_PEA 1 T17, RlI1723_PEA 1 T19, Rl1723_PEA_1 T20, Rl1723_PEA 1 T5andRll723_PEA_ I _T6. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts Trsrip ame Segment starting position Segnenit ding post1on R11723 PEA 1 T15 447 485 R11723_PEAI T17 447 485 R11723_PEA 1 T19 447 485 R11723_PEA _1 T20 447 485 R11723 PEA 1 T5 447 485 R11723_PEA 1 T6 447 485 10 Segment cluster R1 1723_PEA_1_node_8 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1 T6. Table 40 below describes the starting and ending position of this segment on each transcript. 15 Table 40 - Segment location on transcripts Tnmsript name Segmen starting position Segment ending position R11723_PEA 1 T6 486 519 20 WO 2005/116850 PCT/IB2005/002555 462 Variant protein alignment to the previously known protein: Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:Q8IXMO Sequence documentation: 5 Alignment of: R11723 PEA 1 P6 x Q8IXMO Alignment segment 1/1: 10 Quality: 1128.00 Escore: 0 Matching length: 112 Total length: 112 Matching Percent Similarity: 100.00 Matching Percent 15 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 20 Alignment: 111 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 160 IlllllIIIIIIIIIIIllIllIIIIllllIIIIIIIIIllIllllI 1 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 50 25 161 GEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 210 IIIIIIIIIIIIIIllliiIIIi lllllIIIlllllllllllllill 51 GEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 100 30 211 RQRKEKHSMRTQ 222 I I I ll lII III WO 2005/116850 PCT/IB2005/002555 463 101 RQRKEKHSMRTQ 112 5 Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:Q96AC2 10 Sequence documentation: Alignment of: R11723 PEA 1 P6 x Q96AC2 Alignment segment 1/1: 15 Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 20 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I l I l l l l I llI I lI II I I I I I I I I ll l l l I I I Il lI 5II I I 30 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 WO 2005/116850 PCT/IB2005/002555 464 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 IlllllllllIIlIIIIIIIIllllllll| 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 5 10 Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:Q8N2G4 Sequence documentation: Alignment of: R11723 PEA 1 P6 x Q8N2G4 15 Alignment segment 1/1: Quality: 835.00 Escore: 0 20 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 25 Identity: 100.00 Gaps: 0 Alignment: 30 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 Il I I I l l i l l l l l l l l l l l l l l l I I I I I I IIII I l l WO 2005/116850 PCT/IB2005/002555 465 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 I I I I I l l l l I I l I l l i l l I I I I I I l l l li 5 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 10 Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:BAC85518 Sequence documentation: 15 Alignment of: R11723 PEA 1 P6 x BAC85518 Alignment segment 1/1: 20 Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent 25 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 30 Alignment: WO 2005/116850 PCT/IB2005/002555 466 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I l l l l l l I I I I I I Ii l l l l l l l i l l I I I I l l l l l l l l l l l l l l l 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73 5 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 IlllllllllllllllIllIllIlllllli 74 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 106 10 Sequence name: /tmp/VXjdFlzdBX/bexTxTh0Th:Q96AC2 15 Sequence documentation: Alignment of: R11723 PEA 1 P7 x Q96AC2 20 Alignment segment 1/1: Quality: 654.00 Escore: 0 Matching length: 64 Total 25 length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 30 Gaps: 0 WO 2005/116850 PCT/IB2005/002555 467 Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I l l lI l l l l l l l l l l l l ii l l l l l l l l l i l l l i l l l i I l l l I 5 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAG 64 tIIlllllliillli 51 QDMCQKEVMEQSAG 64 10 15 Sequence name: /tmp/VXjdFlzdBX/bexTxTh0Th:Q8N2G4 Sequence documentation: 20 Alignment of: R11723 PEA 1 P7 x Q8N2G4 Alignment segment 1/1: Quality: 654.00 25 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 30 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 468 Gaps: 0 Alignment: 5 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I II I I l l l l l I I I Il l l i l l l l l l l l l l l l l l l l l l I I I I l l 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAG 64 10 I I l l I l 51 QDMCQKEVMEQSAG 64 15 Sequence name: /tmp/VXjdFlzdBX/bexTxTh0Th:BAC85273 20 Sequence documentation: Alignment of: R11723 PEA 1 P7 x BAC85273 Alignment segment 1/1: 25 Quality: 600.00 Escore: 0 Matching length: 59 Total length: 59 30 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 469 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 5 Alignment: 6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 I I I ll l i i I l il li il l l l I I l li l l l l l lll l l l l i ll l l li 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71 10 56 KEVMEQSAG 64 IIIlIIIII 72 KEVMEQSAG 80 15 20 Sequence name: /tmp/VXjdFlzdBX/bexTxTh0Th:BAC85518 Sequence documentation: Alignment of: R11723 PEA 1 P7 x BAC85518 25 Alignment segment 1/1: Quality: 654.00 Escore: 0 30 Matching length: 64 Total length: 64 WO 2005/116850 PCT/IB2005/002555 470 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 1 0 I I I I I l l II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l II I 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73 51 QDMCQKEVMEQSAG 64 IIIIIIIIIIIII 15 74 QDMCQKEVMEQSAG 87 20 Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q96AC2 Sequence documentation: 25 Alignment of: R11723 PEA 1 PlO0 x Q96AC2 Alignment segment 1/1: 30 Quality: 645.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 471 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I l l l l l l l l li lll l l l l l l ll l l I I I l l l I I I l 1 1 1 1 1 1 l I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 15 51 QDMCQKEVMEQSA 63 fI111111111I 51 QDMCQKEVMEQSA 63 20 Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q8N2G4 25 Sequence documentation: Alignment of: R11723 PEA 1 PI0 x Q8N2G4 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 472 Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I l l l Il l l1 l llI l l I l l i i I I I I I I I I I I Ill iI I I I I I I i 15 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSA 63 IIIIIll'llll 51 QDMCQKEVMEQSA 63 20 25 Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85273 Sequence documentation: 30 Alignment of: R11723 PEA 1 P0lO x BAC85273 ..
WO 2005/116850 PCT/IB2005/002555 473 Alignment segment 1/1: Quality: 591.00 Escore: 0 5 Matching length: 58 Total length: 58 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 10 Identity: 100.00 Gaps: 0 Alignment: 15 6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 I l l I l l l l iI I l l l l l l I I Il l i I I I I I I l l l ll~ l l l l l i I 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71 56 KEVMEQSA 63 20 11111111 72 KEVMEQSA 79 25 Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85518 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 474 Alignment of: R11723 PEA 1 P0lO x BAC85518 Alignment segment 1/1: 5 Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 IlllllllllIlllllIIIIIlllllllIIIIIIIIIIllllllIlllI 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73 20 51 QDMCQKEVMEQSA 63 Illllllllll|l 74 QDMCQKEVMEQSA 86 25 30 Alignment of: R11723 PEA 1 P13 x Q96AC2 ..
WO2005/116850 PCT/IB2005/002555 475 Alignment segment 1/1: Quality: 645.00 Escore: 0 5 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 10 Identity: 100.00 Gaps: 0 Alignment: 15 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSA 63 2 0 I I l I I l 1 1 1 1 51 QDMCQKEVMEQSA 63 25 Expression of R1 1723 transcripts which are detectable by amplicon as depicted in sequence R1 1723 segl3 in normal and cancerous ovary tissues Expression of transcripts detectable by or according to segl3, R11723seg13 amplicon(s) and R11723seg13F and R11723seg13R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon 30 - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and WO 2005/116850 PCT/IB2005/002555 476 GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 5 45-48, 71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up regulation for each sample relative to median of the normal PM samples. Figure 16 is a histogram showing over expression of the above-indicated transcripts in cancerous ovary samples relative to the normal PM samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. 10 As is evident from Figure 16, the expression of transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel"). Notably an over expression of at least 5 fold was found in 23 out of 43 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described 15 below. The P value for the difference in the expression levels of transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 4.76E-04. Threshold of 5 fold overexpression was found to differentiate between cancer and 20 normal samples with P value of 2.48E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results.
WO 2005/116850 PCT/IB2005/002555 477 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair RI 1723seglF forward primer; and RI 1723segl 3R reverse primer. 5 The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: R1 1723seg13. R1 1723segl3F (SEQ ID NO:973)- ACACTAAAAGAACAAACACCTTGCTC 10 R1 1723segl13R (SEQ ID NO:974)- TCCTCAGAAGGCACATGAAAGA R1 1723seg13 (SEQ ID NO:975) ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTG ACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTCTGAGGA 15 Expression of R1 1723 transcripts which are detectable by amplicon as depicted in sequence name R1 1723seg13 in different normal tissues Expression of R1 1723 transcripts detectable by or according to R1 1723seg 13 amplicon and R11723segl3F, R11723segl3R was measured by real time PCR. In parallel the expression 20 of four housekeeping genes RPL19 (GenBank AccessionNo. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin(GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of 25 the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2 above, "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 17, presenting the histogram showing the expression 30 of RI 1723 transcripts, which are detectable by amplicon as depicted in sequence name R1 1723seg13, in different normal tissues. Primers and amplicon are as above.
WO 2005/116850 PCT/IB2005/002555 478 5 Expression of RI 1723 transcripts, which are detectable by amplicon as depicted in sequence RI 1723 juncl l-18 in normal and cancerous ovary tissues Expression of transcripts detectable by or according to juncl 1-18 RI 1723 juncl l-18 amplicon and R11723 juncl -18F and R1172 juncl 1-18R primers was measured by real time PCR (It should be noted that the variants of this cluster are variants of the hypothetical protein 10 PSEC0 181 (referred to herein as "PSEC"). Furthermore, use of the known protein (WT protein) for detection of ovarian cancer, alone or in combination with one or more variants of this cluster and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRT1 (GenBank Accession No. 15 NM_000194; amplicon - HPRTl-amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the 20 normal post-mortem (PM) samples (Sample Nos 45-48, 71, Table 1, above: "Tissue samples in ovarian cancer testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 18 is a histogram showing over expression of the above-indicated transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of 25 duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 18, the expression of transcripts detectable by the above amplicon in cancer samples was higher than in the non-cancerous samples (Sample Nos 45-48, 71 Table 1, "Tissue samples in ovarian cancer testing panel"). Notably an over-expression of at least 5 fold was found in 23 out of 43 adenocarcinoma samples. 30 Primer pairs are also optionally and preferably encompassed within the present invention; for example,- for the above experiment, the following primer pair was used as a non- WO 2005/116850 PCT/IB2005/002555 479 limiting illustrative example only of a suitable primer pair R 1723 junc 11- 18F forward primer; and R1 1723 juncl 1-18R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon 5 was obtained as a non-limiting illustrative example only of a suitable amplicon: R1 1723 juncl 1 18 RI1723juncl 1-18F (SEQ ID NO:976)- AGTGATGGAGCAAAGTGCCG RI 1723 juncl 1-18R (SEQ ID NO:977)- CAGCAGCTGATGCAAACTGAG 10 R11723 juncl 1-18 (SEQ ID NO:978) AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGG CCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACT CAGTTTGCATCAGCTGCTG 15 Expression of R1 1723 transcripts, which are detectable by amplicon as depicted in sequence name R1 1723 junc 11-18 in different normal tissues 20 Expression of R1 1723 transcripts detectable by or according to R1 1723segl3 amplicon and R11723 juncI 1-18F, R11723 juncl 1-18R was measured by real time PCR. In parallel the expression of four housekeeping genes- RPL1 9 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank 25 Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20 Table 2 above: "Tissue samples in normal panel"), to obtain a value of relative expression of each sample 30 relative to median of the ovary samples.
WO 2005/116850 PCT/IB2005/002555 48O The results are described in Figure 19, presenting the histogram showing the expression of RI 1723 transcripts, which are detectable by amplicon as depicted in sequence name RI 1723 juncl 1-18, in different normal tissues. Amplicon and primers are as above. 5 DESCRIPTION FOR CLUSTER D56406 Cluster D56406 features 3 transcript(s) and 10 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 10 Table I - Transcripts of interest TrancriptN e SEQID NO D56406 PEA 1 T3 147 D56406 PEA 1 T6 148 D56406 PEA 1 T7 149 Table 2 - Segments of interest Segmient Naune SEQ ID) NO: D56406 PEA 1 node 0 150 D56406 PEA 1 node 13 151 D56406 PEA 1 node 11 152 D56406_PEA _1node_2 153 D56406_PEA 1 node_3 154 D56406 PEA 1 node_5 155 D56406_PEA 1_node 6 156 D56406 PEA 1 node_7 157 D56406_PEA 1 node 8 158 D56406_PEA_1_node 9 159 Table 3 - Proteins of interest WO 2005/116850 PCT/IB2005/002555 481 Protemi Name : SEQ ID NO: D56406_PEA 1 P2 161 D56406_PEA 1 P5 162 D56406_PEA 1 P6 163 These sequences are variants of the known protein Neurotensin/neuromedin N precursor [Contains: Large neuromedin N (NmN- 125); Neuromedin N (NmN) (NN); Neurotensin (NT); Tail peptide] (SwissProt accession identifier NEUT HUMAN), SEQ ID NO: 160, referred to 5 herein as the previously known protein. Protein Neurotensin/neuromedin N precursor is known or believed to have the following function(s): Neurotensin may play an endocrine or paracrine role in the regulation of fat metabolism. It causes contraction of smooth muscle. The sequence for protein Neurotensin/neuromedin N precursor is given at the end of the application, as 10 "Neurotensin/neuromedin N precursor [Contains: Large neuromedin N (NmN- 125); Neuromedin N (NmN) (NN); Neurotensin (NT); Tail peptide] amino acid sequence". Protein Neurotensin/neuromedin N precursor localization is believed to be secreted; packaged within secretory vesicles. The following GO Annotation(s) apply to the previously known protein. The following 15 annotation(s) were found: signal transduction, which are annotation(s) related to Biological Process; neuropeptide hormone, which are annotation(s) related to Molecular Function; and extracellular; soluble fraction, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available 20 from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster D56406 features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein 25 Neurotensin/neuromedin N precursor. A description of each variant protein according to the present invention is now provided.
WO 2005/116850 PCT/IB2005/002555 482 Variant protein D56406_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D56406_PEA lT3. An alignment is given to the known protein (Neurotensin/neuromedin N precursor) at the end of the application. One or more alignments to one or more previously 5 published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D56406 PEAlP2 and NEUT HUMAN: 1.An isolated chimeric polypeptide encoding for D56406_PEA_1 P2, comprising a first 10 amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMT LLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAF QHWE corresponding to amino acids 1 - 120 of NEUT_HUMAN, which also corresponds to amino acids 1 - 120 of D56406_PEA 1_P2, second amino acid sequence being at least 70%, 15 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ARWLTPVIPALWEAETGGSRGQEMETIPANT corresponding to amino acids 121 - 151 of D56406_PEA 1 P2, and a third amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to 20 amino acids 121 - 170 of NEUT_HUMAN, which also corresponds to amino acids 152 - 201 of D56406_PEA_1 P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of D56406_PEA_1_P2, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably 25 at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for ARWLTPVIPALWEAETGGSRGQEMETIPANT, corresponding to D56406_PEA_1_P2. The location of the variant protein was determined according to results from a number of 30 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: WO 2005/116850 PCT/IB2005/002555 483 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein D56406_PEA_ 1P2 also has the following non-silent SNPs (Single 5 Nucleotide Polymorphisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA_1 P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 4 -Amino acid mutations SNP position(s) on amino ACid Alternative aino acids) Pr eviously known SNP'.' sequence 30 M -> V No 44 S -> P No 84 V -> No 84 V -> A No Variant protein D56406_PEA 1 P2 is encoded by the following transcript(s): D56406_PEA 1 T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D56406_PEA_1_T3 is shown in bold; this coding portion starts at 15 position 106 and ends at position 708. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA_l1P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 20 Table 5 - Nucleic acid SNPs SN pon ol nucleotide Alternative nuclejc acd Previosly kow S sequence 94 G ->T No WO 2005/116850 PCT/IB2005/002555 484 95 A ->T No 858 T ->G Yes 103 A ->G Yes 193 A ->G No 235 T->C No 339 T -> C No 356 T-> No 356 T->C No 417 A->T No 757 T-> No Variant protein D56406_PEA_1 P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 D56406_PEAlT6. An alignment is given to the known protein (Neurotensin/neuromedin N precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between D56406_PEA 1_ IP5 and NEUT HUMAN: 1 .An isolated chimeric polypeptide encoding for D56406_PEA 1_P5, comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLC corresponding to amino acids 1 - 23 of NEUT_HUMAN, which also corresponds to amino acids 1 - 23 of D56406_PEA_1_P5, and a second amino acid sequence being at least 90 % 15 homologous to SEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEEL VARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKR KIPYILKRQLYENKPRRPYILKRDSYYy corresponding to amino acids 26 - 170 of NEUT_HUMAN, which also corresponds to amino acids 24 - 168 of D56406_PEA 1 P5, 20 wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 485 2.An isolated chimeric polypeptide encoding for an edge portion of D56406_PEA 1_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 5 least about 50 amino acids in length, wherein at least two amino acids comprise CS, having a structure as follows: a sequence starting from any of amino acid numbers 23-x to 23; and ending at any of amino acid numbers 24 + ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of 10 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 15 Variant protein D56406_PEA_1 P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA _1_P5 sequence provides support for the deduced sequence of this variant protein according to the 20 present invention). Table 6 - Amino acid mutations SNPl positlonws") on i nino acid AlternativeSI amnoacds) Prviulyknw SNPI? 28 M -> V No 42 S -> P No 82 V -> No 82 V -> A No Variant protein D56406_PEA 1 P5 is encoded by the following transcript(s): D56406_PEA_1 T6, for which the sequence(s) is/are given at the end of the application. The WO 2005/116850 PCT/IB2005/002555 486 coding portion of transcript D56406_PEA I T6 is shown in bold; this coding portion starts at position 106 and ends at position 609. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 5 known SNPs in variant protein D56406_PEA _ _P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs SNP position nucleofiLd I f ( ArnAtive uCleic aii cid : 'Previouidv known SNIP? 94 G ->T No 95 A->T No 759 T -> G Yes 806 G -> A Yes 1014 T -> G No 1178 T ->G No 103 A ->G Yes 187 A -> G No 229 T-> C No 333 T -> C No 350 T-> No 350 T ->C No 411 A ->T No 658 T-> No 10 Variant protein D56406_PEA_1 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D56406_PEA 1_T7. An alignment is given to the known protein (Neurotensin/neuromedin N precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the WO 2005/116850 PCT/IB2005/002555 487 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D56406_PEA_1_P6 and NEUT_HUMAN: I.An isolated chimeric polypeptide encoding for D56406_PEA 1 P6, comprising a first 5 amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK corresponding to amino acids 1 - 45 of NEUT_HUMAN, which also corresponds to amino acids 1 - 45 of D56406_PEA_1 P6, and a second amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to 10 amino acids 121 - 170 of NEUT_HUMAN, which also corresponds to amino acids 46 - 95 of D56406_PEAI P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of D56406_PEAlP6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 15 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KL, having a structure as follows: a sequence starting from any of amino acid numbers 45-x to 45; and ending at any of amino acid numbers 46+ ((n-2) - x), in which x varies from 0 to n-2. 20 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 25 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein D56406_PEA 1 P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 30 the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA_1_P6 WO 2005/116850 PCT/IB2005/002555 488 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations SNP positin(s) onaminwo acid Alternative ainnoacid(s) Previously known.SNP? sequenc:~~ 30 M -> V No 44 S -> P No 5 Variant protein D56406_PEA 1 P6 is encoded by the following transcript(s): D56406_PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D56406_PEA _ T7 is shown in bold; this coding portion starts at position 106 and ends at position 390. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative 10 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA 1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs SNP position on Hucleotide Alternative nucleic acid Previouly kiiown SNP" 94 G -> T No 95 A -> T No 103 A-> G Yes 193 A ->G No 235 T ->C No 439 T-> No 540 T-> G Yes 587 G-> A Yes 795 T-> G No 959 T-> G No WO 2005/116850 PCT/IB2005/002555 489 As noted above, cluster D56406 features 10 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster D56406_PEA_1_node_0 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406 PEA 1 T3, D56406 PEA 1 IT6 and 10 D56406_PEA_1_T7. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts Transcipt naine Segment tarting position ii D56406_PEA 1 T3 1 178 D56406_PEA_1 T6 1 178 D56406_PEA_1 T7 1 178 Microarray (chip) data is also available for this segment as follows. As described above 15 with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 11. Table 11 - Oligonucleotides related to this segment Oligonucleotide mine Overexpressed in cancer Ci irfereic&Ye 4 D56406_0 5 0 ovarian carcinoma OVA 20 Segment cluster D56406_PEA 1 node 13 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA 1 T3, D56406 PEA_1 T6 and WO 2005/116850 PCT/IB2005/002555 490 D56406_PEAIT7. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts TIranscript ne Segment stang position eingos1t1on D56406_PEA 1 T3 559 902 D56406_PEA 1 lT6 460 1239 D56406 PEA_1 T7 241 1020 According to an optional embodiment of the present invention, short segments related to 5 the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster D56406_PEA_ 1 node_11 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment 10 can be found in the following transcript(s): D56406_PEA lT3. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Transcipt name Segment starting position Segment ending position. D56406_PEA_1_T3 466 558 15 Segment cluster D56406_PEA 1_node_2 according to the present invention can be found in the following transcript(s): D56406_PEA _ T3 and D56406_PEA 1_T7. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Transcript name Segment starting position Se [gment ending positioit D56406 PEA 1 T3 179 184 D56406 PEA 1 T7 179 184 20 WO 2005/116850 PCT/IB2005/002555 491 Segment cluster D56406_PEA_1 node_3 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA_ IT3, D56406_PEAlT6 and D56406_PEA 1_T7. Table 15 below describes the starting and ending position of this segment 5 on each transcript. Table 15 - Segment location on transcripts Tians'Crit namec [Segment Starting positaonr Se011oeivhngwSltlOn . D56406 PEA 1 T3 185 240 D56406_PEA 1 T6 179 234 D56406 PEA 1 T7 185 240 Segment cluster D56406_PEA 1 node_5 according to the present invention is supported 10 by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA_1_T3 and D56406_PEA_1_T6. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts 3rmncriptnaili 4 enei stril 1)stil SC-e11en C pisifiPo D56406 PEA 1 T3 241 355 D56406 PEA 1 T6 235 349 15 Segment cluster D56406_PEA_1 node_6 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA 1_T3 and D56406_PEAIT6. Table 17 below describes the starting and ending position of this segment on each transcript. 20 Table 17 - Segment location on transcripts Traiscript name Segment starting position Segment ending postio iti D56406 PEA_1_T3 356 389 WO 2005/116850 PCT/IB2005/002555 492 D56406PEA_IT6 350 383 Segment cluster D56406_PEA_1 node_7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): D56406 PEA 1 T3 and D56406_PEA 1_T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Tra ript segment trting position Segment ending poitio D56406_PEA 1 T3 390 415 D56406_PEAIT6 384 409 10 Segment cluster D56406 PEA _1node_8 according to the present invention can be found in the following transcript(s): D56406_PEA_1 _T3 and D56406_PEA_1 T6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Trai4nscrip name 1 Segerit sta1r6ig p n S ent eldi ngposition D56406_PEA_1_T3 416 423 D56406_PEA_1 T6 410 417 15 Segment cluster D56406_PEA_1 node_9 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA_1_T3 and D56406_PEA_1 T6. Table 20 below describes the starting and ending position of this segment on each transcript. 20 Table 20 - Segment location on transcripts Tans ci t nae Segnient srti position Seg nt5en gpoi D56406_PEA_1_T3 424 465 WO 2005/116850 PCT/IB2005/002555 493 D56406_PEA I_T6 418 459 5 Variant protein alignment to the previously known protein: Sequence name: /tmp/jU49325aMA/8FOXuN7La5:NEUT HUMAN 10 Sequence documentation: Alignment of: D56406 PEA 1 P2 x NEUT HUMAN 15 Alignment segment 1/1: Quality: 1591.00 Escore: 0 Matching length: 170 Total 20 length: 201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 84.58 Total Percent Identity: 84.58 25 Gaps: 1 Alignment: 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50 3 0 l l l l l l l l l l l l l l l l i l ll l l l li l l l I I Il l l l I II l l l l l l l WO 2005/116850 PCT/IB2005/002555 494 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 iI I l i l l l lI I I l i l l l li l l l i l l llll l l l i l l l l l il l i; i 5 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 101 MLTIYQLHKICHSRAFQHWEARWLTPVIPALWEAETGGSRGQEMETIPAN 150 II l i l i l l I ll 1 i i I I I 101 MLTIYQLHKICHSRAFQHWE .............................. 120 10 151 TLIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYY 200 IllllllllllIllllllll||llllill|1IillllllilllllllII 121 LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYY 169 15 201 Y 201 1 170 Y 170 20 Sequence name: /tmp/wWui8Kd4y9/zbf3ihRwnR:NEUT HUMAN 25 Sequence documentation: Alignment of: D56406 PEA 1 P5 x NEUT HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 495 Quality: 1572.00 Escore: 0 Matching length: 168 Total length: 170 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 98.82 Total Percent Identity: 98.82 Gaps: 1 10 Alignment: 1 MMAGMKIQLVCMLLLAFSSWSLC..SEEEMKALEADFLTNMHTSKISKAH 48 I I I I I lI I l I II I I I lI I I I I I I I I I I I I I III I I I I I I I I 15 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50 49 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 98 | | | | | II I I I I I I II l I III Il lI l lII I I I l l l II I III II I I I II 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 20 99 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 148 I I l l l II I I l I II I I I I I Ill lI I I II l i l l l l I I I I I I 101 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 150 25 149 QLYENKPRRPYILKRDSYYY 168 IIllllIIIIIIIlllIIIII 151 QLYENKPRRPYILKRDSYYY 170 30 WO 2005/116850 PCT/IB2005/002555 496 Sequence name: /tmp/f5d07fF5D7/E4N5xjUIAN:NEUT HUMAN 5 Sequence documentation: Alignment of: D56406 PEA 1 P6 x NEUT HUMAN Alignment segment 1/1: 10 Quality: 844.00 Escore: 0 Matching length: 95 Total length: 170 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 55.88 Total Percent Identity: 55.88 Gaps: 1 20 Alignment: 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK..... 45 I I I l l I I1 llI ll II I I I I I I I I I III l l III li l l lI 25 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50 45 ...... ............. ............ .................. 45 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 30 46 ....................
LIQEDILDTGNDKNGKEEVIKRKIPYILKR 75 WO 2005/116850 PCT/IB2005/002555 497 I I I l l l l l l I l l I I I I I I l lI I I I I I 101 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 150 76 QLYENKPRRPYILKRDSYYY 95 5 III 111111ll li111111i 151 QLYENKPRRPYILKRDSYYY 170 DESCRIPTION FOR CLUSTER H53393 10 Cluster H53393 features 4 transcript(s) and 16 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Trnscript Nume SEQ ID NO: H53393 PEA 1 T10 164 H53393 PEA 1 TIl1 165 H53393 PEA 1 T3 166 H53393 PEA 1 T9 167 15 Table 2 - Segments of interest Segment Name SEQ ID NO(: H53393_PEAI node_0 168 H53393_PEA 1 node_10 169 H53393_PEA_1 node_12 170 H53393_PEA 1_node_13 171 H53393_PEA 1 node_15 172 H53393 PEA 1 node 17 173 H53393 PEA_1 node 19 174 H53393_PEA 1_node 23 175 WO 2005/116850 PCT/IB2005/002555 498 H53393_PEA 1 node_24 176 H53393 PEA I node_25 177 H53393 PEA_1 node_29 178 H53393 PEA I node_4 179 H53393_PEA _ node 6 180 H53393_PEA 1 node_8 181 H53393_PEA _ node 21 182 H53393_PEA 1 node_22 183 Table 3 - Proteins of interest Protein Name > SEQ ID NO: . H53393 PEAIP2 185 H53393 PEA 1 P3 186 H53393 PEA 1 P6 187 These sequences are variants of the known protein Cadherin-6 precursor (SwissProt 5 accession identifier CAD6_HUMAN; known also according to the synonyms Kidney-cadherin; K-cadherin), SEQ ID NO: 184, referred to herein as the previously known protein. Protein Cadherin-6 precursor is known or believed to have the following function(s): Cadherins are calcium dependent cell adhesion proteins. They preferentially interact with themselves in a homophilic manner in connecting cells; cadherins may thus contribute to the 10 sorting of heterogeneous cell types. The sequence for protein Cadherin-6 precursor is given at the end of the application, as "Cadherin-6 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP positions) on Comment ~amino acid sequence - 421 V ->I 425 T->I WO 2005/116850 PCT/IB2005/002555 499 Protein Cadherin-6 precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion; homophilic cell adhesion, which are annotation(s) 5 related to Biological Process; calcium binding; protein binding, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBI Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available 10 from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster H53393 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of 15 the table and the numbers on the y-axis of Figure 20 refer to weighted expression of ESTs in each category, as 'parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in 20 Figure 20 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma. Table 5 - Normal tissue distribution Name of T issue Number epithelial 2 general 5 kidney 15 lung 6 muscle 5 ovary 0 WO 2005/116850 PCT/IB2005/002555 500 uterus 0 Table 6 - P values and ratios for expression in cancerous tissue Name of Tissue P P SPi R3 SP2 R4 epithelial 1.4e-01 1.le-01 1.8e-04 6.3 2.5e-05 5.9 general 2.0e-01 8.6e-02 1.1e-04 3.1 1.3e-06 3.2 kidney 5.5e-01 4.4e-01 3.4e-01 1.7 8.2e-02 2.3 lung 9.5e-01 8.5e-01 1 0.6 6.2e-01 1.1 muscle 9.2e-01 4.8e-01 1 0.8 3.9e-01 2.0 ovary 7.1le-02 3.0e-02 1.5e-02 5.2 2.9e-03 5.9 uterus 8.2e-02 1.4e-01 1.9e-01 3.0 3.3e-01 2.2 As noted above, cluster H53393 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Cadherin-6 5 precursor. A description of each variant protein according to the present invention is now provided. Variant protein H53393_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 10 H53393_PEA_1_T10. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53393_PEA_1_P2 and CAD6_HUMAN: 15 1.An isolated chimeric polypeptide encoding for H53393_PEAlP2, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV 20 QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA
KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV
WO 2005/116850 PCT/IB2005/002555 501 GENAEl EYSITDGEGLDMFDVITDQETQEGIlITVKK LLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNF 5 TIQDNK corresponding to amino acids I - 543 of CAD6_HUMAN, which also corresponds to amino acids 1 - 543 of H53393_PEA 1 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GK corresponding to amino acids 544 - 545 of H53393_PEA 1 P2, wherein said first and second amino acid 10 sequences are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 15 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein H53393_PEA_1 P2 is encoded by the following transcript(s): 20 H53393_PEA 1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393_PEA 1 T10 is shown in bold; this coding portion starts at position 327 and ends at position 1961. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 25 known SNPs in variant protein H53393_PEA 1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs SNP position on nucleotide: Alternative nucleic acid Previously known, SNI" sequence 1208 C ->T Yes WO 2005/116850 PCT/IB2005/002555 502 1407 T ->C Yes 1851 T ->C Yes 1886 G ->A Yes 2309 C -> T Yes 2736 T->C Yes 2762 G -> T Yes Variant protein H53393_PEAlP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 H53393_PEA_1 T11 and H53393_PEAlT3. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between H53393_PEA_1_P3 and CAD6_HUMAN: 1 .An isolated chimeric polypeptide encoding for H53393_PEAlP3, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD 15 REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV 20 KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQ corresponding to amino acids 1 - 504 of CAD6_HUMAN, which also corresponds to amino acids 1 - 504 of H53393_PEA 1 P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide WO 2005/116850 PCT/IB2005/002555 503 having the sequence RFGFSLS corresponding to amino acids 505 - 511 of H53393_PEA_1 P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H53393_PEA_I_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RFGFSLS in H53393_PEA 1 P3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 10 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 15 Variant protein H53393_PEA 1 P3 is encoded by the following transcript(s): H53393_PEA_1_T11 and H53393_PEA 1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393_PEA_1_T11 is shown in bold; this coding portion starts at position 327 and ends at position 1859. The transcript also has the following 20 SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs SNP position onnIcleotide Alternative nucleic acid Previously known SNP2 'sequence 1208 C ->T Yes 1407 T ->C Yes 1871 T ->C Yes 1906 G ->A Yes WO 2005/116850 PCT/IB2005/002555 504 2329 C -> T Yes 2756 T-> C Yes 2782 G -> T Yes The coding portion of transcript H53393_PEA _ T3 is shown in bold; this coding portion starts at position 327 and ends at position 1859. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the 5 alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEA_1 P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs SNP position onnuIol ceotde Alternatie niii e acd &tiusl, kio nS . SNIPl 1208 C -> T Yes 1407 T-> C Yes 1871 T -> C Yes 1906 G -> A Yes 2149 C -> T Yes 3425 T-> No 3492 C -> G Yes 10 Variant protein H53393_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53393_PEAlT9. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein 15 sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53393_PEAlP6 and CAD6_HUMAN: WO 2005/116850 PCT/IB2005/002555 505 I.An isolated chimeric polypeptide encoding for H53393 PEA l P6, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD 5 REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKK corresponding to amino acids 1 - 333 of CAD6_HUMAN, which also corresponds to amino acids 1 - 333 of H53393_PEA _I_ P6, and 10 a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VMPLLKHHTE corresponding to amino acids 334 - 343 of H53393_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 15 2.An isolated polypeptide encoding for a tail of H53393 PEA_l1P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VMPLLKHHTE in H53393_PEA_1_P6. 20 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane 25 region prediction program predicts that this protein has a trans-membrane region.. Variant protein H53393_PEA 1_P6 is encoded by the following transcript(s): H53393_PEA 1 T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393_PEA_l_I 19 is shown in bold; this coding portion starts at 30 position 327 and ends at position 1355. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative WO 2005/116850 PCT/IB2005/002555 506 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEAIP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs SNP' positi onnucleotide Alternatve1ncleic ac6d Preioly known ;SN? 1208 C -> T Yes 5 As noted above, cluster H53393 features 16 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. 10 Segment cluster H53393_PEA 1_ Inode_0 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1 TI0, H53393_PEAlT11, H53393_PEAlT3 and H53393_PEA_1 T9. Table 11 below describes the starting and ending 15 position of this segment on each transcript. Table 11 - Segment location on transcripts Transcript name SegmenICIt starting- position Segmient sending( positionl H53393_PEA_1 TI0 1 198 H53393_PEA_1 TI 1 198 H53393_PEA_1 T3 1 198 H53393_PEA_1 T9 1 198 Segment cluster H53393 PEA I node_10 according to the present invention is supported 20 by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA ITI0, H53393_PEA_1_TI 1, WO 2005/116850 PCT/IB2005/002555 507 H53393 PEA lT3 and H53393_PEA_ I _T9. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts TrcriH53393_PEA S Tt 970 1137tarting posito H53393 PEA 1 TII 970 1137 H53393-PEA1_Tl l1 970 1137 H53393_PEA _ T3 970 1137 H53393_PEA 1 T9 970 1137 5 Segment cluster H53393_PEAInode_12 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA 1 T10, H53393_PEA_1 _Ti 1, H53393_PEA_1_T3 and H53393_PEA_1 T9. Table 13 below describes the starting and ending 10 position of this segment on each transcript. Table 13 - Segment location on transcripts TransEriApt nain Smnt artn Sgent nding positi H53393_PEA 1_TIO 1138 1325 H53393_PEA 1 TI1 1138 1325 H53393_PEA 1 T3 1138 1325 H53393_PEA1 1T9 1138 1325 Segment cluster H53393_PEA 1_node_13 according to the present invention is supported 15 by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1 _T9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Transcript name I' Segentstartigposition Seg ent ending position WO 2005/116850 PCT/IB2005/002555 508 H53393_PEA _I_T9 1326 1625 Segment cluster H53393_PEAInode_15 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): H53393_PEA 1 T10, H53393_PEA_1 TI I and H53393_PEA_1 T3. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts lranscrpt nmic Segnent starmng position Segment ending position H53393 PEA 1 T10 1326 1579 H53393 PEA 1 TIl1 1326 1579 H53393 PEA lT3 1326 1579 10 Segment cluster H53393_PEA_1 node_17 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA 1_T10, H53393_PEA_1_T 1 and H53393_PEA_1 T3. Table 16 below describes the starting and ending position of this segment 15 on each transcript. Table 16 - Segment location on transcripts Transcnpt mnm Scment startig, positiIoegment etnding position H53393 PEA-1_T10 1580 1716 H53393 PEA_1 Tl1 1580 1716 H53393_PEA 1_T3 1580 1716 Segment cluster H53393_PEAInode_19 according to the present invention is supported 20 by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA 1 T10, H53393_PEA_1_TI and WO 2005/116850 PCT/IB2005/002555 509 H53393_PEA 1_T3. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts Tm ipt nime Sement star'-tin, g position SSegment &iendmgtposition H53393 PEA 1 T10 1717 1838 H53393_PEA 1 TIl1 1717 1838 H53393_PEA 1 T3 1717 1838 5 Segment cluster H53393_PEA _I_ node_23 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEAI T10 and H53393_PEA_1IT11. Table 18 below describes the starting and ending position of this segment on each transcript. 10 Table 18 - Segment location on transcripts Transcript ane Segment starting position Segment ending position H53393_PEA_1 TIO 1957 2136 H53393_PEA 1_TI1 1977 2156 Segment cluster H53393_PEA 1_node_24 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment 15 can be found in the following transcript(s): H53393_PEA 1_T10, H53393_PEA_1 T11 and H53393_PEA _T3. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Tanscript name Segmeinit start position Segment endmg position H53393_PEA IT10 2137 2388 H53393_PEA 1 TIl1 2157 2408 H53393_PEA_ _T3 1977 2228 WO 2005/116850 PCT/IB2005/002555 510 Segment cluster H53393 PEA I _node_25 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): H53393_PEAITIO and H53393 PEAITI 1. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts IransIt n SeC-ent startingposiin Si H53393 PEA 1 T10 2389 2873 H53393 PEA 1 TII 2409 2893 10 Segment cluster H53393_PEA_1_node_29 according to the present invention is supported by 41 libraries.. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1_T3. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts 'TranIscript nme 4 Segment starting position ISem ent endingl position H53393_PEA lT3 2229 3998 15 Segment cluster H53393_PEA_1 node_4 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1_T10, H53393_PEA_1_TI1, 20 H53393_PEA_1_ T3 and H53393_PEA_1 T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts 1 rnscr aip Segment starting positIon Segn t hdingpositior H53393 PEA 1 T10 199 554 WO 2005/116850 PCT/IB2005/002555 511 H53393 PEA 1 TI1 199 554 H53393 PEA 1 T3 199 554 H53393 PEA 1 T9 199 554 Segment cluster H53393_PEA1__node_6 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): H53393_PEAITIO, H53393_PEA_1 T1 1, H53393_PEAlT3 and H53393_PEAlT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Transcript( naime Semnt ,tarting poitionl Segmnt nd1ing positionI H53393 PEA 1 T10 555 849 H53393 PEA 1 TIl1 555 849 H53393 PEA 1 T3 555 849 H53393 PEA 1 T9 555 849 10 Segment cluster H53393_PEA_1_node_8 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1_TIO, H53393_PEA_1_TI 1, H53393_PEAlT3 and H53393_PEAlT9. Table 24 below describes the starting and ending 15 position of this segment on each transcript. Table 24 - Segment location on transcripts Iranscipt Eame Segment starting oI Segment ending position H53393 PEA 1 T10 850 969 H53393 PEA 1 Tl1 850 969 H53393_PEA lT3 850 969 H53393_PEA_1 T9 850 969 WO 2005/116850 PCT/IB2005/002555 512 According to an optional embodiment of the present invention, short segments related to 5 the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster H53393_PEA 1 node_21 according to the present invention can be found in the following transcript(s): H53393_PEA 1 T 11 and H53393_PEA_1 T3. Table 25 10 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Transcript name S ent sTti ositio gm t ending posting H53393_PEA_1 TlIl 1839 1858 H53393_PEA_1 T3 1839 1858 Segment cluster H53393 PEA _1node_22 according to the present invention is supported 15 by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA 1 T10, H53393_PEA 1_T 1 and H53393_PEAIT3. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Tranlscripname <. Segment starting pcition Segmentending posto H53393_PEA _1 T10 1839 1956 H53393_PEA 1 TI1 1859 1976 H53393_PEAI T3 1859 1976 20 WO 2005/116850 PCT/IB2005/002555 513 Variant protein alignment to the previously known protein: Sequence name: /tmp/oAlc9u2qp7/lHgSZJi6aI:CAD6_HUMAN 5 Sequence documentation: Alignment of: H53393 PEA 1 P2 x CAD6 HUMAN Alignment segment 1/1: 10 Quality: 5281.00 Escore: 0 Matching length: 543 Total length: 543 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 20 Alignment: 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 | | I l l il l il l l l l l l llI ill ll I ll l l ill I lil ilIIl l l l l ll 25 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 Il l I l il l I l l I l l l Ill i I I I I I l l l l l i l l lII I I l l l l ll I 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 30 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 WO 2005/116850 PCT/IB2005/002555 514 I I l I l l1 I l l l l I I l l l l l l I I I I I I l l l l l l l I l I l l I 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 5 Il i i i I I I I I l l I I l l I I I I l I 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 I l l l l l I l l l l l l l l I I I I l l l l lI I I Il l l l l I I I I 10 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 I I I I I I 1ll l 1I l l I I I I I I I I I I I I I I I I I l i l l 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 15 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 20 351 NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400 I I I I I l l I I I l l 1II I I I I I I I I I I IIII I l l I l I 351 NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400 401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450 2 5 | | l I l l l I I l l I l l l l l l I I I l l 401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 i1 I l l l l I l l l l L I ll l l l l l l l l l l l l l Il I I I 30 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 WO 2005/116850 PCT/IB2005/002555 515 501 KADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNFTIQDNK 543 IllllllIIIIllllllIIIIIIllIllllllIIIIIllll 501 KADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNFTIQDNK 543 5 10 Sequence name: /tmp/I80QylyXbk/TPOIdLltx5:CAD6 HUMAN Sequence documentation: Alignment of: H53393 PEA 1 P3 x CAD6 HUMAN 15 Alignment segment 1/1: Quality: 4900.00 Escore: 0 20 Matching length: 504 Total length: 504 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 25 Identity: 100.00 Gaps: 0 Alignment: 30 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 I l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l i l l l l I l l l l l l l l l WO 2005/116850 PCT/IB2005/002555 516 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 l I l l l l l l l l l l I l l l l l I I I I1 l l l l l l i l l I I l i l l I 5 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 IF I I l l l I II 1l l l 1I I I I I I I I I I I I I I I I III I I1I I I 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 10 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 I l l l l I I l i l I I I I I I l l l l l II I Il l 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 15 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 Il l I I I l I I I I l l l1 l l l 1 l l l l I I I l l l l l l l I I I IIII111 I 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 20 lIII 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 25 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 351 NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400 351 NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400 30 401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450 WO 2005/116850 PCT/IB2005/002555 517 I I I I l lI I l l l l I I l l l llll l ll l l l l l l l I l l Il l 1 I l l I 401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 5 I I I I I I l l l l I II1l i l l l I l l l I l l l l l l i l l l l l l l l l I l l 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 501 KADQ 504 IIII 10 501 KADQ 504 15 Sequence name: /tmp/NtvjwylOCi/cSLi3091on:CAD6 HUMAN Sequence documentation: 20 Alignment of: H53393 PEA 1 P6 x CAD6 HUMAN Alignment segment 1/1: 25 Quality: 3247.00 Escore: 0 Matching length: 335 Total length: 335 Matching Percent Similarity: 100.00 Matching Percent 30 Identity: 99.40 WO 2005/116850 PCT/IB2005/002555 518 Total Percent Similarity: 100.00 Total Percent Identity: 99.40 Gaps: 0 5 Alignment: 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 I I l l I I Iie l l l lI1 Il l I l l l l l I l l l l l l l i l l I I I I I 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 10 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 I III I 1l l 1 I I I I I II I I lI I I I I I I I I I I I I I I I I I I I Il 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 15 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 I l I I l I I I l I l llIl l I I I III I l l l l I I I 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 20 IlI II l l 151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 I I I I 1l l I I I l l l I l l l l I l l l l l l l l l l III l l l I I I 25 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 I l i l l l l l l l 1l ll l ll1 1I I l l l l l I 1 l l l l l I l l I I I I I I I I I 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 30 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKVM 335 WO 2005/116850 PCT/IB2005/002555 519 I II l l l l I I I l l l I l l l l l l l I I l l l l l l l : : 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLL 335 5 Expression of CAD6_HUMAN Cadherin-6 [Precursor]; Kidney-cadherin; K-cadherin H53393 transcripts which are detectable by amplicon as depicted in sequence name H53393 segl3 in normal and cancerous ovary tissues Expression of CAD6_HUMAN Cadherin-6 [Precursor]; Kidney-cadherin; K-cadherin 10 transcripts detectable by or according to segl3, H53393 segl3 amplicon(s) and H53393 segl3F and H53393 segl3R primers was measured by real time PCR. In this specific example, the real time PCR reaction efficiency was assumed to be 2 and was not calculated by a standard curve reaction (as detailed above in the section of "Real-Time RT-PCR analysis "). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon 15 - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then 20 divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up regulation for each sample relative to median of the normal PM samples. Figure 21 is a histogram showing over expression of the above-indicated CAD6_HUMAN Cadherin-6 [Precursor] transcripts in cancerous ovary samples relative to the 25 normal samples. As is evident from Figure 21, the expression of CAD6_HUMAN Cadherin-6 [Precursor] transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 19 out of 43 30 adenocarcinoma samples.
WO 2005/116850 PCT/IB2005/002555 520 Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CAD6_HUMAN Cadherin-6 [Precursor] transcripts detectable by the above amplicon(s) in ovary cancer samples versus the non-rmal tissue samples was determined by T test as 5.5E-03. 5 Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 6.94E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non 10 limiting illustrative example only of a suitable primer pair: H53393 segl3F forward primer; and H53393 segl 3R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: H53393 segl3. 15 H53393 seg 13 Forward primer (SEQ ID NO:979): AATGCCGCTTCTTAAACACCA H53393 segl3 Reverse primer (SEQ ID NO:980): AGAACTGGCATTTTTCTGAAAATAATAA H53393 segl3 Amplicon(SEQ ID NO:981): AATGCCGCTTCTTAAACACCATACAGAGTGAACCCATTTACTTTTCTCCAGTTCCTA 20 AGTTACCAGGGGCAATTATATCTCACATAAACATTCCTTTAGATTTTTATTTTACTTA TTATTTTCAGAAAAATGCCAGTTCT Expression of CAD6_HUMAN Cadherin-6 [Precursor] H53393 transcripts which are detectable by amplicon as depicted in sequence name H53393 junc21-22 in normal and cancerous ovary 25 tissues Expression of CAD6_HUMAN Cadherin-6 [Precursor] transcripts detectable by or according to junc21-22, H53393 junc21-22 amplicon(s) and H53393 junc21-22F and H53393 junc21-22R primers was measured by real time PCR. In this specific example, the real-time PCR reaction efficiency was assumed to be 2 and was not calculated by a standard curve 30 reaction (as detailed above in the section of "Real-Time RT-PCR analysis "). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC0 19323; amplicon WO 2005/116850 PCT/IB2005/002555 521 - PBGD-amrnplicon), HPRT I (GenBank Accession No. NM_000194; amplicon - HPRTI amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of 5 the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up regulation for each sample relative to median of the normal PM samples. Figure 22 is a histogram showing over expression of the above-indicated 10 CAD6_HUMAN Cadherin-6 [Precursor] transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 22, the expression of CAD6_HUMAN Cadherin-6 [Precursor] transcripts detectable by the above amplicon(s) in cancer samples was higher than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 23 out of 43 15 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: H53393 junc21-22F forward primer; 20 and H53393 junc21-22R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: H53393 junc21 22. 25 H53393 junc21-22 Forward primer (SEQ ID NO:982): TGGTTTTTCTCTTAGTTGATTCAGACC H53393 junc21-22 Reverse primer (SEQ ID NO:983): GAGCCACTGGCTGCTTCAG H53393 junc21-22 Amplicon (SEQ ID NO:984): TGGTTTTTCTCTTAGTTGATTCAGACCTTGCATGCTGTTGACAAGGATGACCCTTATA 30 GTGGGCACCAATTTTCGTTTTCCTTGGCCCCTGAAGCAGCCAGTGGCTC WO 2005/116850 PCT/IB2005/002555 522 DESCRIPTION FOR CLUSTER HSU40434 Cluster HSU40434 features 1 transcript(s) and 36 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 5 Table 1 - Transcripts of interest TranscriptNameSEQD NO HSU40434 PEA _T13 188 Table 2 - Segments of interest SeLe)nti NameSEQ I , NO: HSU40434 PEA 1 node 1 189 HSU40434 PEA 1 node_16 190 HSU40434 PEA 1 node_30 191 HSU40434 PEA 1 node_32 192 HSU40434 PEA1 node_57 193 HSU40434 PEA1 node_0 194 HSU40434_PEA I node 10 195 HSU40434_PEA 1 node 13 196 HSU40434_PEA 1_node_18 197 HSU40434 PEA1 node_2 198 HSU40434_PEA 1 node 20 199 HSU40434_PEA 1 node 21 200 HSU40434_PEA_1 node 23 201 HSU40434 PEA 1 node 24 202 HSU40434 PEA_1_node 26 203 HSU40434 PEA 1_node_28 204 HSU40434 PEA 1 node 3 205 HSU40434 PEA 1 node 35 206 HSU40434_PEA1 node_36 207 WO 2005/116850 PCT/IB2005/002555 523 HSU40434 PEA 1 node_37 208 HSU40434 PEA_ lnode_38 209 HSU40434 PEA 1 node 39 210 HSU40434 PEA 1 node 40 211 HSU40434 PEA 1 node_41 212 HSU40434 PEA 1_node 42 213 HSU40434 PEA 1_node 43 214 HSU40434 PEA _ node_44 215 HSU40434 PEA 1_node 47 216 HSU40434 PEA 1 node_48 217 HSU40434 PEA 1 node_51 218 HSU40434 PEA 1 node_52 219 HSU40434 PEA 1 node_53 220 HSU40434_PEA 1 node 54 221 HSU40434_PEA _1 node 56 222 HSU40434_PEA 1 node 7 223 HSU40434_PEA 1 node_8 224 Table 3 - Proteins of interest Protein Nunne SEQ ID No: HSU40434_PEA_1 Pl2 226 These sequences are variants of the known protein Mesothelin precursor (SwissProt 5 accession identifier MSLN_HUMAN; known also according to the synonym CAK1 antigen), SEQ ID NO: 225, referred to herein as the previously known protein. The variant proteins according to the present invention are variants of a known diagnostic marker, called Mesothelin(CAK-1). Protein Mesothelin precursor is known or believed to have the following function(s): may 10 play a role in cellular adhesion. Antigenic protein reactive with antibody Kl. The sequence for protein Mesothelin precursor is given at the end of the application, as "Mesothelin precursor WO 2005/116850 PCT/IB2005/002555 524 amino acid sequence". Protein Mesothelin precursor localization is believed to be attached to the membrane by a GPI-anchor. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion, which are annotation(s) related to Biological Process; 5 protein binding, which are annotation(s) related to Molecular Function; and membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 10 Cluster HSU40434 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 23 refer to weighted expression of ESTs in 15 each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 23 and Table 4. This cluster is overexpressed (at least at a minimum level) in the 20 following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues, ovarian carcinoma and pancreas carcinoma. Table 4 - Normal tissue distribution Name~ of Tissue NunbeIr brain 2 colon 0 epithelial 9 general 4 kidney 0 liver 0 WO 2005/116850 PCT/IB2005/002555 525 lung 32 ovary 0 pancreas 2 prostate 2 stomach 0 Thyroid 0 uterus 4 Table 5 - P values and ratios for expression in cancerous tissue Nae of Tissu Pl P2 S __ IR SP2 R4 brain 5.1e-01 3.le-01 1 0.9 2.5e-01 2.7 colon 1.7e-01 1.7e-01 3.4e-01 2.4 4.6e-01 2.0 epithelial 4.3e-03 2.3e-03 9.3e-12 6.7 6.1 e-08 4.5 general 4.0e-05 1.5e-05 3.9e-24 11.6 1.5e-17 7.5 kidney 4.1e-01 5.1e-01 1.1e-01 3.2 2.4e-01 2.3 liver 1 6.8e-01 1 1.0 4.8e-01 1.9 lung 5.4e-01 7.9e-01 4.8e-01 1.3 8.4e-01 0.7 ovary 8.2e-02 6.3e-02 4.8e-06 11.3 1.5e-04 8.0 pancreas 2.3e-01 8.7e-02 1.8e-04 5.4 2.4e-04 6.1 prostate 9.7e-01 9.3e-01 1 0.9 7.5e-01 1.2 stomach 1 3.0e-01 1 1.0 2.1e-01 2.3 Thyroid 5.0e-01 5.0e-01 6.7e-01 1.5 6.7e-01 1.5 uterus 9.0e-02 5.6e-02 8.5e-02 3.3 1.1le-01 2.8 As noted above, cluster HSU40434 features I transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Mesothelin 5 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSU40434_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) WO 2005/116850 PCT/IB2005/002555 526 HSU40434_PEA_1 TI3. An alignment is given to the known protein (Mesothelin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 5 Comparison report between HSU40434_PEA_1 PI2 and Q14859 (SEQ ID NO:985): 1.An isolated chimeric polypeptide encoding for HSU40434_PEA_1 P 2, comprising a first amino acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISS LSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDALP 10 LDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALACWGVRGSLLS EADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGP PSTWSVSTMDALRGLLPVLGQPIIRS1PQGIVAAWRQRSSRDPSWRQPERTILRPRFRRE VEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKH KLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMSPQVA 15 TLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW corresponding to amino acids 1 - 458 of Q14859, which also corresponds to amino acids 1 - 458 of HSU40434_PEA 1 PI2. Comparison report between HSU40434_PEA_1_P12 and Q9BTR2 (SEQ ID NO:986): 1.An isolated chimeric polypeptide encoding for HSU40434_PEA 1 P12, comprising a 20 first amino acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ corresponding to amino acids 1 - 43 of Q9BTR2, which also corresponds to amino acids 1 - 43 of HSU40434_PEA 1 P12, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% 25 homologous to a polypeptide having the sequence E corresponding to amino acids 44 - 44 of HSU40434_PEA_1_P12, and a third amino acid sequence being at least 90 % homologous to AAPLDGVLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRC LAHRLSEPPEDLDALPLIDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLL PAALACWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQE 30 AARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPS
WRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRV
WO 2005/116850 PCT/IB2005/002555 527 NAIPFTYEQLDVLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKAL LEVNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSW corresponding to amino acids 44 - 457 of Q9BTR2, which also corresponds to amino acids 45 458 ofHSU40434_PEA_ I_PI2, wherein said first, second and third amino acid sequences are 5 contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSU40434_PEA_ I _P12, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for E, corresponding to HSU40434_PEA_ I _P 12. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 15 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSU40434_PEA 1 P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 20 the SNP is known or not; the presence of known SNPs in variant protein HSU40434_PEA_1 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations SNP position(s),on amino acid ]Alternative amino acid(s) Previously known SNP? sequence 118 L-> V No 139 R-> H No 162 L -> Q No 235 G -> No 330 A -> V No WO 2005/116850 PCT/IB2005/002555 528 342 I ->N No 402 N -> D No 51 V-> No Variant protein HSU40434_PEAIPl2 is encoded by the following transcript(s): HSU40434_PEA_1_T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSU40434_PEA_1_T13 is shown in bold; this coding portion 5 starts at position 420 and ends at position 1793. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU40434_PEA 1 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 7 - Nucleic acid SNPs SNP pos n n nucleOtide Alteraie ndCeic acid Previousiv knon SNP? sequenceu 170 G-> A Yes 334 G-> A Yes 1623 A-> G No 1931 G -> No 1955 A -> G No 2270 A -> G No 2352 C -> No 2431 G-> A No 2482 C ->A No 2483 C ->A No 557 G ->A No 572 C-> No 771 C ->G No 835 G-> A No 904 T-> A No WO 2005/116850 PCT/IB2005/002555 529 1124 C ->No 1408 C-> T No 1444 T-> A No As noted above, cluster HSU40434 features 36 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster HSU40434_PEA Inode_1 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 8 below 10 describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts Trnsript nme ~ Seglnt starting position . Segment exngposito~ HSU40434_PEA 1_T13 58 308 Segment cluster HSU40434_PEA 1_node_ 16 according to the present invention is 15 supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEAI T13. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts Transcript name ScIent stating positions SeIgmej en dig position HSU40434_PEA 1_T13 599 719 20 Segment cluster HSU40434_PEA_1 node 30 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This WO 2005/116850 PCT/IB2005/002555 530 segment can be found in the following transcript(s): HSU40434_PEA 1_T13. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts TransCit naime Segment starting positiof Segment ein
P
ositIo HSU40434_PEAIT13 1315 1493 5 Segment cluster HSU40434_PEA 1_node_32 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEA 1_T13. Table 11 below describes the starting and ending position of this segment on each transcript. 10 Table 11 - Segment location on transcripts Transcript naei Sement strting position Segmet tenditIg position HSU40434_PEA_1TI3 1494 1649 Segment cluster HSU40434_PEA 1 node_57 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This 15 segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts Transcript name Segient starting position Segmeichng position HSU40434_PEA 1 T13 2307 2499 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are 20 included in a separate description. Segment cluster HSU40434_PEA_1_node_0 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This WO 2005/116850 PCT/IB2005/002555 531 segment can be found in the following transcript(s): HSU40434_PEA_ I _TI3. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts -Transcript name Segment starting position SeigmInt ending position HSU40434_PEA 1 TI3 1 57 5 Segment cluster HSU40434_PEA_1_node_10 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA 1 T13. Table 14 below describes the starting and ending position of this segment on each transcript. 10 Table 14 - Segment location on transcripts Transcript name Segment starting poi bSgment endng position HSU40434_PEA_1T13 505 548 Segment cluster HSU40434_PEA 1 node_ 13 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This 15 segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Transcript name, Segme a r oiton Seg nt e..ding p Positi7on HSU40434_PEA_1_T13 549 598 20 Segment cluster HSU40434_PEA 1_node_ 18 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 16 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 532 Table 16 - Segment location on transcripts Transdript amed Segment starting position. Segmenteqfdqng poMition HSU40434 PEA 1 Tl3 720 799 Segment cluster HSU40434_PEAInode_2 according to the present invention is 5 supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA lTl3. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts HSU40434_ PEA_1T13 309 368 10 Segment cluster HSU40434 PEA 1 node_20 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA1__T13. Table 18 below describes the starting and ending position of this segment on each transcript. 15 Table 18 - Segment location on transcripts Tf~,, np muleSc-mcnt tartifig [os SegmendI1 -Oito HSU40434_PEA_1_T13 800 905 Segment cluster HSU40434_PEA 1_node_21 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 19 below describes the 20 starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Transcript name Segment start position I Segment ending osoitioi WO 2005/116850 PCT/IB2005/002555 533 HSU40434-PEA 1_TI3 906 929 Segment cluster HSU40434_PEA 1_ Inode_23 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This 5 segment can be found in the following transcript(s): HSU40434 PEA I T13. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts Tran cipt name Segmien tlartmg position Segmeint n d position HSU40434 PEA I T13 930 1043 10 Segment cluster HSU40434_PEA 1_node_24 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEA_1 T13. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts Triuiscript naune Segment starting poston SegmIen ing posir[ion HSU40434_PEA_1_TI3 1044 1123 15 Segment cluster HSU40434_PEA 1 Inode 26 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 22 below 20 describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Transcript, name Segetstaiing ositn Segent ending position HSU40434_PEA 1 TI3 1124 1214 WO 2005/116850 PCT/IB2005/002555 534 Segment cluster HSU40434_PEAI node_28 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_ ITI3. Table 23 below 5 describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts frlnscpname Se m enttarti Oqston; piee o HSU40434_PEA 1 T13 1215 1314 Segment cluster HSU40434_PEA_1_node_3 according to the present invention is 10 supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts pTiansciript naml-e :a w poito 'Segmeri ntln oii HSU40434_PEA 1 T13 369 410 15 Segment cluster HSU40434_PEA_1_node_35 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 25 below describes the starting and ending position of this segment on each transcript. 20 Table 25 - Segment location on transcripts Transript name -:.. S egmnent startingpo sition . Segment ending position' HSU40434 PEA 1 T13 1650 1679 WO 2005/116850 PCT/IB2005/002555 535 Segment cluster HSU40434 PEA_ I _node_36 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 26 below describes the starting and ending position of this segment on each transcript. 5 Table 26 - Segment location on transcripts Iranscript nme ent sing position Sm t ndig position HSU40434_PEAITI3 1680 1753 Segment cluster HSU40434_PEAInode_37 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This 10 segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts Transcript name begment strting position Segment ending positi HSU40434_PEA 1 T13 1754 1792 15 Segment cluster HSU40434_PEA_1 node_38 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA 1_T13. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts Transcipt nane - Segmenit star ting osition Segiit endmgpositli HSU40434 PEA_1_T13 1793 1866 20 Segment cluster HSU40434_PEA_1_node_39 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This WO 2005/116850 PCT/IB2005/002555 536 segment can be found in the following transcript(s): HSU40434_PEA 1 T13. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Tran t name Segment starting position:- Segment eningposItIon HSU40434_PEA_1 TI13 1867 1909 5 Segment cluster HSU40434_PEA_1 node_40 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1 TI3. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Trascript name Segment-4 startingpoition Segm.ent enin ro emne lOsition' HSU40434_PEA 1 TI13 1910 1930 10 Segment cluster HSU40434_PEA 1_node_41 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 31 below describes the starting and ending position of this segment on each transcript. 15 Table 31 - Segment location on transcripts Trans1 cr-1)ip namei Scneet starting pos),itionf Segmenlt sending, position HSU40434 PEA 1 T13 1931 1948 Segment cluster HSU40434_PEA_1_node_42 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 32 below describes the 20 starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts T script name <Segignt starting positon (Segment eriding position T-alsc.
IT
WO 2005/116850 PCT/IB2005/002555 537 HSU40434 PEA_ IT13 1949 1972 Segment cluster HSU40434 PEA1__node_43 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 33 below describes the 5 starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts Tranfscript ene Semtgm2 sting position begment ending position HSU40434_PEAITI3 1973 1990 Segment cluster HSU40434_PEA1__node_44 according to the present invention can be 10 found in the following transcript(s): HSU40434_PEA_1_T13. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Transcript name Segmntl startling Position segment sending' positions HSU40434 PEA 1 TI3 1991 1994 15 Segment cluster HSU40434_PEA_1_node_47 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Trainript name SegIent suirting posiin Segent ding position HSU40434_PEA_1_TI3 1995 2032 20 WO 2005/116850 PCT/IB2005/002555 538 Segment cluster HSU40434_PEA_ Inode_48 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 36 below describes the starting and ending position of this segment on each transcript. 5 Table 36 - Segment location on transcripts 'Tanscript nameSm s position Smtding position HSU40434_PEA_1_TI3 2033 2089 Segment cluster HSU40434_PEA_1_node_51 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 37 below describes the 10 starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts Transcript n1ame Segnenit starting position Segnentending posnii HSU40434_PEA 1_T13 2090 2113 Segment cluster HSU40434_PEA_1 node_52 according to the present invention is 15 supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts Transcriptnme lSegment starting position Sement ning position HSU40434_PEA_1_T13 2114 2140 20 Segment cluster HSU40434_PEA_1_node_53 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This WO 2005/116850 PCT/IB2005/002555 539 segment can be found in the following transcript(s): HSU40434_PEAITI3. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts t Sygpcnpt name n I Transcp'p-,,parne Segm-nt'starting~rip , position ' HSU40434_PEAITI13 2141 2197 5 Segment cluster HSU40434_PEA 1 node_54 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 40 below describes the starting and ending position of this segment on each transcript. 10 Table 40 - Segment location on transcripts I'icrp -1fn ~ Siot "lrp6wpition S~mn~iITIg Csi HSU40434_PEA 1_T13 2198 2276 Segment cluster HSU40434_PEAInode_56 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This 15 segment can be found in the following transcript(s): HSU40434_PEA_1 T13. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts Transcri-pt name Segment stating position Segment ending position HSU40434 PEA 1 T13 2277 2306 20 Segment cluster HSU40434_PEA 1_node_7 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 42 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 540 Table 42 - Segment location on transcripts ,TriansTilpt niame Segment starting position Segment eHi, position HSU40434 PEA 1 TI13 411 464 Segment cluster HSU40434 PEA_1_node_8 according to the present invention is 5 supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_TI3. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts [Transcripamine ~ Segment srtin position " Segmnt ending position HSU40434_PEA 1 TI13 465 504 10 Variant protein alignment to the previously known protein: Sequence name: /tmp/tZTolplA9i/eTMhjqGV2R:Q14859 15 Sequence documentation: Alignment of: HSU40434 PEA 1 P12 x Q14859 Alignment segment 1/1: 20 Quality: 4448.00 Escore: 0 Matching length: 458 Total length: 458 25 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 541 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 5 Alignment: 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50 I I I I II I I I I l I I l l I I I I I I l i l l l l l l I I I I I l | 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50 10 51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 I l l I I I I I I l | l l I I I l l I I I I l l i l I I I I 51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 15 101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150 I I I I I I I I I I I I I 1 l l I IIl II l ll I I I lI I I I I 101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150 151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200 2 0 I lI IIIII I I I I I 151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200 201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250 I I I I I I I l l l l l i l l l l l l I l l l l I l l l l l l l l l l l l l l l l l l I 25 201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250 251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300 I I I I I l l Il l l I I l l l l III I I l I I I I l l l I I I I I Il l l l 251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300 30 301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350 WO 2005/116850 PCT/IB2005/002555 542 I li l l l l l l l l l l l l I I I I I l l l l l l l l l l l l lll l llI I I 301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350 351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400 5 Il lI I l l l l l I l l I I IIil l l l I1 l l l l l l lI I l l l I I Il l 351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400 401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450 I l l l l iii l l I I Il l l l I I I II I l l 111 l l l l l l I I I I I l l I 10 401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450 451 SVPPSSIW 458 I1IIIII 451 SVPPSSIW 458 15 20 Sequence name: /tmp/tZTolplA9i/eTMhjqGV2R:Q9BTR2 Sequence documentation: 25 Alignment of: HSU40434 PEA 1 P12 x Q9BTR2 Alignment segment 1/1: Quality: 4338.00 30 Escore: 0 WO 2005/116850 PCT/IB2005/002555 543 Matching length: 457 Total length: 458 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 99.78 Total Percent Identity: 99.78 Gaps: 1 Alignment: 10 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50 I l l1 I I l l l I I I I I l l l llll I l lI I I I I I I l l l l l I I I I I I l 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ.AAPLDG 49 15 51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 I l l l l i i I I I I I I I I I I I I ll I I I I I I I I I I I Il ll I l I I ll l i i i 50 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 99 101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150 2 0 II lll l l l l I l l l l l l l l l I I I l l l l II I l l l l l l l l l l I Il l 100 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 149 151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200 I I I I I I Ill l l l l l I I I I I I I I I I I I I I I I I I I I l l l l I I l l l l i 25 150 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 199 201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250 lllIlll~lllIIIIIlllllllIlllllllllllllIll 200 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 249 30 251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300 WO 2005/116850 PCT/IB2005/002555 544 l l1 1 l1 1 l llll l l l l l l l l l l l l l I I I I I l l Il lI I I I I l l l ll I I I 250 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 299 301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350 5 li l 111111111i111111111I11111111111111111111 300 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 349 351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400 I I I I I I I l l I I I l I I I I I I I I I II I I l I I I I I I I I l II I I l lI I I II 10 350 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 399 401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450 I l I I I I I I I I I l I I I I l l l I l l I I I l I I I I I I I I I I I I I I I I 400 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 449 15 451 SVPPSSIW 458 IIIIIIII 450 SVPPSSIW 457 20 25 Sequence name: /tmp/tZTolplA9i/eTMhjqGV2R:MSLNHUMAN Sequence documentation: Alignment of: HSU40434 PEA 1 P12 x MSLN HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 545 Quality: 4074.00 Escore: 0 Matching length: 440 Total 5 length: 448 Matching Percent Similarity: 98.86 Matching Percent Identity: 97.95 Total Percent Similarity: 97.10 Total Percent Identity: 96.21 10 Gaps: 1 Alignment: 19 GSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISSLSPRQLLG 68 15 Ill llll llll:l:IlllIII l 1:111 Ill I IIll I 17 GSLLFLLFSLGWVHPARTLAGETGTESAPLGGVLTTPHNISSLSPRQLLG 66 69 FPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDAL 118 Ili l I I I I I l l l ll i i iIl l l l l l l l l l lll l l l l l lI I Il l 20 67 FPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDAL 116 119 PLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALA 168 IlllIIllIIllIIIllIIlllllllllllllIIIIIIIllllIIl 117 PLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALA 166 25 169 CWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQD 218 IlllIIIlllllllIlllillIIIIllIIllllllIllIIIlll 167 CWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQD 216 30 219 QQEAARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIV 268 I l l I I I I I Ii I l l l l I I I l l l l l l l l I I I I Il l l l l l l l l l l l l I Il WO 2005/116850 PCT/IB2005/002555 546 217 QQEAARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIV 266 269 AAWRQRSSRDPSWRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFY 318 I I 1 1 1 1 l l l l lll l llI I l l l I I l l lll l I l l l l l l II l l l lI 5 267 AAWRQRSSRDPSWRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFY 316 319 KKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKHKLDELYPQGYPESV 368 I l l l l II I Il l lil li i 1 1 1 1 1 ll l l l l l l I I I I l l I I I l l I I I 317 KKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKHKLDELYPQGYPESV 366 10 369 IQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMS........ PQ 410 i l i II I II II II l l l 1 l l l IIIl l lll l l li : I I I I I I II 367 IQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVDKGHEMSPQAPRRPLPQ 416 15 411 VATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW 458 l lI l l I I I I l l I l l l l l l I I I I I l l l l l l l l l l l l l l l l l i 417 VATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW 464 20 DESCRIPTION FOR CLUSTER M77904 Cluster M77904 features 4 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end 25 of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Tanscript Name [SEQ ID No M77904 T11 227 M77904 T3 228 M77904 T8 229 WO 2005/116850 PCT/IB2005/002555 547 M77904_T9 230 Table 2 - Segments of interest ameSQ ID NO: M77904_node 0 231 M77904 node 11 232 M77904_node 12 233 M77904 node 14 234 M77904 node 15 235 M77904 node 17 236 M77904 node 2 237 M77904 node 21 238 M77904 node 23 239 M77904 node 24 240 M77904 node 27 241 M77904 node 28 242 M77904 node 4 243 M77904 node 6 244 M77904 node 7 245 M77904 node 8 246 M77904 node 9 247 M77904 node 19 248 M77904 node 22 249 M77904 node 25 250 M77904 node 26 251 Table 3 -Proteins of interest ProteinNamNe [SEQ D NO: M77904_P2 252 WO 2005/116850 PCT/IB2005/002555 548 M77904 P4 253 M77904 P5 254 M77904 P7 255 Cluster M77904 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given 5 according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 24 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). 10 Overall, the following results were obtained as shown with regard to the histograms in Figure 24 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues. Table 4 - Normal tissue distribution Naunec of TissueNumbe bladder 0 brain 0 colon 94 epithelial 35 general 15 kidney 0 liver 0 lung 33 breast 140 bone marrow 0 ovary 0 pancreas 26 WO 2005/116850 PCT/IB2005/002555 549 prostate 94 stomach 36 Thyroid 0 uterus 22 Table 5 - P values and ratios for expression in cancerous tissue Nai of Tissue< Pl P2 SP1I R3 P2 R4 bladder 5.4e-01 3.4e-01 5.6e-01 1.8 3.2e-01 2.4 brain 8.8e-02 1.3e-01 4.8e-02 8.1 l.le-01 5.1 colon 3.8e-01 3.8e-01 8.7e-01 0.8 8.2e-01 0.8 epithelial 3.1e-02 1.5e-02 4.9e-01 1.1 3.9e-02 1.4 general 2.0e-04 3.4e-05 4.1e-03 2.0 6.2e-07 2.5 kidney 6.5e-01 3.5e-01 1 1.1 1.4e-02 4.0 liver 1 3.0e-01 1 1.0 2.3e-01 2.0 lung 5.9e-01 4.8e-01 8.8e-01 0.7 3.4e-01 1.2 breast 8.7e-01 8.8e-01 1 0.2 9.4e-01 0.3 bone marrow 1 4.2e-01 1 1.0 5.3e-01 2.1 ovary 1.3e-01 9.4e-02 3.2e-01 2.4 3.4e-01 2.2 pancreas 5.1e-01 5.2e-01 2.1e-01 1.8 7.6e-02 1.8 prostate 8.6e-01 8.0e-01 9.2e-01 0.5 8.4e-01 0.6 stomach 2.7e-01 1.9e-01 5.0e-01 1.5 2.7e-01 1.8 Thyroid 6.4e-01 6.4e-01 6.7e-01 1.5 6.7e-01 1.5 uterus 1.2e-01 3.4e-01 5.9e-01 1.4 8.2e-01 0.9 As noted above, cluster M77904 features 4 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided. 5 Variant protein M77904_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904_T3. One or more alignments to one or more previously published protein sequences are given at the end WO 2005/116850 PCT/IB2005/002555 550 of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904P2 and Q8WU91 (SEQ ID NO:987): 1.An isolated chimeric polypeptide encoding for M77904_P2, comprising a first amino 5 acid sequence being at least 90 % homologous to MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG 10 NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES corresponding to amino acids 67 - 341 of Q8WU91, which also corresponds to amino acids 1 - 275 of M77904_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN 20 CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE corresponding to amino acids 276 - 770 of M77904_P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 25 2.An isolated polypeptide encoding for a tail of M77904_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL 30 VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP
SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR
WO 2005/116850 PCT/IB2005/002555 551 DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEElIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT 5 DIPLLNTQEPMEPAE in M77904_P2. Comparison report between M77904_P2 and Q96QU7 (SEQ ID NO:988): 1.An isolated chimeric polypeptide encoding for M77904 P2, comprising a first amino acid sequence being at least 90 % homologous to MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD 10 VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTISCTDHRYCQR 15 KSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQKLQQHTHEKPCNTSFSYLV ASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAPSFQQEASRQGLTVSFIPYFKEEGV FTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAF MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG 20 RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding to amino acids 67 - 836 of Q96QU7, which also corresponds to amino acids 1 - 770 of M77904 P2. 25 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both 30 signal-peptide prediction programs predict that this protein is a non-secreted protein..
WO 2005/116850 PCT/IB2005/002555 552 Variant protein M77904_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P2 sequence provides 5 support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations , OlNP poiton(s) on amm*noici'dI t _CP louisy known SN P" 263 Q ->R No 459 Q ->R Yes 643 G ->D Yes Variant protein M77904_P2 is encoded by the following transcript(s): M77904_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript 10 M77904_T3 is shown in bold; this coding portion starts at position 238 and ends at position 2547. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P2 sequence provides support for the deduced sequence of this variant protein 15 according to the present invention). Table 7 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic acid Previolusly known SNP sequence 561 C ->T No 585 T-> C No 3276 T -> G Yes 3465 C -> T Yes 3760 A-> T Yes 3830 G -> A Yes WO 2005/116850 PCT/IB2005/002555 553 3900 A-> G Yes 3960 C -> A Yes 4114 G->A Yes 4613 C -> T Yes 5050 G->A No 5309 A-> C Yes 957 G -> A Yes 5329 A-> G Yes 5420 T->C Yes 5490 T -> C Yes 5507 C -> A Yes 5511 G->A Yes 5578 T-> G Yes 5662 A-> C No 1025 A-> G No 1613 A->G Yes 1623 C -> T Yes 2085 T->C No 2165 G->A Yes 3043 T->C No 3122 G->A Yes Variant protein M77904_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904_T8. One 5 or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904P4 and Q8WU91: WO 2005/116850 PCT/IB2005/002555 554 1.An isolated chimeric polypeptide encoding for M77904_P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR 5 TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES corresponding to amino acids 1 341 of Q8WU91, which also corresponds to amino acids 1 - 341 of M77904_P4, and a second 10 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT 15 PVIPALWEAKAGGSLEVRSSRPAWPTW corresponding to amino acids 342 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 20 at least about 90% and most preferably at least about 95% homologous to the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVIPALWEAKAGGSLEVRSSRPAWPTW in M77904 P4. Comparison report between M77904P4 and Q9H5V8 (SEQ ID NO:989): 25 1.An isolated chimeric polypeptide encoding for M77904_P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR 30 IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP
EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN
WO 2005/116850 PCT/IB2005/002555 555 MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS corresponding to amino acids 1 - 416 of Q9H5V8, which also corresponds to amino acids 1 416 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 5 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW corresponding to amino acids 417 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 2.An isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904 P4. 15 Comparison report between M77904P4 and Q96QU7: 1 .An isolated chimeric polypeptide encoding for M77904_P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR 20 TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 25 corresponding to amino acids 1 - 416 of Q96QU7, which also corresponds to amino acids 1 416 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS 30 LEVRSSRPAWPTW corresponding to amino acids 417 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order.
WO 2005/116850 PCT/IB2005/002555 556 2.An isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS 5 LEVRSSRPAWPTW in M77904 P4. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 10 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein M77904_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid 15 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations SNP p)os1iti(S) on1 amin11t aIcid Altenuive(I innino) acds rvously" known SNP?,I sequence , 329 Q -> R No 20 Variant protein M77904_P4 is encoded by the following transcript(s): M77904_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904_T8 is shown in bold; this coding portion starts at position 137 and ends at position 1597. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column 25 indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 557 Table 9 - Nucleic acid SNPs SNP position on iiicleotd Altenative nucleic acid Previously known SNP?. sequence 54 G-> No 59 G-> No 131 G ->C Yes 658 C ->T No 682 T ->C No 1054 G->A Yes 1122 A -> G No Variant protein M77904_P5 according to the present invention has an amino acid 5 sequence as given at the end of the application; it is encoded by transcript(s) M77904_T9. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904_P5 and Q96QU7: 10 1 .An isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 % homologous to MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP 15 TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding to amino acids 606 - 836 of Q96QU7, which also corresponds to amino acids 1 - 231 of M77904 P5. Comparison report between M77904P5 and Q9H8C2 (SEQ ID NO:990): 1 .An isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino 20 acid sequence being at least 90 % homologous to
MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL
WO 2005/116850 PCT/IB2005/002555 558 TVILIAAVGGGVLLLSALGLllCCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE corresponding to amino acids 419 - 649 of Q9H8C2, which also corresponds to amino acids 1 - 231 of 5 M77904 P5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 10 membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein M77904_P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid 15 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNPis known or not; the presence of known SNPs in variant protein M77904_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations 104 G -> D Yes 20 Variant protein M77904_P5 is encoded by the following transcript(s): M77904_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904_T9 is shown in bold; this coding portion starts at position 1226 and ends at position 1918. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column 25 indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 559 Table 11 - Nucleic acid SNPs .NP position on nucotide Altemative nucleic acid PreviofuslyknIwnSNP? sequence 248 A-> C Yes 318 G ->C Yes 3131 A ->T Yes 3201 G ->A Yes 3271 A ->G Yes 3331 C ->A Yes 3485 G ->A Yes 3984 C -> T Yes 4421 G->A No 4680 A -> C Yes 4700 A -> G Yes. 4791 T -> C Yes 984 A -> G Yes 4861 T-> C Yes 4878 C -> A Yes 4882 G -> A Yes 4949 T->G Yes 5033 A -> C No 994 C -> T Yes 1456 T->C No 1536 G->A Yes 2414 T->C No 2493 G -> A Yes 2647 T->G Yes 2836 C -> T Yes WO 2005/116850 PCT/IB2005/002555 560 Variant protein M77904_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904_TI 1. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the 5 present invention to each such aligned protein is as follows: Comparison report between M77904P7 and Q8WU91: 1 .An isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS 10 KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q8WU91, which also corresponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 15 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide 20 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. Comparison report between M77904P7 and Q9H5V8: 1.An isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino 25 acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of 30 Q9H5V8, which also corresponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more WO 2005/116850 PCT/IB2005/002555 561 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 5 2.An isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. Comparison report between M77904P7 and Q96QU7: 10 1.An isolated chimeric polypeptide encoding for M77904P7, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR 15 IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q96QU7, which also corresponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of 20 M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 25 EKAPPCYLIRLKHTRSSLF in M77904 P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 30 secreted. The protein localization is believed to be secreted because both signal-peptide WO 2005/116850 PCT/IB2005/002555 562 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein M77904_P7 is encoded by the following transcript(s): M77904_TI 1, for 5 which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904_TI1 is shown in bold; this coding portion starts at position 137 and ends at position 850. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein 10 M77904_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNII poiti1n onnucleotide Alternative nucc acid Previusly knownSN P? sequencesw' 444'"" 54 G-> No 59 G-> No 2361 A-> G No 131 G->C Yes 658 C ->T No 682 T->C No 943 C -> T Yes 1667 G -> A No 1700 G -> A No 1807 T-> C Yes 2293 G -> A Yes As noted above, cluster M77904 features 21 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) 15 are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
WO 2005/116850 PCT/IB2005/002555 563 Segment cluster M77904 node_0 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_TI1 and M77904_T8. Table 13 below describes 5 the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Mscrit na Segment startingposition S nit ndig posto M77904_TI1 1 218 M77904_T8 1 218 Segment cluster M77904_node_ 11 according to the present invention is supported by 37 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T8. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Transcript naie Segment stIng position Segment ending position M77904_T3 1064 1285 M77904_T8 1161 1382 15 Segment cluster M77904_node_12 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T8. Table 15 below describes the starting and ending position of this segment on each transcript. 20 Table 15 - Segment location on transcripts Transipt name S3egment tarting position jSegment ening psitioll1 M77904_T8 1383 1785 WO 2005/116850 PCT/IB2005/002555 564 Segment cluster M77904_node_14 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T9. Table 16 below describes the starting and 5 ending position of this segment on each transcript. Table 16- Segment location on transcripts Transcript nanc Segment starting position Sgment ending position M77904 T9 1 656 Segment cluster M77904_node_15 according to the present invention is supported by 44 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts Transcrip name SeItstating position Segment ending position M77904 T3 1286 1666 M77904 T9 657 1037 15 Segment cluster M77904_node_17 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 18 below describes the starting and ending position of this segment on each transcript. 20 Table 18 - Segment location on transcripts Transcipt name Segment starting position Segment ending position M77904T3 1667 2032 M77904 T9 1038 1403 WO 2005/116850 PCT/IB2005/002555 565 Segment cluster M77904 node 2 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3. Table 19 below describes the starting and 5 ending position of this segment on each transcript. Table 19 - Segment location on transcripts Trans Cript niame Segmenit starting position Segment ndisitio rl M77904T3 1 121 Segment cluster M77904_node_21 according to the present invention is supported by 54 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts Tnmusenpt namei Segmenclt starting posiJtion Segment ending position M77904 T3 2121 4095 M77904 T9 1492 3466 15 Segment cluster M77904_node_23 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 21 below describes the starting and ending position of this segment on each transcript. 20 Table 21 - Segment location on transcripts Transcript name Segmentc st'arting1 position Segment ending posItIori ."" M77904 T3 4106 4375 M77904 T9 3477 3746 WO 2005/116850 PCT/IB2005/002555 56.6 Segment cluster M77904_node_24 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 22 below describes the 5 starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Transcript name Segment starting positlon Segment enAing positli M77904 T3 4376 4785 M77904 T9 3747 4156 Segment cluster M77904_node_27 according to the present invention is supported by 81 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Transcript name Segment starting position Segmten Clding position M77904 T3 4994 5482 M77904 T9 4365 4853 15 Segment cluster M77904_node_28 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 24 below describes the starting and ending position of this segment on each transcript. 20 Table 24 - Segment location on transcripts Tn napt name , Segment ~startigpositon Segmeiennding position M77904 T3 5483 5914 M77904 T9 4854 5285 WO 2005/116850 PCT/IB2005/002555 567 Segment cluster M77904_node 4 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): M77904_TI 1, M77904_T3 and M77904_T8. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Trsetngntstting position Sgmen1 din6 position M77904 T11 219 428 M77904 T3 122 331 M77904 T8 219 428 10 Segment cluster M77904_node_6 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment canbe found in the following transcript(s): M77904_TI 1, M77904_T3 and M77904_T8. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts TIranscrip nmle Sement st itio S~gent en dn psii on M77904 T11 429 791 M77904 T3 332 694 M77904 T8 429 791 15 Segment cluster M77904_node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T1 1. Table 27 below describes the starting and 20 ending position of this segment on each transcript. Table 27 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 568 Transcript name ,Sgent starting position . Segment endmg posItion M77904 TI I 792 2030 Segment cluster M77904_node_8 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): M77904_T I1, M77904_T3 and M77904_T8. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts Trnscript nme1 segment starting g position (eg]S imet ending position M77904 T 11 2031 2399 M77904 T3 695 1063 M77904 T8 792 1160 10 Segment cluster M77904_node_9 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T1 1. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts TInS CrIipt name SCLegmet starting position Seg MenC enIiding, position M77904T 11 2400 2658 15 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are 20 included in a separate description.
WO 2005/116850 PCT/IB2005/002555 569 Segment cluster M77904_node 19 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 30 below describes the starting and ending position of this segment on each transcript. 5 Table 30 - Segment location on transcripts Transenpt nafle Segment starting position Segmeneding position M77904 T3 2033 2120 M77904_T9 1404 1491 Segment cluster M77904_node_22 according to the present invention can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 31 below describes the starting and 10 ending position of this segment on each transcript. Table 31 - Segment location on transcripts Tascript nm e> Segment starIng position Segmn&t Cielxng positIn M77904 T3 4096 4105 M77904_T9 3467 3476 Segment cluster M77904_node_25 according to the present invention is supported by 40 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts Transcript name Segfiient starting position Segmen ending position M77904_T3 4786 4896 M77904_T9 4157 4267 20 WO 2005/116850 PCT/IB2005/002555 570 Segment cluster M77904_node_26 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904_T9. Table 33 below describes the starting and ending position of this segment on each transcript. 5 Table 33 - Segment location on transcripts 'Transcript name Smen' 1 t starting position Segnt ending position M77904 T3 4897 4993 M77904 T9 4268 4364 Microarray (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotide was 10 found to hit this segment (with regard to ovarian cancer), shown in Table 33. Table 33 - Oligonucleotide related to this gene O li(ceoide name Over SSed IR r s in cancers Chip reference M7790408 0 Ovarian cancer Ovary Variant protein alignment to the previously known protein: Sequence name: /tmp/c2Fe8npYgJ/QPDZHH46X1l:Q8WU91 15 Sequence documentation: Alignment of: M77904 P2 x Q8WU91 20 Alignment segment 1/1: Quality: 2730.00 Escore: 0 Matching length: 275 Total 25 length: 275 WO 2005/116850 PCT/IB2005/002555 571 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 50 1 0 IlIII I I I I I ii iI I I i l l i I II 67 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 116 51 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 100 I I I I I I l l l l l l Il l l III II l ll 1 l ll I I I IIII II 15 117 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 166 101 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 150 I l l1 l l l Il l1l l I I l III l l I i I I l l l I Il l1 i I l l I 167 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 216 20 151 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 200 I l l i I l l l l I l l l i i I l l l l i l l l i l l l l l i l i 217 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 266 25 201 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 250 I lII I l l1 I I1 l i l l1 I I I I I I l l i l l l I I I II I I I l l l i l I 267 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 316 251 DQDAQSPGILRLQFQVLVQHPQNES 275 30 317 DQDAQSPGILRLQFQVLVQHPQNES 341 WO 2005/116850 PCT/IB2005/002555 572 5 Sequence name: /tmp/c2Fe8npYgJ/QPDZHH46X1:Q96QU7 Sequence documentation: 10 Alignment of: M77904 P2 x Q96QU7 Alignment segment 1/1: 15 Quality: 7633.00 Escore: 0 Matching length: 770 Total length: 770 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 50 I l I l l I l l lI I I lI I I l I I l l I I I I l I l l I I I I1 1 l1 1 I 67 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 116 30 51 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 100 WO 2005/116850 PCT/IB2005/002555 573 I l l l l l l l I I I l l i l l l IIII l l i i 1 1 I I Il I IIl l l I I 117 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 166 101 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 150 5 I I I I I II l l I I I II l l l l lII l I 167 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 216 151 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 200 I lI l I I I l l l l i I I I Il i l l l l l I I I l I I I 10 217 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 266 201 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 250 I I I I 1l 1l l I 1I II l l I I I I I II l l i I I I I I II I I I l l I I 267 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 316 15 251 DQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ 300 I I I I l l l I I l l l l I I I I I l I I I I I I I I l 317 DQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ 366 20 301 SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 350 I I I II I I l l l l lIII Ii l l l I l l ll i I II 367 SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 416 351 CTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQK 400 2 5 I l l l l l I l I I II l l l l I II I I II l 417 CTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQK 466 401 LQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVT 450 Il l6l l I l 11 1 1 Ill I lI lIII 1 1 I I I Il l l l l 1i 30 467 LQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVT 516 WO 2005/116850 PCT/IB2005/002555 574 451 LRTFAPSFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDR 500 I l l 1 li 11ll il l11 l i l l l l l Ill l I I II1 l l l iI 517 LRTFAPSFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDR 566 5 501 GLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAE 550 I I I I I I l l l l l I I I I l i I I I l l l 1 1l l Il I l lI I I 567 GLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAE 616 551 EIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL 600 1 0i IIII i II Il l I iI I 617 EIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL 666 601 TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMP 650 I I I I I I I II I I 1l l I I I I I l I I I I I I I I I I l I I I I I I I 15 667 TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMP 716 651 RQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQ 700 11111 I 1 I 111111 I i 1111 I 717 RQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQ 766 20 701 GTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDV 750 I I l l I II I I I l l l I I I I I I IiII IIll l l l l I l l l l l l I l 767 GTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDV 816 25 751 SSKDTDIPLLNTQEPMEPAE 770 I I I l l l i1 1 llI I Iil II 817 SSKDTDIPLLNTQEPMEPAE 836 30 WO 2005/116850 PCT/IB2005/002555 575 Sequence name: /tmp/4AUsKD5TnV/TBRg9DoebW:Q8WU91 5 Sequence documentation: Alignment of: M77904 P4 x Q8WU91 Alignment segment 1/1: 10 Quality: 3341.00 Escore: 0 Matching length: 341 Total length: 341 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 20 Alignment: 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 l Il l l I I I I l lI l lI I I ll l l l l l l l l i i i I I I l l l l l l l l l l l l l 25 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I l l l l l l l l I I I I I I I I l l l l l l l l l Il l l l l l l li l l l l l l l il l l 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 30 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 WO 2005/116850 PCT/IB2005/002555 576 II l l l l l lll l l ll I I l l l l lI I I I I l l I I! I I l l I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 5 l l li lIII I l l l l il1 1I11 1 I I l i l iI l l i i i I I l l l li 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 I1 I I I l lI i I l l l l i Iil l i I I I I l l l III i i lli i1 l i 10 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 l iI I l 1 l llll1 I I I l l l I I I I i l l l I I I i l i i i II ll l l l l l l l i 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 15 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES 341 IlllllilillIlllliIIIIIIIIllllllilli 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES 341 20 25 Sequence name: /tmp/4AUsKD5TnV/TBRg9DoebW:Q9H5V8 Sequence documentation: Alignment of: M77904 P4 x Q9H5V8 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 577 Quality: 4081.00 Escore: 0 Matching length: 416 Total 5 length: 416 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 10 Gaps: 0 Alignment: 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 1 5 I I I I I I llllI l ll l l l IIll l l l l l l l l l l l llIl l l l l l l l 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I11llllllllllIIIIIll1IIIIIIllllllillllllillllII 20 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I II11 111 1111 111 11111111 t 111Ii 11111 II I II11 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 25 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 Il lI I I l l l l I I l l l I I l il l l i i I I I I I I I I I l l l ll I I l l l l 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 30 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 l I I I l lll l l i l l l l l I I I I l l l l l l l l l l l l l l l l l l l i WO 2005/116850 PCT/IB2005/002555 578 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 I I I I l l l l l i l ll l l l l I l l l l I l l l 11 1 1 1 l l I l l II I l l l 5 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 lI I l l I l l1 l l I I Il l l l1 li l l li l i l il i il l l l l l I I I l i l l lI I 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 10 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 l lII I I I I I l l I I I I I I1 II I I l l 1 I l l l l l l l lll i il lI I 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 15 401 CDDLTRLWMNVEKTIS 416 illIIIIIIIIilIl 401 CDDLTRLWMNVEKTIS 416 20 Sequence name: /tmp/4AUsKD5TnV/TBRg9DoebW:Q96QU7 25 Sequence documentation: Alignment of: M77904 P4 x Q96QU7 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 579 Quality: 4081.00 Escore: 0 Matching length: 416 Total length: 416 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I lI I II I I I lI I I I I I I I l l l l l l lI l I I I I l II I l I II I 15 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 | |I I I I l l II I I I I I II II Il lI lII I I I lI I I lI II I I I IlI I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 20 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I l I I I I I I I I I I I I I I I I I I I I I I I I l I I I I I I I I I I I I l I II I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 25 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 IllllIIIIIIIIIIIIIIIIIIIIIlllIIIIllIlllII 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 30 I lI I I lIIl l l l l ll IIIII IIII I III IIIIIII liII 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 WO 2005/116850 PCT/IB2005/002555 580 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 l li l l l I I I I I l I I l l l i I I I 1 1 l l I I l l l l l li l l l lI I I I I 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 5 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 ||||lllllIIIIIlllllllllllllllllllllllIlIlllllllll 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 10 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 I I I I I I I I I l l l l l I I I l ll l l li i l l l l l l l l l l l l l l l 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 401 CDDLTRLWMNVEKTIS 416 15 I lll1111111lll ll 401 CDDLTRLWMNVEKTIS 416 20 Sequence name: /tmp/IChL9nLIus/pmgyBTHuqO:Q96QU7 25 Sequence documentation: Alignment of: M77904 P5 x Q96QU7 Alignment segment 1/1: 30 WO 2005/116850 PCT/IB2005/002555 581 Quality: 2285.00 Escore: 0 Matching length: 231 Total length: 231 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 50 IllllllllllilllillllfllllllllllllllillllillllllII 15 606 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 655 51 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 100 IllllllllllllIIIIIIIIIIIIllIIIIIIIlllIIIlllllllIIII 656 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 705 20 101 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 150 l l l l l ll l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l i l l I 706 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 755 25 151 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 200 IllIIIIIIllllllllillllllllllllllllIIIIIIIIIlllll 756 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 805 201 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 231 3 0l ll l ll l l l l l l l l l l l 806 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 836 WO 2005/116850 PCT/IB2005/002555 582 5 Sequence name: /tmp/IChL9nLIus/pmgyBTHuqO:Q9H8C2 Sequence documentation: 10 Alignment of: M77904 P5 x Q9H8C2 Alignment segment 1/1: 15 Quality: 2285.00 Escore: 0 Matching length: 231 Total length: 231 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 50 II I I I l l ill l i i iI l l i l l l I IIl li l lll lll l l llI I l l l l l 419 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 468 30 51 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 100 WO 2005/116850 PCT/IB2005/002555 583 I I l l1 i l l l lll1 l l l l l lI I l l l I l l l l l I I I I I I l l I I ll i i 469 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 518 101 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 150 5 II II l l l i I I I I I lI l l i I I I lll llI I I1l llIl l 519 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 568 151 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 200 I I l l i l il l l i I l l l l I I I I I I Il l l l l l 1i i I1 I I I l l ii I Iil l 10 569 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 618 201 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 231 IIIIIIlIlliIIIIIliIIIIIIllIII 619 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 649 15 20 Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd574:Q8WU91 Sequence documentation: 25 Alignment of: M77904 P7 x Q8WU91 Alignment segment 1/1: Quality: 2124.00 30 Escore: 0 WO 2005/116850 PCT/IB2005/002555 584 Matching length: 219 Total length: 219 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I I I ll i i i I I I I I I I I I I l l l I 1 1il l l l l l l l l l l l ll l i 1 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 15 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I1 ll l l l l llI l l I I l l l Ill l l l l llI I I l l l l l II 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 2 0 I I I I I I I I II I l l l l l l l l l l l l 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I l l l l l l l l l l l l l I I I I I I I l l ll I l l l l l l l l l l l l l l l l I I I I 25 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKR 219 IIIIIIIlIIIIIIllllIl 201 FHPRNVSGFSIANRSSIKR 219 30 WO 2005/116850 PCT/IB2005/002555 585 5 Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd574:Q9H5V8 Sequence documentation: Alignment of: M77904 P7 x Q9H5V8 10 Alignment segment 1/1: Quality: 2124.00 Escore: 0 15 Matching length: 219 Total length: 219 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 20 Identity: 100.00 Gaps: 0 Alignment: 25 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I l l l l l l l l I I I l l lI II I I l I I I I I I I I I I I I I I I I l l I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 3 0 I I l l l I I I I I I I l lI l llI I I I l l l l l l l l lI I I I I I I l 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 WO 2005/116850 PCT/IB2005/002555 586 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 lIlIII 111111111 111111111 11111 111111111Il11111 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 5 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 IIIII111111 1111111 I111111111I 111111111 lli II I 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 10 201 FHPRNVSGFSIANRSSIKR 219 IIIIIIIIIIIIIIIIII 201 FHPRNVSGFSIANRSSIKR 219 15 Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd574:Q96QU7 20 Sequence documentation: Alignment of: M77904 P7 x Q96QU7 25 Alignment segment 1/1: Quality: 2124.00 Escore: 0 Matching length: 219 Total 30 length: 219 WO 2005/116850 PCT/IB2005/002555 587 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 5 Gaps: 0 Alignment: 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 10 I ll1 I I I I IIII I I I I I I I I I I I I I I1 1 ll1I Ill l l l l ll 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 l l l l l I l llii I I I Il l I I1 1 1 1 1 1I ll I I I I I I I I I I I I I I I 15 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I I I l l l l l l I I I ll l l l l I I 1 l l l l l lI I I I l l I I l l 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 20 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 IllIIIIIlllllllIllll|1IlIIIllllllllllIIllIIIIIll 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 25 201 FHPRNVSGFSIANRSSIKR 219 IIIIIIII111111111111111111 201 FHPRNVSGFSIANRSSIKR 219 30 DESCRIPTION FOR CLUSTER Z25299 WO 2005/116850 PCT/IB2005/002555 588 Cluster Z25299 features 5 transcript(s) and 11 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Z25299 PEA 2 Tl 256 Z25299 PEA 2 T2 257 Z25299 PEA 2 T3 258 Z25299 PEA 2 T6 259 Z25299 PEA 2 T9 260 5 Table 2 - Segments of interest Scgmnt NaItneg ~SQI)ID N() Z25299 PEA 2_node_20 261 Z25299_PEA 2_node_21 262 Z25299_PEA 2_node_23 263 Z25299_PEA 2_node_24 264 Z25299_PEA 2_node_8 265 Z25299_PEA_2_node_12 266 Z25299_PEA_2_node_13 267 Z25299_PEA_2_node_14 268 Z25299_PEA 2_node_17 269 Z25299_PEA_2_ node_18 270 Z25299_PEA_2 node 19 271 Table 3 - Proteins of interest Protein Name SEQ ID NO Z25299_PEA_2_P2 273 WO 2005/116850 PCT/IB2005/002555 589 Z25299 PEA 2 P3 274 Z25299_PEA 2_P7 275 Z25299 PEA 2 P10 276 These sequences are variants of the known protein Antileukoproteinase 1 precursor (SwissProt accession identifier ALKIHUMAN; known also according to the synonyms ALP; HUSI-1; Seminal proteinase inhibitor; Secretory leukocyte protease inhibitor; BLPI; Mucus 5 proteinase inhibitor; MPI; WAP four-disulfide core domain protein 4; Protease inhibitor WAP4), SEQ ID NO: 272, referred to herein as the previously known protein. Protein Antileukoproteinase 1 precursor is known or believed to have the following function(s): Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other 10 mucosal tissues. The sequence for protein Antileukoproteinase 1 precursor is given at the end of the application, as "Antileukoproteinase 1 precursor amino acid sequence". Protein Antileukoproteinase 1 precursor localization is believed to be Secreted. 15 It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Elastase inhibitor; Tryptase inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The 20 cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anti- inflammatory; Antiasthma. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteinase inhibitor; serine protease inhibitor, which are annotation(s) 25 related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremB1 Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
WO 2005/116850 PCT/IB2005/002555 590 Cluster Z25299 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of 5 the table and the numbers on the y-axis of Figure 25 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in 10 Figure 25 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma. Table 4 - Normal tissue distribution Name orti~ Ntirnbc bladder 82 bone 6 brain 0 colon 37 epithelial 145 general 73 head and neck 638 kidney 26 liver 68 lung 465 breast 52 ovary 0 pancreas 20 prostate 36 skin 215 WO 2005/116850 PCT/IB2005/002555 591 stomach 219 uterus 113 Table 5 - P values and ratios for expression in cancerous tissue 'N bf Ti §e- P ~S~~~~~ SP2 4 bladder 8.2e-01 8.5e-01 9.2e-01 0.6 9.7e-01 0.5 bone 5.5e-01 7.3e-01 4.0e-01 2.1 4.9e-01 1.5 brain 8.8e-02 1.5e-01 2.3e-03 7.7 1.2e-02 4.8 colon 3.3e-01 2.8e-01 4.2e-01 1.6 4.2e-01 1.5 epithelial 2.5e-01 7.6e-01 3.8e-01 1.0 1 0.6 general 6.4e-03 2.5e-01 1.7e-06 1.6 5.2e-01 0.9 head and neck 3.6e-01 5.9e-01 7.6e-01 0.6 1 0.3 kidney 7.4e-01 8.4e-01 2.1e-01 2.1 4.2e-01 1.4 liver 4.1e-01 9.1e-01 4.2e-02 3.2 6.4e-01 0.8 lung 7.6e-01 8.3e-01 9.8e-01 0.5 1 0.3 breast 5.0e-01 5.5e-01 9.8e-02 1.6 3.4e-01 1.1 ovary 3.7e-02 3.0e-02 6.9e-03 6.1 4.9e-03 5.6 pancreas 3.8e-01 3.6e-01 3.6e-01 1.7 3.9e-01 1.5 prostate 9.le-01 9.2e-01 8.9e-01 0.5 9.4e-01 0.5 skin 6.0e-01 8.1e-01 9.3e-01 0.4 1 0.1 stomach 3.0e-01 8.1e-01 9.1e-01 0.6 1 0.3 uterus 1.6e-01 1.3e-01 3.2e-02 1.6 3.0e-01 1.1 As noted above, cluster Z25299 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein 5 Antileukoproteinase 1 precursor. A description of each variant protein according to the present invention is now provided. Variant protein Z25299_PEA_2_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 10 Z25299_PEA 2_T1. An alignment is given to the known protein (Antileukoproteinase 1 WO 2005/116850 PCT/IB2005/002555 592 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 5 Comparison report between Z25299_PEA_2_P2 and ALKI HUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA_2_P2, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK 10 CCMGMCGKSCVSPVK corresponding to amino acids 1 - 131 of ALK I_HUMAN, which also corresponds to amino acids 1 - 131 of Z25299_PEA_2 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKQGMRAH corresponding to amino acids 132 - 139 of Z25299_PEA_2 P2, wherein said 15 first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299_PEA 2 P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GKQGMRAH in Z25299_PEA 2_P2. 20 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 25 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEA_2_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 30 the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2 P2 WO 2005/116850 PCT/IB2005/002555 593 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations SNp positio6n(s) on anInio acid Alternative aino acicts) . Previously kn own SNP? sequence 136 M -> T Yes 20 P -> No 43 C -> R No 48 K ->N No 83 R-> K No 84 R -> W No 5 Variant protein Z25299_PEA 2_P2 is encoded by the following transcript(s): Z25299_PEA_2 Ti, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEA_2 TI is shown in bold; this coding portion starts at position 124 and ends at position 540. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative 10 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs SNP position o nucIeotide AternatiVe Iuc'eic acid Peviousl knmown SNP? sequence 122 C> T No 123 C -> T No 530 T-> C Yes 989 C -> T Yes 1127 C -> T Yes 1162 A-> C Yes WO 2005/116850 PCT/IB2005/002555 594 1180 A ->C Yes 1183 A ->C Yes 1216 A->C Yes 1262 G ->A Yes 183 T-> No 250 T->C No 267 A->C No 267 A-> G No 339 C -> T Yes 371 G->A No 373 A->T No 435 C ->T No Variant protein Z25299_PEA_2_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 Z25299_PEA_2_T2. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between Z25299_PEA 2_P3 and ALKIHUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA 2_P3, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK 15 CCMGMCGKSCVSPVK corresponding to amino acids 1 - 131 of ALKI_ HUMAN, which also corresponds to amino acids 1 - 131 of Z25299_PEA_2 P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKRHHKQLRDQEVDPLEMRRHSAG corresponding to amino acids 132 - 156 of WO 2005/116850 PCT/IB2005/002555 595 Z25299_PEA 2 P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299_PEA_2_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG in Z25299_PEA_2_P3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 10 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEA_2_P3 also has the following non-silent SNPs (Single 15 Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299 PEA 2 P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). 20 Table 8 - Amino acid mutations SNP osi Alternativeamino acid(,) Previously known SNP? sequence 20 P-> No 43 C ->R No 48 K->N No 83 R-> K No 84 R-> W No Variant protein Z25299_PEA_2_P3 is encoded by the following transcript(s): Z25299_PEA_2_T2, for which the sequence(s) is/are given at the end of the application. The WO 2005/116850 PCT/IB2005/002555 596 coding portion of transcript Z25299 PEA 2 T2 is shown in bold; this coding portion starts at position 124 and ends at position 591. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 5 known SNPs in variant protein Z25299_PEA_2_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs SNP poiin onnuceoide Alternativenucleicacid Previously known SNP?' 122 C ->T No 123 C ->T No 183 T-> No 250 T-> C No 267 A-> C No 267 A-> G No 339 C ->T Yes 371 G ->A No 373 A ->T No 435 C ->T No 10 Variant protein Z25299_PEA_2_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299_PEA 2_T6. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the 15 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299 PEA 2 P7 and ALK1_HUMAN: WO 2005/116850 PCT/IB2005/002555 597 I.An isolated chimeric polypeptide encoding for Z25299_PEA_2 P7, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNP corresponding to amino acids 1 - 81 of ALKI_HUMAN, 5 which also corresponds to amino acids 1 - 81 of Z25299_PEA_2 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RGSLGSAQ corresponding to amino acids 82 - 89 of Z25299_PEA_2 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 2.An isolated polypeptide encoding for a tail of Z25299_PEA 2 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RGSLGSAQ in Z25299_PEA_2_P7. 15 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane 20 region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEA_2_P7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA 2_ P7 25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations SNP position(s) onimino acid Alternative amino acid(s) . Previously known SNP sequence 20 P No WO 2005/116850 PCT/IB2005/002555 598 43 C->R No 48 K->N No 82 R->S No Variant protein Z25299_PEA_2_P7 is encoded by the following transcript(s): Z25299_PEA 2_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEA 2 T6 is shown in bold; this coding portion starts at 5 position 124 and ends at position 390. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 11 - Nucleic acid SNPs SNP position on Iniucoide Alternative nicleica acd Previsly kn nSNP? sequence ~ 4 122 C -> T No 123 C -> T No 576 A-> C Yes 594 A -> C Yes 597 A-> C Yes 630 A-> C Yes 676 G-> A Yes 183 T-> No 250 T ->C No 267 A ->C No 267 A ->G No 339 C ->T Yes 369 A ->T No 431 C ->T No 541 C ->T Yes WO 2005/116850 PCT/IB2005/002555 599 Variant protein Z25299 PEA_2_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 Z25299_PEA_2 T9. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between Z25299_PEA_2_Pl0 and ALKIHUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA_2_Pl 0, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPT corresponding to amino acids 1 - 82 of ALKI_HUMAN, 15 which also corresponds to amino acids 1 - 82 of Z25299_PEA_2_P10. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEA_2_P10O also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the 25 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P10O sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 600 SNP position(s) on arnino acid Alternative amino acid(s) Previously known SNP? sequence 20 P-> No 43 C ->R No 48 K ->N No Variant protein Z25299_PEA_2 P10O is encoded by the following transcript(s): Z25299_PEA_2_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEA_2_T9 is shown in bold; this coding portion starts at 5 position 124 and ends at position 369. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA 2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 13 - Nucleic acid SNPs SNP position on nucleotide AlterniIve nu'cleicacid Previously known SNP? sequence 122 C ->T No 123 C ->T No 451 A ->C Yes 484 A ->C Yes 530 G ->A Yes 183 T-> No 250 T -> C No 267 A ->C No 267 A ->G No 339 C ->T Yes 395 C ->T Yes 430 A-> C Yes WO 2005/116850 PCT/IB2005/002555 601 448 A -> C Yes As noted above, cluster Z25299 features 11 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster Z25299_PEA 2_node_20 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA 2_Ti. Table 14 below describes the 10 starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts ~Taipt name Segment starting position Smmedngpsto Z25299_PEA 2 TI 518 1099 Segment cluster Z25299_PEA_2 node_21 according to the present invention is supported 15 by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299 PEA 2 TI, Z25299 PEA 2_T6 and Z25299_PEA_2_T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts TranscipTt name Segment staring' position Segment ening p)ositio Z25299 PEA 2 TI 1100 1292 Z25299_PEA 2 T6 514 706 Z25299_PEA 2 T9 368 560 20 Segment cluster Z25299_PEA_2 node_23 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment WO 2005/116850 PCT/IB2005/002555 602 can be found in the following transcript(s): Z25299_PEA 2 T2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts : getnt ame7 tgte , ng position Z25299 PEA 2 T2 518 707 5 Segment cluster Z25299_PEA_2_node 24 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T2 and Z25299_PEA 2 T3. Table 17 below describes the starting and ending position of this segment on each transcript. 10 Table 17 - Segment location on transcripts T~pi~ip iih&ll -t-IICI C lit ~ S eted position Z25299_PEA_2 T2 708 886 Z25299 PEA 2 T3 518 696 Segment cluster Z25299_PEA_2_node_8 according to the present invention is supported by 218 libraries. The number of libraries was determined as previously described. This segment 15 can be found in the following transcript(s): Z25299_PEA 2_T1, Z25299_PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA 2_T6 and Z25299_PEA_2_T9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Transcript name Segment starting position Sg t ending positi Z25299 PEA 2 T1 1 208 Z25299 PEA 2 T2 1 208 Z25299 PEA 2 T3 1 208 Z25299_PEA_2 T6 1 208 Z25299_PEA_2_T9 1 208 WO 2005/116850 PCT/IB2005/002555 603 Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides 5 were found to hit this segment (with regard to ovarian cancer), shown in Table 19. Table 19 - Oligonucleotides related to this segment lIgonucleotilde nameIOvrIxpes "ssedincancers Clifp refrec~er : c Z25299 0 3 0 ovarian carcinoma OVA According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 10 Segment cluster Z25299_PEA_2_node_12 according to the present invention is supported by 228 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_TI, Z25299_PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEA_2_T9. Table 20 below describes 15 the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts TTnscript name Segment starting position Segment endin position Z25299 PEA 2 TI 209 245 Z25299 PEA 2 T2 209 245 Z25299 PEA_2 T3 209 245 Z25299 PEA_2_T6 209 245 Z25299 PEA 2_T9 209 245 Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially 20 expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to ovarian cancer), shown in Table 21.
WO 2005/116850 PCT/IB2005/002555 604 Table 21 - Oligonucleotides related to this segment Ohgonu11LclIeotide name Overcxpressed mn cancers Chip reference Z25299 0 3 0 ovarian carcinoma OVA Segment cluster Z25299 PEA_2_node_13 according to the present invention is supported 5 by 246 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA 2_Tl, Z25299_PEA 2 T2, Z25299_PEA_2_T3, Z25299_PEA_2 T6 and Z25299_PEA_2_T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Transcript niam Segment starting position Segnient ending postion Z25299 PEA 2 T1 246 357 Z25299 PEA 2 T2 246 357 Z25299 PEA 2 T3 246 357 Z25299 PEA 2 T6 246 357 Z25299 PEA_2 T9 246 357 10 Segment cluster Z25299_PEA 2_node_14 according to the present invention can be found in the following transcript(s): Z25299_PEA_2 Ti, Z25299_PEA_2_T2, Z25299_PEA 2 T3, Z25299_PEA 2_T6 and Z25299_PEA 2_T9. Table 23 below describes 15 the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Trancr25299iptname Segment starting posit Seg nt ending posi367ion Z25299_PEA_2 T2 358 367 Z25299_PEA 2 T3 358 367 Z25299 PEA 2 T6 358 367 Z25299 PEA 2 T6 358 367 WO 2005/116850 PCT/IB2005/002555 605 Z25299_PEA_2 T9 358 367 Segment cluster Z25299_PEA_2_node_17 according to the present invention can be found in the following transcript(s): Z25299_PEA_2 Tl, Z25299_PEA_2_T2 and 5 Z25299_PEA_2 T3. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts Transcript name Segment taring posito S e n position Z25299 PEA 2 TI 368 371 Z25299 PEA 2 T2 368 371 Z25299 PEA 2 T3 368 371 10 Segment cluster Z25299_PEA_2_node_ 8 according to the present invention is supported by 221 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T2, Z25299_PEA_2_T3 and Z25299_PEA_2_T6. Table 25 below describes the starting and ending position of this segment on each transcript. 15 Table 25 - Segment location on transcripts Transcript nme egnent starling position Segmeeit endiIgposition Z25299_PEA 2 T1 372 427 Z25299 PEA 2 T2 372 427 Z25299 PEA 2 T3 372 427 Z25299 PEA_2 T6 368 423 Segment cluster Z25299_PEA_2 node_19 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment 20 can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T2, WO 2005/116850 PCT/IB2005/002555 606 Z25299_PEA_2 T3 and Z25299_PEA_2 T6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts i segment Sii tin S e tending position Z25299 PEA 2 TI 428 517 Z25299 PEA 2 T2 428 517 Z25299 PEA 2 T3 428 517 Z25299 PEA 2 T6 424 513 5 Variant protein alignment to the previously known protein: 10 Sequence name: /tmp/oXgeQ4MeyL/K6VqblMQu2:ALK1_HUMAN Sequence documentation: Alignment of: Z25299 PEA 2 P2 x ALK1 HUMAN 15 Alignment segment 1/1: Quality: 1371.00 Escore: 0 20 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 25 Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 607 Gaps: 0 Alignment: 5 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I 1 II I I I I l l I I l l I I I I I I I I I I l l l l1 1 1 I l l l l l ll l I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 10 III I I IIII I I l l l ll111 111111111I 11111 l l I I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 IllI 111IIIllII1llIlllIIIIIIII 15 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 20 Sequence name: /tmp/rbf314VLIm/yR43i4SbP4:ALK1 HUMAN Sequence documentation: 25 Alignment of: Z25299_PEA 2 P3 x ALK1 HUMAN Alignment segment 1/1: 30 Quality: 1371.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 608 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 IIIIlIIIIIIIIIllllIIIIllIIIIIIIIIIIIIIIIIIIIIIIllI 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 15 51 CQSDWQCPGKKRCOPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 IIIIIIIIIIIIIIIIIlIlIIIIlIIIIIIIIIlIIIIIIIIIIII 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 2 0 I I I I l I l l I I I l l l l 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 25 Sequence name: /tmp/KCtSXACZXe/rK4T6LKeRX:ALK1 HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 609 Alignment of: Z25299 PEA 2 P7 x ALK1 HUMAN Alignment segment 1/1: 5 Quality: 835.00 Escore: 0 Matching length: 81 Total length: 81 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I iI l I I I ll i i I I i i l l ll lII I I l l I I I I I I I I I I 1 l iI 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 20 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP 81 I I I l l il l l 11ii i I I I I l l l l l Il lI 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP 81 25 30 Sequence name: /tmp/LcBlcAxB6c/NSI9pqfxoU:ALK1 HUMAN WO2005/116850 PCT/IB2005/002555 610 Sequence documentation: Alignment of: Z25299_PEA 2 P0lO x ALK1 HUMAN 5 Alignment segment 1/1: Quality: 844.00 Escore: 0 Matching length: 82 Total 10 length: 82 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 15 Gaps: 0 Alignment: 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 2 0 i l l i l l l l l l l l i l l i I l i l lI l l I li l Il lI l l i I Il l 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT 82 I I I l l i I1 1 I i I I I I l l ll I l l i I I I 25 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT 82 Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299 juncl 13-14 21 in normal and cancerous ovary tissues 30 Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by or according tojuncl3-14-21, Z25299juncl13-14-21 amplicon(s) and WO 2005/116850 PCT/IB2005/002555 611 Z25299 junc 13-14-21F and Z25299 junc 13-14-21R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon- PBGD-amplicon), HPRTI (GenBank Accession No. NM_000194; amplicon - HPRTI -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon 5 SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samplesin testing panel", above), to 10 obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 26 is a histogram showing over expression of the above-indicated Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts in cancerous ovary samples relative to the normal samples. The number and percentage of samples that exhibit at 15 least 5 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 26, the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was 20 found in 12 out of 42 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 3.0E-04. 25 The above value demonstrates statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: Z25299 juncl3-14-21F forward primer; and Z25299 juncl3-14-21R reverse primer. 30 The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon WO 2005/116850 PCT/IB2005/002555 612 was obtained as a non-limiting illustrative example only of a suitable amplicon: Z25299 juncl 3 14-21. Z25299 juncl3-14-21 Forward primer (SEQ ID NO:991): ACCCCAAACCCAACTTGATTC 5 Z25299 juncl 13-14-21 Reverse primer (SEQ ID NO:992): TCAGTGGTGGAGCCAAGTCTC Z25299 juncl 13-14-21 Amplicon (SEQ ID NO:993): ACCCCAAACCCAACTTGATTCCTGCCATATGGAGGAGGCTCTGGAGTCCTGCTCTGT GTGGTCCAGGTCCTTTCCACCCTGAGACTTGGCTCCACCACTGA 10 Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299 seg20 in normal and cancerous ovary tissues Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor 15 transcripts detectable by or according to seg20, Z25299 seg20 amplicon(s) and Z25299 seg20F and Z25299 seg20R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC0 19323; amplicon - PBGD amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRTI -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH 20 (GenBank Accession No. BC026907; GAPDH amplicon) was me asured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samplesin testing panel" above), to obtain a value of fold up 25 regulation for each sample relative to median of the normal PM samples. Figure 27A is a histogram showing over expression of the above-indicated Secretory leukocyte protease inhibitor Acid-stable proteinase transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 27A, the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase transcripts detectable by the above 30 amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples WO 2005/116850 PCT/IB2005/002555 613 (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel"). Notably an over expression of at least 10 fold was found in 30 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease 5 inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 9.81E-07. Threshold of 10 fold overexpression was found to differentiate between cancer ard normal samples with P value of 5E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. 10 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: Z25299 seg20F forward primer; and Z25299 seg20R reverse primer. 15 The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: Z25299 seg20. Z25299 seg20 Forward primer (SEQ ID NO:994): CTCCTGAACCCTACTCCAAGCA Z25299 seg20 Reverse primer (SEQ ID NO:995): CAGGCGATCCTATGGAAATCC 20 Z25299 seg20 Amplicon (SEQ ID NO:996): CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAA CTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCTG Expression of Secretory leukocyte protease inhibitor (Acid-stable proteinase inhibitor with 25 strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G) Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues Expression of Secretory leukocyte protease inhibitor transcripts detectable by or 30 according to Z25299seg20 amplicon(s) and primers: Z25299seg23F Z25299seg20R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 WO 2005/116850 PCT/IB2005/002555 614 (GenBank Accession No. NM 000981; RPLl9 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA amplicon) was measured similarly. For each RT sample, the expression of the above amplicon 5 was nonrmalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 1 above, Tissue samples in testing panel), to obtain a value of relative expression of each sample relative to median of the ovary samples. Primers and amplicon are as above. Results are shown in Figure 27B. 10 Expression of Secretory leukocyte protease inhibitor Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299 seg23 in normal and cancerous ovary tissues Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor 15 transcripts detectable by or according to seg23, Z25299 seg23 amplicon(s) and Z25299 seg23F and Z25299 seg23R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH 20 (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel" above), to obtain a value of fold up 25 regulation for each sample relative to median of the normal PM samples. Figure 28A is a histogram showing over expression of the above-indicated Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 28A, the expression of Secretory leukocyte protease inhibitor 30 Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, WO 2005/116850 PCT/IB2005/002555 615 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 10 fold was found in 31 out of 43 adenocarcinorna samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease 5 inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 2.48E-07. Threshold of 10 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.61E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. 10 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: Z25299 seg23F forward primer; and Z25299 seg23R reverse primer. The present invention also preferably encompasses any amp!icon obtained through the 15 use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: Z25299 seg23. Z25299 seg23 Forward primer (SEQ ID NO:997): CAAGCAATTGAGGGACCAGG Z25299 seg23 Reverse primer (SEQ ID NO:998): CAAAAAACATTGTTAATGAGAGAGATGAC 20 Z25299 seg23 Amplicon (SEQ ID NO:999): CAAGCAATTGAGGGACCAGGAAGTGGATCCTCTAGAGATGAGGAGGCATTCTGCTG GATGACTTTTAAAAATGTTTTCTCCAGAGTCATCTCTCTCATTAACAATGTTTTTTG Expression of Secretory leukocyte protease inhibitor Z25299 transcripts which are detectable by 25 amplicon as depicted in sequence name Z25299seg23 in different normal tissues Expression of Secretory leukocyte protease inhibitor transcripts detectable by or according to Z25299seg23 amplicon(s) and primers (as above): Z25299seg23F Z25299seg23R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 30 (GenBank Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - WO 2005/116850 PCT/IB2005/002555 616 Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the 5 ovary samples (Sample Nos. 18-20, Table I above, Tissue samples in testing panel), to obtain a value of relative expression of each sample relative to median of the ovary samples. Results are shown in Figure 28B.
WO 2005/116850 PCT/IB2005/002555 617 DESCRIPTION FOR CLUSTER T39971 Cluster T39971 features 4 transcript(s) and 28 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the 5 application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcip t Name SEQ ID NO: T39971_T10 570 T39971 T12 571 T39971 T16 572 T39971 T5 573 Table 2 - Segments of interest Set Name SEQ ID No: T39971_node_0 574 T39971_node 18 575 T39971 node 21 576 T39971 node 22 577 T39971_node 23 578 T39971_node_31 579 T39971_node 33 580 T39971_node 7 581 T39971 node 1 582 T39971 node 10 583 T39971_node 11 584 T39971 node 12 585 T39971 node 15 586 T39971_node 16 587 T39971_node 17 588 WO 2005/116850 PCT/IB2005/002555 618 T39971_node_26 589 T39971_node 27 590 T39971_node_28 591 T39971 node 29 592 T39971 node 3 593 T39971 node 30 594 T39971 node 34 595 T39971 node 35 596 T39971 node 36 597 T39971 node 4 598 T39971 node 5 599 T39971 node 8 600 T39971 node 9 601 Table 3 - Proteins of interest Protein Name SEQ ID NO: T39971 P6 603 T39971 P9 604 T39971 P11 605 T39971 P12 606 These sequences are variants of the known protein Vitronectin precursor (SwissProt 5 accession identifier VTNC_HUMAN; known also according to the synonyms Serum spreading factor; S-protein; V75), SEQ ID NO: 602, referred to herein as the previously known protein. Protein Vitronectin precursor is known or believed to have the following function(s): Vitronectin is a cell adhesion and spreading factor found in serum and tissues. Vitronectin interacts with glycosaminoglycans and proteoglycans. Is recognized by certain members of the 10 integrin family and serves as a cell-to-substrate adhesion molecule. Inhibitor of the membrane damaging effect of the terminal cytolytic complement pathway. The sequence for protein WO 2005/116850 PCT/IB2005/002555 619 Vitronectin precursor is given at the end of the application, as "Vitronectin precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP position~sl) on1 Commencit amino aicid sequence 122 A -> S./FTId=VAR 012983. 268 R -> Q. /FTId=VAR_012984. 400 T -> M./FTId=VAR 012985. 50 C ->N 225 S ->N 366 A -> T 5 Protein Vitronectin precursor localization is believed to be Extracellular. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, melanoma. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential 10 pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Alphavbeta3 integrin antagonist; Apoptosis agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic 15 indication: Anticancer. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: immune response; cell adhesion, which are annotation(s) related to Biological Process; protein binding; heparin binding, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular 20 Component.
WO 2005/116850 PCT/IB2005/002555 620 The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 5 Cluster T39971 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 29 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to 10 the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 29 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: liver cancer, lung malignant tumors and pancreas carcinoma. 15 Table 5 - Normal tissue distribution Nanme of' T'i'ssue Number adrenal 60 bladder 0 bone 0 brain 9 colon 0 epithelial 79 general 29 liver 2164 lung 0 lymph nodes 0 breast 0 pancreas 0 prostate 0 WO 2005/116850 PCT/IB2005/002555 621 skin 0 uterus 0 Table 6 - P values and ratios for expression in cancerous tissue Name of Tissue P] P2 sPl R3 SP2 R4 adrenal 6.9e-01 7.4e-01 2.0e-02 2.3 5.3e-02 1.8 bladder 5.4e-01 6.0e-01 5.6e-01 1.8 6.8e-01 1.5 bone 1 6.7e-01 1 1.0 7.0e-01 1.4 brain 8.0e-01 8.6e-01 3.0e-01 1.9 5.3e-01 1.2 colon 4.2e-01 4.8e-01 7.0e-01 1.6 7.7e-01 1.4 epithelial 6.6e-01 5.7e-01 1.Oe-O1 0.8 8.7e-01 0.6 general 5.1e-01 3.8e-01 9.2e-08 1.6 8.3e-04 1.3 liver 1 6.7e-01 2.3e-03 0.3 1 0.2 lung 2.4e-01 9.1e-02 1.7e-01 4.3 8.1e-03 5.0 lymph nodes 1 5.7e-01 1 1.0 5.8e-01 2.3 breast 1 6.7e-01 1 1.0 8.2e-01 1.2 pancreas 9.5e-02 1.8e-01 1.5e- 11 6.5 8.2e-09 4.6 prostate 7.3e-01 6.0e-01 6.7e-01 1.5 5.6e-01 1.7 skin 1 4.4e-01 1 1.0 6.4e-01 1.6 uterus 5.0e-01 2.6e-01 1 1.1 8.0e-01 1.4 As noted above, cluster T39971 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Vitronectin 5 precursor. A description of each variant protein according to the present invention is now provided. Variant protein T3997 1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T3997 1_T5. An alignment is 10 given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the WO 2005/116850 PCT/IB2005/002555 622 application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971_P6 and VTNC_HUMAN: 1.An isolated chimeric polypeptide encoding for T39971_P6, comprising a first amino 5 acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV 10 LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG corresponding to amino acids 1 - 276 of VTNC_HUMAN, which also corresponds to amino acids 1 - 276 of T39971_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TQGVVGD corresponding to amino acids 15 277 - 283 of T39971 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T3997 1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 TQGVVGD in T39971_P6.The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither 25 trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein T39971_P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is 30 known or not; the presence of known SNPs in variant protein T39971_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 623 Table 7 - Amino acid mutations SNP positions) on imlino acid Aliternative tunino acid(s) Prev-iou)sly K-1known SNP? 122 A -> S Yes 145 G-> No 268 R-> Q Yes 280 V-> A Yes 180 C -> No 180 C -> W No 192 Y-> No 209 A -> No 211 T-> No 267 G-> No 267 G->A No 268 R-> No Variant protein T39971_P6 is encoded by the following transcript(s): T39971_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript 5 T39971 T5 is shown in bold; this coding portion starts at position 756 and ends at position 1604. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971_P6 sequence provides support for the deduced sequence of this variant protein 10 according to the present invention). Table 8 - Nucleic acid SNPs SNP position on nucleotide Altenuitive nucleic acid P1reviously. known SN?' sequecec 417 G-> C Yes 459 T -> C Yes WO 2005/116850 PCT/IB2005/002555 624 1387 C-> No 1406 ->A No 1406 ->G No 1555 G -> No 1555 G->C No 1558 G-> No 1558 G->A Yes 1594 T->C Yes 1642 T->C Yes 1770 C -> T Yes 529 G->T Yes 1982 A-> G No 2007 G-> No 2029 T->C No 2094 T->C No 2117 C -> G No 2123 C -> T Yes 2152 C -> T Yes 2182 G->T No 2185 A->C No 2297 T -> C Yes 1119 G->T Yes 2411 G-> No 2411 G->T No 2487 T-> C Yes 1188 G-> No 1295 C -> No 1295 C -> G No 1324 -> T No 1331 C -> No WO 2005/116850 PCT/IB2005/002555 625 1381 C-> No Variant protein T3997 1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 _T10. An alignment 5 is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between T39971_P9 and VTNC_HUMAN: 1 .An isolated chimeric polypeptide encoding for T39971_P9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 15 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT corresponding to amino acids 1 - 325 of VTNC_HUMAN, which also corresponds to amino acids 1 - 325 of T39971 P9, and a second 20 amino acid sequence being at least 90 % homologous to SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGA NNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGC PAPGHL corresponding to amino acids 357 - 478 of VTNC_HUMAN, which also corresponds to amino acids 326 - 447 of T39971 P9, wherein said first and second amino acid sequences are 25 contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T3997 lP9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 30 least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a WO 2005/116850 PCT/IB2005/002555 626 structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of 5 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 10 Variant protein T39971_P9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T3997 1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). 15 Table 9 - Amino acid mutations SNP positions) on amino acid Alternative amino acids) Previously known SNP?' sequel i ce ..... ............... .. .... 122 A-> S Yes 145 G-> No 268 R -> Q Yes 328 M -> T No 350 S -> P No 369 T->M Yes 379 S -> I No 380 N -> T No 180 C -> No 180 C -> W No 192 Y -> No 209 A -> No 211 T-> No WO 2005/116850 PCT/IB2005/002555 627 267 G-> No 267 G->A No 268 R-> No Variant protein T39971_P9 is encoded by the following transcript(s): T39971_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971_T10 is shown in bold; this coding portion starts at position 756 and ends at position 5 2096. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 10 - Nucleic acid SNPs SNP positioI on nucleotide Alterative nicleic acid Previously known SNP? sequence 417 G-> C Yes 459 T-> C Yes 1387 C -> No 1406 -> A No 1406 -> G No 1555 G -> No 1555 G -> C No 1558 G -> No 1558 G->A Yes 1738 T -> C No 1803 T->C No 1826 C -> G No 529 G -> T Yes 1832 C -> T Yes 1861 C -> T Yes WO 2005/116850 PCT/IB2005/002555 628 1891 G->T No 1894 A->C No 2006 T-> C Yes 2120 G -> No 2120 G->T No 2196 T->C Yes 1119 G->T Yes 1188 G-> No 1295 C -> No 1295 C -> G No 1324 -> T No 1331 C -> No 1381 C -> No Variant protein T39971_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971_T12. An 5 alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences 'are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between T3997 I_P11 and VTNC_HUMAN: 1 .An isolated chimeric polypeptide encoding for T39971_P11, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 15 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTR1NCQGKTYLFKGSQYWRFEDGV
LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE
WO 2005/116850 PCT/IB2005/002555 629 CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1 - 326 of VTNC_HUMAN, which also corresponds to amino acids 1 - 326 of T39971_P11, and a second amino acid sequence being at least 90 % homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids 442 5 - 478 of VTNCHUMAN, which also corresponds to amino acids 327 - 363 of T39971_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971_P 11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino 10 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. 15 Comparison report between T39971 _P 11 and Q9BSH7 (SEQ ID NO:1000): 1 .An isolated chimeric polypeptide encoding for T3997 1_P 11, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 20 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1 - 326 of Q9BSH7, which also corresponds to amino acids 1 - 326 of T39971_P 1, and a second amino 25 acid sequence being at least 90 % homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids 442 - 478 of Q9BSH7, which also corresponds to amino acids 327 - 363 ofT39971_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T3997 I_P11, 30 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino WO 2005/116850 PCT/IB2005/002555 630 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. 5 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 10 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein T39971_P11 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein T39971_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations I acid Alternative amino aci(s) known SNPN 'P 122 A-> S Yes 145 G-> No 268 R-> Q Yes 180 C -> No 180 C -> W No 192 Y -> No 209 A-> No 211 T-> No 267 G -> No 267 G-> A No 268 R-> No WO 2005/116850 PCT/IB2005/002555 631 Variant protein T39971_P11 is encoded by the following transcript(s): T39971_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971_T12 is shown in bold; this coding portion starts at position 756 and ends at position 5 1844. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 _P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 12 - Nucleic acid SNPs SNP position on nucleotide. Alternative nucleic acid C Previusly known SNP? ~seqnce ....
~~ 417 G-> C Yes 459 T-> C Yes 1387 C -> No 1406 ->A No 1406 ->G No 1555 G-> No 1555 G-> C No 1558 G-> No 1558 G -> A Yes 1754 T-> C Yes 1868 G -> No 1868 G ->T No 529 G ->T Yes 1944 T->C Yes 1119 G -> T Yes 1188 G -> No 1295 C -> No 1295 C -> G No WO 2005/116850 PCT/IB2005/002555 632 1324 -> T No 1331 C -> No 1381 C -> No Variant protein T39971 _P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 _T16. An 5 alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between T39971_P12 and VTNC_HUMAN: 1.An isolated chimeric polypeptide encoding for T39971_P12, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV 15 LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1 - 223 of VTNC_HUMAN, which also corresponds to amino acids 1 - 223 of T39971_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% 20 homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV corresponding to amino acids 224 - 238 ofT39971_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably 25 at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971_P12.
WO 2005/116850 PCT/IB2005/002555 633 Comparison report between T39971_P12 and Q9BSH7: 1.An isolated chimeric polypeptide encoding for T39971_P 12, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC 5 KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1 - 223 of Q9BSH7, which also corresponds to amino acids 1 - 223 of T39971_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 10 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV corresponding to amino acids 224 238 of T39971_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971_P12, comprising a polypeptide 15 being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971 P12. The location of the variant protein was determined according to results from a number of 20 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 25 Variant protein T39971_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T3997 IP 12 sequence provides support for the deduced sequence of this variant protein according to the present invention). 30 Table 13 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 634 SNP positions) on aminoacidI Alternative aminoacil(s) Previoiusly known SNP? sequence 122 A->S Yes 145 G-> No 180 C -> No 180 C -> W No 192 Y -> No 209 A-> No 211 T-> No Variant protein T39971 _P 12 is encoded by the following transcript(s): T39971 _T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971_T16 is shown in bold; this coding portion starts at position 756 and ends at position 5 1469. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 14 -Nucleic acid SNPs SNP position on nucotide Alternative niucleic acid PreviousIy knowni SNP? sequceIC 417 G -> C Yes 459 T -> C Yes 1387 C -> No 1406 -> A No 1406 -> G No 529 G->T Yes 1119 G->T Yes 1188 G -> No WO 2005/116850 PCT/IB2005/002555 635 1295 C-> No 1295 C -> G No 1324 -> T No 1331 C -> No 1381 C-> No As noted above, cluster T39971 features 28 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster T39971 _node_0 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. 10 Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Transcript name Seg(ment start ing position Segment ending' posOtior T39971 T10 1 810 T39971 T12 1 810 T39971 T16 1 810 T39971 T5 1 810 Segment cluster T3997 I_node_ 8 according to the present invention is supported by 1 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 _T16. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Transcript na1e Segment starting position Seginet endiig position WO 2005/116850 PCT/IB2005/002555 636 T39971T16 1425 1592 Segment cluster T39971_node_21 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T39971_T10, T39971_T12 and T39971_T5. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts Traiiscriplt name Segmntt startihg position Segment ending positions T39971 T10 1425 1581 T39971 T12 1425 1581 T39971 T5 1425 1581 10 Segment cluster T39971_node_22 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T5. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts T39971 T5 1582 1779 15 Segment cluster T39971_node_23 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_TIO, T39971_T12 and T39971_T5. Table 19 20 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Transcript nmsarting positions Sogment ending position : WO 2005/116850 PCT/IB2005/002555 637 T39971 T10 1582 1734 T39971 T12 1582 1734 T39971 T5 1780 1932 Segment cluster T39971_node_31 according to the present invention is supported by 94 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T39971_T10 and T39971_T5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts Trnscript 1mime11C Segm'Inent starting position Segmen ding position T39971 T10 1847 1986 T39971 T5 2138 2277 10 Segment cluster T39971_node_33 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12 and T39971_T5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts Tnlscript nIune bSegment starting position Segment ending position o T39971 T10 1987 2113 T39971 T12 1735 1861 T39971 T5 2278 2404 15 Segment cluster T3997 Inode_7 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_TIO, T39971_T12, T39971_TI16 and T39971_T5. 20 Table 22 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 638 Table 22 - Segment location on transcripts Transcipt name Segmencit Starting positionl Segmnentiding p~ositionl T39971 T10 940 1162 T39971 TI2 940 1162 T39971 TI6 940 1162 T39971 T5 940 1162 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 5 Segment cluster T39971_nodel according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Transcript name Segment statifng' po(Sitio Semn en1g ositio T39971 T10 811 819 T39971 T12 811 819 T39971 T16 811 819 T39971 T5 811 819 10 Segment cluster T39971_node_10 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 _T10, T39971_TI12, T39971_T16 and T39971 T5. 15 Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts TraIsc ipt name Segment starting position 'Segment ending position I T39971 T10O 1189 1232 T39971-T12 1189 1232 WO 2005/116850 PCT/IB2005/002555 639 T39971 T16 1189 1232 T39971 T5 1189 1232 Segment cluster T3997 1_node_11 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T39971_TIO, T39971_T12, T39971_TI16 and T39971_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Transcript name Segment tarting posiion Segment ending position T39971 T10 1233 1270 T39971 T12 1233 1270 T39971 T16 1233 1270 T39971 T5 1233 1270 10 Segment cluster T39971_node_12 according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_TI16 and T39971_TS. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Transcript name 2 Segment starting positions - Segment ending position T39971 T10 1271 1284 T39971 T12 1271 1284 T39971 T16 1271 1284 T39971 T5 1271 1284 15 Segment cluster T39971_node_15 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be WO 2005/116850 PCT/IB2005/002555 640 found in the following transcript(s): T39971_T10, T39971 _T12, T39971_T16 and T39971 T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts Transcript nane Segment starting position Segment ending option T39971 T10O 1285 1316 T39971 T12 1285 1316 T39971 T16 1285 1316 T39971 T5 1285 1316 5 Segment cluster T39971_node_16 according to the present invention can be found in the following transcript(s): T39971_T10, T39971T12, T39971_T16 and T39971_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts Transcrpt namlte Segmennt starting positIOn Segthnent end Ing position1 T39971 T10 1317 1340 T39971 T12 1317 1340 T39971 T16 1317 1340 T39971 T5 1317 1340 10 Segment cluster T39971_node_17 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T3997 IT10, T39971_T12, T39971_T16 and T39971_T5. 15 Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Tra1Script tname Segment starting position Segnen nding position T39971 T0IO 1341 1424 T39971 T12 1341 1424 WO 2005/116850 PCT/IB2005/002555 641 T39971 T16 1341 1424 T39971 T5 1341 1424 Segment cluster T39971_node_26 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T39971 _T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Trnscript 111ameC Segment starting Position Segmnteding position T39971_T5 1933 1974 10 Segment cluster T3997 1_node_27 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T5. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts Transcript namne Sgment starting position Segment ending posiion T39971_T5 1975 2025 15 Segment cluster T39971 _node_28 according to the present invention can be found in the following transcript(s): T39971 _T10 and T39971_T5. Table 32 below describes the starting and ending position of this segment on each transcript. 20 Table 32 - Segment location on transcripts Transcript name .Segment startingposition Segment ending position T39971 T10 1735 1743 T39971 T5 2026 2034 WO 2005/116850 PCT/IB2005/002555 642 Segment cluster T39971 _node_29 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T39971_T10 and T39971_T5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts Transcript niime Segmenit star-ting position . Segmenit pending positions T39971 T10 1744 1838 T39971 T5 2035 2129 10 Segment cluster T39971 _node_3 according to the present invention is supported by 78 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts TranUscri-pt n1ame Segmnt star'tingo positionl Segmlent eningII1 Pos"itIon T39971 T10 820 861 T39971 T12 820 861 T39971 T16 820 861 T39971 T5 820 861 15 Segment cluster T39971_node_30 according to the present invention can be found in the following transcript(s): T39971_TIO and T39971_T5. Table 35 below describes the starting and ending position of this segment on each transcript. 20 Table 35 - Segment location on transcripts Transcript name. Segment staftingposition Segment ending poition.
WO 2005/116850 PCT/IB2005/002555 643 T39971 TIO 1839 1846 T39971 T5 2130 2137 Segment cluster T39971_node_34 according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12 and T39971_T5. Table 36 below describes 5 the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts Transcript name Segment starting positli Segment ending Position T39971 T10 2114 2120 T39971 T12 1862 1868 T39971 T5 2405 2411 Segment cluster T39971_node_35 according to the present invention can be found in the 10 following transcript(s): T39971_T10, T39971_T12 and T39971_T5. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts Trascript name Segment starting position Scgnient ending position T39971 T10 2121 2137 T39971 T12 1869 1885 T39971 T5 2412 2428 15 Segment cluster T39971_node_36 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 _T10, T39971T 12 and T39971 T5. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 644 TranIsenIpt name Segment starting p)os.Ition Segment sending Position T39971 T10 2138 2199 T39971 T12 1886 1947 T39971 T5 2429 2490 Segment cluster T3997 1_node_4 according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971 T5. Table 39 5 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts Trainscript name Segment starting p~ositioli Segment ending" positKin T39971 T10 862 881 T39971 T12 862 881 T39971 T16 862 881 T39971 T5 862 881 Segment cluster T39971_node_5 according to the present invention is supported by 80 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts Transcript namne Segment starting position Segment ending position T39971 T10 882 939 T39971 T12 882 939 T39971 T16 882 939 T39971 T5 882 939 15 WO 2005/116850 PCT/IB2005/002555 645 Segment cluster T39971_node_8 according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts Trascrpt name. . . Segment starting position Seinent ending positions T39971 T10 1163 1168 T39971 T12 1163 1168 T39971 T16 1163 1168 T39971 T5 1163 1168 5 Segment cluster T39971_node_9 according to the present invention can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971_T5. Table 42 below describes the starting and ending position of this segment on each transcript. 10 Table 42 - Segment location on transcripts TIranscript name Segmenit starting position Segmit ending positions T39971 T10 1169 1188 T39971 T12 1169 1188 T39971 T16 1169 1188 T39971 T5 1169 1188 15 Variant protein alignment to the previously known protein: Sequence name: /tmp/xkraCL2OcZ/43L7YcPH7x:VTNC HUMAN WO2005/116850 PCT/IB2005/002555 646 Sequence documentation: Alignment of: T39971_P6 x VTNC HUMAN 5 Alignment segment 1/1: Quality: 2774.00 Escore: 0 10 Matching length: 278 Total length: 278 Matching Percent Similarity: 99.64 Matching Percent Identity: 99.64 Total Percent Similarity: 99.64 Total Percent 15 Identity: 99.64 Gaps: 0 Alignment: 20 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 IIlllllllilIIIllllillIlilliillllilllllilillilII 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 2 5 lI i I i I l l l li l i l l ll I ll i i I l l i l l l i l l l l l l li ll l il 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 i30 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGlllliIllillillllDSRPETLHPGRPQPP 150llllll 30 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 WO 2005/116850 PCT/IB2005/002555 647 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 IllllllllllllIllllllilllllllllllllllllll 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 5 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 IllllllIIllllIlllllllllillllllIlillllllllllilll|l 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGTQ 278 1 0 I lll11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 251 PDNVDAALALPAHSYSGRERVYFFKGKQ 278 15 Sequence name: /tmp/X4DeeuSlB4/yMubSR5FPs:VTNCHUMAN 20 Sequence documentation: Alignment of: T39971 P9 x VTNC HUMAN Alignment segment 1/1: 25 Quality: 4430.00 Escore: 0 Matching length: 447 Total length: 478 30 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 648 Total Percent Similarity: 93.51 Total Percent Identity: 93.51 Gaps: 1 5 Alignment: 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I 1111111111 ili111111i 11111111 i iI 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 10 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 II lI I Ill Ill Ill I llll llll l II 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 15 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I l l l l l l l l l l lI l l l l l l l IIl l l l l l l l l l l lll l l l I I il l 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 20 l i l l l l I ll Il ll ll I ii 111 | i|li| 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I l l l l I lll l l l l l l I I1 II l l l l l l l l l l l l 1l l l l l l l l l l l l l1 I I 25 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 I l l ll l l l l l l l l l l II I l l l l I ll l l l I I I 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 30 301 VFEHFAMMQRDSWEDIFELLFWGRT ......................... 325 WO 2005/116850 PCT/IB2005/002555 649 Illlllilllllllllllllllllll 301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 326 ...... SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 369 5 l l l l l l l l l l l l l l l l l l l l l l l l l l lll l lll lll l l I 351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400 370 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 419 IlllllllllllllllllllIllllllllllllllllllllllll 10 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450 420 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 447 IIIIllllllllllllIllllIll 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478 15 20 Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZw:VTNC HUMAN Sequence documentation: 25 Alignment of: T39971 P11 x VTNC HUMAN Alignment segment 1/1: Quality: 3576.00 30 Escore: 0 WO 2005/116850 PCT/IB2005/002555 650 Matching length: 363 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps: 1 Alignment: 10 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I l I I I I I I I I I I l I I I l l I I I I I l l I I I I l l l l l l I I I I l l 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 15 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I l l l l l I IIII I I Il l I l l I I I I II I I I I I IIII I l l l l l lII I II 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 2 0 I I I lI l l l l l l lI l lI l l l l l l l l l l l l l Il I I II Il l l llI I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I l l I l l lI IlIl lIlI I I I I l l I I I I I I I I l I l l l l l 25 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I I I I l I I II I I I I I I l l I I l I I I l l I I I l l lI I l lI I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 30 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 WO 2005/116850 PCT/IB2005/002555 651 Ilillllllllllllllllllllllillllllllllllllllillllllll 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 301 VFEHFAMMQRDSWEDIFELLFWGRTS ........................ 326 5 I l lI l l I I I I I I I I I lI l lI I 301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 326 .................................................. 326 10 351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400 327 ......................................... DKYYRVNLR 335 IIII111111111 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450 15 336 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478 20 25 Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZw:Q9BSH7 Sequence documentation: Alignment of: T39971 P11 x Q9BSH7 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 652 Quality: 3576.00 Escore: 0 Matching length: 363 Total 5 length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 10 Gaps: 1 Alignment: 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 1 5 l i l l l l l l l l l iI l l i l l l l l lll l l l l l i l l l I l l l l lI I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 IlilllllliliIIIIllilillilllllllllllllllllllll 20 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I l l i 1 ll l lll l I III I I I I l l l i l l l l l l1 l l l l l l l l l l II I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 25 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 Il l l l l l l l l l l l l l l l l l l l l l l l I I I Il li i i I I l l l l l i l l l l l l 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 30 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 Il l l l l l I l l l l l I II l l l l l l l l l l li l l l ll l l l l l l lI llI WO 2005/116850 PCT/IB2005/002555 653 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 I I I I Ill l l I lll l l l l lli l l l l I l l l l l l I l l il l l l l l 5 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 301 VFEHFAMMQRDSWEDIFELLFWGRTS ........................ 326 Illilllllllllllllllllllllli 301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 10 326 .................................................. 326 351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAM 400 15 327 .........................................DKYYRVNLR 335 I11IllII 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450 336 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 2 0 l l l l l l l l l l l l l l l l l l l l l l 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478 25 Sequence name: /tmp/fgebv7ir4i/48bTBMziJO:VTNC HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 654 Alignment of: T39971 P12 x VTNC HUMAN Alignment segment 1/1: 5 Quality: 2237.00 Escore: 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 IllllllllllllilllllllllillllllllIIIlllllllllllll 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 20 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 IlllllllllllilllilllllllilllIllllllilllllllllll 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 25 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I Il l l l l i I I l l l l ll l l l l i I l l i I l l i l l l l l l l l l l l l i I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200: 3 0 iI l l l l l l l ll iII Il l ll l l l l l l l l l l l l l l i l l l l l l I I l l l 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 WO 2005/116850 PCT/IB2005/002555 655 201 GIEGPIDAAFTRINCQGKTYLFK 223 IllliiIlllllilillil 201 GIEGPIDAAFTRINCQGKTYLFK 223 5 10 Sequence name: /tmp/fgebv7ir4i/48bTBMziJ0O:Q9BSH7 Sequence documentation: 15 Alignment of: T39971_P12 x Q9BSH7 Alignment segment 1/1: Quality: 2237.00 20 Escore: 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 25 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 30 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 WO 2005/116850 PCT/IB2005/002555 656 I l l l l l l l il l llI l l l l l l I l lll l l lI l l l l lI I I l l t l ll 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 5 I I l I I I I I l l I I I I I I lI I l l l I I I l l l I l lI I IlI l llI II 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I lI I I I I I l I I l lI Il l llI IlII III I I I I l III I I II 10 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I l I I l l l I I I l l l I I II I I I l l l l l l l II II II l I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 15 201 GIEGPIDAAFTRINCQGKTYLFK 223 II|IlllIllIIIIIlIllIIIII 201 GIEGPIDAAFTRINCQGKTYLFK 223 20 Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971 junc23-33 in normal and cancerous ovary tissues Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, 25 complement S-protein) transcripts detectable by or according to junc23-33, T39971 junc23-33 amplicon(s) and T39971 junc23-33F and T39971 junc23-33R primers was measured by real time PCR. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRTI -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon 30 SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amp licon was normalized WO 2005/116850 PCT/IB2005/002555 657 to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, Table 1, above, 'Tissue samples in testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM 5 samples. Figure 30 is a histogram showing down regulation of the above-indicated VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 30, the expression of VTNC_HUMAN vitronectin (serum 10 spreading factor, somatomedin B, complement S-protein), transcripts detectable by the above amplicon(s) in most cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 45-48 Table 1, above, "Tissue samples in testing panel"). Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non 15 limiting illustrative example only of a suitable primer pair: T39971 junc23-33F forward primer; and T39971 junc23-33R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T39971 junc23 20 33. T39971 junc23-33 Forward primer (SEQ ID NO:1001): GGGGCAGAACCTCTGACAAG T39971 junc23-33 Reverse primer (SEQ ID NO:1002): GGGCAGCCCAGCCAGTA T39971 junc23-33 Amplicon (SEQ ID NO:1003): 25 GGGGCAGAACCTCTGACAAGTACTACCGAGTCAATCTTCGCACACGGCGAGTGGAC ACTGTGGACCCTCCCTACCCACGCTCCATCGCTCAGTACTGGCTGGGCTGCCC Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, 30 complement S-protein), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33 in different normal tissues.
WO 2005/116850 PCT/IB2005/002555 658 Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts detectable by or according to T39971ljunc23-33 amplicon and T39971junc23-33F and T39971junc23-33R was measured by real time PCR. In parallel the 5 expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the 10 quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35, Table 2 "Tissue samples in normal panel" above), to obtain a value of relative expression of each sample relative to median of the breast samples. The results are described in Figure 31, presenting the histogram showing the expression 15 of T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33, in different normal tissues. Primers and amplicon are as above. 20 DESCRIPTION FOR CLUSTER Z44808 Cluster Z44808 features 5 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 25 Table 1 - Transcripts of interest Trinscript Name' 0SEQ ID NO Z44808 PEA 1 T81 607 Z44808_PEA 1 T4 608 Z44808 PEA_1_T5 609 Z44808_PEA_1_T8 610 WO 2005/116850 PCT/IB2005/002555 659 Z44808_PEA 1 T9 611 Table 2 - Segments of interest Segment Name SEQ ID NO: Z44808 PEA 1 node 0 612 Z44808 PEA_1 node 16 613 Z44808_PEA 1 node 2 614 Z44808_PEA_1 node_24 615 Z44808 PEA_1_node 32 616 Z44808_PEA_1 node_33 617 Z44808_PEA 1 node 36 618 Z44808_PEA 1_node_37 619 Z44808_PEA_1_node 41 620 Z44808 PEA 1 node 11 621 Z44808 PEA 1 node 13 622 Z44808_PEA 1_node_ 18 623 Z44808_PEA 1 node 22 624 Z44808_PEA 1 node 26 625 Z44808_PEA 1 node 30 626 Z44808_PEA 1 node34 627 Z44808 PEA 1 node 35 628 Z44808_PEA_1_node_39 629 Z44808 PEA 1_node_4 630 Z44808_PEA_1 node 6 631 Z44808 PEA 1 node 8 632 Table 3 - Proteins of interest Protein Name SEQ I D NO: Z44808_PEA1 1P5 634 WO 2005/116850 PCT/IB2005/002555 660 Z44808_PEA_1 P6 635 Z44808 PEA 1 P7 636 Z44808 PEA 1 P11 637 These sequences are variants of the known protein SPARC related modular calcium binding protein 2 precursor (SwissProt accession identifier SMO2_HUMAN; known also according to the synonyms Secreted modular calcium-binding protein 2; SMOC-2; Smooth 5 muscle-associated protein 2; SMAP-2; MSTP 117), SEQ ID NO: 633, referred to herein as the previously known protein. The sequence for protein SPARC related modular calcium-binding protein 2 precursor is given at the end of the application, as "SPARC related modular calcium-binding protein 2 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 10 4. Table 4 - Amino acid mutations for Known Protein SNP position(s) onl Comment amino11 aci seqceI~lc 169 - 170 KT -> TR 212 S ->P 429 - 446 TPRGHAESTSNRQPRKQG > RSKRNL 434 A->V 439 N -> Y Protein SPARC related modular calcium-binding protein 2 precursor localization is believed to be Secreted. 15 Cluster Z44808 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 32 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to 20 the expression of all ESTs in that category, according to parts per million).
WO 2005/116850 PCT/IB2005/002555 661 Overall, the following results were obtained as shown with regard to the histograms in Figure 32 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, lung cancer and pancreas carcinoma. 5 Table 5 - Normal tissue distribution Name off'TissueM Number bladder 123 bone 304 brain 18 colon 0 epithelial 40 general 37 kidney 2 lung 0 breast 61 ovary 116 pancreas 0 prostate 128 stomach 36 uterus 195 Table 6 - P values and ratios for expression in cancerous tissue Name ofTIsue P SPI 10 SP2 R4 bladder 6.8e-01 7.6e-01 7.7e-01 0.8 9.1e-01 0.6 bone 7.0e-01 8.8e-01 9.9e-01 0.3 1 0.2 brain 6.8e-01 7.2e-01 3.0e-02 2.6 1.7e-01 1.6 colon 9.2e-03 1.3e-02 1.2e-01 3.6 1.6e-01 3.1 WO 2005/116850 PCT/IB2005/002555 662 epithelial 2.1e-02 4.0e-01 1.0e-04 1.9 2.7e-01 1.0 general 2.6e-02 7.2e-01 4.9e-07 1.9 3.0e-01 1.0 kidney 7.3e-01 8.1e-01 1 1.0 1 1.0 lung 4.0e-03 1.8e-02 8.0e-04 12.2 2.1e-02 6.0 breast 4.8e-01 6.1e-01 9.8e-02 2.0 3.9e-01 1.2 ovary 8.1e-01 8.3e-01 9.1e-01 0.6 9.7e-01 0.5 pancreas 1.2e-01 2.1e-01 1.0e-03 6.5 5.9e-03 4.6 prostate 8.4e-01 8.9e-01 9.0e-01 0.6 9.8e-01 0.4 stomach 5.0e-01 8.7e-01 9.6e-04 1.5 1.9e-01 0.8 uterus 6.7e-01 7.9e-01 9.2e-01 0.5 1 0.3 As noted above, cluster Z44808 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein SPARC related modular calcium-binding protein 2 precursor. A description of each variant protein according to the present invention is now provided. 5 Variant protein Z44808_PEA 1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808_PEA_1_T4. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to 10 one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z44808_PEAlP5 and SMO2_HUMAN: 15 1.An isolated chimeric polypeptide encoding for Z44808_PEA 1_P5, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA 20 APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN WO 2005/116850 PCT/IB2005/002555 663 DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1 - 441 5 of SMO2_HUMAN, which also corresponds to amino acids 1 - 441 of Z44808_PEA 1 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR corresponding to amino acids 442 - 464 ofZ44808_PEA_1 P5, wherein said first and second amino acid sequences are 10 contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808_PEAlP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEAlP5. 15 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 20 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808_PEAlP5 is encoded by the following transcript(s): Z44808_PEA 1_T4, for which the sequence(s) is/are given at the end of the application. The 25 coding portion of transcript Z44808_PEA 1 T4 is shown in bold; this coding portion starts at position 586 and ends at position 1977. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA_1 P5 sequence provides support for the deduced 30 sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs WO 2005/116850 PCT/IB2005/002555 664 SNP position on nucleotide Altemativeickic acid previously known SNP?. sequece ,.... . 549 A-> G No 648 T-> G No 4403 G -> T No 4456 G ->A Yes 4964 G-> C Yes 1025 C-> No 1677 T-> C No 2691 C -> T Yes 3900 T-> C No 3929 G -> A Yes 4099 G -> T Yes 4281 T->C No 4319 G->C Yes Variant protein Z44808_PEA 1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 Z44808_PEAlT5. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between Z44808_PEA 1_P6 and SMO2_HUMAN: 1.An isolated chimeric polypeptide encoding for Z44808_PEA lP6, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR 15 TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD WO 2005/116850 PCT/IB2005/002555 665 GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE 5 RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRH corresponding to amino acids 1 - 428 of SMO2_HUMAN, which also corresponds to amino acids 1 - 428 of Z44808_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 10 RSKRNL corresponding to amino acids 429 - 434 ofZ44808_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 15 sequence RSKRNL in Z44808_PEA 1_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808_PEAlP6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the 25 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 666 SNP positi@(s) onl amino eid Alteinativ amio acid(s) Previously knSNP 147 A-> No Variant protein Z44808_PEA 1 P6 is encoded by the following transcript(s): Z44808_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808_PEA 1 T5 is shown in bold; this coding portion starts at 5 position 586 and ends at position 1887. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEAlP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 9 - Nucleic acid SNPs SNP position oinnucleotide, Alternative nucleic acid Previously known SNP? sequtence 549 A-> G No 648 T-> G No 2866 G -> A Yes 3374 G ->C Yes 1025 C-> No 1677 T ->C No 2310 T ->C No 2339 G -> A Yes 2509 G -> T Yes 2691 T-> C No 2729 G -> C Yes 2813 G -> T No WO 2005/116850 PCT/IB2005/002555 667 Variant protein Z44808_PEA_1_ P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808_PEA 1_T9. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to 5 one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z44808_PEA_1 P7 and SMO2_HUMAN: 10 1.An isolated chimeric polypeptide encoding for Z44808_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA 15 APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1 - 441 20 of SMO2_HUMAN, which also corresponds to amino acids 1 - 441 of Z44808_PEA_1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLWLRGKVSFYCF corresponding to amino acids 442 - 454 of Z44808_PEA 1 P7, wherein said first and second amino acid sequences are contiguous and 25 in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808_PEA 1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLWLRGKVSFYCF in Z44808_PEA_1_P7. 30 WO 2005/116850 PCT/IB2005/002555 668 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 5 prediction programs predict that this protein has a signal peptide, and neither trans - membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 10 the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEAlP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations SNP position(s) on ainno aicid Alternive amno acid(s) Previously known SNP? sequence 147 A> No 15 Variant protein Z44808_PEA 1 P7 is encoded by the following transcript(s): Z44808_PEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808_PEAlT9 is shown in bold; this coding portion starts at position 586 and ends at position 1947. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative 20 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic acid Previously known SNP? sequence 549 A >G No WO 2005/116850 PCT/IB2005/002555 669 648 T->G No 1025 C -> No 1677 T->C No 2169 C -> A Yes Variant protein Z44808_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 Z44808_PEA_1_TI 1. The identification of this transcript was performed using a non-EST based method for identification of alternative splicing, described in the following reference: "Sorek R et al., Genome Res. (2004) 14:1617-23." An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end 10 of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z44808_PEA_1_P11 and SMO2_HUMAN: 1.An isolated chimeric polypeptide encoding for Z44808_PEA_1_P11, comprising a first 15 amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT corresponding to amino acids 1 - 170 of SMO2_HUMAN, which also corresponds to amino 20 acids 1 - 170 of Z44808_PEA_1_P11, and a second amino acid sequence being at least 90 % homologous to DIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGL YKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQ GCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLD 25 KNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKE DGKADTKKRHTPRGHAESTSNRQPRKQG corresponding to amino acids 188 - 446 of WO 2005/116850 PCT/IB2005/002555 670 SMO2_HUMAN, which also corresponds to amino acids 171 - 429 of Z44808_PEA 1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Z44808_PEA_1 P1 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 5 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to -170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 15 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 20 the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 -Amino acid mutations SNPpositioii(s)onainiino acid Alternative amino acid(s) Previously known SN P? sequence 147 A-> No 25 Variant protein Z44808_PEA_1_P11 is encoded by the following transcript(s): Z44808_PEA _1T1 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808_PEA 1 TI 1 is shown in bold; this coding portion starts at WO 2005/116850 PCT/IB2005/002555 671 position 586 and ends at position 1872. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA 1 P11 sequence provides support for the deduced 5 sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs SNP position on nucotide AlternatiC RIdeic acid Previously known SNP 549 A-> G No 648 T-> G No 2720 G-> A Yes 3228 G-> C Yes 1025 C -> No 1626 T ->C No 2164 T ->C No 2193 G ->A Yes 2363 G ->T Yes 2545 T ->C No 2583 G ->C Yes 2667 G->T No As noted above, cluster Z44808 features 21 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are 10 of particular interest. A description of each segment according to the present invention is now provided. Segment cluster Z44808_PEA 1 node_0 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment 15 can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEA 1 T4, WO 2005/116850 PCT/IB2005/002555 672 Z44808_PEA 1_T5, Z44808_PEA_1 T8 and Z44808_PEA 1_T9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Transcript name Segment starting position Segment end ig posItion. Z44808_PEA 1 Tll 1 669 Z44808 PEA 1 T4 1 669 Z44808 PEA 1 T5 1 669 Z44808 PEA 1 T8 1 669 Z44808 PEA 1 T9 1 669 5 Segment cluster Z44808_PEA 1 node_16 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEAlT4, Z44808_PEA 1 T5, Z44808_PEA 1 T8 and Z44808_PEA 1 T9. Table 15 below describes 10 the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Transcript name Segment starting pson PSegmenosition Z44808 PEA 1 TIl1 1172 1358 Z44808 PEA 1 T4 1223 1409 Z44808 PEA 1 T5 1223 1409 Z44808_PEA 1 T8 1223 1409 Z44808_PEA lT9 1223 1409 Segment cluster Z44808_PEA_1_node_2 according to the present invention is supported 15 by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA1__T1 1, Z44808_PEA_1_T4, WO 2005/116850 PCT/IB2005/002555 673 Z44808_PEA_1_T5, Z44808_PEA _ T8 and Z44808 PEA 1 T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Transcript nme Sgnent start position Segnpent ending position Z44808 PEA 1 Tll 670 841 Z44808 PEA 1 T4 670 841 Z44808 PEA 1 T5 670 841 Z44808_PEA 1_T8 670 841 Z44808 PEA 1 T9 670 841 5 Segment cluster Z44808_PEA 1 node_24 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEAlT4, Z44808_PEA 1_T5, Z44808 PEA 1_T8 and Z44808_PEA 1 T9. Table 17 below describes 10 the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts Transcript namec Segmnent starting position Segment ening]T'( position Z44808 PEA 1 T11 1545 1819 Z44808 PEA 1 T4 1596 1870 Z44808 PEA 1 T5 1596 1870 Z44808 PEA 1 T8 1596 1870 Z44808 PEA 1 T9 1596 1870 Segment cluster Z44808_PEA 1_node_32 according to the present invention is supported 15 by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T4 and Z44808_PEAIT8. Table 18 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 674 Table 18 - Segment location on transcripts Transcript name Se-menit startling position] Segment end(inIg po-sitionl Z44808 PEA 1 T4 1909 3593 Z44808 PEA 1 T8 1909 2397 Segment cluster Z44808_PEA 1_node_33 according to the present invention is supported by 133 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEA 1 T4 and Z44808_PEA_1 T5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts Trans,,cript nmei Segmecnt stIartingjy position Segmnent ending11 Position Z44808 PEA 1 Tll 1858 2734 Z44808_PEA_1 T4 3594 4470 Z44808_PEA lT5 2004 2880 10 Segment cluster Z44808_PEA 1_node_36 according to the present invention is supported by 117 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA 1_T1 1, Z44808_PEA 1 T4 and Z44808_PEA 1_T5. Table 21 below describes the starting and ending position of this segment 15 on each transcript. Table 21 - Segment location on transcripts Transcript iiame 01Segente acting poson Segen endgposton Z44808 PEA 1 TIl1 2829 3080 Z44808 PEA 1 T4 4565 4816 Z44808 PEA 1 T5 2975 3226 WO 2005/116850 PCT/IB2005/002555 675 Segment cluster Z44808_PEA 1_node_37 according to the present invention is supported by 120 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEAIT4 and Z44808_PEA_1_T5. Table 22 below describes the starting and ending position of this segment 5 on each transcript. Table 22 - Segment location on transcripts Transcript ime Segment starting position Segment ending positIon Z44808 PEA 1 T11 3081 3429 Z44808 PEA 1 T4 4817 5165 Z44808 PEA 1 T5 3227 3575 Segment cluster Z44808_PEA_1_node_41 according to the present invention is supported 10 by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAIT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Tnmscript name Segment starting pOSition : L 1Segmento Z44808_PEA 1 T9 1974 2206 15 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster Z44808_PEA_1 node_ 11 according to the present invention is supported 20 by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAlT4, Z44808_PEA_1 T5, Z44808_PEA_1 T8 and Z44808_PEA 1_T9. Table 24 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 676 Table 24 - Segment location on transcripts Tramnscrit name Seguient starting position, Seent ending position Z44808 PEA 1 T4 1097 1147 Z44808 PEA 1 T5 1097 1147 Z44808 PEA 1 T8 1097 1147 Z44808 PEA 1 T9 1097 1147 Segment cluster Z44808_PEA_1 _node_13 according to the present invention is supported 5 by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEA_1_T4, Z44808_PEA_1_T5, Z44808_PEAlT8 and Z44808_PEA_1_T9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts TT a -. ,, ri t~tnameSegniei? ,statizg position ~SC gm1(nt C.Bking Posito Z44808 PEA 1 T1l 1097 1171 Z44808 PEA 1 T4 1148 1222 Z44808_PEA 1 T5 1148 1222 Z44808 PEA 1 T8 1148 1222 Z44808 PEA 1 T9 1148 1222 10 Segment cluster Z44808_PEA_1 _node 18 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_Tl 1, Z44808_PEAlT4, 15 Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEAIT9. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Transenpt name Segment starting position Segment ending position WO 2005/116850 PCT/IB2005/002555 677 Z44808_PEA_1 _T1 1359 1441 Z44808_PEA 1 T4 1410 1492 Z44808_PEA 1 T5 1410 1492 Z44808_PEA_1 T8 1410 1492 Z44808_PEA 1 T9 1410 1492 Segment cluster Z44808_PEA_1 node 22 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): Z44808_PEA 1 T1 1, Z44808_PEA 1_T4, Z44808_PEA_1 T5, Z44808_PEA 1 T8 and Z44808_PEAlT9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts TranIscrI-pt nam11e Segmenit star-ting p)osition Segmenit enin~ilg Position Z44808_PEA 1 TH 1442 1544 Z44808_PEA_1 T4 1493 1595 Z44808_PEA lT5 1493 1595 Z44808_PEA lT8 1493 1595 Z44808_PEA_1_T9 1493 1595 10 Segment cluster Z44808_PEA 1 node_26 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA 1 T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Transcriptname Segnint s psiton Se ending position Z44808_PEA 1_ .T5 1871 1965 15 WO 2005/116850 PCT/IB2005/002555 678 Segment cluster Z44808_PEA1__node_30 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can'be found in the following transcript(s): Z44808_PEA_1_Tl 1, Z44808_PEA 1 T4, Z44808_PEA 1_T5, Z44808_PEA 1 T8 and Z44808_PEA 1 T9. Table 31 below describes 5 the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts Transcript naie Segnent starting position Segment ending position Z44808_PEA 1 Tll 1820 1857 Z44808_PEA 1 T4 1871 1908 Z44808 PEA 1 T5 1966 2003 Z44808_PEA_1_T8 1871 1908 Z44808_PEA_1 T9 1871 1908 Segment cluster Z44808_PEA_1_node_34 according to the present invention is supported 10 by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEAlT4 and Z44808_PEA_1 _T5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts I'scri pt name Segment starting position Sement eidirg position Z44808_PEA 1 TI1 2735 2809 Z44808 PEA 1 T4 4471 4545 Z44808 PEA 1 T5 2881 2955 15 Segment cluster Z44808_PEA__1node_35 according to the present invention can be found in the following transcript(s): Z44808_PEA1__T1 1, Z44808_PEAlT4 and Z44808_PEA_1_T5. Table 33 below describes the starting and ending position of this segment 20 on each transcript.
WO 2005/116850 PCT/IB2005/002555 679 Table 33 - Segment location on transcripts Transcript name Segmnent stating position Segmient endIn~g positions Z44808 PEA 1 Tl1 2810 2828 Z44808_PEA 1 T4 4546 4564 Z44808 PEA 1 T5 2956 2974 Segment cluster Z44808_PEA 1_node_39 according to the present invention is supported 5 by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1 T9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Trians cript mnme Scgmenit startling( position1 Segmenit enldinig position Z44808_PEA_1 T9 1909 1973 10 Segment cluster Z44808_PEA_1_node_4 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEA_1 T4, Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEA_1_T9. Table 35 below describes 15 the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Transcript name Segment stating position Segmt eding positioni Z44808 PEA 1 Tl 842 948 Z44808_PEAI 1T4 842 948 Z44808 PEA 1 T5 842 948 Z44808 PEA 1 T8 842 948 Z44808 PEA 1 T9 842 948 WO 2005/116850 PCT/IB2005/002555 680 Segment cluster Z44808 PEA_1_node_6 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1 _T1 1, Z44808_PEA 1_T4, 5 Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts Tnmscript name Segment starting position Segment ending position Z44808_PEA1 TIl1 949 1048 Z44808_PEA 1_T4 949 1048 Z44808 PEA 1 T5 949 1048 Z44808_PEA _1 T8 949 1048 Z44808 PEAlT9 949 1048 10 Segment cluster Z44808_PEA_1_node_8 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1 _T.T11, Z44808_PEA_1 T4, Z44808_PEAlT5, Z44808_PEA_1 T8 and Z44808_PEA_1_T9. Table 37 below describes the starting and ending position of this segment on each transcript. 15 Table 37 - Segment location on transcripts Transcript nae Segient starting position Segment eIdinosition Z44808_PEA 1 TIl1 1049 1096 Z44808_PEA_1 T4 1049 1096 Z44808_PEA_1 T5 1049 1096 Z44808_PEA_1 T8 1049 1096 Z44808_PEA 1 T9 1049 1096 WO 2005/116850 PCT/IB2005/002555 681 5 Variant protein alignment to the previously known protein: Sequence name: /tmp/vUqLu6eAVZ/K3JDuPvaLo:SMO2 HUMAN Sequence documentation: 10 Alignment of: Z44808 PEA 1 P5 x SMO2 HUMAN Alignment segment 1/1: 15 Quality: 4440.00 Escore: 0 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent 20 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I lI I I I I I I I I I I I I III I III I II I 1 1 1 I I I II 1 I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 30 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 WO 2005/116850 PCT/IB2005/002555 682 I l i l l l i l l l l l I llll l l l l l l l l l i l l l l l l l l l l l I Il l i 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 5 11 1 1 1 1 1 1 1 11 1 1 1 1II li 111111111 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 10 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 15 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 IIll l l l l l I I I I I I I I I I I I I I I l I I I I I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 20 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 1 11111 111111111 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 2 5 I IllI I l lI I 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 30 401 DKSISVQELGCLGVAKEDGKADTKKRTPRGHAESTSNRQ 441i 30 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 WO 2005/116850 PCT/IB2005/002555 683 5 Sequence name: /tmp/QSUNfTsJ5y/kLOw5Vb6SD:SMO2 HUMAN Sequence documentation: 10 Alignment of: Z44808 PEA 1 P6 x SMO2 HUMAN Alignment segment 1/1: Quality: 4310.00 15 Escore: 0 Matching length: 428 Total length: 428 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 20 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 25 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 l i i i1 11111111111i1 11ii l1111l llllI lilllllIll 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 30 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 ii 1 1 1 llil lil 1 1 11 ill li i111 l lIIII1 llillilll WO 2005/116850 PCT/IB2005/002555 684 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I l Il l l lI Il l I 1 l l l I I I I I I IIIl l l l l l l lI l l l l l l I 5 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 10 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 I I I l l l l I I I I1 I II l l l l l l l l l l l l l l l l l l I l 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 15 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I Il l l l l lI l l l Il l I I I I l l Il1ll l l l l l l II I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 2 0 I l l l l l ll I I Il lll l I Il l I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 Il l l l l l l l l l l l l l l I1l l l lI Ill l l l l l l l l I l l l l II I 25 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 401 DKSISVQELMGCLGVAKEDGKADTKKRH 428 I Ii l l l l1 l l l lI I l I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRH 428 30 WO 2005/116850 PCT/IB2005/002555 685 5 Sequence name: /tmp/MZVdR4PVdM/5uN8RwViJ1:SMO2 HUMAN Sequence documentation: Alignment of: Z44808 PEA 1 P7 x SMO2 HUMAN 10 Alignment segment 1/1: Quality: 4440.00 Escore: 0 15 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 20 Identity: 100.00 Gaps: 0 Alignment: 25 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I lI I I I I I I l l l l l l I l l l l l I l l l lI I I I I I I I l l lI I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 3 0 I l l I I II I I I I I l l l l I I I I I I I I l l II I I I I I I I lI I I I I I I l 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 WO 2005/116850 PCT/IB2005/002555 686 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I 'I I IIIII II i II 11 i i I iii I 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 5 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 II I I l l l I l l l l l l l l I I I I I l l I I I I I I I I I I I 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 10 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 Ill l I IIIIII ll1lll llll llli llll li iil l I Ii 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 1 5 l l l l I ll l l l l l l l I I I I I I l l I I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I 1111 1111111 111111 111 1111l II ll Illli 20 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 I I l l l l l II I I I I I l I I l l I I I I I I I I I I I I II I 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 25 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 l I l l I l l l l l i l l l lI l l I l l l III l l l l l Il l 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 30 WO 2005/116850 PCT/IB2005/002555 687 Sequence name: /tmp/3fGVxqLloe/J5mQduAd0F:SMO2_HUMAN 5 Sequence documentation: Alignment of: Z44808 PEA 1 P11 x SMO2 HUMAN 10 Alignment segment 1/1: Quality: 4228.00 Escore: 0 Matching length: 429 Total 15 length: 446 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 96.19 Total Percent Identity: 96.19 20 Gaps: 1 Alignment: 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 2 5 I l l l l il l l l l l l l llI l l l l Il l l l l l l l l l l l l l l I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 30 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100II I II 30 51 PLCAS DGRTFLSRCEFQRAKCKDPQLE IAYRGNCKDVSRCVAERKYTQEQ 100 WO 2005/116850 PCT/IB2005/002555 688 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I l l l l I l l l l l lI ll l l l l l l l i l II I l I l l I IIll l 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 5 151 PRCPGSVNEKLPQREGTGKT. .................DIASRYPTLWTEQ 183 II I I l I l l I I l l 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 184 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 233 10 I l l l l l l I l l l l l l l l l l l Ill l l l l l l l l l l I 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 234 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 283 15 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 284 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 333 I I l l l l I l l I I l l l I I I l I l l l l l l l l l l l l l l l l I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 20 334 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 383 Il l l l l l I l l l l ll l l l l l l I l il I 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 25 384 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 429 I II Il lIlIll l l l lI ~ l l l l ~ l l ~ l II I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 446 30 WO 2005/116850 PCT/IB2005/002555 689 Expression of SMO2 HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts, which are detectable by amplicon as depicted in sequence name Z44808 junc8-11 in normal and cancerous ovary tissues 5 Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to junc8-11, Z44808 junc8-1 1 amplicon(s) and Z44808 junc8-11 F and Z44808 junc8-11 R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; 10 amplicon - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon HPRT1-amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of 15 each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue sample in testing panel", above). The reciprocal of this ratio was then calculated, to obtain a value of fold down-regulation for each sample relative to median of the normal PM samples. Figure 33A is a histogram showing down regulation of the above-indicated 20 SMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 33A, the expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 45-48, 25 71, Table 1, "Tissue sample in testing panel"). Notably down regulation of at least 5 fold was found in 33 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of SMO2_HUMAN SPARC 30 related modular calcium-binding protein 2 precursor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test WO 2005/116850 PCT/IB2005/002555 690 as 4.47E-05. Threshold of fold down regulation was found to differentiate between cancer and normal samples with P value of 1.75E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present 5 invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: Z44808 junc8-11F forward primer; and Z44808 junc8-11R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon 10 was obtained as a non-limiting illustrative example only of a suitable amplicon: Z44808 junc8 11. Z44808 junc8- 11 Forward primer (SEQ ID NO: 1004): GAAGGCACAGGAAAAACAGATATTG Z44808 junc8-11 Reverse primer (SEQ ID NO:1005): TGGTGCTCTTGGTCACAGGAT 15 Z44808 junc8-11 Amplicon (SEQ ID NO:1006): GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACA GGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCATCCTGTGACCAAG AGCACCA 20 Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8- 11 in different normal tissues 25 Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to Z44808junc8-11 amplicon(s) and primers: Z44808junc8-11iF Z44808junc8-11R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981; RPL19 30 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank WO 2005/116850 PCT/IB2005/002555 691 Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2: 5 Tissue samples in normal panel, above), to obtain a value of relative expression of each sample relative to median of the ovary samples. Results are shown in Figure 33B. Primers and amplicon are as above. 10 DESCRIPTION FOR CLUSTER S67314 Cluster S67314 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. 15 Table 1 - Transcripts of interest TniscripNae SEQ ID NO : S67314 PEA 1 T4 638 S67314PEA 1 T5 639 S67314 PEA 1 T6 640 S67314 PEA 1 T7 641 Table 2 - Segments of interest Se nit Name SEQ ID NO: S67314 PEA 1 node 0 642 S67314 PEA 1 node 11 643 S67314_PEA 1 node 13 644 S67314 PEA 1 node 15 645 S67314 PEA 1 node 17 646 S67314 PEA 1 node 4 647 WO 2005/116850 PCT/IB2005/002555 692 S67314 PEA 1_node_10 648 S67314_PEA 1 node_3 649 Table 3 - Proteins of interest Protin Naime SEQ ID NO: S67314_PEA 1 P4 651 S67314_PEA 1 P5 652 S67314_PEA 1 _P6 653 S67314_PEA_1 P7 654 These sequences are variants of the known protein Fatty acid-binding protein, heart 5 (SwissProt accession identifier FABH_HUMAN; known also according to the synonyms H FABP; Muscle fatty acid-binding protein; M-FABP; Mammary-derived growth inhibitor; MDGI), SEQ ID NO: 650, referred to herein as the previously known protein. Protein Fatty acid-binding protein is known or believed to have the following function(s): FABP are thought to play a role in the intracellular transport of long-chain fatty acids and their 10 acyl-CoA esters. The sequence for protein Fatty acid-binding protein is given at the end of the application, as "Fatty acid-binding protein, heart amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP p)ositin(s) oni Comnnient aino acidsequence 1 V -> A 104 L ->K 124 C ->S 129 E-> Q 15 Protein Fatty acid-binding protein localization is believed to be cytoplasmic.
WO 2005/116850 PCT/IB2005/002555 693 The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: negative control of cell proliferation, which are annotation(s) related to Biological Process; and lipid binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBI 5 Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster S67314 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Fatty acid binding protein. A description of each variant protein according to the present invention is now 10 provided. Variant protein S67314_PEA 1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314_PEA 1_T4. An alignment is given to the known protein (Fatty acid-binding protein) at 15 the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314_PEAlP4 and FABHHUMAN: 1.An isolated chimeric polypeptide encoding for S67314_PEA 1 P4, comprising a first 20 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL 25 corresponding to amino acids 1 - 116 of FABH_HUMAN, which also corresponds to amino acids 1 - 116 of S67314_PEA_1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL 30 TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL corresponding to amino acids WO 2005/116850 PCT/IB2005/002555 694 117 - 215 of S67314_PEA 1 P4, wherein said firstand second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEA_1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 5 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEA_1 P4. Comparison report between S67314_PEA_1 P4 and AAP35373 (SEQ ID NO:1007): 10 1.An isolated chimeric polypeptide encoding for S67314_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 15 116 of S67314_PEA_1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL corresponding to amino acids 20 117 - 215 of S67314_PEAlP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 25 sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEA_1 P4. The location of the variant protein was determined according to results from a number of 30 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: WO 2005/116850 PCT/IB2005/002555 695 intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. 5 Variant protein S67314_PEA 1_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the 10 present invention). Table 5 - Amino acid mutations SNP position(s) on unino acid Alternative aminio acid(s) Plreviously, known SNP?. sequence 53 K -> R Yes Variant protein S67314_PEA 1 P4 is encoded by the following transcript(s): S67314_PEAlT4, for which the sequence(s) is/are given at the end of the application. The 15 coding portion of transcript S67314_PEAlT4 is shown in bold; this coding portion starts at position 925 and ends at position 1569. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEAlP4 sequence provides support for the deduced 20 sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic acid Previously known SNP? sequence 580 T-> C Yes 1082 A ->G Yes 1670 A -> C Yes WO 2005/116850 PCT/IB2005/002555 696 Variant protein S67314_PEAlP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314_PEA 1_T5. An alignment is given to the known protein (Fatty acid-binding protein, 5 heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314_PEA_1_P5 and FABHHUMAN: 10 1 .An isolated chimeric polypeptide encoding for S67314_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF 15 KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of FABH_HUMAN, which also corresponds to amino acids 1 - 116 of S67314_PEAlP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 20 DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV corresponding to amino acids 117 - 178 ofS67314_PEA 1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEAlP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 25 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314 PEA 1 P5. Comparison report between S67314_PEA_1_P5 and AAP35373: 30 1.An isolated chimeric polypeptide encoding for S67314_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 697 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 116 of S67314_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at 5 least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV corresponding to amino acids 117 - 178 ofS67314_PEAlP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 10 2.An isolated polypeptide encoding for a tail of S67314_PEAlP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG 15 KSIV in S67314 PEA_1_P5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314_PEA 1_P5 also has the following non-silent SNPs (Single 25 Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). 30 Table 7 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 698 SNPpostio~s)on nin acd \terat1e ainoaci~s) Preimously known SNPM equenice 53 K ->R Yes Variant protein S67314 PEA_1_P5 is encoded by the following transcript(s): S67314_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314_PEA_1_T5 is shown in bold; this coding portion starts at 5 position 925 and ends at position 1458. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1 _P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 8 - Nucleic acid SNPs SNP posIn on ucleotIde Alteraive nuicleic acidv Previously kilown SNP? sequene , 580 T-> C Yes 1082 A-> G Yes 1326 A-> G Yes Variant protein S67314_PEA 1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 15 S67314_PEA 1 T6. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 20 Comparison report between S67314_PEA 1_P6 and FABH HUMAN: 1 .An isolated chimeric polypeptide encoding for S67314_PEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more WO 2005/116850 PCT/IB2005/002555 699 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL 5 corresponding to amino acids 1 - 116 of FABHHUMAN, which also corresponds to amino acids 1 - 116 of S67314_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK corresponding to amino acids 117 - 126 of S67314_PEA_1 P6, wherein said first and second 10 amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEAlP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEA_1 _P6. 15 Comparison report between S67314_PEAlP6 and AAP35373: 1.An isolated chimeric polypeptide encoding for S67314_PEA 1 P6, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL 20 corresponding to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 116 of S67314_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK corresponding to amino acids 117 - 126 ofS67314_PEA_ 1 P6, wherein said first and second amino acid 25 sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEA_1_P6. 30 WO 2005/116850 PCT/IB2005/002555 700 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the 5 trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314_PEA 1 P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the 10 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_l_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 -Amino acid mutations SNP posit ion(s) on ainio aicid Alterniativ e anmio acid(s) Previously known SNP? sequence 53 K -> R Yes 15 Variant protein S67314_PEA 1 _P6 is encoded by the following transcript(s): S67314_PEAlT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314_PEA _I_ T6 is shown in bold; this coding portion starts at position 925 and ends at position 1302. The transcript also has the following SNPs as listed in 20 Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs SNP position on nucleotide Altemrnative nucleic acid Previously known SNP? sequence . .
WO 2005/116850 PCT/IB2005/002555 701 580 T->C Yes 1082 A->G Yes 1444 T->C Yes Variant protein S67314_PEA 1 P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 S67314_PEA 1_T7. An alignment is given to the known protein (Fatty acid-binding protein) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314_PEAlP7 and FABH_HUMAN: 10 1.An isolated chimeric polypeptide encoding for S67314_PEA_1 P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL corresponding to amino acids 1 - 24 of FABH_HUMAN, which also corresponds to amino acids 1 - 24 ofS67314_PEA 1 P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more 15 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS corresponding to amino acids 25 - 35 of S67314_PEA_1 P7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA corresponding 20 to amino acids 25 - 133 of FABH_HUMAN, which also corresponds to amino acids 36 - 144 of S67314_PEA_1 P7, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314_PEA_1 P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably 25 at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, corresponding to S67314 PEA 1 P7. Comparison report between S67314_PEA_1_P7 and AAP35373: WO 2005/116850 PCT/IB2005/002555 702 1.An isolated chimeric polypeptide encoding for S67314_PEA 1_P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL corresponding to amino acids 1 - 24 of AAP35373, which also corresponds to amino acids 1 - 24 of S67314_PEA 1 P7, second amino acid 5 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS corresponding to amino acids 25 - 35 of S67314_PEA_1_P7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI 10 VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA corresponding to amino acids 25 - 133 of AAP35373, which also corresponds to amino acids 36 - 144 of S67314_PEA_ 1 P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314_PEAlP7, 15 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, corresponding to S67314 PEA 1 P7. 20 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. 25 In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314_PEA lP7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 30 the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1_P7 WO 2005/116850 PCT/IB2005/002555 703 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNPjposition(s) on amino acid Alternative amino cid(s) Previously knownI SNP? sequence~. 64 K -> R Yes 5 Variant protein S67314_PEA 1 P7 is encoded by the following transcript(s): S67314_PEA 1 T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314_PEA 1_T7 is shown in bold; this coding portion starts at position 925 and ends at position 1356. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative 10 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA 1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs SNP position orlmiceotide AlteratIVe nuLic acid Previously known SNPI 580 T-> C Yes 1115 A -> G Yes 2772 G -> A Yes 2896 C ->A Yes 2918 G ->C Yes 3003 A-> G Yes 3074 T-> G Yes 1344 T-> C Yes 1522 ->T No 1540 ->A No 1540 ->T No WO 2005/116850 PCT/IB2005/002555 704 1578 G->A Yes 1652 G->A Yes 2263 G->A Yes 2605 T->C Yes As noted above, cluster S67314 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now 5 provided. Segment cluster S67314_PEA_1 node_0 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1 T4, S67314_PEAlT5, 10 S67314_PEAlT6 and S67314_PEA_1_ T7. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Transcript name Segmenit starting position, Segmente position S67314 PEA 1 T4 1 997 S67314 PEA_1 T5 1 997 S67314_PEA 1 T6 1 997 S67314_PEA_1 T7 1 997 15 Segment cluster S67314_PEA 1_node_11 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1 T4. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Transcript name Segient starting posti on Segment ending positi6n WO 2005/116850 PCT/IB2005/002555 705 S67314_PEA_1 T4 1273 2110 Segment cluster S67314_PEA 1_node_13 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): S67314_PEA 1_T7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Trainscript nalle Segment strigposition Segment ending 1positionl S67314_PEA_1_T7 1306 3531 10 Segment cluster S67314_PEA_1_node_15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1 T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Tnmiscript inme Segment starting( positions Seginent ending position S67314_PEAIT5 1273 1733 15 Segment cluster S67314 PEA 1_node_17 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA 1_T6. Table 17 below describes the 20 starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts Transcript nae Sg t starting positi Sgment ending posito1822 S67314 PEA 1 T6- 1273 1822 WO 2005/116850 PCT/IB2005/002555 706 Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 18. 5 Table 18 - Oligonucleotides related to this segment Oligonucleotide name Ocrexpressed in cancers Chip reference S67314_0_0_744 ovarian carcinoma OVA Segment cluster S67314_PEA_1_node_4 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1_T4, S67314_PEA_1 T5, 10 S67314_PEA_1_T6 and S67314_PEAlT7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts T s67314cPEA 1 SemntT4 998 1170sing osito Se ent d position S67314 PEA 1 T5 998 1170 S67314 PEA 1 T6 998 1170 S67314 PEA 1 T6 998 1170 S67314 PEA 1 T7 1031 1203 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are 15 included in a separate description. Segment cluster S67314_PEA _I_ node_10 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1 _T4, S67314_PEA_1 T5, 20 S67314_PEA_1_T6 and S67314_PEA_I_T7. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 707 Transcipt name SegIent starting position Segment ending poition S67314 PEA 1 T4 1171 1272 S67314 PEA 1 T5 1171 1272 S67314_PEA 1 T6 1171 1272 S67314 PEA 1 T7 1204 1305 Segment cluster S67314_PEA_1_node_3 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): S67314_PEAlT7. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts Transcipt name Segment starting position Segment ending position S67314 PEA_1_T7 998 1030 10 Variant protein alignment to the previously known protein: Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5:FABHHUMAN 15 Sequence documentation: Alignment of: S67314 PEA 1 P4 x FABH HUMAN Alignment segment 1/1: 20 Quality: 1095.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 708 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 1111111111 l lii i i I I lii lii I I l l iii i i 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50 15 52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I I II I i l i ll l i I 1111 1I 11I 11111111 i il i l i I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100 102 ETTLVRELIDGKLIL 116 20 I I ll I I I I I l 101 ETTLVRELIDGKLIL 115 25 Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5:AAP35373 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 709 Alignment of: S67314 PEA 1 P4 x AAP35373 Alignment segment 1/1: 5 Quality: 1107.00 Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 Il l il l l l l ll l l l l i i l li ll l i l l l l l i l lll l l i l l l l i 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 20 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 l~illlllilllIllllIIlliIIIIllliIIIIIIIIIIllIII 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 25 101 QETTLVRELIDGKLIL 116 1111111111111111 101 QETTLVRELIDGKLIL 116 30 WO 2005/116850 PCT/IB2005/002555 710 Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW:FABH HUMAN 5 Sequence documentation: Alignment of: S67314 PEA 1 P5 x FABH HUMAN Alignment segment 1/1: 10 Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 20 Alignment: 2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 I Il lI I I I l l l I I I I I l l l l I I l l l l l l l I l l l l l I I II I I l 25 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50 52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I l l lIl lIII II ll l Il lI l l lI II l l l l i i i 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100 30 102 ETTLVRELIDGKLIL 116 WO 2005/116850 PCT/IB2005/002555 711 IlllIIIII11[ll 101 ETTLVRELIDGKLIL 115 5 Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW:AAP35373 10 Sequence documentation: Alignment of: S67314 PEA 1 P5 x AAP35373 15 Alignment segment 1/1: Quality: 1107.00 Escore: 0 Matching length: 116 Total 20 length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 25 Gaps: 0 Alignment: 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 3 0 I l l l l l I I II I l l I lI I I l lI l I lI I I I I I II lI I lII I 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 WO 2005/116850 PCT/IB2005/002555 712 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 I l i l l l l l l l l l l l I l l l l l l l l l l l l l l l l I l l l Il il l l l i 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 5 101 QETTLVRELIDGKLIL 116 IIIIIIIIIIIIIIII 101 QETTLVRELIDGKLIL 116 10 15 Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX:FABH HUMAN Sequence documentation: Alignment of: S67314 PEA 1 P6 x FABH HUMAN 20 Alignment segment 1/1: Quality: 1095.00 Escore: 0 25 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 30 Identity: 100.00 Gaps: 0 WO 2005/116850 PCT/IB2005/002555 713 Alignment: 2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 5 I II I I I l I lI lI l lI l llI I I I I l l l l l l llI I I l lI l l llI I 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50 52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 IIIllIllllIIIIIIIlllllIIIIIlllIlllllllllllIIllII 10 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100 102 ETTLVRELIDGKLIL 116 IIIIIIIIIIIII 101 ETTLVRELIDGKLIL 115 15 20 Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX:AAP35373 Sequence documentation: 25 Alignment of: S67314 PEA 1 P6 x AAP35373 Alignment segment 1/1: Quality: 1107.00 30 Escore: 0 WO 2005/116850 PCT/IB2005/002555 714 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 IIllliIIIIllllllIlilllllllllllllliIIlllIll IIIII 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 15 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 IllI~llilllillIIlllllillllllilllllllllillIlllll 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 101 QETTLVRELIDGKLIL 116 2 0 I l l l l l l I l l l l I I 101 QETTLVRELIDGKLIL 116 25 Sequence name: /tmp/xYzWyViDom/twDu3T69pd:FABH HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 715 Alignment of: S67314 PEA 1 P7 x FABH HUMAN Alignment segment 1/1: 5 Quality: 1160.00 Escore: 0 Matching length: 132 Total length: 143 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 92.31 Total Percent Identity: 92.31 Gaps: 1 15 Alignment: 2 VDAFLGTWKLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKPT 51 Il l l l l l li l il l l l l li l l I I l l l l l l l l l l llI l 1 VDAFLGTWKLVDSKNFDDYMKSL ........... GVGFATRQVASMTKPT 39 20 52 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 101 IlllillllllllllilllllllIIilllIIIlliIIIllillllllil 40 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 89 25 102 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144 lIlliIIlllliilllllllillllllllllllllliI 90 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 132 30 WO 2005/116850 PCT/IB2005/002555 716 Sequence name: /tmp/xYzWyViDom/twDu3T69pd:AAP35373 5 Sequence documentation: Alignment of: S67314 PEA 1 P7 x AAP35373 Alignment segment 1/1: 10 Quality: 1172.00 Escore: 0 Matching length: 133 Total length: 144 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.36 Total Percent Identity: 92.36 Gaps: 1 20 Alignment: 1 MVDAFLGTWKLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKP 50 I I I I II 1 1 1III I t~ l l ll l l lI~I i I 25 1 MVDAFLGTWKLVDSKNFDDYMKSL ........... GVGFATRQVASMTKP 39 51 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 100 11111ii 1111111111i1 i i I 111 11111111111111 1i 11111 40 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 89 30 101 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144 WO 2005/116850 PCT/IB2005/002555 717 lilllllllllllllllllllllllIlllllllllllllll 90 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 133 DESCRIPTION FOR CLUSTER Z39337 5 Cluster Z39337 features 3 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Name SE"Q ID) NO: Z393372PEA 2_PEA 1 T3 655 Z39337_PEA_2_PEA 1_T6 656 Z39337_PEA 2_PEA 1 T12 657 10 Table 2 - Segments of interest Segment Name SEQ ID NO: Z39337_PEA_2_PEA_1 node 2 658 Z39337_PEA 2 PEA 1_node 15 659 Z39337 PEA 2 PEA 1 node 16 660 Z39337 PEA 2 PEA 1 node 18 661 Z39337_PEA 2 PEA _node_21 662 Z39337 PEA 2 PEA 1 node 22 663 Z39337 PEA 2 PEA_1 node 3 664 Z39337_PEA_2_PEA_1_node_5 665 Z39337 PEA 2 PEA _1 node 6 666 Z39337 PEA_2 PEA 1 node 10 667 Z39337_PEA_2 PEA 1 node 11 668 Z39337 PEA 2 PEA _1 node 14 669 WO 2005/116850 PCT/IB2005/002555 718 Table 3 - Proteins of interest Protenm e SEQ ID NO: Corresponidiig Tfanscnpt(s) Z39337_PEA_2_PEAlP4 671 Z39337_PEA 2 PEA --T3 Z39337_PEA 2 PEA 1_P9 672 Z39337 PEA_2 PEA_1_T12 Z39337_PEA 2 PEA_1 P13 673 Z39337 PEA 2 PEA 1 T6 These sequences are variants of the known protein Kallikrein 6 precursor (SwissProt accession identifier KLK6 HUMAN; known also according to the synonyms EC 3.4.21.-; 5 Protease M; Neurosin; Zyme; SP59), SEQ ID NO: 670, referred to herein as the previously known protein. The sequence for protein Kallikrein 6 precursor is given at the end of the application, as "Kallikrein 6 precursor amino acid sequence". Protein Kallikrein 6 precursor localization is believed to be secreted. 10 The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: central nervous system development; response to wounding; protein autoprocessing, which are annotation(s) related to Biological Process; chymotrypsin; tissue kallikrein; trypsin; protein binding; hydrolase, which are annotation(s) related to Molecular Function; and extracellular; cytoplasm, which are annotation(s) related to Cellular Component. 15 The GO assignment relies on information from one or more of the SwissProt/TremBI Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster Z39337 can be used as a diagnostic marker according to overexpression of 20 transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 34 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). 25 WO 2005/116850 PCT/IB2005/002555 719 Overall, the following results were obtained as shown with regard to the histograms in Figure 34 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and gastric carcinoma. 5 Table 4 - Normal tissue distribution Name of Tissue . Nime(r brain 56 colon 0 epithelial 3 general 11 head and neck 0 kidney 26 breast 52 ovary 0 prostate 0 stomach 0 uterus 0 Table 5 - P values and ratios for expression in cancerous tissue Nalme of T ISsue PI P12 SPI R 3 S P R4 brain 8.0e-01 8.4e-01 9.6e-01 0.5 1 0.3 colon 1.2e-01 8.1e-02 4.9e-01 1.9 7.4e-02 2.2 epithelial 2.0e-02 1.8e-02 1.0e-05 4.3 7.8e-15 6.9 general 4.1le-02 1.1le-01 4.3e-06 2.3 1.6e-16 2.6 head and neck 2.1e-01 3.3e-01 1 1.7 1 1.2 kidney 8.9e-01 9.2e-01 8.2e-01 0.8 9.1e-01 0.6 breast 9.1e-O1 9.1e-01 1 0.5 9.7e-01 0.6 ovary 1.4e-01 1.7e-01 4.7e-03 2.9 2.4e-02 2.2 WO 2005/116850 PCT/IB2005/002555 720 prostate 7.3e-01 7.8e-01 4.5e-01 2.0 5.6e-01 1.7 stomach 3.6e-01 1.1e-01 1 1.0 8.9e-08 5.3 uterus 4.7e-01 4.0e-01 1.9e-01 2.0 3.3e-01 1.7 5 As noted above, cluster Z39337 features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Kallikrein 6 precursor. A description of each variant protein according to the present invention is now provided. 10 Variant protein Z39337_PEA_2_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2 PEAlT3. An alignment is given to the known protein (Kallikrein 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the 15 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z39337_PEA_2 PEA_1_P4 and KLK6_HUMAN: 1.An isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA_1 P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, 20 more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWLPLSGAA corresponding to amino acids 1 - 9 of Z39337_PEA 2_PEA_1 P4, and a second amino acid sequence being at least 90 % homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHIPYQAALYTSGHLLCGGVLIHPLWV 25 LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADGDFPDTIQCAYIHLVSREECEHAYPGQ ITQNMLCAGDEKYGKDSCQGDSGGPLVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCR YTNWIQKTIQAK corresponding to amino acids 1 - 244 of KLK6_HUMAN, which also WO 2005/116850 PCT/IB2005/002555 721 corresponds to amino acids 10 - 253 ofZ39337_PEA_2_PEA1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z39337_PEA 2 PEAlP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least 5 about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWLPLSGAA of Z39337_PEA_2 PEA_1_P4. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 10 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein Z39337_PEA_2_PEA 1_P4 also has the following non-silent SNPs 15 (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). 20 Table 6 - Amino acid mutations SNP posiion(s) on amno acid Alternative amino acids) Previously known SNP? sequence 238 N -> No The glycosylation sites of variant protein Z39337 PEA_2 PEA 1 P4, as compared to the known protein Kallikrein 6 precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether 25 the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s) WO 2005/116850 PCT/IB2005/002555 722 Positions) on known tMiI Preseit in variant protein? Position in variaht protein? acid sequence 134 yes 143 Variant protein Z39337_PEA 2 PEA 1 P4 is encoded by the following transcript(s): Z39337_PEA 2_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337_PEA 2 PEA 1 T3 is shown in bold; this 5 coding portion starts at position 87 and ends at position 845. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA 2_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the 10 present invention). Table 8 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic id Previously kiown SNP? sequence 87 A -> G Yes 396 -> G No 599 G-> C Yes 799 A-> No 995 C -> No 995 C ->G No 1184 C-> No 1294 T ->A Yes Variant protein Z39337_PEA 2_PEA_1_P9 according to the present invention has an 15 amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2_PEA_1 T12. An alignment is given to the known protein (Kallikrein 6 precursor) at the end of the application. One or more alignments to one or more previously WO 2005/116850 PCT/IB2005/002555 723 published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z39337 PEA 2 PEA_1 P9 and KLK6 HUMAN: 5 1.An isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA 1 P9, comprising a first amino acid sequence being at least 90 % homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADG corresponding to amino acids 1 - 149 of 10 KLK6_HUMAN, which also corresponds to amino acids 1 - 149 of Z39337_PEA_2 PEA_1 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence Q corresponding to amino acids 150 150 of Z39337_PEA_2_PEA 1 P9, wherein said first amino acid sequence and second amino 15 acid sequence are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. The glycosylation sites of variant protein Z39337_PEA_2_PEA _1 P9, as compared to the known protein Kallikrein 6 precursor, are described in Table 9 (given according to their 25 position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s) Position(s) on known amino Present in varianit protem? Position in variaitprotein? acid sequence WO 2005/116850 PCT/IB2005/002555 724 134 yes 134 Variant protein Z39337_PEA_2_PEA 1_P9 is encoded by the following transcript(s): Z39337_PEA_2_PEA 1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337_PEA_2_ PEA_1_T12 is shown in bold; 5 this coding portion starts at position 298 and ends at position 747. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEA 1 P9 sequence provides support for the deduced sequence of this variant protein according to the 10 present invention). Table 10 - Nucleic acid SNPs SNP position oni 1uceoide Altenative nuicleic acid Previoisly known SNP%? ~sequence2Z77'7 2K.. 81 G-> No 102 G -> T Yes 147 G -> A Yes 270 G -> No 270 G ->A No 580 -> G No 784 T -> C Yes 802 G -> A Yes Variant protein Z39337_PEA_2_PEA_1_P13 according to the present invention has an 15 amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2_PEA_1_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both WO 2005/116850 PCT/IB2005/002555 725 signal-peptide prediction programs predict tint this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 5 Variant protein Z39337_PEA 2_PEA_1 P13 is encoded by the following transcript(s): Z39337_PEA 2_PEA 1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337_PEA 2_PEA_1 T6 is shown in bold; this coding portion starts at position 298 and ends at position 417. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide 10 sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs SNP position on icotile Alterative nuclcic acid Previously known SNP? secquence 81 G-> No 102 G -> T Yes 147 G->A Yes 270 G-> No 270 G->A No 423 -> G No 626 G-> C Yes 826 A-> No 1022 C -> No 1022 C -> G No 1211 C -> No 1321 T->A Yes 15 As noted above, cluster Z39337 features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) WO 2005/116850 PCT/IB2005/002555 726 are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. 5 Segment cluster Z39337_PEA_2_PEA_1_node_2 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337 PEA_2_PEA_1_T6 and Z39337_PEA_2_PEA_1_T12. Table 12 below describes the starting and ending position of this segment on each transcript. 10 Table 12 - Segment location on transcripts Transcript name Segmedt Seginent starting position ending position Z39337_PEA 2 PEA 1 T6 1 237 Z39337_PEA 2 PEA 1 T12 1 237 Segment cluster Z39337_PEA 2_PEA _ node_15 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This 15 segment can be found in the following transcript(s): Z39337_PEA 2 PEAlT3, Z39337 PEA_2_PEA_1_T6 and Z39337_PEA 2 PEA 1 T12. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts Transcnpt name Segment Segment startmlg postown endmtg positon Z39337_PEA_2_PEA 1_T3 363 558 Z39337_PEA_2_PEAlT6 390 585 Z39337_PEA_2_PEA_1 T12 547 742 20 WO 2005/116850 PCT/IB2005/002555 727 Segment cluster Z39337_PEA 2_PEA_1_node_16 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T12. Table 14 below describes the starting and ending position of this segment on each transcript. 5 Table 14 - Segment location on transcripts Tratsnscrit namIe SegimdnSgmn starting position ending position Z39337_PEA 2 PEA1 _T12 743 1402 Segment cluster Z39337_PEA 2 PEA 1 node_ 8 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This 10 segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1 T3 and Z39337_PEA_2_PEAlT6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Tanscript name Segment Segment st. . artig position ending position Z39337 PEA 2 PEA_1 T3 559 695 Z39337 PEA 2 PEA_1 T6 586 722 15 Segment cluster Z39337_PEA 2_PEA_1 node_21 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA 2_PEAlT3 and Z39337_PEA_2 PEA_1_T6. Table 16 below describes the starting and ending position of this 20 segment on each transcript. Table 16 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 728 Trangtscript name emt Segmenft strtngpoition ending- position Z39337_PEA 2 PEA 1 T3 696 1112 Z39337_PEA 2 PEA 1 T6 723 1139 Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides 5 were found to hit this segment (with regard to ovarian cancer), shown in Table 17. Table 17 - Oligonucleotides related to this segment Oligonclcotide nme Overexpressed in cancers Chi.: p reference Z39337_0 9 0 ovarian carcinoma OVA Segment cluster Z39337_PEA 2 PEA 1 node 22 according to the present invention is 10 supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAlT3 and Z39337_PEA_2_PEA_1_T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Tra'nscript na~me SemnenC~t Segmnent sIltr 01 Ht position en Ing position1 Z39337_PEA_2_PEA 1_T3 1113 1387 Z39337_PEA 2_PEA 1_T6 1140 1414 15 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
WO 2005/116850 PCT/IB2005/002555 729 Segment cluster Z39337_PEA 2 PEA 1_node_3 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAlT6 and Z39337_PEA_2_PEA_1_T12. Table 19 below describes the starting and ending position of this 5 segment on each transcript. Table 19 - Segment location on transcripts Transcrilpt namne Segmenit Segamenrt Z39337_PEA_2_PEA_1 T6 238 289 Z39337_PEA_2_PEA 1_T12 238 289 Segment cluster Z39337_PEA_2_PEA lnode_5 according to the present invention is 10 supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA 2 PEA_1 T3. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 -Segment location on transcripts Traniscript na me Segmniit Segmenclt starting position position Z39337_PEA_2_PEA__T3 1 105 15 Segment cluster Z39337 PEA 2 PEA 1 node_6 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA 1 T3, Z39337_PEA_2_PEA_1 T6 and Z39337_PEA_2_PEA_1_TI2. Table 21 below describes the 20 starting and ending position of this segment on each transcript. Table 21 -Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 730 Transeript nameSege Segment starting position endIng position Z39337_PEA 2_PEA_1_T3 106 153 Z39337_PEA_2 PEAlT6 290 337 Z39337_PEA 2 PEA_1_T12 290 337 Segment cluster Z39337_PEA 2_PEA_1_node_10 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This 5 segment can be found in the following traiscript(s): Z39337_PEA_2_PEA_1 T3 and Z39337_PEA_2_PEA_1_T12. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Transcript name Segment Segment starting Position ending Position Z39337_PEA_2_PEA 1 T3 154 207 Z39337_PEA_2_PEA 1_T12 338 391 10 Segment cluster Z39337_PEA_2 PEA_1_node_11 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337 PEA_2_PEA_1 T3 and Z39337_PEA_2_PEAI T12. Table 23 below describes the starting and ending position of this 15 segment on each transcript. Table 23 - Segment location on transcripts Transcript name - Segiient Segment starting position e endngpostion Z39337_PEA 2 PEA 1 T3 208 310 Z39337_PEA 2 PEA 1 T12 392 494 WO 2005/116850 PCT/IB2005/002555 731 Segment cluster Z39337_PEA 2_PEA 1 node_14 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This 5 segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T3, Z39337_PEA_2_PEA_1_T6 and Z39337_PEA_2_PEA_1_T12. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts Transcript .n Seemnt gent starting posItIo iding position Z39337_PEA 2 PEA 1 T3 311 362 Z39337 PEA 2 PEA 1 T6 338 389 Z39337 PEA 2 PEA 1 T12 495 546 10 Variant protein alignment to the previously known protein: 15 Sequence name: KLK6 HUMAN Sequence documentation: Alignment of: Z39337_PEA 2 PEA 1 P4 x KLK6 HUMAN 20 Alignment segment 1/1: Quality: 2444.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 732 Matching length: 244 Total length: 244 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 10 MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 59 I I I I I I II I I I I I 1 1 1 1 11 1 1I I I 1 I1I I I 1 MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50 15 60 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 109 11111 I II 111111111 1111 II lII II llI 111111 I I 51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100 110 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADGD 159 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADGD 150 160 FPDTIQCAYIHLVSREECEHAYPGQITQNMLCAGDEKYGKDSCQGDSGGP 209 III I i 11 11i III I I I II I lli 111 III I III III l 25 151 FPDTIQCAYIHLVSREECEHAYPGQITQNMLCAGDEKYGKDSCQGDSGGP 200 210 LVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCRYTNWIQKTIQAK 253 I I IIII l l I I I II I I I 1 1 I II I I I l~ 201 LVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCRYTNWIQKTIQAK 244 30 WO 2005/116850 PCT/IB2005/002555 733 5 Sequence name: KLK6 HUMAN Sequence documentation: Alignment of: Z39337 PEA 2 PEA 1 P9 x KLK6 HUMAN 10 Alignment segment 1/1: Quality: 1471.00 Escore: 0 15 Matching length: 149 Total length: 149 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 20 Identity: 100.00 Gaps: 0 Alignment: 25 1 MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50 Il 1 ll l l li l l l IIIl l l l I III II l l ill Il l ll il l l i l l l l l 1 MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50 51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100 3 0 Il l l l i l lI l l l l l l l l i lI I I l l l l l l l l l l lI l l l l l l l l l l I l l 51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100 WO 2005/116850 PCT/IB2005/002555 734 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADG 149 l i l I l l I II l l l l I I I l l l i l l l l li l l l l1 I I l l I I Il l 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADG 149 5 DESCRIPTION FOR CLUSTER HUMPHOSLIP Cluster HUMPHOSLIP features 7 transcript(s) and 53 segment(s) of interest, the names for which are given in Tables I and 2, respectively, the sequences themselves are given at the 10 end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Name SEQ ID NO: HUMPHOSLIPPEA 2 T6 674 HUMPHOSLIPPEA 2_ T7 675 HUMPHOSLIP PEA 2 T14 676 HUMPHOSLIP PEA 2 T16 677 HUMPHOSLIP PEA 2 T17 678 HUMPHOSLIP PEA 2_T18 679 HUMPHOSLIP PEA 2 T19 680 Table 2 - Segments of interest Segment Name SQ ID NO: HUMPHOSLIP PEA 2 node 0 681 HUMPHOSLIP PEA 2 node 19 682 HUMPHOSLIP_PEA_2 node 34 683 HUMPHOSLIPPEA 2_node_68 684 HUMPHOSLIP PEA 2 node_70 685 HUMPHOSLIP PEA 2 node 75 686 HUMPHOSLIP PEA 2_node 2 687 WO 2005/116850 PCT/IB2005/002555 735 HUMPHOSLIP PEA 2 node 3 688 HUMPHOSLIP PEA 2 node_4 689 HUMPHOSLIP PEA_2_node_6 690 HUMPHOSLIP PEA 2_node 7 691 HUMPHOSLIPPEA 2 node_8 692 HUMPHOSLIP PEA 2 node 9 693 HUMPHOSLIP PEA 2 node 14 694 HUMPHOSLIP PEA 2_node_15 695 HUMPHOSLIP PEA 2_node_16 696 HUMPHOSLIP PEA 2 node 17 697 HUMPHOSLIP PEA 2 node 23 698 HUMPHOSLIP PEA 2 node 24 699 HUMPHOSLIP PEA 2 node 25 700 HUMPHOSLIP PEA 2 node 26 701 HUMPHOSLIP PEA_2_node_29 702 HUMPHOSLIP PEA 2 node 30 703 HUMPHOSLIP PEA 2_node 33 704 HUMPHOSLIP PEA 2 node 36 705 HUMPHOSLIP PEA 2_node_37 706 HUMPHOSLIP PEA 2 node 39 707 HUMPHOSLIPPEA 2_node_40 708 HUMPHOSLIPPEA 2 node_41 709 HUMPHOSLIPPEA 2 node_42 710 HUMPHOSLIPPEA 2 node_44 711 HUMPHOSLIP PEA 2 node 45 712 HUMPHOSLIP PEA 2_node 47 713 HUMPHOSLIP PEA_2 node_51 714 HUMPHOSLIP PEA 2 node 52 715 HUMPHOSLIP PEA 2 node 53 716 HUMPHOSLIP PEA 2 node 54 717 WO 2005/116850 PCT/IB2005/002555 736 HUMPHOSLIP PEA_2 node_55 718 HUMPHOSLIP_ PEA 2_node_58 719 HUMPHOSLIPPEA 2 node_59 720 HUMPHOSLIP PEA 2_node_60 721 HUMPHOSLIP PEA 2_node_61 722 HUMPHOSLIP PEA 2_node_62 723 HUMPHOSLIPPEA_2_node_63 724 HUMPHOSLIP PEA_2_node_64 725 HUMPHOSLIP PEA_2_node_65 726 HUMPHOSLIP PEA 2 node_66 727 HUMPHOSLIP PEA 2 node_67 728 HUMPHOSLIP PEA 2 node_69 729 HUMPHOSLIP PEA 2_node 71 730 HUMPHOSLIP PEA 2 node_72 731 HUMPHOSLIP PEA 2 node 73 732 HUMPHOSLIP PEA 2 node 74 733 Table 3 - Proteins of interest Protin Nine SEQ ID NO: (Corresponding Transcript(s) HUMPHOSLIP PEA 2 P10O 735 HUMPHOSLIP PEA 2 T17 HUMPHOSLIP PEA_2_P12 736 HUMPHOSLIP PEA_2_T19 HUMPHOSLIP PEA 2 P30 737 HUMPHOSLIP PEA 2 T6 HUMPHOSLIP PEA_2_P31 738 HUMPHOSLIP PEA 2_T7 HUMPHOSLIP PEA 2 P33 739 HUMPHOSLIP PEA 2_T14 HUMPHOSLIP PEA 2_P34 740 HUMPHOSLIP PEA 2_T16 HUMPHOSLIP PEA 2_P35 741 HUMPHOSLIPPEA 2_T18 These sequences are variants of the known protein Phospholipid transfer protein precursor 5 (SwissProt accession identifier PLTP_HUMAN; known also according to the synonyms Lipid transfer protein II), SEQ ID NO: 734, referred to herein as the previously known protein.
WO 2005/116850 PCT/IB2005/002555 737 Protein Phospholipid transfer protein precursor is known or believed to have the following function(s): Converts HDL into larger and smaller particles. May play a key role in extracellular phospholipid transport and modulation of HDL particles. The sequence for protein Phospholipid transfer protein precursor is given at the end of the application, as "Phospholipid transfer protein 5 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP position(s) on (.Commilbent amino acid sequence 282 R-> Q. /FTId=VAR_017020. 372 R-> H. /FTId=VAR 017021. 380 R-> W (in dbSNP:6065903)./FTId=VAR 017022. 444 F -> L (in dbSNP: 1804161). /FTId=VAR_012073. 487 T > K (in dbSNP: 1056929). /FTId=VAR 012074. 18 E> V Protein Phospho lipid transfer protein precursor localization is believed to be Secreted. 10 The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: lipid metabolism; lipid transport, which are annotation(s) related to Biological Process; lipid binding, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBI 15 Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microarray (chip) 20 data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously WO 2005/116850 PCT/IB2005/002555 738 described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below (with regard to ovarian cancer), shown in Table 5. Table 5 - Oligonucleotides related to this cluster Oligonucicoide name Overexpressed i cancers Chip references HUtMPHOSLIP 0 0 18458 ovarian carcinoma OVA As noted above, cluster HUMPHOSLIP features 7 transcript(s), which were listed in 5 Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Phospholipid transfer protein precursor. A description of each variant protein according to the present invention is now provided. Variant protein HUMPHOSLIP PEA 2 P10 according to the present invention has an 10 amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIPPEA 2_T17. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each 15 such aligned protein is as follows: Comparison report between HUMPHOSLIPPEA 2_P10 and PLTPHUMAN: 1.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P10, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH 20 FYYNISE corresponding to amino acids 1 - 67 of PLTPHUMAN, which also corresponds to amino acids 1 - 67 of HUMPHOSLIPPEA_2_P10, and a second amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMK DPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMES 25 YFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKP SGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSN HSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVTNHAGFLTI GADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV corresponding to amino acids 163 - WO 2005/116850 PCT/IB2005/002555 739 493 of PLTPHUMAN, which also corresponds to amino acids 68 - 398 of HUMPHOSLIP_PEA 2 P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of 5 HUMPHOSLIP_PEA 2_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid 10 numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 15 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP PEA_2_P10O also has the following non-silent SNPs 20 (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA 2 P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). 25 Table 6 - Amino acid mutations SNP position(s) on amino acid Alternative amino acids) Previously known SNP ? sequence 16 H-> R Yes 18 E ->V Yes 113 S ->F Yes WO 2005/116850 PCT/IB2005/002555 740 118 V-> No 140 R-> No 140 R->P No 150 N-> No 160 P-> No 201 P-> No 274 M -> No 285 R->W Yes 292 Q -> No 315 L->* No 330 M-> I Yes 349 F->L Yes 392 T->K Yes The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P10, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column 5 indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s) Positions) on known ammo Present n variant protein Pri acid sequence 94 no 143 no 64 yes 64 245 yes 150 398 yes 303 117 no WO 2005/116850 PCT/IB2005/002555 741 Variant protein HUMPHOSLIP_PEA_2 P10 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_TI17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T17 is shown in bold; this coding portion starts at position 276 and ends at position 1469. The transcript also has the 5 following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 8 - Nucleic acid SNPs SNP position oni nucleotide Alternative nucleic acid lPreviously know.rn SNP? 174 G->T No 175 A->T No 322 A-> G Yes 328 A-> T Yes 431 G-> A Yes 551 C -> T Yes 613 C -> T Yes 628 T-> No 694 G -> No 694 G -> C No 723 A-> No 753 C -> No 876 C -> No 1037 C -> T Yes 1097 G -> No 1128 C -> T Yes 1149 C -> No 1219 T -> A No WO 2005/116850 PCT/IB2005/002555 742 1230 C -> T Yes 1265 G-> C Yes 1322 T->A Yes 1450 C -> A Yes 1469 C -> T No 1549 C -> T Yes 1565 A->G No 1565 A-> T No 1630 A-> G Yes 1654 T->A No 1731 G->T Yes 1864 G->A Yes 1893 G->T Yes 2073 G->A Yes 2269 C -> T Yes 2325 G->T Yes 2465 C -> T Yes 2566 C->T Yes 2881 A->G No Variant protein HUMPHOSLIP_PEA_2 P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIP_PEA_2_T19. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HUMPHOSLIP PEA_2_P12 and PLTP_HUMAN: WO 2005/116850 PCT/IB2005/002555 743 1.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2 P12, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS 5 AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRG AFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDK VPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVP PDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLK 10 TMLQIGVMPMLN corresponding to amino acids 1 - 427 of PLTP_HUMAN, which also corresponds to amino acids 1 - 427 of HUMPHOSLIP_PEA_2_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKAGV corresponding to amino acids 428 - 432 of HUMPHOSLIP_PEA_2 P12, wherein said 15 first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2 P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 20 sequence GKAGV in HUMPHOSLIPPEA 2_P12. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 25 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIPPEA _2_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on 30 the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein WO 2005/116850 PCT/IB2005/002555 744 HUMPHOSLIPPEA 2 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations SNP Iosition(s) on arnino acid Alternative aniino acid(s) Prviously know SNP? sequences 16 H-> R Yes 18 E->V Yes 81 D->H Yes 124 S->Y Yes 160 T-> No 160 T->N No 208 S->F Yes 213 V-> No 235 R->P No 235 R-> No 245 N -> No 255 P -> No 296 P -> No 369 M -> No 380 R->W Yes 387 Q -> No 410 L ->* No 425 M -> I Yes 5 The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P12, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 10 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). 10 Table 10 - Glycosylation site(s) WO 2005/116850 PCT/IB2005/002555 745 Positions) on known amino P)resent I vma t protein? Po sitio in v r t ant protein aicid sequnccc 94 yes 94 143 yes 143 64 yes 64 245 yes 245 398 yes 398 117 yes 117 Variant protein HUMPHOSLIP_PEA_2_P12 is encoded by the following transcript(s): HUMPHOSLIP_PEA 2_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T19 is shown in bold; this 5 coding portion starts at position 276 and ends at position 1571. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA 2_P12 sequence provides support for the deduced sequence of this variant protein according to the 10 present invention). Table 11 - Nucleic acid SNPs SNP position on nucieotide Alternative nucleic acidt I Previously known SNP? sequence 174 G -> T No 175 A -> T No 322 A -> G Yes 328 A ->T Yes 431 G ->A Yes 516 G-> C Yes 644 G ->A Yes 646 C ->A Yes WO 2005/116850 PCT/IB2005/002555 746 754 C -> No 754 C ->A No 836 C ->T Yes 898 C ->T Yes 913 T-> No 979 G-> No 979 G->C No 1008 A -> No 1038 C -> No 1161 C -> No 1322 C -> T Yes 1382 G -> No 1413 C ->T Yes 1434 C-> No 1504 T->A No 1515 C -> T Yes 1550 G->C Yes 1690 T->A Yes 1818 - C -> A Yes 1837 C->T No 1917 C -> T Yes 1933 A->G No 1933 A->T No 1998 A -> G Yes 2022 T -> A No 2099 G->T Yes 2232 G->A Yes 2261 G->T Yes 2441 G->A Yes 2637 C -> T Yes WO 2005/116850 PCT/IB2005/002555 747 2693 G->T Yes 2833 C ->T Yes 2934 C ->T Yes 3249 A->G No Variant protein HUMPHOSLIPPEA_2_P30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIP_PEA_2_ T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither 10 trans-memnbrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIPPEA_2_P30 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates 15 whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA 2 P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations SNP position(s) on amino acid, ,Alternative amino acid(s): !Previously known SNP?9 sequen~ce 16 H -> R Yes 18 E -> V Yes 37 R-> Q Yes 20 Variant protein HUMPHOSLIP_PEA_2_P30 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T6, for which the sequence(s) is/are given at the end of the application.
WO 2005/116850 PCT/IB2005/002555 748 The coding portion of transcript HUMPHOSLIP_PEA 2 T6 is shown in bold; this coding portion starts at position 276 and ends at position 431. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; 5 the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs SNPp position on nucleotide Alternative ictacid Previousiy known SNIP? sequecec 174 G -> T No 175 A -> T No 322 A -> G Yes 328 A ->T Yes 385 G ->A Yes 470 G ->C Yes 598 G ->A Yes 600 C ->A Yes 708 C-> No 708 C ->A No 790 C ->T Yes 852 C ->T Yes 867 T-> No 933 G-> No 933 G-> C No 962 A-> No 992 C -> No 1115 C -> No 1276 C -> T Yes 1336 G-> No 1367 C -> T Yes WO 2005/116850 PCT/IB2005/002555 749 1388 C-> No 1458 T->A No 1469 C -> T Yes 1504 G-> C Yes 1561 T->A Yes 1689 C -> A Yes 1708 C -> T No 1788 C -> T Yes 1804 A -> G No 1804 A->T No 1869 A -> G Yes 1893 T->A No 1970 G->T Yes 2103 G->A Yes 2132 G -> T Yes 2312 G->A Yes 2508 C -> T Yes 2564 G->T Yes 2704 C -> T Yes 2805 C -> T Yes 3120 A -> G No Variant protein HUMPHOSLIP_PEA_2_ P31 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIPPEA_2_T7. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: WO 2005/116850 PCT/IB2005/002555 750 Comparison report between HUMPHOSLIP_PEA_2_P31 and PLTP HUMAN: 1.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P31, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH 5 FYYNISE corresponding to amino acids 1 - 67 of PLTP_HUMAN, which also corresponds to amino acids 1 - 67 of HUMPHOSLIP_PEA_2 P31, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG corresponding to amino acids 68 - 98 of 10 HUMPHOSLIP_PEA _2 P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2 P31, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 15 sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG in HUMPHOSLIPPEA_2_P31. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 20 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP_PEA 2_P31 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 14, (given according to their position(s) 25 on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 751 SNP positii(s) on a1nino acid Alternative amino acid(s) Previsly known SNP? sequenCe 16 H->R Yes 18 E->V Yes The glycosylation sites of variant protein HUMPHOSLIPPEA 2 P31, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column 5 indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s) Positiol](s) oi known amino. Present M variait protein? Position In variant protein? acid sequence 94 no 143 no 64 yes 64 245 no 398 no 117 no Variant protein HUMPHOSLIP_PEA_2_P31 is encoded by the following transcript(s): 10 HUMPHOSLIP_PEA_2_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T7 is shown in bold; this coding portion starts at position 276 and ends at position 569. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; 15 the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs WO 2005/116850 PCT/IB2005/002555 752 SNP position on nucleotide Alternative nucleic acid Pr eiously known SNP? 174 G->T No 175 A->T No 322 A ->G Yes 328 A->T Yes 431 G->A Yes 608 G->C Yes 736 G->A Yes 738 C->A Yes 846 C -> No 846 C->A No 928 C -> T Yes 990 C->T Yes 1005 T-> No 1071 G -> No 1071 G->C No 1100 A-> No 1130 C-> No 1253 C -> No 1414 C -> T Yes 1474 G -> No 1505 C -> T Yes 1526 C -> No 1596 T->A No 1607 C -> T Yes 1642 G-> C Yes 1699 T->A Yes 1827 C -> A Yes 1846 C -> T No WO 2005/116850 PCT/IB2005/002555 753 1926 C ->T Yes 1942 A->G No 1942 A->T No 2007 A->G Yes 2031 T->A No 2108 G->T Yes 2241 G->A Yes 2270 G -> T Yes 2450 G->A Yes 2646 C -> T Yes 2702 G->T Yes 2842 C->T Yes 2943 C -> T Yes 3258 A->G No Variant protein HUMPHOSLIP_PEA _2_P33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIP_PEA_2_T14. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HUMPHOSLIP_PEA_2_P33 and PLTP_HUMAN: 1 .An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA 2_P33, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS 15 AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ corresponding to amino acids 1 - 183 of PLTP_HUMAN, which also corresponds to amino acids 1 - 183 of HUMPHOSLIP_PEA 2_P33, and a second amino acid sequence being WO 2005/116850 PCT/IB2005/002555 754 at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL corresponding to amino acids 184 - 200 of HUMPHOSLIP_PEA_2 P33, wherein said first amino acid sequence and second amino acid 5 sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA 2 P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP_PEA_2_P33. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 15 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP_PEA_2_P33 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates 20 whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations SNPposition(s)w onaminoacidac(s) Previously known SNP. sequece)C 16 H->R Yes 18 E->V Yes 81 D -> H Yes 124 S -> Y Yes 160 T-> No WO 2005/116850 PCT/IB2005/002555 755 160 T->N No The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P33, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column 5 indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s) Pos itionu(s) oy kownino Present i variny protein? I Ivaant prn? acid sequence ,. 94 yes 94 143 yes 143 64 yes 64 245 no 398 no 117 yes 117 Variant protein HUMPHOSLIPPEA2 P33 is encoded by the following transcript(s): 10 HUMPHOSLIPPEA2 T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2 T14 is shown in bold; this coding portion starts at position 276 and ends at position 875. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs SN P position on nlIetid Alternative nucleic aid ~ Previousjy iown SNP? .sequen. .ce :sequence 2: q : , - WO 2005/116850 PCT/IB2005/002555 756 174 G->T No 175 A->T No 322 A->G Yes 328 A->T Yes 431 G->A Yes 516 G->C Yes 644 G->A Yes 646 C->A Yes 754 C -> No 754 C -> A No 921 C -> T Yes 983 C -> T Yes 998 T-> No 1064 G -> No 1064 G->C No 1093 A -> No 1123 C -> No 1246 C -> No 1407 C -> T Yes 1467 G-> No 1498 C->T Yes 1519 C -> No 1589 T->A No 1600 C->T Yes 1635 G->C Yes 1692 T->A Yes 1820 C -> A Yes 1839 C -> T No 1919 C->T Yes 1935 A-> G No WO 2005/116850 PCT/IB2005/002555 757 1935 A->T No 2000 A -> G Yes 2024 T->A No 2101 G->T Yes 2234 G -> A Yes 2263 G -> T Yes 2443 G->A Yes 2639 C ->T Yes 2695 G -> T Yes 2835 C -> T Yes 2936 C ->T Yes 3251 A->G No Variant protein HUMPHOSLIP_PEA_2_P34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIP_PEA_2 T16. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HUMPHOSLIP PEA_2_P34 and PLTP_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P34, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS 15 AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPV corresponding to amino acids 1 - 205 of PLTPHUMAN, which also corresponds to amino acids 1 - 205 of HUMPHOSLIPPEA_2 P34, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least WO 2005/116850 PCT/IB2005/002555 758 95% homologous to a polypeptide having the sequence LWTSLLALTIPS corresponding to amino acids 206 - 217 of HUMPHOSLIP_PEA_2_P34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2 P34, comprising 5 a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LWTSLLALTIPS in HUMPHOSLIP_PEA_2_P34. The location of the variant protein was determined according to results from a number of 10 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 15 Variant protein HUMPHOSLIP_PEA_2_P34 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P34 sequence provides support for the deduced sequence of this 20 variant protein according to the present invention). Table 20 - Amino acid mutations SNP positions) onuino acid Alternative amino Icid(s) Previously known SNP? sequence 16 H->R Yes 18 E->V Yes 81 D->H Yes 124 S -> Y Yes 160 T-> No 160 T->N No 211 L-> No WO 2005/116850 PCT/IB2005/002555 759 The glycosylation sites of variant protein HUMPHOSLIPPEA_2_P34, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column 5 indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s) Positions) on known aunino Present In variant protein? Position in variant protein? acid sequenace 94 yes 94 143 yes 143 64 yes 64 245 no 398 no 117 yes 117 Variant protein HUMPHOSLIP_PEA_2_P34 is encoded by the following transcript(s): 10 HUMPHOSLIP_PEA_2_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2 T16 is shown in bold; this coding portion starts at position 276 and ends at position 926. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA 2_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs SNP positionon nucleotide Alterative nuclicacid a pf- Previously known SNP'.? sequence 174 G->T No WO 2005/116850 PCT/IB2005/002555 760 175 A->T No 322 A-> G Yes 328 A->T Yes 431 G->A Yes 516 G->C Yes 644 G->A Yes 646 C ->A Yes 754 C -> No 754 C -> A No 836 C -> T Yes 891 C -> T Yes 906 T-> No 972 G-> No 972 G-> C No 1001 A-> No 1031 C -> No 1154 C -> No 1315 C -> T Yes 1375 G -> No 1406 C -> T Yes 1427 C -> No 1497 T->A No 1508 C -> T Yes 1543 G->C Yes 1600 T->A Yes 1728 C -> A Yes 1747 C -> T No 1827 C->T Yes 1843 A->G No 1843 A->T No WO 2005/116850 PCT/IB2005/002555 761 1908 A->G Yes 1932 T->A No 2009 G -> T Yes 2142 G->A Yes 2171 G->T Yes 2351 G->A Yes 2547 C -> T Yes 2603 G -> T Yes 2743 C -> T Yes 2844 C -> T Yes 3159 A->G No Variant protein HUMPHOSLIP_PEA_2 P35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMPHOSLIP_PEA_2_T18. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between HUMPHOSLIP_PEA_2_P35 and PLTP_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2 P35, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF corresponding to 15 amino acids 1 - 109 of PLTPHUMAN, which also corresponds to amino acids 1 - 109 of HUMPHOSLIPPEA_2 P35, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQ corresponding to amino acids 163 - 183 of PLTP_HUMAN, which also corresponds to amino acids 111 - 131 of HUMPHOSLIP_PEA_2 P35, and a fourth 20 amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more WO 2005/116850 PCT/IB2005/002555 762 preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL corresponding to amino acids 132 - 148 of HUMPHOSLIP PEA 2_P35, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a 5 sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMPHOSLIP PEA_2_P35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 10 least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a structure as follows (numbering according to HUMPHOSLIPPEA_2_P35): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2. 3.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA _2 P35, comprising 15 a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP_PEA_2_P35. The location of the variant protein was determined according to results from a number of 20 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 25 Variant protein HUMPHOSLIP_PEA 2_P35 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA 2_P35 sequence provides support for the deduced sequence of this 30 variant protein according to the present invention). Table 23 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 763 SNPposition(s)on miino acId Altenate n d() Prevnously known SNP? " sequce I...C 16 H ->R Yes 18 E->V Yes 81 D -> H Yes The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P35, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column 5 indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s) PosAii(s) Onl kTnw nino Pr-esent in varint protein'? Position in varuiant proteilf? tcid seqience 94 yes 94 143 no 64 yes 64 245 no 398 no 117 no Variant protein HUMPHOSLIP_PEA_2_P35 is encoded by the following transcript(s): 10 HUMPHOSLIP_PEA_2_T18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2 T18 is shown in bold; this coding portion starts at position 276 and ends at position 719. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is 15 known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 764 Table 25 - Nucleic acid SNPs SNP position on nucleotide Alternative nucleic acid Previously known SNP? seatence 174 G-> T No 175 A-> T No 322 A ->G Yes 328 A-> T Yes 431 G ->A Yes 516 G ->C Yes 765 C ->T Yes 827 C ->T Yes 842 T-> No 908 G-> No 908 G-> C No 937 A-> No 967 C -> No 1090 C -> No 1251 C -> T Yes 1311 G-> No 1342 C -> T Yes 1363 C -> No 1433 T->A No 1444 C -> T Yes 1479 G->C Yes 1536 T->A Yes 1664 C -> A Yes 1683 C -> T No 1763 C->T Yes 1779 A->G No 1779 A->T No WO 2005/116850 PCT/IB2005/002555 765 1844 A->G Yes 1868 T->A No 1945 G->T Yes 2078 G->A Yes 2107 G->T Yes 2287 G->A Yes 2483 C ->T Yes 2539 G->T Yes 2679 C -> T Yes 2780 C -> T Yes 3095 A-> G No As noted above, cluster HUMPHOSLIP features 53 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present 5 invention is nowprovided. Segment cluster HUMPHOSLIP_PEA_2_node_0 according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 10 HUMPHOSLIPPEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA 2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Transcnpt name Segment segment HLstrgposition U ending position HUMPHOSLIP PEA 2 T6 1 264 HUMPHOSLIP PEA 2 T7 1 264 HUMPHOSLIP PEA 2 T14 1 264 HUMPHOSLIP PEA_2 T16 1 264 WO 2005/116850 PCT/IB2005/002555 766 HUMPHOSLIP_PEA_2_T17 1 264 HUMPHOSLIPPEA_2 T18 1 264 HUMPHOSLIP PEA 2_T19 1 264 Segment cluster HUMPHOSLIP_PEA_2_node_19 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA 2_T14, HUMPHOSLIPPEA_2_T16 and HUMPHOSLIP_PEA_2_T19. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts Transcrpt nnme Semnt Segmenit staring posItion1 clIding positionI HUMPHOSLIP PEA 2 T6 559 714 HUMPHOSLIP PEA 2 T7 697 852 HUMPHOSLIP PEA 2 T14 605 760 HUMPHOSLIP PEA 2 T16 605 760 HUMPHOSLIP PEA 2 T19 605 760 10 Segment cluster HUMPHOSLIP PEA 2 node 34 according to the present invention is supported by 191 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPPEA_2 T6, 15 HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP PEA_2_T16, HUMPHOSLIP PEA 2 T17, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 767 Trnscrpt name Segment Segment starting posItion endIng, position HUMPHOSLIP PEA 2 T6 971 1111 HUMPHOSLIP PEA 2 T7 1109 1249 HUMPHOSLIP PEA 2 T14 1102 1242 HUMPHOSLIP PEA,2 T16 1010 1150 HUMPHOSLIP PEA 2 T17 732 872 HUMPHOSLIP PEA 2 T18 946 1086 HUMPHOSLIP PEA 2 T19 1017 1157 Segment cluster HUMPHOSLIP_PEA_2_node_68 according to the present invention is supported by 131 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP PEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIPPEA_2 T17, HUMPHOSLIP PEA_2_T18 and HUMPHOSLIPPEA_2_T19. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts Transcript name Segment Sgnt starting( position sending" positions HUMPHOSLIP PEA 2 T6 1867 2285 HUMPHOSLIP PEA 2 T7 2005 2423 HUMPHOSLIP PEA 2 T14 1998 2416 HUMPHOSLIP PEA 2 T16 1906 2324 HUMPHOSLIP PEA 2 T17 1628 2046 HUMPHOSLIP PEA 2 T18 1842 2260 HUMPHOSLIP PEA 2 T19 1996 2414 10 WO 2005/116850 PCT/IB2005/002555 768 Segment cluster HUMPHOSLIP_PEA_2_node_70 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, HUMPHOSLIP_PEA _2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, 5 HUMPHOSLIPPEA 2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Transcript name Segment Segment starting Position ending Position HUMPHOSLIP PEA 2 T6 2298 2529 HUMPHOSLIP PEA 2 T7 2436 2667 HUMPHOSLIP PEA 2 T14 2429 2660 HUMPHOSLIP PEA 2 T16 2337 2568 HUMPHOSLIP PEA 2 T1 7 2059 2290 HUMPHOSLIP PEA 2 T18 2273 2504 HUMPHOSLIP PEA 2 T19 2427 2658 10 Segment cluster HUMPHOSLIP_PEA_2 node_75 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIPPEA 2 T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIP_PEA 2 T19. 15 Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts Transcript name Segment Segmn I stang position Positn HUMPHOSLIPPEA 2 T6 2846 3125 HUMPHOSLIP_PEA 2_T7 2984 3263 WO 2005/116850 PCT/IB2005/002555 769 HUMPHOSLIPPEA_2_T14 2977 3256 HUMPHOSLIPPEA_2_T16 2885 3164 HUMPHOSLIP PEA 2_T17 2607 2886 HUMPHOSLIPPEA _2 T18 2821 3100 HUMPHOSLIP PEA_2 T19 2975 3254 According to an optional embodiment of the present invention, short segments related to 5 the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HUMPHOSLIPPEA_2_node_2 according to the present invention is supported by 159 libraries. The number of libraries was determined as previously described. 10 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2 T6, HUMPHOSLIP_PEA 2_T7, HUMPHOSLIPPEA_2 T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts Tramscript name Segment Segment starting position ending position HUMPHOSLIP PEA 2 T6 265 337 HUMPHOSLIP PEA 2 T7 265 337 HUMPHOSLIP PEA 2 T14 265 337 HUMPHOSLIP PEA 2 T16 265 337 HUMPHOSLIP PEA 2 T17 265 337 HUMPHOSLIP PEA 2 T18 265 337 HUMPHOSLIP PEA 2 T19 265 337 15 WO 2005/116850 PCT/IB2005/002555 770 Segment cluster HUMPHOSLIP_PEA 2_node_3 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA 2_T16, HUMPHOSLIPPEA 2 T17, HUMPHOSLIPPEA_2_TI8 and HUMPHOSLIPPEA 2_T19. Table 33 below describes the 5 starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts Trascrpt ameSegentSegmlent start n pig positi HUMPHOSLIP PEA_2 T7 338 355 HUMPHOSLIPPEA_2_T14 338 355 HUMPHOSLIP_PEA 2_T76 338 355 HUMPHOSLIP PEA 2_T14 338 355 HUMPHOSLIPPEA 2 T16 338 355 HUMPHOSLIP PEA 2 T17 338 355 HUMPHOSLIPPEA 2 T18 338 355 HUMPHOSLIPPEA 2 T19 338 355 Segment cluster HUMPHOSLIP_PEA 2_node_4 according to the present invention can 10 be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIP_PEA_2 T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA_2_T19. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Tr anscri'pt niame Segmnit Segmmt starting position ending position HUMPHOSLIP PEA 2 T7 356 375 HUMPHOSLIP PEA 2_T14 356 375 HUMPHOSLIP PEA 2 T16 356 375 HUMPHOSLIP PEA 2 T17 356 375 HUMPHOSLIP PEA 2 T18 356 375 WO 2005/116850 PCT/IB2005/002555 771 HUMPHOSLIPPEA_2_T19 356 375 Segment cluster HUMPHOSLIP_PEA_2_node_6 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7, 5 HUMPHOSLIP PEA_2_T14, HUMPHOSLIP PEA_2_T16, HUMPHOSLIP PEA 2 T17, HUMPHOSLIPPEA _2_T18 and HUMPHOSLIPPEA_2_T19. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts T1anscipt nine SegmentSegm stating position ending positi HUMPHOSLIP PEA 2 T7 376 383 HUMPHOSLIP PEA 2 T14 376 383 HUMPHOSLIPPEA_2_T16 376 383 HUMPHOSLIPPEA_2 T17 376 383 HUMPHOSLIP PEA_2_T18 376 383 HUMPHOSLIP PEA_2 T19 376 383 10 Segment cluster HUMPHOSLIP_PEA_2_node_7 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2 T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA 2 T14, HUMPHOSLIPPEA 2 T16, HUMPHOSLIP_PEA_2 T17, HUMPHOSLIP_PEA 2_T18 and HUMPHOSLIP_PEA 2_T19. 15 Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts starting position endng position HUMPHOSLIPPEA 2 T6 338 343 HUMPHOSLIP PEA 2_T7 384 389 WO 2005/116850 PCT/IB2005/002555 772 HUMPHOSLIP PEA 2 T14 384 389 HUMPHOSLIP PEA 2 T16 384 389 HUMPHOSLIP PEA 2 T17 384 389 HUMPHOSLIP PEA 2 TI8 384 389 HUMPHOSLIP PEA 2 T19 384 389 Segment cluster HUMPHOSLIP_PEA_2_node_8 according to the present invention is supported by 171 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIPPEA 2 T17, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 37 below describes the starting and ending position of this segment on each transcript. Table 3 7- Segment location on transcripts stalrting' position ending Position HUMPHOSLIP PEA_2_T6 344 378 HUMPHOSLIPPEA_2 T7 390 424 HUMPHOSLIP PEA 2_T14 390 424 HUMPHOSLIPPEA_2 T16 390 424 HUMPHOSLIPPEA_2 T17 390 424 HUMPHOSLIP PEA 2 T18 390 424 HUMPHOSLIP PEA 2 T19 390 424 10 Segment cluster HUMPHOSLIPPEAI2Pnode_9 according to the present invention is supported by 168 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPPEA2T6, 15 HUMPHOSLIP PEA 2 T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA 2 T16, WO 2005/116850 PCT/IB2005/002555 773 HUMPHOSLIP PEA_2_T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts Frahnscript namne Segment Sement starting position endn ositjI HUMPHOSLIP PEA_2_T6 379 429 HUMPHOSLIP PEA_2_T7 425 475 HUMPHOSLIP PEA 2_T14 425 475 HUMPHOSLIP PEA 2_T16 425 475 HUMPHOSLIP PEA 2 T17 425 475 HUMPHOSLIP PEA_2_T18 425 475 HUMPHOSLIP PEA 2 T19 425 475 5 Segment cluster HUMPHOSLIP_PEA_2_node_14 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7. Table 39 below describes the starting and ending position of this segment on each transcript. 10 Table 39 - Segment location on transcripts Transcript name Segment Segent startigposition. endig position HUMPHOSLIPPEA_2_T7 476 567 Segment cluster HUMPHOSLIPPEA 2 node_ 5 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2 T6, 15 HUMPHOSLIPPEA 2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP PEA_2_T18 and HUMPHOSLIPPEA_2_T19. Table 40 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 774 Table 40 - Segment location on transcripts Transcript name SSgment string position ending position HUMPHOSLIP PEA 2 T6 430 445 HUMPHOSLIP PEA 2 T7 568 583 HUMPHOSLIP PEA 2 T14 476 491 HUMPHOSLIP PEA 2 T16 476 491 HUMPHOSLIP PEA 2 T18 476 491 HUMPHOSLIPPEA 2 T19 476 491 Segment cluster HUMPHOSLIP_PEA_2_node_16 according to the present invention is 5 supported by 179 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA 2 T16, HUMPHOSLIPPEA 2_T18 and HUMPHOSLIPPEA_2_T19. Table 41 below describes the starting and ending position of this segment on each transcript. 10 Table 41 - Segment location on transcripts Transcript name Segment Seghent starting i, o t cn i.. . surting 1 poito endin position HUMPHOSLIP PEA 2 T6 446 534 HUMPHOSLIP PEA 2 T7 584 672 HUMPHOSLIP PEA 2 T14 492 580 HUMPHOSLIP PEA 2 T16 492 580 HUMPHOSLIP PEA 2 T18 492 580 HUMPHOSLIP PEA_2_T19 492 580 Segment cluster HUMPHOSLIP_PEA_2_node_17 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, WO 2005/116850 PCT/IB2005/002555 775 HUMPHOSLIP PEA_2_T7, HUMPHOSLIPPEA 2_T14, HUMPHOSLIP PEA 2 T16, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA_2 T19. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts Triscript ine Segmn't Segment staring position ending position HUMPHOSLIP PEA 2 T6 535 558 HUMPHOSLIP PEA 2 T7 673 696 HUMPHOSLIP PEA 2 T14 581 604 HUMPHOSLIP PEA 2 T16 581 604 HUMPHOSLIP PEA 2 T18 581 604 HUMPHOSLIP PEA 2 T19 581 604 5 Segment cluster HUMPHOSLIP_PEA_2_node_23 according to the present invention is supported by 168 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 10 HUMPHOSLIP_PEA_2 T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA 2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA_2_TI18 and HUMPHOSLIPPEA_2 T19. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts Transcript name Semet eget starting position en sition HUMPHOSLIP PEA 2 T6 715 766 HUMPHOSLIP PEA 2 T7 853 904 HUMPHOSLIP PEA 2 T14 761 812 HUMPHOSLIP PEA 2 T16 761 812 HUMPHOSLIP PEA 2 T17 476 527 HUMPHOSLIP PEA 2 T18 605 656 WO 2005/116850 PCT/IB2005/002555 776 HUMPHOSLIPPEA_2 T19 761 812 Segment cluster HUMPHOSLIP_PEA_2_node_24 according to the present invention can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, 5 HUMPHOSLIPPEA_2 T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA 2 T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2 T19. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts Trainscript nanIe Segment Segment starting position ending position HUMPHOSLIP PEA 2 T6 767 778 HUMPHOSLIP PEA 2 T7 905 916 HUMPHOSLIP PEA 2 T14 813 824 HUMPHOSLIP PEA 2 T16 813 824 HUMPHOSLIP PEA 2 T17 528 539 HUMPHOSLIP PEA 2 T18 657 668 HUMPHOSLIP PEA 2 T19 813 824 10 Segment cluster HUMPHOSLIP_PEA_2_node_25 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_TI4 and HUMPHOSLIP_PEA_2_T18. Table 45 below describes the starting and ending position of this 15 segment on each transcript. Table 45 - Segment location on transcripts Transcript 2T8name S 0meint egment starting position ending position HUMPHOSLIPPEA_2_TI4 825 909 WO 2005/116850 PCT/IB2005/002555 777
HUMPHOSLIPPEA
-
2_T18 669 753 Segment cluster HUMPHOSLIPPEA_2 node_26 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPPEA 2 T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts Transcripjt niame Segmntt SegmIlent stitrting( Position ending positions HUMPHOSLIPPEA_2_T6 779 842 HUMPHOSLIPPEA_2_T7 917 980 HUMPHOSLIP PEA 2 T14 910 973 HUMPHOSLIP PEA 2 T16 825 888 HUMPHOSLIPPEA 2_T17 540 603 HUMPHOSLIP PEA 2 T18 754 817 HUMPHOSLIPPEA 2_T19 825 888 10 Segment cluster HUMPHOSLIPPEA_2_node_29 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA 2 T7, HUMPHOSLIP_PEA 2_T14, HUMPHOSLIPPEA 2 T17, 15 HUMPHOSLIPPEA 2 T18 and HUMPHOSLIP PEA_2_T19. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts Transcript name Segment Segment staring position ending position WO 2005/116850 PCT/IB2005/002555 778 HUMPHOSLIPPEA 2 T6 843 849 HUMPHOSLIP PEA_2_T7 981 987 HUMPHOSLIP PEA_2_T14 974 980 HUMPHOSLIP PEA_2 T17 604 610 HUMPHOSLIPPEA 2_T18 818 824 HUMPHOSLIP PEA 2_T19 889 895 Segment cluster HUMPHOSLIP_PEA_2_node_30 according to the present invention is supported by 181 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, HUMPHOSLIP_PEA_2 T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPPEA 2 T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts Transtcnpt mne . Seen - -~'-~startingpi, ton, position u IW HUMPHOSLIP PEA 2 T6 850 . 934 HUMPHOSLIP PEA 2 T7 988 1072 HUMPHOSLIP PEA 2 T14 981 1065 HUMPHOSLIP PEA 2 T16 889 973 HUMPHOSLIP PEA 2 T17 611 695 HUMPHOSLIP PEA 2 T18 825 909 HUMPHOSLIP PEA 2 T19 896 980 10 Segment cluster HUMPHOSLIP_PEA_2_node_33 according to the present invention is supported by 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 15 HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, WO 2005/116850 PCT/IB2005/002555 779 HUMPHOSLIPPEA_2_T17, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts TranscnIpt name SegmeCnt Sgm-nt saLrtng position endmg positIOn HUMPHOSLIP PEA 2 T6 935 970 HUMPHOSLIP PEA 2 T7 1073 1108 HUMPHOSLIP PEA 2 T14 1066 1101 HUMPHOSLIPPEA_2 T16 974 1009 HUMPHOSLIP PEA 2 T17 696 731 HUMPHOSLIP PEA 2 T18 910 945 HUMPHOSLIP PEA 2 T19 981 1016 5 Segment cluster HUMPHOSLIP_PEA_2_node_36 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2 T7, HUMPHOSLIPPEA 2_T14, HUMPHOSLIP PEA_2_T16, 10 HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2_T18 and HUMPHOSLIPPEA_2_T19. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts starting posItioposition poito m(osto HUMPHOSLIP PEA 2 T6 1112 1156 HUMPHOSLIP PEA 2 T7 1250 1294 HUMPHOSLIP PEA 2 T14 1243 1287 HUMPHOSLIP PEA 2 T16 1151 1195 HUMPHOSLIPPEA 2 T17 873 917 HUMPHOSLIP PEA 2 T18 1087 1131 WO 2005/116850 PCT/IB2005/002555 780 HUMPHOSLIP_PEA 2_T19 1158 1202 Segment cluster HUMPHOSLIP_PEA 2_node_37 according to the present invention can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, 5 HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA 2 T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA_2 T19. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts Transcr-ipt imnme' Segment Segalent starting position ending position HUMPHOSLIP PEA_2_T6 1157 1171 HUMPHOSLIP PEA 2 T7 1295 1309 HUMPHOSLIP PEA 2_T14 1288 1302 HUMPHOSLIPPEA 2 T16 1196 1210 HUMPHOSLIP PEA 2_T17 918 932 HUMPHOSLIP PEA 2 T18 1132 1146 HUMPHOSLIP PEA 2_T19 1203 1217 10 Segment cluster HUMPHOSLIPPEA 2 node_39 according to the present invention is supported by 166 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, HUMPHOSLIPPEA 2 T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA 2 T16, 15 HUMPHOSLIPPEA 2 T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA 2 T19. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts Transcript name Segment Segment starting position ending [position WO 2005/116850 PCT/IB2005/002555 781 HUMPHOSLIP PEA 2_T6 1172 1201 HUMPHOSLIP PEA 2 T7 1310 1339 HUMPHOSLIP PEA 2 T14 1303 1332 HUMPHOSLIP PEA 2_T16 1211 1240 HUMPHOSLIP PEA 2_T17 933 962 HUMPHOSLIPPEA 2 T18 1147 1176 HUMPHOSLIP PEA 2_T19 1218 1247 Segment cluster HUMPHOSLIP_PEA_2_node_40 according to the present invention is supported by 199 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP PEA 2 T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA_2_T19. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts Transcript iame Segment Segment starting positon ndin position HUMPHOSLIP PEA 2 T6 1202 1288 HUMPHOSLIP PEA 2 T7 1340 1426 HUMPHOSLIP PEA 2 T14 1333 1419 HUMPHOSLIP PEA 2 T16 1241 1327 HUMPHOSLIP PEA 2 T17 963 1049 HUMPHOSLIP PEA 2 T18 1177 1263 HUMPHOSLIP PEA 2 T19 1248 1334 10 Segment cluster HUMPHOSLIP_PEA_2 node_41 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, WO 2005/116850 PCT/IB2005/002555 782 HUMPHOSLIP _PEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP PEA 2 T16, HUMPHOSLIPPEA_2_TI7, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA_2_T19. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts Trnuscript name SgnSme SWarIng( posItion) ending p)ositionl HUMPHOSLIP PEA 2 T6 1289 1318 HUMPHOSLIP PEA 2 T7 1427 1456 HUMPHOSLIP PEA 2 T14 1420 1449 HUMPHOSLIPPEA 2 T16 1328 1357 HUMPHOSLIP PEA 2 T17 1050 1079 HUMPHOSLIP PEA 2_T18 1264 1293 HUMPHOSLIP PEA 2 T19 1335 1364 5 Segment cluster HUMPHOSLIP_PEA_2_node_42 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, 10 HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA 2 T19. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts Transcnpt-name Ssqgmnt starting positioI C'eiidig position) HUMPHOSLIP PEA 2 T6 1319 1336 HUMPHOSLIP PEA 2 T7 1457 1474 HUMPHOSLIP PEA 2 T14 1450 1467 HUMPHOSLIP PEA 2_T16 1358 1375 HUMPHOSLIPPEA 2_T17 1080 1097 HUMPHOSLIPPEA 2 T18 1294 1311 WO 2005/116850 PCT/IB2005/002555 783 HUMPHOSLIPPEA_2_TI9 1365 1382 Segment cluster HUMPHOSLIPPEA_2 node_44 according to the present invention is supported by 185 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIPPEA 2_T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIP PEA_2_T19. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56- Segment location on transcripts Transcript name SegmentSegmen t position ending positi HUMPHOSLIP PEA 2 T6 1337 1363 HUMPHOSLIP PEA 2 T7 1475 1501 HUMPHOSLIP PEA 2 T14 1468 1494 HUMPHOSLIP PEA 2 T16 1376 1402 HUMPHOSLIP PEA 2 T17 1098 1124 HUMPHOSLIP PEA 2 T18 1312 1338 HUMPHOSLIP PEA 2 T19 1383 1409 10 Segment cluster HUMPHOSLIP_PEA 2_node_45 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 15 HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T1 6, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA 2 T19. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57- Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 784 Transcript name Segic Segment starting position c~ndlng, Position~:'1 HUMPHOSLIP PEA 2 T6 1364 1404 HUMPHOSLIP PEA 2 T7 1502 1542 HUMPHOSLIP PEA 2 T14 1495 1535 HUMPHOSLIP PEA 2 T16 1403 1443 HUMPHOSLIP PEA 2 T17 1125 1165 HUMPHOSLIP PEA 2 T18 1339 1379 HUMPHOSLIP PEA 2 T19 1410 1450 Segment cluster HUMPHOSLIP_PEA _2_node_47 according to the present invention is supported by 223 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA 2 T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP PEA 2 T16, HUMPHOSLIP PEA 2_T17, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIPPEA_2_T19. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts Tra-nscri-pt nime Segmnit Segmnent starting position endn p, -ositi HUMPHOSLIP PEA 2 T6 1405 1447 HUMPHOSLIP PEA 2 T7 1543 1585 HUMPHOSLIP PEA 2 T14 1536 1578 HUMPHOSLIP PEA 2 T16 1444 1486 HUMPHOSLIP PEA 2 T17 1166 1208 HUMPHOSLIP PEA 2 T18 1380 1422 HUMPHOSLIP PEA 2 T19 1451 1493 10 WO 2005/116850 PCT/IB2005/002555 785 Segment cluster HUMPHOSLIP_PEA_2_node_51 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA 2 T7, HUMPHOSLIP_PEA 2_T14, HUMPHOSLIPPEA 2 T16, HUMPHOSLIPPEA 2_T17, HUMPHOSLIPPEA_2 T18 and HUMPHOSLIPPEA_2 T19. 5 Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts Transcript name Segment Segment saItring posito & ending position HUMPHOSLIP PEA 2 T6 1448 1462 HUMPHOSLIP PEA 2 T7 1586 1600 HUMPHOSLIP PEA 2 T14 1579 1593 HUMPHOSLIP PEA 2 T16 1487 1501 HUMPHOSLIP PEA 2 T17 1209 1223 HUMPHOSLIP PEA 2 T18 1423 1437 HUMPHOSLIP PEA 2 T19 1494 1508 Segment cluster HUMPHOSLIP_PEA_2_node_52 according to the present invention is 10 supported by 235 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2 T6, HUMPHOSLIPPEA 2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA 2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA 2_T19. Table 60 below describes the starting and ending position of this segment on each transcript. 15 Table 60 - Segment location on transcripts Transcript name t Segment Segment starting p~ositionl ending position HUMPHOSLIPPEA 2 T6 1463 1511 HUMPHOSLIP PEA 2 T7 1601 1649 HUMPHOSLIP PEA 2 T14 1594 1642 WO 2005/116850 PCT/IB2005/002555 786 HUMPHOSLIP PEA_2_T16 1502 1550 HUMPHOSLIP PEA 2_T17 1224 1272 HUMPHOSLIP PEA 2 T18 1438 1486 HUMPHOSLIP PEA 2 T19 1509 1557 Segment cluster HUMPHOSLIPPEA_2_node_53 according to the.present invention is supported by 5 libraries. The number of libraries was determined as previously described. This 5 segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T19. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts Transcript name Segment Segment startmg position lending position HUMPHOSLIP PEA 2 T19 1558 1640 10 Segment cluster HUMPHOSLIPPEA_2_node_54 according to the present invention is supported by 236 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. 15 Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts Tramer1scipt namec Segmenclt Segment starting position Lending position HUMPHOSLIPPEA 2 T6 1512 1552 HUMPHOSLIP PEA 2 T7 1650 1690 HUMPHOSLIP PEA 2 T14 1643 1683 HUMPHOSLIP PEA 2 T16 1551 1591 WO 2005/116850 PCT/IB2005/002555 787 HUMPHOSLIPPEA_2 T17 1273 1313 HUMPHOSLIP PEA_2_T18 1487 1527 HUMPHOSLIP_PEA_2 T19 1641 1681 Segment cluster HUMPHOSLIP_PEA_2_node_55 according to the present invention is supported by 232 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIPPEA 2 T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA_2 T16, HUMPHOSLIPPEA_2 T17, HUMPHOSLIPPEA_2 T18 and HUMPHOSLIP PEA 2_T19. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts Transript name Segment Sement startIng positIonl ending position HUMPHOSLIP PEA_2_T6 1553 1588 HUMPHOSLIP PEA 2_T7 1691 1726 HUMPHOSLIPPEA_2 TI14 1684 1719 HUMPHOSLIPPEA_2 T16 1592 1627 HUMPHOSLIPPEA_2 TI17 1314 1349 HUMPHOSLIPPEA_2 T18 1528 1563 HUMPHOSLIP PEA_2_T19 1682 1717 10 Segment cluster HUMPHOSLIP_PEA_2_node_58 according to the present invention can be found in the following transcript(s): HUMPHOSLIPPEA 2 T6, HUMPHOSLIP_PEA 2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, 15 HUMPHOSLIPPEA 2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIP PEA 2 T19. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 788 Transcript name 1 Segmen Segment star ting position ending position HUMPHOSLIP PEA_2_T6 1589 1612 HUMPHOSLIPPEA_2 T7 1727 1750 HUMPHOSLIP PEA_2 T14 1720 1743 HUMPHOSLIP PEA_2 T16 1628 1651 HUMPHOSLIP PEA 2 T17 1350 1373 HUMPHOSLIP PEA 2_T18 1564 1587 HUMPHOSLIPPEA 2 T19 1718 1741 Segment cluster HUMPHOSLIP_PEA_2 node_59 according to the present invention is supported by 230 libraries. The number of libraries was determined as previously described. 5 This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIP PEA 2_T14, HUMPHOSLIP PEA 2_T16, HUMPHOSLIP PEA 2 T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts Tianscipt name S&I1i eg tScgmienet startingT positionl elrIin p)Sosto HUMPHOSLIP PEA 2 T6 1613 1648 HUMPHOSLIP PEA 2 T7 1751 1786 HUMPHOSLIP PEA 2 T14 1744 1779 HUMPHOSLIP PEA 2 T16 1652 1687 HUMPHOSLIP PEA 2 T17 1374 1409 HUMPHOSLIP PEA 2 T18 1588 1623 HUMPHOSLIP PEA 2 T19 1742 1777 10 WO 2005/116850 PCT/IB2005/002555 789 Segment cluster HUMPHOSLIP_PEA_2_node_60 according to the present invention can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, HUMPHOSLIP PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPPEA_2 T17, HUMPHOSLIPPEA_2 T18 and HUMPHOSLIPPEA 2 T19. 5 Table 66 below describes the starting and ending position of this segment on each transcript. Table 66- Segment location on transcripts Transcript name Segnknt Seument starting positIon1 ending position HUMPHOSLIPPEA_2_T6 1649 1671 HUMPHOSLIP PEA 2 T7 1787 1809 HUMPHOSLIP PEA 2_ T14 1780 1802 HUMPHOSLIP PEA 2 T16 1688 1710 HUMPHOSLIP PEA 2 T17 1410 1432 HUMPHOSLIP PEA_2_T18 1624 1646 HUMPHOSLIP PEA 2_T19 1778 1800 Segment cluster HUMPHOSLIP_PEA 2 node_61 according to the present invention can 10 be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA 2_T18 and HUMPHOSLIP_PEA_2_T19. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts Transcript name Segment Scgment start ing positIOn endm poito HUMPHOSLIP PEA 2 T6 1672 1680 HUMPHOSLIP PEA 2 T7 1810 1818 HUMPHOSLIP PEA 2 T14 1803 1811 HUMPHOSLIP PEA 2 TI6 1711 1719 WO 2005/116850 PCT/IB2005/002555 790 HUMPHOSLIPPEA 2_T17 1433 1441 HUMPHOSLIPPEA 2_T18 1647 1655 HUMPHOSLIP PEA 2 T19 1801 1809 Segment cluster HUMPHOSLIP_PEA_2_node_62 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, 5 HUMPHOSLIPPEA_2_T7, HUMPHOSLIPPEA_2 T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts Tratscrijt mnmeSementegment" starting positiond ending position HUMPHOSLIP PEA_2_T6 1681 1703 HUMPHOSLIP PEA 2 T7 1819 1841 HUMPHOSLIP PEA_2 T14 1812 1834 HUMPHOSLIPPEA_2_T16 1720 1742 HUMPHOSLIPPEA_2_T17 1442 1464 HUMPHOSLIP PEA _2 T18 1656 1678 HUMPHOSLIP PEA 2_T19 1810 1832 10 Segment cluster HUMPHOSLIPPEA_2_node_63 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP PEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. 15 Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 791 Transcript namie Sgment Segment . sItartling positiori ending positIon HUMPHOSLIP PEA_2 T6 1704 1727 HUMPHOSLIP PEA 2 T7 1842 1865 HUMPHOSLIP PEA 2 T14 1835 1858 HUMPHOSLIP PEA 2 T16 1743 1766 HUMPHOSLIP PEA 2 T17 1465 1488 HUMPHOSLIP PEA 2 T18 1679 1702 HUMPHOSLIP PEA 2 T19 1833 1856 Segment cluster HUMPHOSLIP_PEA_2_node_64 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 5 HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2 T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA 2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts Transcript naime Segmet egment startingpoition ending position HUMPHOSLIP PEA 2 T6 1728 1734 HUMPHOSLIP PEA 2 T7 1866 1872 HUMPHOSLIP PEA 2 T14 1859 1865 HUMPHOSLIP PEA 2 T16 1767 1773 HUMPHOSLIP PEA 2 T17 1489 1495 HUMPHOSLIP PEA 2 T18 1703 1709 HUMPHOSLIP PEA 2 T19 1857 1863 10 Segment cluster HUMPHOSLIP_PEA_2_node_65 according to the present invention can be found in the following transcript(s): HUMPHOSLIPPEA_2 T6, WO 2005/116850 PCT/IB2005/002555 792 HUMPHOSLIP PEA_2_T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIP PEA 2 T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2_T18 and HUMPHOSLIPPEA_2 T19. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts Traniscipt name Sement Segment sLtrting position1 ending pos'itIoil HUMPHOSLIP PEA 2 T6 1735 1754 HUMPHOSLIP PEA 2 T7 1873 1892 HUMPHOSLIP PEA 2 T14 1866 1885 HUMPHOSLIP PEA 2 T16 1774 1793 HUMPHOSLIP PEA 2_T17 1496 1515 HUMPHOSLIP PEA 2 T18 1710 1729 HUMPHOSLIP PEA_2 T19 1864 1883 5 Segment cluster HUMPHOSLIP_PEA_2_node_66 according to the present invention is supported by 180 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 10 HUMPHOSLIPPEA 2_T7, HUMPHOSLIPPEA 2_T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIP_PEA 2_T17, HUMPHOSLIPPEA_2 T18 and HUMPHOSLIP_PEA_2 T19. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts Transncpt naxmIe Segmentc Segment': starting position endig position n HUMPHOSLIP PEA 2 T6 1755 1844 HUMPHOSLIPPEA 2 T7 1893 1982 HUMPHOSLIP PEA 2 T14 1886 1975 HUMPHOSLIP PEA 2_T16 1794 1883 HUMPHOSLIP PEA 2 T17 1516 1605 WO 2005/116850 PCT/IB2005/002555 793 HUMPHOSLIP PEA_2_T18 1730 1819 HUMPHOSLIPPEA_2_T19 1884 1973 Segment cluster HUMPHOSLIP_PEA_2_node_67 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, 5 HUMPHOSLIP PEA 2_T7, HUMPHOSLIPPEA 2 T14, HUMPHOSLIPPEA_2_T16, HUMPHOSLIP_PEA_2 T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA 2 T19. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts Trnscript narne Segmnit Segmenit startiiig positIOn1 ending positionI HUMPHOSLIP PEA 2 T6 1845 1866 HUMPHOSLIP PEA_2_T7 1983 2004 HUMPHOSLIP PEA_2_T14 1976 1997 HUMPHOSLIP PEA 2 T16 1884 1905 HUMPHOSLIP PEA 2 T17 1606 1627 HUMPHOSLIP PEA 2 T18 1820 1841 HUMPHOSLIP PEA 2_T19 1974 1995 10 Segment cluster HUMPHOSLIP_PEA 2_node_69 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIPPEA_2 T14, HUMPHOSLIP_PEA 2 T16, HUMPHOSLIPPEA 2 T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIPPEA 2 T19. 15 Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts Transcript name Se Ol ent Seg einti starting position ending position WO 2005/116850 PCT/IB2005/002555 794 HUMPHOSLIP PEA_2 T6 2286 2297 HUMPHOSLIP PEA 2 T7 2424 2435 HUMPHOSLIP PEA_2 T14 2417 2428 HUMPHOSLIP PEA 2 T16 2325 2336 HUMPHOSLIP PEA_2 T17 2047 2058 HUMPHOSLIP PEA_2 T18 2261 2272 HUMPHOSLIP PEA_2_ T19 2415 2426 Segment cluster HUMPHOSLIPPEA _2 node_71 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA 2_T6, 5 HUMPHOSLIPPEA_2 T7, HUMPHOSLIPPEA_2 T14, HUMPHOSLIPPEA 2_T16 HUMPHOSLIP PEA 2 T17, HUMPHOSLIPPEA 2 T18 and HUMPHOSLIPPEA_2 T19. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts Transcript name SegmeAt startmg position ending position HUMPHOSLIP PEA 2 T6 2530 2542 HUMPHOSLIP PEA 2 T7 2668 2680 HUMPHOSLIP PEA_2 T14 2661 2673 HUMPHOSLIP PEA_2 T16 2569 2581 HUMPHOSLIP PEA_2 T17 2291 2303 HUMPHOSLIP PEA 2 T18 2505 2517 HUMPHOSLIP PEA 2 T19 2659 2671 10 Segment cluster HUMPHOSLIP_PEA 2 node_72 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, HUMPHOSLIP_PEA_2 T7, HUMPHOSLIPPEA_2_T14, HUMPHOSLIPPEA 2_T16 WO 2005/116850 PCT/IB2005/002555 795 HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP PEA_2_T18 and HUMPHOSLIP PEA 2 T19. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts TransenCrpt nmule Sgetemn s.artmy poston ending position HUMPHOSLIP PEA 2_T6 2543 2647 HUMPHOSLIP PEA_2 T7 2681 2785 HUMPHOSLIPPEA 2_T14 2674 2778 HUMPHOSLIPPEA_2 T16 2582 2686 HUMPHOSLIP_PEA 2 T17 2304 2408 HUMPHOSLIPPEA 2_T18 2518 2622 HUMPHOSLIP_PEA 2_T19 2672 2776 5 Segment cluster HUMPHOSLIP PEA_2_node_73 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6 HUMPHOSLIPPEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2 T16 10 HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA 2_T18 and HUMPHOSLIP PEA 2 T19. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts __ - Sgm n . egn ..... ;~ Tiranscrlptname SegmentSegmen starting position ending position HUMPHOSLIPPEA_2 T6 2648 2755 HUMPHOSLIPPEA 2 T7 2786 2893 HUMPHOSLIP PEA_2_T14 2779 2886 HUMPHOSLIPPEA_2_T16 2687 2794 HUMPHOSLIPPEA 2_T17 2409 2516 HUMPHOSLIP_PEA_2 T18 2623 2730 WO 2005/116850 PCT/IB2005/002555 796 HUMPHOSLIPPEA 2_T19 2777 2884 Segment cluster HUMPHOSLIPPEA_2_node_74 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This 5 segment can be found in the following transcript(s): HUMPHOSLIPPEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIPPEA 2_T16, HUMPHOSLIPPEA_2_T17, HUMPHOSLIPPEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts Transcript narme Segment Segment startling Positionen igp sio HUMPHOSLIP PEA_2 T6 2756 2845 HUMPHOSLIP PEA 2 T7 2894 2983 HUMPHOSLIP PEA 2 T14 2887 2976 HUMPHOSLIP PEA_2_T16 2795 2884 HUMPHOSLIP PEA_2_T17 2517 2606 HUMPHOSLIP PEA 2_T18 2731 2820 HUMPHOSLIP PEA 2_T19 2885 2974 10 Variant protein alignment to the previously known protein: 15 Sequence name: PLTP HUMAN Sequence documentation: Alignment of: HUMPHOSLIP PEA 2 P0lO x PLTP HUMAN 20 WO 2005/116850 PCT/IB2005/002555 797 Alignment segment 1/1: Quality: 3716.00 Escore: 0 5 Matching length: 398 Total length: 493 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 80.73 Total Percent 10 Identity: 80.73 Gaps: 1 Alignment: 15 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 Il lI I I I I I l l i li l il l l l i l i l l l l l l l l l l l l l l l l i l l l l 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISE ................................. 67 2 0 I I l l l l l i l l l i 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 67 .................................................. 67 25 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 68 ............ KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 105 Illlilillllllllllllillillllillllllllll 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 30 106 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 155 WO 2005/116850 PCT/IB2005/002555 798 I l l I l l l l l l ll l l l I I I l I l l l l l l l l l I l l l l l l l l l l l l 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250 156 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 205 5 I l l l l l I I I ll l l l l l l l l l l l l lI 1 I l l II I 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 206 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 255 l i l l l l l l lll ll I I I I l I I I I l I I Il l l l l 10 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 256 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 305 I ~ l Il l I I l l l I l l I I i l l l l l l lI l l l l l l l 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 15 306 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVT 355 I 1l l l I l l ll l l 1 III I I l l l l l l l l l l l l l l l l l l I l l lI 401 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVT 450 20 356 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 398 I III l l l l l l l l l l l lI l l l l l I I 451 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 493 25 Sequence name: PLTP HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 799 Alignment of: HUMPHOSLIP PEA 2 P12 x PLTP HUMAN Alignment segment 1/1: 5 Quality: 4101.00 Escore: 0 Matching length: 427 Total length: 427 10 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 IlllllllllilllillllllllllllllllllllilllllllillllI 20 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 IlllIIIIlllllllilllllllllllllllllliillIlillll 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 25 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 I l ll l I I l l I I l l l l l l l l l l l ll l l l l l l l l l l l l i l l l l l l l 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 30 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 I l l l l l l l l l l l l l l l l i l l l l l l i l l l l l l l l l l l l l l l l l l l i WO 2005/116850 PCT/IB2005/002555 800 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250 I l l1 l l l l l l I III I 1 II I Ill l l I ll 1 II I l lll lll l lll l l 5 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 IIl l l l I l l ll l l l l l l l l lll l l l l l l l l l l II I l l I I l l1 1 1 1 I 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 10 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 I I I l li i i 1 l l l l l l l l l l l l II Il l l l l 1 l l l I l l l l l I I Il l l l l 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 15 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 I l l l l I I l l l l l l l l l l l l l l l l l l l l il l l l l l l l l l l l 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 401 ALESLALIPLQAPLKTMLQIGVMPMLN 427 2 0 I l I I I I III I I I Il l l l l l l l l l l I I 401 ALESLALIPLQAPLKTMLQIGVMPMLN 427 25 Sequence name: PLTP HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 801 Alignment of: HUMPHOSLIP PEA 2 P31 x PLTP HUMAN Alignment segment 1/1: 5 Quality: 639.00 Escore: 0 Matching length: 67 Total length: 67 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I~ilillllllllllllllllllllililllllllllllllllllll 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 20 51 IPDLRGKEGHFYYNISE 67 Illiilillllllilll 51 IPDLRGKEGHFYYNISE 67 25 30 Sequence name: PLTP HUMAN WO 2005/116850 PCT/IB2005/002555 802 Sequence documentation: Alignment of: HUMPHOSLIP PEA 2 P33 x PLTP HUMAN 5 Alignment segment 1/1: Quality: 1767.00 Escore: 0 Matching length: 184 Total 10 length: 184 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.46 Total Percent Similarity: 100.00 Total Percent Identity: 99.46 15 Gaps: 0 Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 2 0 Il l l l l l l l l l l if l l l lll l l l l l l l l l llII I I I I I l l l l l I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 IfllllllllllllllllllIIIIllllIIllllllllllllllllllIl 25 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 IllllllllIIIllllllll1111111111111IlIII111111111l 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 30 151 -SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQV 184 WO 2005/116850 PCT/IB2005/002555 803 Illlllllllllilllllllll~lllllllll: 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184 5 Sequence name: PLTP HUMAN 10 Sequence documentation: Alignment of: HUMPHOSLIP PEA 2 P34 x PLTP HUMAN 15 Alignment segment 1/1: Quality: 1971.00 Escore: 0 Matching length: 205 Total 20 length: 205 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 25 Gaps: 0 Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 3 0 I l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 WO 2005/116850 PCT/IB2005/002555 804 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I I l l l lI l lI II I lI I I I I l l l l l l II I l l I I I I I I I I I l lI 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 5 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 I I I I l l l l l ll l l l I I I I I I I I I I I I I I l l I I I I I I I I I I 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 10 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 lI lI I l I l lI I I I I I l l l l lI I I I l l l lI I I I l l lI I I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 201 DTVPV 205 15 I1 1I111 201 DTVPV 205 20 Sequence name: PLTPHUMAN 25 Sequence documentation: Alignment of: HUMPHOSLIP PEA 2 P35 x PLTP HUMAN Alignment segment 1/1: 30 WO 2005/116850 PCT/IB2005/002555 805 Quality: 1158.00 Escore: 0 Matching length: 132 Total length: 184 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.48 Total Percent Similarity: 71.74 Total Percent Identity: 70.65 Gaps: 1 10 Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I l lI I I III lIII l I l i l lI l l l I Il l l l I I I I I I 15 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I l l l l l l l lI l l lI III l l l l I l l l l l l l l l lI I I l l l l l II I l 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 20 101 FRRQLLYWFL........................................ 110 II111111 l : 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 25 111 ............ KVYDFLSTFITSGMRFLLNQQV 132 lI II l III liI lI III : 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184 30 DESCRIPTION FOR CLUSTER T59832 WO 2005/116850 PCT/IB2005/002555 806 Cluster T59832 features 5 transcript(s) and 30 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Tninscript NaIme SEQ ID) NO: T59832_T6 742 T59832 T8 743 T59832_T1l 744 T59832_TI5 745 T59832_T22 746 5 Table 2 - Segments of interest Segmenit Name 'SEQ ID NO:) T59832 node 1 747 T59832_node 7 748 T59832_node 29 749 T59832_node 39 750 T59832_node 2 751 T59832_node 3 752 T59832_node 4 753 T59832_node 5 754 T59832_node 6 755 T59832_node 8 756 T59832_node 9 757 T59832_node 10 758 T59832_node 11 759 T59832_node_12 760 T59832 node 14 761 T59832 node 16 762 WO 2005/116850 PCT/IB2005/002555 807 T59832 node 19 763 T59832 node 20 764 T59832 node 25 765 T59832 node 26 766 T59832 node 27 767 T59832 node 28 768 T59832 node 30 769 T59832 node 31 770 T59832 node 32 771 T59832 node 34 772 T59832 node_35 773 T59832 node 36 774 T59832 node 37 775 T59832_node_38 776 Table 3 - Proteins of interest I~~tinNai SQ D L~ LCores[)onding T-iiclts T59832_P5 778 T59832 T6 T59832_P7 779 T59832 T8 T59832_P9 780 T59832 Tll T59832_P12 781 T59832 T15 T59832_P18 782 T59832_T22 These sequences are variants of the known protein Gamma-interferon inducible lysosomal 5 thiol reductase precursor (SwissProt accession identifier GILT_HUMAN; known also according to the synonyms Gamma-interferon-inducible protein IP-30), SEQ ID NO: 777, referred to herein as the previously known protein. Protein Gamma-interferon inducible lysosomal thiol reductase precursor is known or believed to have the following function(s): cleaves disulfide bonds in proteins by reduction. 10 May facilitate the complete unfolding of proteins destined for lysosomal degradation. May be WO 2005/116850 PCT/IB2005/002555 808 involved in MHC class II-restricted antigen processing. The sequence for protein Gamma interferon inducible lysosomal thiol reductase precursor is given at the end of the application, as "Gamma-interferon inducible lysosomal thiol reductase precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. 5 Table 4 - Amino acid mutations for Known Protein SNP positions) on Comment amino acid seqiuen.ce 109 L->S 130 H -> L 157 - 261 IVCMEEFEDMERSLPLCLQLYAPGLSPDTIMECAMG DRGMQ LMHANAQRTDALQPPHEYVPWVTVNGKPLEDQTQL LTLVCQ LYQGKKPDVCPSSTSSLRSVCFK -> MSGMAWKSLRTWRE VCHYACSSTPQGCRQNYHGVCNGGPRHAAHARQRP ADRCSP ATARVCALGHRQWETLGRSDPAPYPCLPVVPGQEA GCLPFL NQLPPECLLRVLAGGLRRAHGRRVGTRLPAFFSDPD PRHLL LTNWKILCIP Protein Gamma-interferon inducible lysosomal thiol reductase precursor localization is believed to be Lysosomal. The following GO Annotation(s) apply to the previously known protein. The following 10 annotation(s) were found: extracellular; lysosome, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 15 WO 2005/116850 PCT/IB2005/002555 809 Cluster T59832 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 35 refer to weighted expression of ESTs in 5 each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 35 and Table 5. This cluster is overexpressed (at least at a minimum level) in the 10 following pathological conditions: brain malignant tumors, breast malignant tumors, ovarian carcinoma and pancreas carcinoma. Table 5 - Normal tissue distribution Namre ofTisne Number adrenal 208 bladder 205 bone 200 brain 18 colon 236 epithelial 143 general 280 head and neck 192 kidney 71 liver 53 lung 459 lymph nodes 248 breast 0 bone marrow 94 ovary 0 pancreas 20 WO 2005/116850 PCT/IB2005/002555 810 prostate 86 skin 29 stomach 109 T cells 557 Thyroid 0 uterus 63 Table 6 - P values and ratios for expression in cancerous tissue N~iune o5f Tiwute PI P2 SP1 R3 SP R4 adrenal 4.9e-01 5.9e-01 4.7e-03 1.1 2.9e-02 0.8 bladder 3.7e-01 5.6e-01 3.7e-02 1.3 2.5e-01 0.9 bone 6.6e-01 6.7e-01 3.4e-01 0.6 9.1e-01 0.4 brain 1.8e-01 2.9e-01 4.3e-03 3.8 2.8e-02 2.5 colon 4.4e-01 5.2e-01 6.1e-01 0.9 8.1e-01 0.7 epithelial 2.5e-02 1.6e-01 1.2e-05 1.6 9.8e-02 1.1 general 1.3e-02 1.6e-01 1 0.8 1 0.6 head and neck 3.4e-01 3.3e-01 1 0.4 9.4e-01 0.5 kidney 7.7e-01 8.5e-01 1.4e-01 1.3 4.2e-01 0.9 liver 8.3e-01 7.6e-01 1 0.5 1 0.6 lung 5.7e-01 8.3e-01 3.5e-01 0.8 9.8e-01 0.5 lymph nodes 5.7e-01 6.6e-01 7.6e-01 0.8 3.6e-02 1.1 breast 5.0e-02 1.3e-01 2.5e-03 6.5 4.4e-02 3.6 bone marrow 6.2e-01 7.8e-01 1 0.3 9.5e-01 0.5 ovary 2.2e-01 9.4e-02 3.2e-03 6.1 8.3e-03 5.3 pancreas 9.0e-02 1.6e-02 1.1le-03 4.0 7.9e-04 4.2 prostate 8.1e-01 8.0e-01 5.7e-01 0.9 4.1e-01 0.9 skin 1.6e-01 1.2e-01 2.3e-02 6.0 1.0e-02 2.2 stomach 5.5e-01 7.4e-01 9.4e-01 0.6 4.9e-01 1.0 T cells 1 6.7e-01 6.9e-01 1.0 9.8e-01 0.5 WO 2005/116850 PCT/IB2005/002555 811 Thyroid 2.3e-01 2.3e-01 5.9e-02 2.5 5.9e-02 2.5 uterus 7.4e-02 4.7e-02 2.2e-02 2.0 6.2e-02 1.7 As noted above, cluster T59832 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Gamma interferon inducible lysosomal thiol reductase precursor. A description of each variant protein according to the present invention is now provided. 5 Variant protein T59832_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The 10 variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 15 Variant protein T59832_P5 is encoded by the following transcript(s): T59832_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832_T6 is shown in bold; this coding portion starts at position 149 and ends at position 715. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates 20 whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs SN[positlon on nucleotide Alternative nucelc acid P previously known SNP? sequence~*** 61 C -> T Yes 148 G-> T Yes 212 -> A No WO 2005/116850 PCT/IB2005/002555 812 241 G->T No 244 A->G Yes 962 C->T Yes 1074 G->A Yes 1248 G->C Yes 1441 G->A Yes 1443 G-> A No 1505 G->C Yes 1651 T-> No 1652 T->G Yes 1717 C->A No 1722 C -> No 1722 C -> G No 1752 A->G Yes 1817 A->G Yes 1854 C -> No 1854 C -> A No 1871 C -> T Yes 1886 T-> G No 1906 G->A No 1906 G-> C No 1942 C -> No 1942 C -> T No 1971 C -> No 1986 G->A No 2001 G->T Yes 2008 A-> No 2030 -> T No 2031 C->T No 2050 C -> No WO 2005/116850 PCT/IB2005/002555 813 2056 A->G Yes 2068 G->A Yes 2111 A-> C Yes 2136 A->C Yes 2144 T->C Yes Variant protein T59832_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T8. An alignment is 5 given to the known protein (Gamma-interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832_P7 and GILT_HUMAN: 10 1 .An isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM 15 ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG corresponding to amino acids 12 - 223 of GILTHUMAN, which also corresponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR corresponding to amino acids 20 213 - 238 of T59832_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 25 VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
WO 2005/116850 PCT/IB2005/002555 814 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 5 prediction programs predict that this protein has a signal peptide. Variant protein T59832_P7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P7 sequence provides 10 support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations SNP position(s) onamino acid' Alternativemino ac(s) Previously known SNP? sequence 76 R-> Q Yes 77 A -> T No 146 I-> No 146 I->M Yes 168 P->Q No 170 L-> No 170 L->V No 180 M -> V Yes The glycosylation sites of variant protein T59832_P7, as compared to the known protein Gamma-interferon inducible lysosomal thiol reductase precursor, are described in Table 9 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s) WO 2005/116850 PCT/IB2005/002555 815 Positions) on known miov4 Present in vanint protein" Position in variant protein actId seqtienice 119 yes 108 106 yes 95 74 yes 63 Variant protein T59832_P7 is encoded by the following transcript(s): T59832_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832_T8 is shown in bold; this coding portion starts at position 149 and ends at position 862. 5 The transcript also has the following SNPs as listed in Table 10 (given according to the ir position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 10 - Nucleic acid SNPs SNP position On nucleotide Alternaive inucleic acid Previously know1n SNP?# sequence 61 C ->T Yes 148 G ->T Yes 212 -> A No 241 G ->T No 244 A ->G Yes 375 G ->A Yes 377 G ->A No 439 G -> C Yes 585 T-> No 586 T -> G Yes 651 C -> A No 656 C -> No WO 2005/116850 PCT/IB2005/002555 816 656 C -> G No 686 A->G Yes 751 A->G Yes 1004 T->G Yes 1206 C -> No 1206 C -> A No 1223 C -> T Yes 1238 T->G No 1258 G->A No 1258 G->C No 1294 C -> No 1294 C -> T No 1323 C -> No 1338 G->A No 1353 G->T Yes 1360 A-> No 1382 -> T No 1383 C -> T No 1402 C -> No 1408 A->G Yes 1420 G->A Yes 1463 A->C Yes 1488 A->C Yes 1496 T->C Yes Variant protein T59832_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T1 1. An alignment 5 is given to the known protein (Gamma-interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published WO 2005/116850 PCT/IB2005/002555 817 protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832_P9 and GILT_HUMAN: 5 1 .An isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM 10 ECAMGDRGMQLMHANAQRTDALQPPHE corresponding to amino acids 12 - 214 of GILT_HUMAN, which also corresponds to amino acids 1 - 203 of T59832_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to 15 amino acids 204 - 244 of T59832_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 25 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein T59832_P9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid 30 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is WO 2005/116850 PCT/IB2005/002555 818 known or not; the presence of known SNPs in variant protein T59832_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNP positin(s)on amino acid Alternative amino acid(s) Previously knIow SNP? 76 R-> Q Yes 77 A-> T No 146 I -> No 146 I->M Yes 168 P -> Q No 170 L-> No 170 L-> V No 180 M ->V Yes 204 N-> No 204 N ->K No 210 P -> L Yes 215 L->W No 222 A-> T No 222 A->P No 234 P-> No 234 P -> S No 243 G-> No 5 The glycosylation sites of variant protein T59832_P9, as compared to the known protein Gamma-interferon inducible lysosomal thiol reductase precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). 10 Table 12 - Glycosylation site(s) WO 2005/116850 PCT/IB2005/002555 819 Position(s) on knowVamino Present in variant protein? Position in varint protein? acid sequence* 1 :' ,,,......................... ... .... .. 119 yes 108 106 yes 95 74 yes 63 Variant protein T59832_P9 is encoded by the following transcript(s): T59832_T1 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832_T 11 is shown in bold; this coding portion starts at position 149 and ends at position 5 880. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 13 - Nucleic acid SNPs ..... . . $XV~.. . .... ,~
-
-... SNP position on nuceotide Alternative nulic aCid Previously known SNP? sequence 61 C -> T Yes 148 G-> T Yes 212 -> A No 241 G ->T No 244 A-> G Yes 375 G ->A Yes 377 G ->A No 439 G ->C Yes 585 T-> No 586 T-> G Yes 651 C ->A No 656 C-> No WO 2005/116850 PCT/IB2005/002555 820 656 C ->G No 686 A->G Yes 751 A->G Yes 760 C -> No 760 C -> A No 777 C -> T Yes 792 T->G No 812 G->A No 812 G-> C No 848 C -> No 848 C ->T No 877 C -> No 892 G->A No 907 G->T Yes 914 A-> No 936 -> T No 937 C->T No 956 C -> No 962 A->G Yes 974 G->A Yes 1017 A-> C Yes 1042 A->C Yes 1050 T->C Yes Variant protein T59832_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T15. An 5 alignment is given to the known protein (Gamma-interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the WO 2005/116850 PCT/IB2005/002555 821 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832_P12 and GILTHUMAN: 1.An isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino 5 acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE corresponding to amino acids 12 - 141 of GILT_HUMAN, which also corresponds to amino acids 1 - 130 of T59832_P12, and a second amino acid sequence being at 10 least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 131 - 219 of T59832 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential 15 order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 20. least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of 25 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 30 Variant protein T59832_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 14, (given according to their position(s) on the amino acid WO 2005/116850 PCT/IB2005/002555 822 sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations SNP positions) op ainino acid Alteriativ amrino acid(s) Previusly known S NIP? sequence 76 R-> Q Yes 77 A-> T No 137 P -> Q No 139 L-> No 139 L-> V No 149 M -> V Yes 183 P-> No 183 P ->T No 200 G ->A No 200 G-> D No 212 S -> No 212 S ->F No 5 The glycosylation sites ofvariant protein T59832_P12, as compared to the known protein Gamma-interferon inducible lysosomal thiol reductase precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last 10 column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s) Position(s) on known amiio Present in vianart protein' Position in variant Protein?! ticid sequence 119 yes 108 106 yes 95 WO 2005/116850 PCT/IB2005/002555 823 74 yes 63 Variant protein T59832_P12 is encoded by the following transcript(s): T59832_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832_T15 is shown in bold; this coding portion starts at position 149 and ends at position 5 805. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 16 - Nucleic acid SNPs SNP position on nucleotide Alternative nucic acid FCVI previously knn SNP? sequence 61 C -> T Yes 148 G-> T Yes 212 -> A No 241 G -> T No 244 A ->G Yes 375 G ->A Yes 377 G ->A No 439 G ->C Yes 558 C ->A No 563 C-> No 563 C ->G No 593 A-> G Yes 658 A-> G Yes 695 C-> No 695 C ->A No 712 C ->T Yes 727 T ->G No WO 2005/116850 PCT/IB2005/002555 824 747 G->A No 747 G->C No 783 C -> No 783 C->T No 812 C -> No 827 G->A No 842 G->T Yes 849 A-> No 871 -> T No 872 C->T No 891 C -> No 897 A->G Yes 909 G->A Yes 952 A->C Yes 977 A->C Yes 985 T->C Yes Variant protein T59832_P18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T22. An 5 alignment is given to the known protein (Gamma-interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: 10 Comparison report between T59832_P18 and GILTHUMAN: 1.An isolated chimeric polypeptide encoding for T59832_P18, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 12 - 55 of GILT_HUMAN, which also corresponds to amino acids 1 - 44 of T59832_P18, 15 and a second amino acid sequence being at least 90 % homologous to WO 2005/116850 PCT/IB2005/002555 825 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also corresponds to amino acids 45 - 133 of T59832 P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential 5 order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at 10 least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of 15 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 20 Variant protein T59832_P18 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P 18 sequence provides support for the deduced sequence of this variant protein according to the present invention). 25 Table 17- Amino acid mutations SNP~Iositioi(s) on aino acid Alterniative aminoacid(s) Previously known SNP? sequence ,, 51 P ->Q No 53 L-> V No 53 L-> No WO 2005/116850 PCT/IB2005/002555 826 63 M->V Yes 97 P -> No 97 P->T No 114 G->A No 114 G->D No 126 S->F No 126 S -> No The glycosylation sites of variant protein T59832_P18, as compared to the known protein Gamma-interferon inducible lysosomal thiol reductase precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second 5 column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s) Position(s) O known IIIIio Present in variant protein? acid seque nce 119 no 106 no 74 no Variant protein T59832_P18 is encoded by the following transcript(s): T59832_T22, for 10 which the sequence(s) is/are given.at the end of the application. The coding portion of transcript T59832_T22 is shown in bold; this coding portion starts at position 149 and ends at position 547. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein 15 T59832_P 18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs WO 2005/116850 PCT/IB2005/002555 827 SNP position on nucleotide Alternative nucleic tcid Previously known SNP? Sequence 61 C->T Yes 148 G ->T Yes 212 ->A No 241 G -> T No 244 A -> G Yes 300 C -> A No 305 C -> No 305 C -> G No 335 A -> G Yes 400 A -> G Yes 437 C -> No 437 C -> A No 454 C -> T Yes 469 T-> G No 489 G->A No 489 G-> C No 525 C -> No 525 C -> T No 554 C -> No 569 G->A No 584 G->T Yes 591 A-> No 613 -> T No 614 C -> T No 633 C-> No 639 A-> G Yes 651 G->A Yes 694 A->C Yes WO 2005/116850 PCT/IB2005/002555 828 719 A->C Yes 727 T->C Yes As noted above, cluster T59832 features 30 segment(s), which were listed in Table 2 5 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. 10 Segment cluster T59832_node_1 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 20 below describes the starting and ending position of this segment on each transcript. 15 Table 20 - Segment location on transcripts Transcript name segment Segment starting position ending position T59832 T6 1 123 T59832 T8 1 123 T59832 Tll 1 123 T59832 T15 1 123 T59832_T22 1 123 Segment cluster T59832_node_7 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be 20 found in the following transcript(s): T59832_T6. Table 21 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 829 Table 21 - Segment location on transcripts TratnSCr-ipt nme Segment strgposition ending position T59832 T6 281 1346 Segment cluster T59832_node_29 according to the present invention is supported by 12 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T8. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts Transe npt rlune Segmlenit Segmenclt startig positionl ending Position T59832_T8 785 1202 10 Segment cluster T59832_node_39 according to the present invention is supported by 195 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 23 below describes the starting and ending position of this segment on each 15 transcript. Table 23 - Segment location on transcripts Transcript lnane Segmet Scgnt starting position enidg position T59832 T6 2125 2178 T59832 T8 1477 1530 T59832 T11 1031 1084 T59832 T15 966 1019 T59832 T22 708 761 WO 2005/116850 PCT/IB2005/002555 830 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 5 Segment cluster T59832_node_2 according to the present invention is supported by 258 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 24 below describes the starting and ending position of this segment on each transcript. 10 Table 24 - Segment location on transcripts Transcipt n e eet Segment startng Position ending position T59832 T6 124 154 T59832T8 124 154 T59832 T1l 124 154 T59832 TI5 124 154 T59832_T22 124 154 Segment cluster T59832_node_3 according to the present invention can be found in the following transcript(s): T59832_T6, T59832 T8, T59832_T11, T59832_T15 and T59832_T22. 15 Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts -Transcript name Scgment Segment n" Position endig position T59832T6 155 172 T59832T8 155 172 T59832 T1l 155 172 T59832_TI5 155 172 WO 2005/116850 PCT/IB2005/002555 831 T59832_T22 155 172 Segment cluster T59832_node_4 according to the present invention is supported by 296 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T59832_T6, T59832_T8, T59832_TI 1, T59832_T15 and T59832_T22. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts Trannscrpt nImic Segment Segment stating position ending position T59832 T6 173 223 T59832 T8 173 223 T59832 T11 173 223 T59832 T15 173 223 T59832 T22 173 223 10 Segment cluster T59832_node_5 according to the present invention is supported by 305 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 27 below describes the starting and ending position of this segment on each 15 transcript. Table 27 - Segment location on transcripts Trinscript namle Segmlent Segnment starting position ending' positions T59832_T6 224 259 - at T59832 T8 224 259 T59832_T11l 224 259 WO 2005/116850 PCT/IB2005/002555 832 T59832 T15 224 259 T59832 T22 224 259 Segment cluster T59832_node_6 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. 5 Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts TranscritnSegment S starting position chding position T59832 T6 260 280 T59832 T8 260 280 T59832 T11 260 280 T59832 T15 260 280 T59832 T22 260 280 Segment cluster T59832_node_8 according to the present invention can be found in the 10 following transcript(s): T59832_T6, T59832_T8, T59832_T11 and T59832_T15. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts T'ran-script name11 .sqegint Segmlenlt s 1atig position endin I o 1ition T59832 T6 1347 1367 T59832_T8 281 301 T59832 Tll 281 301 T59832 T15 281 301 WO 2005/116850 PCT/IB2005/002555 833 Segment cluster T59832_node_9 according to the present invention is supported by 330 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T 11 and T59832_T15. Table 30 below describes the starting and ending position of this segment on each transcript. 5 Table 30 - Segment location on transcripts Transcript name Segment Seet stiing position ending position T59832 T6 1368 1403 T59832 T8 302 337 T59832 T11 302 337 T59832 T15 302 337 Segment cluster T59832_node_10 according to the present invention is supported by 332 libraries. The number of libraries was determined as previously described. This segment can be 10 found in the following transcript(s): T59832_T6, T59832_T8, T59832_T 11 and T59832_T15. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts Transcript nmte Sement Segmenit starting position e n po'sitionl T59832 T6 1404 1448 T59832 T8 338 382 T59832 T11l 338 382 T59832 T15 338 382 15 Segment cluster T59832_node 11 according to the present invention is supported by 306 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832 TI 1 and T59832_T15. Table 32 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 834 Table 32 - Segment location on transcripts Transcript nalme begmenrt Sc',eiet starting p)osition ending positions T59832 T6 1449 1483 T59832 T8 383 417 T59832 Tll 383 417 T59832 T15 383 417 Segment cluster T59832_node_12 according to the present invention is supported by 280 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1 and T59832_T15. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts TranscipIt nameI Segment Segmnit starting positionl ending" p ositioni T59832 T6 1484 1529 T59832 T8 418 463 T59832 Tll 418 463 T59832 T15 418 463 10 Segment cluster T59832_node_14 according to the present invention is supported by 280 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11 and T59832_T15. Table 34 below describes the starting and ending position of this segment on each transcript. 15 Table 34 - Segment location on transcripts Transcipt name Segment,. Seg starting position endingposition WO 2005/116850 PCT/IB2005/002555 835 T59832 T6 1530 1568 T59832 T8 464 502 T59832 Tll 464 502 T59832 T15 464 502 Segment cluster T59832_node 16 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T59832_T6, T59832_T8, T59832_T 11 and T59832 T15. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Transcript nanie Segment Segment SI 1 01 eimpst start ing position e1dn postio T59832 T6 1569 1604 T59832 T8 503 538 T59832 T11l 503 538 T59832 T15 503 538 10 Segment cluster T59832_node 19 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8 and T59832_Tl 1. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts Transcript name Segment Segment starting pending position T59832 T6 1605 1643 T59832_T8 539 577 T59832_T11 539 577 WO 2005/116850 PCT/IB2005/002555 836 Segment cluster T59832_node_20 according to the present invention is supported by 318 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T59832_T6, T59832_T8 and T59832_T1 1. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts Traniscri St mune Segment Seg-menlt starting po ending, position T59832 T6 1644 1697 T59832 T8 578 631 T59832 T11 578 631 10 Segment cluster T59832 node_25 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts TraniscrIpt name Segmlent Sqeet starting pOsito ending Position P itioil T59832 T6 1698 1719 T59832 T8 632 653 T59832 Tl1l 632 653 T59832 TI5 539 560 T59832 T22 281 302 15 Segment cluster T59832_node_26 according to the present invention is supported by 342 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and WO 2005/116850 PCT/IB2005/002555 837 T59832_T22. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts Transcript name Segment Segment starting position ending position T59832 T6 1720 1783 T59832 T8 654 717 T59832 TI1 654 717 T59832 T15 561 624 T59832 T22 303 366 5 Segment cluster T59832_node_27 according to the present invention is supported by 314 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 40 below describes the starting and ending position of this segment on each 10 transcript. Table 40 - Segment location on transcripts Transcrinpt narnmeSemtSgmn T59832 T6 1784 1822 T59832 T8 718 756 T59832 Tll 718 756 T59832 T15 625 663 T59832 T22 367 405 Segment cluster T59832 node_28 according to the present invention is supported by 284 15 libraries. The number of libraries was determined as previously described. This segment can be WO 2005/116850 PCT/IB2005/002555 838 found in the following transcript(s): T59832_T6, T59832_T8, T59832_T15 and T59832_T22. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts TrInscrIpt name Segment SIet starting pos tione ig positions I T59832 T6 1823 1850 T59832 T8 757 784 T59832 T15 664 691 T59832 T22 406 433 5 Segment cluster T59832_node_30 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts Transcript nmei Sq'egmet Segment staitrt position sending" p~ositionl T59832T6 1851 1854 T59832 T8 1203 1206 T59832_ T11l 757 760 T59832_T15 692 695 T59832 T22 434 437 10 Segment cluster T59832_node_31 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 43 below describes the starting and ending position of this segment on each transcript. 15 Table 43 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 839 Transcrptnt Seen starting posit on ending position T59832 T6 1855 1874 T59832 T8 1207 1226 T59832 T11 761 780 T59832 T15 696 715 T59832 T22 438 457 Segment cluster T59832_node_32 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts Transcr-ipt nam-e Segmenclt Seg'ment starting position ending,- position T59832 T6 1875 1904 T59832 T8 1227 1256 T59832 Tll 781 810 T59832 T15 716 745 T59832 T22 458 487 10 Segment cluster T59832 node_34 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_TI5 and T59832_T22. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 840 Tnm~uscipt name Seginert eget startiacg Position ending position T59832 T6 1905 1926 T59832 T8 1257 1278 T59832 Tll 811 832 T59832 T15 746 767 T59832 T22 488 509 Segment cluster T59832_node_35 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_T15 and T59832_T22. 5 Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts Transcript nme Segnient Segment Starting position ending position T59832 T6 1927 1930 T59832 T8 1279 1282 T59832 T11l 833 836 T59832 TI5 768 771 T59832 T22 510 513 Segment cluster T59832_node_36 according to the present invention can be found in the 10 following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts Trisciipt name Segment Sgment starting position ending position T59832_T6 1931 1939 WO 2005/116850 PCT/IB2005/002555 841 T59832 T8 1283 1291 T59832 T11l 837 845 T59832 T15 772 780 T59832 T22 514 522 Segment cluster T59832_node_37 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be 5 found in the following transcript(s): T59832_T6, T59832_T8, T59832_T1 1, T59832_TI5 and T59832_T22. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts Transcript nacme Segmeinnt Segmnlt starIting positIon ending p~os-ition T59832 T6 1940 2039 T59832 T8 1292 1391 T59832 T11l 846 945 T59832 T15 781 880 T59832 T22 523 622 10 Segment cluster T59832_node_38 according to the present invention is supported by 247 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 49 below describes the starting and ending position of this segment on each 15 transcript. Table 49 - Segment location on transcripts Transcript Scgmxent Segmeit stating position ending position WO 2005/116850 PCT/IB2005/002555 842 T59832 T6 2040 2124 T59832 T8 1392 1476 T59832 Tll 946 1030 T59832 T15 881 965 T59832 T22 623 707 5 Variant protein alignment to the previously known protein: Sequence name: GILT HUMAN 10 Sequence documentation: Alignment of: T59832 P7 x GILT HUMAN 15 Alignment segment 1/1: Quality: 2110.00 Escore: 0 Matching length: 212 Total 20 length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 25 Gaps: 0 WO 2005/116850 PCT/IB2005/002555 843 Alignment: 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 I I l l l I I l l l l I l l l l l l l l l l l l I l l l l l l l I l lI I I l l I I 5 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 IIIIIlIlIlllllllIlIllIIIlllllllllIlllllIIIIlII 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 10 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 I I I l l l l l l l l I lI I I I I l l l l l l l l I l lI Il II I I I 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 15 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 I I l l l l l I I I I l I I I I Il l lI l l I I FI I I l l l l l lI I I I I I I I I I I 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 201 PHEYVPWVTVNG 212 2 0 I I I l lI I I 212 PHEYVPWVTVNG 223 25 Sequence name: GILT HUMAN 30 Sequence documentation: WO 2005/116850 PCT/IB2005/002555 844 Alignment of: T59832 P9 x GILT HUMAN Alignment segment 1/1: 5 Quality: 2016.00 Escore: 0 Matching length: 203 Total length: 203 Matching Percent Similarity: 100.00 Matching Percent 10 Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 15 Alignment: 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 Il l i I l l i l l li l l i I l l i li l I l l l l l l l I I I I I l l l i l l ll I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 20 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 lIllIIIIlllilillliilililllllllIIIIIIIlllllllliI 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 25 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 IlliIillll~liliIlilllllillllllllllII IIIIIIlli 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 3 0l l l l l l i l l l l l l l l l l l l l l l i i i i l l l l l l l l l l l l l l 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 WO 2005/116850 PCT/IB2005/002555 845 201 PHE 203 III 212 PHE 214 5 10 Sequence name: GILT HUMAN Sequence documentation: 15 Alignment of: T59832 P12 x GILT HUMAN Alignment segment 1/1: Quality: 2084.00 20 Escore: 0 Matching length: 219 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 25 Total Percent Similarity: 87.60 Total Percent Identity: 87.60 Gaps: 1 Alignment: 30 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 WO 2005/116850 PCT/IB2005/002555 846 I l l l l l l l I Il l l l li i I lI I I I I l l l l l lI l lI Ill l l lI I ll l l 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 5 11111 11111111111 1111111111111111111111111111111 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVE .................... 130 lIIIIIlllllllllllIIllllllllllll 10 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 131 ........... CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 169 IllllIlllllIlIllIIllllllllllllllllII 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 15 170 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 219 Il l l l l l l l l l l l l I l l l I l lI l I I I l l ll l ll l l ll l l l l l 212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261 20 25 Sequence name: GILT HUMAN Sequence documentation: Alignment of: T59832 P18 x GILT HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 847 Quality: 1222.00 Escore: 0 Matching length: 133 Total 5 length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 53.20 Total Percent Identity: 53.20 10 Gaps: 1 Alignment: 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK...... 44 1 5 i l l l l i l l l l l l l i l l i l l l l I ll l l l l I l l l l l l l I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 44 .................................................. 44 20 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 44 .................................................. 44 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 25 45 ........... CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 83 I I I I I l l l I l l ll l l l l i l l l i l l l li li i l l l 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 30 84 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 133 l i l l li l ill l il f i l l l il iI l l ll i l li il l l l l l l l l lI I I WO 2005/116850 PCT/IB2005/002555 848 212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261 Expression of Homo sapiens interferon, gamma-inducible protein 30 (IFI30) T59832 transcripts which are detectable by amplicon as depicted in sequence name T59832 junc6-25-26 in normal 5 and cancerous Ovary tissues Expression of Homo sapiens interferon, gamma-inducible protein 30 (IFI30) transcripts detectable by or according to junc6-25-26, T59832 junc6-25-26 amplicon(s) and primers T59832 junc6-25-26F and T59832 junc6-25-26R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon 10 - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then 15 divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, above), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was carried out no differential expression in the cancerous samples relative to the normal PM samples was observed, although this may be due a problem 20 with this specific experiment.
WO 2005/116850 PCT/IB2005/002555 849 Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non limiting illustrative example only of a suitable primer pair: T59832 junc6-25-26F forward primer; and T59832 junc6-25-26R reverse primer. 5 The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T59832 junc6 25-26. Forward primer T59832 junc6-25-26F (SEQ ID NO :1008): 10 CCACCAGTTAACTACAAGTGCCTG Reverse primer T59832 junc6-25-26R (SEQ ID NO :1009): GCGTGCATGAGCTGCATG Amplicon T59832 junc6-25-26 (SEQ ID NO :1010): CCACCAGTTAACTACAAGTGCCTGCAGCTCTACGCCCCAGGGCTGTCGCCAGACAC 15 TATCATGGAGTGTGCAATGGGGGACCGCGGCATGCAGCTCATGCACGC DESCRIPTION FOR CLUSTER HSCP2 Cluster HSCP2 features 12 transcript(s) and 50 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end 20 of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Name SEQ 10 NO: HSCP2 PEA 1 T4 783 HSCP2 PEA 1 T13 784 HSCP2 PEA 1 T19 785 HSCP2 PEA 1 T20 786 HSCP2 PEA 1 T22 787 HSCP2_PEA 1 T23 788 HSCP2_PEA 1 T25 789 HSCP2_PEA 1 T31 790 WO 2005/116850 PCT/IB2005/002555 850 HSCP2_PEA 1 T33 791 HSCP2_PEA 1 T34 792 HSCP2 PEA 1 T45 793 HSCP2 PEA 1 T50 794 Table 2 - Segments of interest HSCP2 PEA_1 node_0 795 HSCP2 PEA 1_node 3 796 HSCP2_PEA_1_node 6 797 HSCP2 PEA 1 node 8 798 HSCP2_PEA_1_node 10 799 HSCP2_PEA_1 node 14 800 HSCP2 PEA 1 node_23 801 HSCP2 PEA 1 node 26 802 HSCP2_PEA_1 node 29 803 HSCP2 PEA 1 node 31 804 HSCP2_PEA_1 node_32 805 HSCP2_PEA_1 node 34 806 HSCP2_PEA_1 node_52 807 HSCP2_PEA_1 node_58 808 HSCP2 PEA_1_node_72 809 HSCP2_PEA_1_node 73 810 HSCP2_PEA_1_node 74 811 HSCP2_PEA_1_node 76 812 HSCP2_PEA_1 node 78 813 HSCP2_PEA 1 node 80 814 HSCP2_PEA_1_node 84 815 HSCP2 PEA_1 node 4 816 WO 2005/116850 PCT/IB2005/002555 851 HSCP2_PEA _1 node_7 817 HSCP2 PEA 1 node 13 818 HSCP2 PEA_1 node_15 819 HSCP2 PEA 1 node 16 820 HSCP2 PEA 1 node 18 821 HSCP2 PEA 1 node 20 822 HSCP2 PEA 1 node 21 823 HSCP2 PEA 1 node 37 824 HSCP2 PEA 1 node 38 825 HSCP2 PEA 1 node 39 826 HSCP2 PEA 1 node 41 827 HSCP2_PEA 1 node 42 828 HSCP2 PEA 1 node 46 829 HSCP2_PEA 1_node 47 830 HSCP2_PEA_1 node 50 831 HSCP2_PEA_1 node_51 832 HSCP2_PEA_1 node 55 833 HSCP2 PEA 1 node 56 834 HSCP2 PEA 1 node 60 835 HSCP2 PEA 1 node 61 836 HSCP2 PEA 1 node 67 837 HSCP2 PEA 1 node 68 838 HSCP2_PEA_1 node 69 839 HSCP2 PEA 1 node 70 840 HSCP2 PEA 1 node 75 841 HSCP2_PEA 1 node 77 842 HSCP2_PEA 1_node_79 843 HSCP2_PEA_1_node 82 844 Table 3 - Proteins of interest WO 2005/116850 PCT/IB2005/002555 852 Protein Name SEQ ID NO, Corresponing Transcript(S) HSCP2 PEA_1 P4 846 HSCP2_PEA_1_T4; HSCP2 PEA 1 T50 HSCP2 PEA 1 P8 847 HSCP2_PEA 1 T13 HSCP2_PEA 1_P14 848 HSCP2_PEA 1 T19 HSCP2_PEA 1 P15 849 HSCP2_PEA_l_T20 HSCP2_PEA 1_P2 850 HSCP2_PEA 1 T22 HSCP2_PEA 1 P16 851 HSCP2_PEA_ 1 T23 HSCP2_PEA_l_P6 852 HSCP2_PEA 1 T25 HSCP2 PEA 1 P22 853 HSCP2_PEA 1 T31 HSCP2_PEA_1_P24 854 HSCP2_PEA__1 T33 HSCP2_PEA__1 P25 855 HSCP2_PEA_1 T34 HSCP2_PEA_I_P33 856 HSCP2_PEA_l_T45 These sequences are variants of the known protein Ceruloplasmin precursor (SwissProt accession identifier CERU_HUMAN; known also according to the synonyms EC 1.16.3.1; Ferroxidase), SEQ ID NO: 845, referred to herein as the previously known protein. 5 Protein Ceruloplasmin precursor is known or believed to have the following function(s): Ceruloplasmin is a blue, copper-binding (6-7 atoms per molecule) glycoprotein found in plasma. Four possible functions are ferroxidase activity, amine oxidase activity, copper transport and homeostasis, and superoxide dismutase activity. The sequence for protein Ceruloplasmin precursor is given at the end of the application, as "Ceruloplasmin precursor amino acid 10 sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNPpositi-on(s) on Comment ammo~ acidsequience 79 T ->G. /FTId=VAR 001043. 449 L ->G. /FTId=VAR 001044. 1060 E ->EGEYP WO 2005/116850 PCT/IB2005/002555 853 5 The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ion transport; copper ion transport; copper homeostasis; iron homeostasis, which are annotation(s) related to Biological Process; ferroxidase; copper ion transporter; copper binding; oxidoreductase, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. 10 The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSCP2 can be used as a diagnostic marker according to overexpression of 15 transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 36 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). 20 Overall, the following results were obtained as shown with regard to the histograms in Figure 36 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: kidney malignant tumors and ovarian carcinoma. 25 Table 5 - Normal tissue distribution Name ofTiIssue Number bladder 0 bone 9 brain 48 epithelial 100 WO 2005/116850 PCT/IB2005/002555 854 general 58 head and neck 0 kidney 4 liver 1818 lung 96 lymph nodes 18 breast 43 bone marrow 0 ovary 0 pancreas 10 prostate 6 Thyroid 0 uterus 113 Table 6 - P values and ratios for expression in cancerous tissue Namencof Tissue P] 1P2 SP I R) SP2? R4 bladder 5.4e-01 6.0e-01 5.6e-01 1.8 6.8e-01 1.5 bone 6.3e-01 8.3e-01 1 1.0 7.0e-01 1.2 brain 8.1e-01 8.4e-01 9.8e-01 0.3 1 0.2 epithelial 2.5e-01 5.8e-01 1.9e-03 1.3 2.4e-01 0.9 general 4.0e-01 7.6e-01 1.0e-08 1.8 7.4e-04 1.2 head and neck 2.1e-01 3.3e-01 2.1e-01 4.3 5.6e-01 1.9 kidney 4.0e-01 4.4e-01 2.9e-04 8.5 2.3e-03 6.1 liver 2.9e-01 8.3e-01 1 0.3 1 0.1 lung 8.4e-01 9.0e-01 4.4e-02 1.1 5.6e-01 0.6 lymph nodes 5.8e-01 8.2e-01 4.9e-01 1.8 8.2e-01 0.9 breast 3.2e-01 3.7e-01 2.3e-01 2.1 5.7e-01 1.3 bone marrow 1 6.7e-01 1 1.0 5.3e-01 1.9 ovary 7.8e-03 7.0e-03 7.0e-04 7.5 4.9e-03 5.6 WO 2005/116850 PCT/IB2005/002555 855 pancreas 2.3e-01 4.0e-01 1.2e-03 2.5 9.4e-03 1.8 prostate 9.7e-01 9.3e-01 1 0.8 7.4e-05 1.3 Thyroid 5.0e-01 5.0e-01 6.7e-01 1.5 6.7e-01 1.5 Uterus 2.4e-01 1.7e-01 6.5e-04 2.1 7.2e-02 1.3 As noted above, cluster HSCP2 features 12 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Ceruloplasmin precursor. A description of each variant protein according to the present invention is now provided. 5 Variant protein HSCP2_PEAlP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2_PEA_ 1 _T4 and HSCP2_PEAlT50. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more 10 previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEAlP4 and CERU_HUMAN: 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P4, comprising a first 15 amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY 20 CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD 25 PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL
FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH
WO 2005/116850 PCT/IB2005/002555 856 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS 5 DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE corresponding to amino acids 1 - 1060 of CERU_HUMAN, which also corresponds to amino acids 1 - 1060 of HSCP2_PEA_ 1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 10 preferably at least 95% homologous to a polypeptide having the sequence GGTSM corresponding to amino acids 1061 - 1065 of HSCP2_PEA 1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA 1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 15 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGTSM in HSCP2_PEA_1_P4. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 20 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA l_P4 also has the following non-silent SNPs (Single 25 Nucleotide Polymorphisms) as listed in Table 7 (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). 30 Table 7 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 857 SNIP position(s) on aminoacid Alternative amino acid(s) Previously known SNI?). sequence 26I -> No 29 I-> No 37 S ->P No 47 V-> No 54 I-> V No 63 I-> No 92 F->S No 117 Y->N No 148 K->R No 173 N -> No 186 P-> No 190 A-> No 190 A-> G No 213 I-> No 218 V->M No 221 F-> No 235 N->D No 253 F->L No 275 M->T No 286 F->L No 298 F->S No 305 T->A No 445 H->Y No 451 P->A No 477 P->L No 493 P-> No 507 S->P No 535 L->P No WO 2005/116850 PCT/IB2005/002555 858 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 D->G No 660 F -> S No 675 A -> No 711 Q -> No 727 F -> S No 748 Q -> No 759 Q -> No 759 Q->P No 789 D->N No 927 E->K Yes 1040 C->W No The glycosylation sites of variant protein HSCP2_PEA_1 P4, as compared to the known protein Ceruloplasmin precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s) Position(s) variant protein? Position in variantprotel'? acid sequence 138 yes 138 762 yes 762 397 yes 397 358 yes 358 WO 2005/116850 PCT/IB2005/002555 859 Variant protein HSCP2_PEA_1_P4 is encoded by the following transcript(s): HSCP2_PEAlT4 and HSCP2_PEA_1_T50, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1_T4 is shown in bold; this coding portion 5 starts at position 250 and ends at position 3444. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1 P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 9 - Nucleic acid SNPs SNP positions on nucleotide 11ternative liucleic acid Previously known SNP? Sequence 63 A-> No 201 G-> T No 326 T-> No 335 T -> No 358 T-> C No 360 T ->C No 389 T-> No 409 A-> G No 437 T-> No 524 T -> C No 591 T-> C No 598 T-> A No 692 A-> G No 768 T-> No 807 A-> No 807 A-> G No 818 C-> No 818 C ->G No WO 2005/116850 PCT/IB2005/002555 860 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A-> G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C ->T No 1353 G->A No 1582 C->T No 1600 C ->G No 1617 G->A No 1679 C -> T No 1728 A-> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T->G Yes 2139 T->C No 2168 A->G No WO 2005/116850 PCT/IB2005/002555 861 2199 A-> C Yes 2228 T-> C No 2274 A-> No 2364 C ->T No 2381 A-> No 2429 T->C No 2492 A -> No 2525 A-> No 2525 - A->C No 2614 G->A No 3028 G -> A Yes 3240 T->C No 3276 A->G No 3369 C -> G No 5131 C -> A Yes 6091 T-> No 6106 A-> C Yes 6366 G -> A No 6564 G->A Yes The coding portion of transcript HSCP2 _PEA _1_T50 is shown in bold; this coding portion starts at position 250 and ends at position 3444. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with 5 the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2PEA 1 P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs SNP position on nucleotid.e Alternative nucleic acid reviously known SNP? sequ>nceN 63 A -> No WO 2005/116850 PCT/IB2005/002555 862 201 G->T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A->G No 437 T-> No 524 T-> C No 591 T->C No 598 T->A No 692 A->G No 768 T-> No 807 A-> No 807 A->G No 818 C -> No 818 C->G No 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C->T No WO 2005/116850 PCT/IB2005/002555 863 1353 G ->A No 1582 C ->T No 1600 C ->G No 1617 G->A No 1679 C ->T No 1728 A -> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T->G Yes 2139 T->C No 2168 A->G No 2199 A-> C Yes 2228 T->C No 2274 A-> No 2364 C -> T No 2381 A-> No 2429 T->C No 2492 A-> No 2525 A-> No 2525 A->C No 2614 G->A No 3028 G->A Yes 3240 T->C No 3276 A->G No WO 2005/116850 PCT/IB2005/002555 864 3369 C ->G No Variant protein HSCP2_PEAIP8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA_1 T13. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P8 and CERU _HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 15 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY 20 PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ 25 NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYK corresponding to amino acids 1 - 1006 of CERU_HUMAN, which also 30 corresponds to amino acids 1 - 1006 of HSCP2_PEA_1 P8, and a second amino acid sequence WO 2005/116850 PCT/IB2005/002555 865 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KCFQEHLEFGYSTAM corresponding to amino acids 1007 - 1021 of HSCP2_PEA_l_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a 5 sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA_l_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KCFQEHLEFGYSTAM in HSCP2_PEA 1 P8. 10 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 15 prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA_l_P8 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether 20 the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_l_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations SNP positions) on amino acid Alternative amino acid(s) Previously known SNP? sequence 26 I-> No 29 I -> No 37 S -> P No 47 V -> No 54 I -> V No WO 2005/116850 PCT/IB2005/002555 866 63 I-> No 92 F->S No 117 Y->N No 148 K->R No 173 N-> No 186 P-> No 190 A-> No 190 A->G No 213 I-> No 218 V->M No 221 F -> No 235 N->D No 253 F->L No 275 M->T No 286 F ->L No 298 F->S No 305 T->A No 445 H->Y No 451 P->A No 477 P->L No 493 P -> No 507 S->P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 D->G No 660 F->S No 675 A-> No WO 2005/116850 PCT/IB2005/002555 867 711 Q -> No 727 F -> S No 748 Q -> No 759 Q -> No 759 Q->P No 789 D->N No 927 E->K Yes 1020 A-> G No The glycosylation sites of variant protein HSCP2_PEA_1 P8, as compared to the known protein Ceruloplasmin precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s) Poitons)onknwnannno. P1bcsexWt in vuanprotein'~.) oiini 1proternA? 138 yes 138 762 yes 762 397 yes 397 358 yes 358 Variant protein HSCP2_PEA 1 P8 is encoded by the following transcript(s): 10 HSCP2_PEAIT13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1_T13 is shown in bold; this coding portion starts at position 250 and ends at position 3312. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 15 known SNPs in variant protein HSCP2_PEAlP8 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 868 Table 13 - Nucleic acid SNPs SNP position on nucleotide Alternative niicleic acid Previously known SNP? sequence 63 A-> No 201 G->T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A-> G No 437 T-> No 524 T-> C No 591 T-> C No 598 T-> A No 692 A-> G No 768 T-> No 807 A -> No 807 A->G No 818 C -> No 818 C -> G No 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A -> G No 1006 T->C No 1053 A->G Yes 1073 T ->C No 1107 T ->G No WO 2005/116850 PCT/IB2005/002555 869 1142 T->C No 1162 A->G No 1284 A->G No 1287 C ->T No 1353 G->A No 1582 C->T No 1600 C->G No 1617 G->A No 1679 C->T No 1728 A-> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T -> G Yes 2139 T->C No 2168 A->G No 2199 A->C Yes 2228 T->C No 2274 A-> No 2364 C->T No 2381 A-> No 2429 T->C No 2492 A-> No 2525 A-> No 2525 A->C No WO 2005/116850 PCT/IB2005/002555 870 2614 G->A No 3028 G->A Yes 3240 T->C No 3308 C ->G No 3880 T -> No 3895 A->C Yes 4155 G->A No 4353 G->A Yes Variant protein HSCP2_PEA_1 P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA_1_T19. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P14 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1 P14, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 15 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY 20 PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH corresponding to amino acids 1 - 621 of CERUHUMAN, which also corresponds to amino acids 1 - 621 of HSCP2_PEA_1_P14, a WO 2005/116850 PCT/IB2005/002555 871 second amino acid sequence bridging amino acid sequence comprising of W, and a third amino acid sequence being at least 90 % homologous to TFNVECLTTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGIL 5 GPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGA GTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENE SWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYL MGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHV TDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 694 - 1065 of 10 CERU_HUMAN, which also corresponds to amino acids 623 - 994 of HSCP2_PEA_1_P14, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCP2_PEA_1_P14, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in 15 length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HWT having a structure as follows (numbering according to HSCP2_PEA_1_P14): a sequence starting from any of amino acid numbers 621-x to 621; and ending at any of amino acid numbers 623 + ((n-2) 20 - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 25 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA 1_P14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 14, (given according to their position(s) on the 30 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P14 WO 2005/116850 PCT/IB2005/002555 872 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations SNP positioton amino acid Alternative amino acid(s) Previously known SNP? 26 I-> No 29 I-> No 37 S -> P No 47 V-> No 54 I-> V No 63 I-> No 92 F -> S No 117 Y->N No 148 K->R No 173 N -> No 186 P-> No 190 A-> No 190 A -> G No 213 I-> No 218 V->M No 221 F -> No 235 N -> D No 253 F ->L No 275 M->T No 286 F -> L No 298 F->S No 305 T-> A No 445 H->Y No 451 P -> A No WO 2005/116850 PCT/IB2005/002555 873 477 P->L No 493 P -> No 507 S -> P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 Q -> No 656 F -> S No 677 Q -> No 688 Q -> No 688 Q -> P No 718 D->N No 856 E->K Yes 969 C -> W No The glycosylation sites of variant protein HSCP2_PEA_1 P14, as compared to the known protein Ceruloplasmin precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s) Position(s) on kno amino Preit in riant protein? Position in, variant protein? acid sequieincei. 138 yes 138 762 yes 691 397 yes 397 358 yes 358 WO 2005/116850 PCT/IB2005/002555 874 Variant protein HSCP2_PEA_1_P14 is encoded by the following transcript(s): HSCP2_PEA_ I _T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1_T19 is shown in bold; this coding portion starts at position 250 and ends at position 3231. The transcript also has the following SNPs as listed in 5 Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA 1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs SNP position on nucleotide Altenative nucleic icid Previously known SNPI7 sequence 63 A-> No 201 G -> T No 326 T-> No 335 T-> No 358 T ->C No 360 T ->C No 389 T-> No 409 A-> G No 437 T-> No 524 T -> C No 591 T -> C No 598 T->A No 692 A -> G No 768 T-> No 807 A-> No 807 A -> G No 818 C -> No 818 C -> G No 837 T-> C No WO 2005/116850 PCT/IB2005/002555 875 887 T-> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T-> G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C ->T No 1353 G->A No 1582 C->T No 1600 C->G No 1617 G->A No 1679 C ->T No 1728 A-> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T -> G Yes 2151 C ->T No 2168 A-> No 2216 T->C No WO 2005/116850 PCT/IB2005/002555 876 2279 A-> No 2312 A-> No 2312 A-> C No 2401 G->A No 2815 G->A Yes 3027 T->C No 3063 A->G No 3156 C->G No 3728 T-> No 3743 A-> C Yes 4003 G-> A No 4201 G->A Yes Variant protein HSCP2_PEA 1_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEAlT20. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P15 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 15 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG
TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY
WO 2005/116850 PCT/IB2005/002555 877 PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH 5 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT 10 VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE corresponding to amino acids 1 - 1060 of CERUHUMAN, which also corresponds to amino acids 1 - 1060 of HSCP2_PEA_l_ P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence 15 GEYPASSETHRRIWNVIYPITVSVIILFQISTKE corresponding to amino acids 1061 - 1094 of HSCP2_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA 1 P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 20 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEYPASSETHRRIWNVIYPITVSVIILFQISTKE in HSCP2_PEA_1 P15. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 25 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA_1_P15 also has the following non-silent SNPs (Single 30 Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether WO 2005/116850 PCT/IB2005/002555 878 the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P 15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations SNPpo.sition(s) on ami.o .c.d Altertive mIioacid(s) Previously known SNP? sequence . .. " .... 4 26 I-> No 29 I -> No 37 S -> P No 47 V-> No 54 I -> V No 63 I-> No 92 F -> S No 117 Y->N No 148 K->R No 173 N -> No 186 P -> No 190 A-> No 190 A-> G No 213 I-> No 218 V->M No 221 F -> No 235 N -> D No 253 F -> L No 275 M -> T No 286 F -> L No 298 F -> S No 305 T-> A No 445 H-> Y No WO 2005/116850 PCT/IB2005/002555 879 451 P->A No 477 P->L No 493 P -> No 507 S -> P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V -> G Yes 640 D-> G No 660 F -> S No 675 A-> No 711 Q -> No 727 F -> S No 748 Q -> No 759 Q -> No 759 Q -> P No 789 D->N No 927 E->K Yes 1040 C->W No The glycosylation sites of variant protein HSCP2_PEA_1 P15, as compared to the known protein Ceruloplasmin precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s) Position(s) on kriown amino Present in variant protein? Posito in variant prot in? acid sequence WO 2005/116850 PCT/IB2005/002555 880 138 yes 138 762 yes 762 397 yes 397 358 yes 358 Variant protein HSCP2_PEA 1 P15 is encoded by the following transcript(s): HSCP2_PEA_1_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA 1_T20 is shown in bold; this coding portion starts at 5 position 250 and ends at position 3531. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P 15 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 19 - Nucleic acid SNPs SNPI position on nuicleotide Alternative nutcleic acid Previously known SNP? 63 A-> No 201 G-> T No 326 T-> No 335 T-> No 358 T -> C No 360 T -> C No 389 T-> No 409 A-> G No 437 T-> No 524 T->C No 591 T->C No 598 T-> A No 692 A-> G No 768 T-> No WO 2005/116850 PCT/IB2005/002555 881 807 A-> No 807 A-> G No 818 C -> No 818 C -> G No 837 T->C No 887 T-> No 901 G->A No 910 T -> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C -> T No 1353 G->A No 1582 C->T No 1600 C -> G No 1617 G->A No 1679 C->T No 1728 A -> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G -> A Yes WO 2005/116850 PCT/IB2005/002555 882 2055 T->C No 2069 T-> G Yes 2139 T-> C No 2168 A->G No 2199 A->C Yes 2228 T-> C No 2274 A -> No 2364 C -> T No 2381 A -> No 2429 T->C No 2492 A-> No 2525 A-> No 2525 A->C No 2614 G->A No 3028 G->A Yes 3240 T->C No 3276 A -> G No 3369 C->G No 3623 T-> Yes 3828 G->T No 3978 T-> No 3979 C -> No Variant protein HSCP2_PEA 1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA_1 T22. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: WO 2005/116850 PCT/IB2005/002555 883 Comparison report between HSCP2_PEA_1_P2 and CERU_HUMAN: 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_l_P2, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD 5 RIGRLYKKALYLQYTDETF RTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF 10 QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL 15 FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ corresponding to amino acids 1 - 761 of CERU_HUMAN, which also corresponds to amino acids 1 - 761 of HSCP2_PEA_1 P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 20 preferably at least 95% homologous to a polypeptide having the sequence K corresponding to amino acids 762 - 762 ofHSCP2_PEA_1 P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of 25 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 30 Variant protein HSCP2_PEA 1_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 20, (given according to their position(s) on the WO 2005/116850 PCT/IB2005/002555 884 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEAlP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 5 Table 20 - Amino acid mutations SNP amiIno o, )i ive . mi a ils ) lli-cevl lyiniS' P? 26 I> No 29 I -> No 37 S -> P No 47 V-> No 54 I -> V No 63 I-> No 92 F ->S No 117 Y ->N No 148 K ->R No 173 N-> No 186 P-> No 190 A-> No 190 A->G No 213 I-> No 218 V->M No 221 F-> No 235 N -> D No 253 F -> L No 275 M -> T No 286 F -> L No 298 F -> S No 305 T->A No WO 2005/116850 PCT/IB2005/002555 885 445 H->Y No 451 P->A No 477 P->L No 493 P-> No 507 S->P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 D->G No 660 F -> S No 675 A-> No 711 Q -> No 727 F -> S No 748 Q -> No 759 Q -> No 759 Q->P No The glycosylation sites of variant protein HSCP2_PEA 1 P2, as compared to the known protein Ceruloplasmin precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s) Positions) on known anino Present in variant protein? ' Position in variant protein? acid sequence 138 yes 138 762 no WO 2005/116850 PCT/IB2005/002555 886 397 yes 397 358 yes 358 Variant protein HSCP2_PEA_1_P2 is encoded by the following transcript(s): HSCP2_PEAlT22, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEA_1 T22 is shown in bold; this coding portion starts at 5 position 250 and ends at position 2535. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 22 - Nucleic acid SNPs SNP position on nucleotide Altemative nucleic cid Previously kiown SNP? sequence ..... i _ 63 A-> No 201 G-> T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A-> G No 437 T-> No 524 T-> C No 591 T-> C No 598 T-> A No 692 A-> G No 768 T-> No 807 A-> No 807 A -> G No WO 2005/116850 PCT/IB2005/002555 887 818 C-> No 818 C -> G No 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C->T No 1353 G->A No 1582 C->T No 1600 C ->G No 1617 G->A No 1679 C -> T No 1728 A-> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T-> G Yes WO 2005/116850 PCT/IB2005/002555 888 2139 T-> C No 2168 A->G No 2199 A->C Yes 2228 T->C No 2274 A-> No 2364 C->T No 2381 A-> No 2429 T->C No 2492 A-> No 2525 A -> No 2525 A->C No 2565 A -> No 2676 G->A No 3195 T->A Yes 3482 G -> A Yes 3542 A->G No 3975 G->A No Variant protein HSCP2_PEA_1_P16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA_1 T23. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1 P16 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS
HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY
WO 2005/116850 PCT/IB2005/002555 889 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG 5 TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH 10 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT 15 VHFHGHSFQYKH corresponding to amino acids 1 - 1007 of CERU_HUMAN, which also corresponds to amino acids 1 - 1007 of HSCP2_PEA_1 P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLRLTGEYGM corresponding to amino acids 1008 - 1017 of HSCP2_PEA_1 P16, wherein 20 said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA_1 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 25 sequence LLRLTGEYGM in HSCP2_PEA 1_P16. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 30 secreted. The protein localization is believed to be secreted because both signal-peptide WO 2005/116850 PCT/IB2005/002555 890 prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA_1_P16 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 23 (given according to their position(s) on the 5 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA 1 P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations S.lloiinso~iin CL Aftat ieaminocd(s) Prvoul km.,..NP 26 I-> No 29 I -> No 37 S -> P No 47 V -> No 54 I -> V No 63 I -> No 92 F -> S No 117 Y-> N No 148 K -> R No 173 N -> No 186 P -> No 190 A-> G No 190 A-> No 213 I-> No 218 V->M No 221 F -> No 235 N -> D No 253 F -> L No 275 M -> T No WO 2005/116850 PCT/IB2005/002555 891 286 F -> L No 298 F -> S No 305 T -> A No 445 H -> Y No 451 P -> A No 477 P->L No 493 P -> No 507 S->P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 D-> G No 660 F->S No 675 A-> No 711 Q -> No 727 F -> S No 748 Q -> No 759 Q -> No 759 Q->P No 789 D->N No 927 E->K Yes The glycosylation sites of variant protein HSCP2_PEA_1_ P16, as compared to the known protein Ceruloplasmin precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s) WO 2005/116850 PCT/IB2005/002555 892 Position(s) on known amino Prescnt in variant protein? Position in variant protein acid seqiueCeC 138 yes 138 762 yes 762 397 yes 397 358 yes 358 Variant protein HSCP2_PEA 1 P16 is encoded by the following transcript(s): HSCP2_PEA_1_T23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1_T23 is shown in bold; this coding portion starts at 5 position 250 and ends at position 3300. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P 16 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 25 - Nucleic acid SNPs SNP position On nuloide Alternative nucleic cid Previously known SNP? 63 A -> No 201 G -> T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A->G No 437 T-> No 524 T-> C No 591 T->C No WO 2005/116850 PCT/IB2005/002555 893 598 T->A No 692 A->G No 768 T-> No 807 A-> No 807 A->G No 818 C -> No 818 C -> G No 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No 1287 C->T No 1353 G->A No 1582 C ->T No 1600 C->G No 1617 G->A No 1679 C->T No 1728 A -> No 1768 T-> C No 1851 T->C No 1853 T->C No 1881 T->A Yes WO 2005/116850 PCT/IB2005/002555 894 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T->G Yes 2139 T->C No 2168 A->G No 2199 A->C Yes 2228 T->C No 2274 A -> No 2364 C->T No 2381 A-> No 2429 T->C No 2492 A-> No 2525 A-> No 2525 A->C No 2614 G->A No 3028 G->A Yes 3240 T->C No 3448 T-> Yes 3653 G->T No 3803 T-> No 3804 C -> No Variant protein HSCP2_PEA 1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA lT25. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein WO 2005/116850 PCT/IB2005/002555 895 sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA 1 P6 and CERUHUMAN: 1 .An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P6, comprising a first 5 amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY 10 CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD 15 PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK 20 VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYK corresponding to amino acids 1 - 1006 of CERU_HUMAN, which also corresponds to amino acids 1 - 1006 of HSCP2_PEA_1 P6, and a second amino acid sequence 25 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSL corresponding to amino acids 1007 - 1009 of HSCP2 PEA__P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 30 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized WO 2005/116850 PCT/IB2005/002555 896 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. 5 Variant protein HSCP2_PEA 1 P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA 1 P6 sequence provides support for the deduced sequence of this variant protein according to the 10 present invention). Table 26 - Amino acid mutations SNPpostio~s)on min acd Aerhtiv amno cidsy Previlously known SNP? sequence 26 I -> No 29 I-> No 37 S -> P No 47 V -> No 54 I ->V No 63 I -> No 92 F -> S No 117 Y -> N No 148 K -> R No 173 N -> No 186 P -> No 190 A -> No 190 A -> G No 213 I -> No 218 V->M No 221 F -> No 235 N -> D No WO 2005/116850 PCT/IB2005/002555 897 253 F->L No 275 M->T No 286 F->L No 298 F->S No 305 T->A No 445 H->Y No 451 P -> A No 477 P ->L No 493 P-> No 507 S -> P No 535 L->P No 544 D->E Yes 584 V->A No 598 R->K Yes 607 V->G Yes 640 D->G No 660 F->S No 675 A-> No 711 Q -> No 727 F->S No 748 Q -> No 759 Q -> No 759 Q ->P No 789 D->N No 927 E->K Yes 1008 S -> G No The glycosylation sites of variant protein HSCP2_PEA 1_P6, as compared to the known protein Ceruloplasmin precursor, are described in Table 27 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the WO 2005/116850 PCT/IB2005/002555 898 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 27 - Glycosylation site(s) Postiol~s oi UoN 1 '1111110 Prcentinwantpr iteicn? 'Position in variant protein" 138 yes 138 762 yes 762 397 yes 397 358 yes 358 5 Variant protein HSCP2_PEA 1 P6 is encoded by the following transcript(s): HSCP2_PEA_1_T25, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEA_1 T25 is shown in bold; this coding portion starts at position 250 and ends at position 3276. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative 10 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1 P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs SNP positiolnon nucleotide Alternative nucleic acid , Previously known SNP? Sequence ~A4444 . .44.~4 63 A-> No 201 G-> T No 326 T-> No 335 T -> No 358 T-> C No 360 T-> C No 389 T-> No 409 A -> G No WO 2005/116850 PCT/IB2005/002555 899 437 T-> No 524 T-> C No 591 T->C No 598 T->A No 692 A->G No 768 T -> No 807 A -> No 807 A->G No 818 C -> No 818 C->G No 837 T->C No 887 T -> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A -> G No 1287 C -> T No 1353 G->A No 1582 C -> T No 1600 C->G No 1617 G->A No 1679 C->T No 1728 A -> No 1768 T->C No WO 2005/116850 PCT/IB2005/002555 900 1851 T -> C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G->A Yes 2055 T->C No 2069 T-> G Yes 2139 T->C No 2168 A->G No 2199 A->C Yes 2228 T->C No 2274 A -> No 2364 C -> T No 2381 A-> No 2429 T->C No 2492 A-> No 2525 A -> No 2525 A->C No 2614 G->A No 3028 G->A Yes 3240 T -> C No 3271 A->G No 3364 C -> G No Variant protein HSCP2 PEA 1_P22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA 1_T31. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein WO 2005/116850 PCT/IB2005/002555 901 sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEA 1 P22 and CERU HUMAN: I.An isolated chimeric polypeptide encoding for HSCP2_PEA 1_P22, comprising a first 5 amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHE corresponding to amino acids 1 - 131 of CERU_HUMAN, which also corresponds to amino acids 1 - 131 of HSCP2_PEA_ lP22, a second amino acid sequence 10 bridging amino acid sequence comprising of A, and a third amino acid sequence being at least 90 % homologous to VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN 15 RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS 20 TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVV YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR PYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF 25 DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 262 - 1065 of CERUHUMAN, which also corresponds to amino acids 133 936 of HSCP2_PEA_1 lP22, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCP2_PEA_ _ P22, 30 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino WO 2005/116850 PCT/IB2005/002555 902 acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EAV having a structure as follows (numbering according to HSCP2_PEA 1 P22): a sequence starting from any of amino acid numbers 131-x to 131; and ending at any of amino acid numbers 133 + ((n-2) 5 - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 10 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA_l_P22 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 29, (given according to their position(s) on the 15 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA 1 P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Amino acid mutations SNP position(s) on uino acid Alternative amio acid(s) : Previously known SNP? sequence. 26 I> No 29 I> No 37 S -> P No 47 V-> No 54 I -> V No 63 I-> No 92 F->S No 117 Y->N No 146 M -> T No WO 2005/116850 PCT/IB2005/002555 903 157 F->L No 169 F -> S No 176 T->A No 316 H->Y No 322 P->A No 348 P -> L No 364 P -> No 378 S -> P No 406 L->P No 415 D->E Yes 455 V->A No 469 R->K Yes 478 V->G Yes 511 D->G No 531 F->S No 546 A-> No 582 Q -> No 598 F->S No 619 Q -> No 630 Q->P No 630 Q -> No 660 D->N No 798 E->K Yes 911 C->W No The glycosylation sites of variant protein HSCP2 PEA_1 P22, as compared to the known protein Ceruloplasmin precursor, are described in Table 30 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
WO 2005/116850 PCT/IB2005/002555 904 Table 30 - Glycosylation site(s) Positions) on knon mi no Present in variant protein? Position in varint protein? acid sequence 138 no 762 yes 633 397 yes 268 358 yes 229 Variant protein HSCP2 PEA 1 P22 is encoded by the following transcript(s): HSCP2_PEA 1 T31, for which the sequence(s) is/are given at the end of the application. The 5 coding portion of transcript HSCP2_PEA_1 T31 is shown in bold; this coding portion starts at position 250 and ends at position 3057. The transcript also has the following SNPs as listed in Table 31 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA 1_P22 sequence provides support for the deduced 10 sequence of this variant protein according to the present invention). Table 31 - Nucleic acid SNPs SNP position on nicleoide AlternatiVe nulc acI Previously known SNP? 63 A-> No 201 G-> T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A-> G No 437 T-> No 524 T-> C No WO 2005/116850 PCT/IB2005/002555 905 591 T->C No 598 T->A No 666 A->G Yes 686 T->C No 720 T-> G No 755 T->C No 775 A->G No 897 A->G No 900 C ->T No 966 G->A No 1195 C ->T No 1213 C -> G No 1230 G->A No 1292 C -> T No 1341 A-> No 1381 T->C No 1464 T->C No 1466 T->C No 1494 T->A Yes 1551 A->G No 1613 T->C No 1655 G->A Yes 1668 T->C No 1682 T->G Yes 1752 T->C No 1781 A->G No 1812 A->C Yes 1841 T->C No 1887 A-> No 1977 C->T No WO 2005/116850 PCT/IB2005/002555 906 1994 A-> No 2042 T->C No 2105 A-> No 2138 A-> No 2138 A->C No 2227 G-> A No 2641 G->A Yes 2853 T->C No 2889 A->G No 2982 C -> G No 3554 T-> No 3569 A->C Yes 3829 G->A No 4027 G -> A Yes Variant protein HSCP2_PEA_1_ P24 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA_1 T33. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P24 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1 P24, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MPLTMGKRNLFLLTP corresponding to amino acids 1 - 15 of HSCP2_PEA 1_P24, and a second amino acid sequence being at least 90 % homologous to 15 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY
YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN
WO 2005/116850 PCT/IB2005/002555 907 RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR 5 GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVV YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR PYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI 10 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG corresponding to amino acids 262 - 1065 of CERUHUMAN, which also corresponds to amino acids 16 - 819 of HSCP2_PEA 1_P24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 15 2.An isolated polypeptide encoding for a head of HSCP2_PEA_1_P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MPLTMGKRNLFLLTP of HSCP2_PEAlP24. 20 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal peptide prediction programs (HMM:N on-secretory protein,NN:YES) predicts that this protein 25 has a signal peptide. Variant protein HSCP2_PEA_1_P24 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_l_P24 30 sequence provides support for the deduced sequence of this variant protein according to the present invention).
WO 2005/116850 PCT/IB2005/002555 908 Table 32 - Amino acid mutations i Alternative aninoa ) Previusly known SNP??. sequence 3 L->P No 13 L-> No 29 M->T No 40 F->L No 52 F-> S No 59 T->A No 199 H->Y No 205 P->A No 231 P->L No 247 P-> No 261 S->P No 289 L->P No 298 D->E Yes 338 V->A No 352 R->K Yes 361 V -> G Yes 394 D -> G No 414 F -> S No 429 A-> No 465 Q -> No 481 F->S No 502 Q -> No 513 Q->P No 513 Q -> No 543 D->N No 681 E->K Yes 794 C -> W No WO 2005/116850 PCT/IB2005/002555 909 The glycosylation sites of variant protein HSCP2_PEA_1 P24, as compared to the known protein Ceruloplasmin precursor, are described in Table 33 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 33 - Glycosylation site(s) Pos'itionl(s) onl knlown:l amino1 Present ini variant proteiri?" Positionl in variant protein? acid sequenice 138 no 762 yes 516 397 yes 151 358 yes 112 Variant protein HSCP2_PEA 1 P24 is encoded by the following transcript(s): 10 HSCP2_PEA 1 T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEAlT33 is shown in bold; this coding portion starts at position 353 and ends at position 2809. The transcript also has the following SNPs as listed in Table 34 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 15 known SNPs in variant protein HSCP2_PEAlP24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 34 - Nucleic acid SNPs SNP position on nucleotide Alternatile nucleic acid Previously known SNP? sequence~ 63 A-> No 201 G-> T No 326 T-> No 335 T-> No WO 2005/116850 PCT/IB2005/002555 910 358 T-> C No 360 T->C No 389 T-> No 418 A->G Yes 438 T-> C No 472 T->G No 507 T->C No 527 A->G No 649 A->G No 652 C ->T No 718 G->A No 947 C ->T No 965 C->G No 982 G->A No 1044 C->T No 1093 A-> No 1133 T->C No 1216 T->C No 1218 T->C No 1246 T->A Yes 1303 A->G No 1365 T->C No 1407 G->A Yes 1420 T-> C No 1434 T->G Yes 1504 T->C No 1533 A->G No 1564 A->C Yes 1593 T->C No 1639 A -> No WO 2005/116850 PCT/IB2005/002555 911 1729 C ->T No 1746 A-> No 1794 T->C No 1857 A-> No 1890 A-> No 1890 A->C No 1979 G->A No 2393 G -> A Yes 2605 T->C No 2641 A->G No 2734 C->G No 3306 T-> No 3321 A->C Yes 3581 G->A No 3779 G->A Yes Variant protein HSCP2_PEA_1 P25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEA 1_T34. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P25 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA_1 P25, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 15 HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY
CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH
WO 2005/116850 PCT/IB2005/002555 912 AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD 5 PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH corresponding to amino acids 1 - 621 of CERUHUMAN, which also corresponds to amino acids 1 - 621 of HSCP2_PEA 1 P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide 10 having the sequence CKYCIIHQSTKLF corresponding to amino acids 622 - 634 of HSCP2_PEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA 1_P25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, 15 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CKYCIIHQSTKLF in HSCP2_PEA 1 P25. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 20 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA_1 P25 also has the following non-silent SNPs (Single 25 Nucleotide Polymorphisms) as listed in Table 35, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA _1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). 30 Table 35 - Amino acid mutations WO 2005/116850 PCT/IB2005/002555 913 SNP position(s) on amino acid Alternative amino ac1(s). Previously known SNP? s-equen:ICe 26 I-> No 29 I> No 37 S->P No 47 V-> No 54 I->V No 63 I-> No 92 F-> S No 117 Y->N No 148 K -> R No 173 N -> No 186 P -> No 190 A-> G No 190 A-> No 213 I -> No 218 V -> M No 221 F-> No 235 N->D No 253 F->L No 275 M->T No 286 F->L No 298 F-> S No 305 T->A No 445 H->Y No 451 P->A No 477 P->L No 493 P-> No 507 S -> P No 535 L->P No WO 2005/116850 PCT/IB2005/002555 914 544 D->E Yes 584 V->A No 598 R->K Yes 607 V-> G Yes The glycosylation sites of variant protein HSCP2_PEA_1 P25, as compared to the known protein Ceruloplasmin precursor, are described in Table 36 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 36 - Glycosylation site(s) Position(s) on known iino1111 Present ini varint pirotein? Position in vainant protein? acid seqluence 138 yes 138 762 no 397 yes 397 358 yes 358 Variant protein HSCP2_PEA 1_P25 is encoded by the following transcript(s): 10 HSCP2_PEA_1 T34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1 T34 is shown in bold; this coding portion starts at position 250 and ends at position 2151. The transcript also has the following SNPs as listed in Table 37 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 15 known SNPs in variant protein HSCP2_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 37 - Nucleic acid SNPs SNP position on nucleotide A ltemative nucleic acid previously known SNP? sequence WO 2005/116850 PCT/IB2005/002555 915 63 A-> No 201 G->T No 326 T-> No 335 T-> No 358 T->C No 360 T->C No 389 T-> No 409 A->G No 437 T-> No 524 T->C No 591 T->C No 598 T->A No 692 A->G No 768 T-> No 807 A-> No 807 A->G No 818 C -> No 818 C -> G No 837 T->C No 887 T-> No 901 G->A No 910 T-> No 952 A->G No 1006 T->C No 1053 A->G Yes 1073 T->C No 1107 T->G No 1142 T->C No 1162 A->G No 1284 A->G No WO 2005/116850 PCT/IB2005/002555 916 1287 C->T No 1353 G->A No 1582 C ->T No 1600 C -> G No 1617 G->A No 1679 C ->T No 1728 A-> No 1768 T->C No 1851 T->C No 1853 T->C No 1881 T->A Yes 1938 A->G No 2000 T->C No 2042 G -> A Yes 2055 T->C No 2069 T-> G Yes Variant protein HSCP2_PEA_1_P33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HSCP2_PEAlT45. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEAlP33 and CERU_HUMAN: 10 1.An isolated chimeric polypeptide encoding for HSCP2_PEA 1 P33, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY 15 HSHIDAPKDIASGLIGPLIICKK corresponding to amino acids 1 - 202 of CERU_HUMAN, WO 2005/116850 PCT/IB2005/002555 917 which also corresponds to amino acids 1 - 202 of HSCP2_PEA_1 P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC corresponding to amino acids 203 - 232 of 5 HSCP2_PEAIP33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA_1_P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 10 sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC in HSCP2_PEAlP33. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 15 secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSCP2_PEA 1_P33 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 38, (given according to their position(s) on the 20 amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_l_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Amino acid mutations SNIPlosition(s) on imino acid Alternative limino acid(s) Previously known SNP? 26 I -> No 29 I -> No 37 S -> P No 47 V -> No WO 2005/116850 PCT/IB2005/002555 918 54 I ->V No 63 I -> No 92 F -> S No 117 Y->N No 148 K -> R No 173 N -> No 186 P-> No 190 A -> G No 190 A-> No The glycosylation sites of variant protein HSCP2_PEA_1 P33, as compared to the known protein Ceruloplasmin precursor, are described in Table 39 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the 5 glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 39 - Glycosylation site(s) Position(s) on known atin o Present in varint protein? Position in variant protein? 'aIcd sequence 138 yes 138 762 no 397 no 358 no Variant protein HSCP2_PEA 1 P33 is encoded by the following transcript(s): 10 HSCP2_PEA 1 T45, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA 1 T45 is shown in bold; this coding portion starts at position 250 and ends at position 945. The transcript also has the following SNPs as listed in Table 40 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of WO 2005/116850 PCT/IB2005/002555 919 known SNPs in variant protein HSCP2 PEAlP33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 40 - Nucleic acid SNPs SNP position oni nuctide .Alteraitive nucleic acid Previously knowii SNP? 63 A-> No 201 G-> T No 326 T-> No 335 T-> No 358 T-> C No 360 T-> C No 389 T-> No 409 A-> G No 437 T-> No 524 T-> C No 591 T-> C No 598 T-> A No 692 A-> G No 768 T-> No 807 A-> No 807 A-> G No 818 C -> No 818 C -> G No 837 T-> C No 1099 T-> A Yes As noted above, cluster HSCP2 features 50 segment(s), which were listed in Table 2 5 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
WO 2005/116850 PCT/IB2005/002555 920 Segment cluster HSCP2_PEA 1_node_0 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA _lT4, HSCP2_PEA 1 T13, 5 HSCP2_PEA_1_T19, HSCP2 PEA 1 T20, HSCP2_PEA 1 T22, HSCP2_PEAlT23, HSCP2 PEAlT25, HSCP2 PEA_1_T31, HSCP2 PEA 1 T33, HSCP2_PEA 1 T34, HSCP2_PEA_1_T45 and HSCP2_PEA_1 T50. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts Transcript name Segment Segnt starting position ending position HSCP2 PEA 1 T4 1 395 HSCP2_PEA 1 T13 1 395 HSCP2 PEA 1 T19 1 395 HSCP2 PEA 1 T20 1 395 HSCP2 PEA 1_T22 1 395 HSCP2_PEA 1_T23 1 395 HSCP2_PEA 1_T25 1 395 HSCP2_PEA 1_T31 1 395 HSCP2 PEA 1 T33 1 395 HSCP2 PEA 1 T34 1 395 HSCP2 PEA 1 T45 1 395 HSCP2_PEA_1 T50 1 395 10 Segment cluster HSCP2_PEA_1 node_3 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA 1_T13, 15 HSCP2 PEA l1T19, HSCP2 PEA 1lT20, HSCP2 PEA 1 T22, HSCP2 PEA 1 T23, HSCP2 PEA 1_T25, HSCP2_PEA 1_T31, HSCP2 PEA 1 T34, HSCP2 PEA 1 T45 and WO 2005/116850 PCT/IB2005/002555 921 HSCP2_PEA_1 T50. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts Trntscript name11 Segmenit Segm1ent starting position Cnding psI tiOn HSCP2 PEA 1 T14 396 587 HSCP2 PEA 1 T13 396 587 HSCP2_PEA 1_T29 396 587 HSCP2 PEA 1 T20 396 587 HSCP2_PEA 1 T22 396 587 HSCP2 PEA 1 T23 396 587 HSCP2 PEA 1 T25 396 587 HSCP2 PEA 1 T31 396 587 HSCP2 PEA 1 T34 396. . 587 HSCP2 PEA 1 T45 396 587 HSCP2 PEA_1_T50 396 587 5 Segment cluster HSCP2_PEA 1 node_6 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1 T1 3, HSCP2_PEA_1_T19, HSCP2_PEA 1 T20, HSCP2_PEA_1_T22, HSCP2_PEA 1 T23, 10 HSCP2_PEA_1_T25, HSCP2_PEAlT34, HSCP2_PEA 1 T45 and HSCP2_PEA 1 T50. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts Transcript name Segmenit. Se gent starting position endimg position HSCP2 PEAI lT4 644 830 HSCP2_PEA_1_T13 644 830 WO 2005/116850 PCT/IB2005/002555 922 HSCP2 PEA 1 T19 644 830 HSCP2 PEA 1 T20 644 830 HSCP2 PEA_1 T22 644 830 HSCP2 PEA 1 T23 644 830 HSCP2_PEA 1 T25 644 830 HSCP2 PEA 1 T34 644 830 HSCP2 PEA 1 T45 644 830 HSCP2_PEA_1 T50 644 830 Segment cluster HSCP2_PEA 1 node_8 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T45. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts starting 1 osition endlingk positions HSCP2_PEA 1_T45 857 1634 10 Segment cluster HSCP2_PEA_1_node_10 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_I_T19, HSCP2_PEA_1_T20, HSCP2_PEA 1_T22, HSCP2_PEA_1_T23, HSCP2 PEA__ T25, HSCP2 PEA 1 T34 and HSCP2_PEA 1 T50. Table 45 below describes 15 the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts Transcript name Segment Segment staritmg position ending position WO 2005/116850 PCT/IB2005/002555 923 HSCP2 PEA 1 T4 857 1030 HSCP2 PEA 1 T13 857 1030 HSCP2 PEA 1 T19 857 1030 HSCP2 PEA 1 T20 857 1030 HSCP2 PEA 1 T22 857 1030 HSCP2_PEA_1 T23 857 1030 HSCP2 PEA 1 T25 857 1030 HSCP2_PEA 1_T34 857 1030 HSCP2 PEAlT50 857 1030 Segment cluster HSCP2 PEA_1 node_14 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA 1 T22, HSCP2_PEA 1 T23, HSCP2_PEA_1 _T25, HSCP2_PEA_1 T31, HSCP2_PEA 1 T33, HSCP2_PEA 1 T34 and HSCP2 PEAlT50. Table 46 below describes the starting and ending position of this segment on each transcript. 10 Table 46 - Segment location on transcripts Tramnscrip)t name Segment Segmlent starting positions endig position HSCP2 PEA 1 T4 1089 1236 HSCP2 PEA 1 T13 1089 1236 HSCP2 PEA 1 T19 1089 1236 HSCP2 PEA 1 T20 1089 1236 HSCP2 PEA_1 T22 1089 1236 HSCP2 PEA _1 T23 1089 1236 HSCP2_PEA_1 T25 1089 1236 HSCP2 PEA_1 T31 702 849 HSCP2_PEA 1 T33 454 601 WO 2005/116850 PCT/IB2005/002555 924 HSCP2 PEA 1 T34 1089 1236 HSCP2 PEA_1_T50 1089 1236 Segment cluster HSCP2_PEA 1 node_23 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, HSCP2_PEA_1 T19, HSCP2_PEA 1 T20, HSCP2_PEA 1 T22, HSCP2_PEA 1_T23, HSCP2_PEAlT25, HSCP2_PEAlT31, HSCP2_PEA 1 T33, HSCP2_PEA 1 T34 and HSCP2_PEA_1_T50. Table 47 below describes the starting and ending position of this segment on each transcript. 10 Table 47 - Segment location on transcripts Transcrip1t nMleSemnSget starting- position enIngIII position HSCP2 PEA_1 T4 1458 1597 HSCP2_PEA 1_T13 1458 1597 HSCP2 PEA 1 T19 1458 1597 HSCP2 PEA 1 T20 1458 1597 HSCP2 PEA 1 T22 1458 1597 HSCP2 PEA 1 T23 1458 1597 HSCP2 PEA 1 T25 1458 1597 HSCP2_PEA_1 T31 1071 1210 HSCP2 PEA 1 T33 823 962 HSCP2 PEA 1 T34 1458 1597 HSCP2 PEA 1 T50 1458 1597 Segment cluster HSCP2_PEA_1_node_26 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment 15 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, WO 2005/116850 PCT/IB2005/002555 925 HSCP2_PEA_1 T19, HSCP2_PEA 1 T20, HSCP2_PEA 1 T22, HSCP2_PEA 1 T23, HSCP2_PEA_1_T25, HSCP2_PEA 1 _T31, HSCP2_PEA_1 _T33, HSCP2_PEA 1_T34 and HSCP2_PEA_1 T50. Table 48 below describes the starting and ending position of this segment on each transcript. 5 Table 48 - Segment location on transcripts TranscrIpt name Scgmen Se nt startmg position ending position HSCP2 PEA 1 T4 1598 1750 HSCP2 PEA 1 T13 1598 1750 HSCP2 PEA 1 T19 1598 1750 HSCP2 PEA 1 T20 1598 1750 HSCP2 PEA 1 T22 1598 1750 HSCP2 PEA 1 T23 1598 1750 HSCP2 PEA 1 T25 1598 1750 HSCP2 PEA 1 T31 1211 1363 HSCP2 PEA 1 T33 963 1115 HSCP2 PEA 1 T34 1598 1750 HSCP2 PEA 1 T50 1598 1750 Segment cluster HSCP2_PEA_1_node_29 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment 10 can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEA_1 T13, HSCP2 PEA_1 T19, HSCP2 PEA_ 1T20, HSCP2 PEA 1_T22, HSCP2 PEA 1_T23, HSCP2_PEA_1 T25, HSCP2_PEAlT31, HSCP2_PEA 1_T33, HSCP2 PEA_1_T34 and HSCP2_PEA_1 T50. Table 49 below describes the starting and ending position of this segment on each transcript. 15 Table 49 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 926 TranscP m e Segnt Segment starting Position ending position HSCP2 PEA 1 T4 1751 1962 HSCP2_PEA 1 T13 1751 1962 HSCP2_PEA 1 TI9 1751 1962 HSCP2_PEA 1 T20 1751 1962 HSCP2 PEA 1 T22 1751 1962 HSCP2 PEA 1 T23 1751 1962 HSCP2_PEA 1_T25 1751 1962 HSCP2 PEA 1 T31 1364 1575 HSCP2 PEA 1 T33 1116 1327 HSCP2_PEA 1 T34 1751 1962 HSCP2 PEA 1 T50 1751 1962 Segment cluster HSCP2_PEA_1 _node_31 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA 1 T4, HSCP2_PEA 1_T13, HSCP2_PEA 1 T19, HSCP2_PEA_1 T20, HSCP2_PEAlT22, HSCP2_PEA_1_T23, HSCP2_PEA_1 T25, HSCP2_PEA_1 T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2_PEA 1 T50. Table 50 below describes the starting and ending position of this segment on each transcript. 10 Table 50 - Segment location on transcripts Transcipt name Segmint SegIent startIng pos..tisn ,o ending position . HSCP2 PEA 1 T4 1963 2113 HSCP2_PEA 1 T13 1963 2113 HSCP2_PEA_1 T19 1963 2113 HSCP2_PEA_1 T20 1963 2113 WO 2005/116850 PCT/IB2005/002555 927 HSCP2_PEA_1 T22 1963 2113 HSCP2 PEA 1 T23 1963 2113 HSCP2 PEA 1 T25 1963 2113 HSCP2 PEA 1 T31 1576 1726 HSCP2 PEA 1 T33 1328 1478 HSCP2 PEA 1 T34 1963 2113 HSCP2 PEA 1 T50 1963 2113 Segment cluster HSCP2_PEA_1_node_32 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT34. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts Trans"cript name Segme~ntSemn startingY posItionl ending" position HSCP2 PEA 1 T34 2114 2246 10 Segment cluster HSCP2_PEA_1 node_34 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA__T13, HSCP2_PEA_1_T20, HSCP2_PEAlT22, HSCP2_PEA 1_T23, HSCP2_PEA_1_T25, HSCP2 PEA 1_T31, HSCP2_PEA 1 T33 and HSCP2 PEA 1_T50. Table 52 below describes 15 the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts rascript 1ae Se 11mt Segment starting position eiiding position HSCP2_PEA_1_T4 2114 2326 WO 2005/116850 PCT/IB2005/002555 928 HSCP2_PEA 1 T13 2114 2326 HSCP2 PEA 1 T20 2114 2326 HSCP2 PEA 1 T22 2114 2326 HSCP2 PEA 1 T23 2114 2326 HSCP2 PEA 1 T25 2114 2326 HSCP2 PEA 1 T31 1727 1939 HSCP2 PEA 1 T33 1479 1691 HSCP2_PEA_1 T50 2114 2326 Segment cluster HSCP2_PEA_1_node_52 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT22. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts Transcript name Se-mentSemn startling, position ending position HSCP2_PEA_1 T22 2866 4061 10 Segment cluster HSCP2_PEA1_node_58 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, HSCP2_PEA_I_T19, HSCP2_PEA_1_T20, HSCP2_PEAIT23, HSCP2_PEA_1 T25, HSCP2 PEA 1 T31, HSCP2 PEA 1 T33 and HSCP2 PEA 1 T50. Table 54 below describes 15 the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts Transcript name Segment startjig~os1 ion cdfg position, WO 2005/116850 PCT/IB2005/002555 929 HSCP2_PEA_1_T4 2911 3127 HSCP2 PEA 1 T13 2911 3127 HSCP2 PEA 1 T19 2698 2914 HSCP2 PEA 1 T20 2911 3127 HSCP2 PEA 1 T23 2911 3127 HSCP2 PEA 1 T25 2911 3127 HSCP2 PEA 1 T31 2524 2740 HSCP2 PEA 1 T33 2276 2492 HSCP2 PEA 1 T50 2911 3127 Segment cluster HSCP2_PEA_1_node_72 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA 1_T4 and HSCP2_PEAlT50. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts Transcript n1ame1 SegmnentSemn starting position sending position HSCP2_PEA 1 T4 13431 3636 HSCP2_PEA 1 T50 3431 3636 10 Segment cluster HSCP2_PEA_1_node_73 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56- Segment location on transcripts Transenpt name Segment Segment startmg position endig position WO 2005/116850 PCT/IB2005/002555 930 HSCP2_PEA_1_T4 3637 5580 Segment cluster HSCP2_PEA_1_node_74 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1 T4, HSCP2_PEA_1 T13, HSCP2_PEA 1_T19, HSCP2_PEA 1 T25, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts Tninscrpt name Segment ScgInent starting position e ing position HSCP2_PEA 1 T4 5581 5882 HSCP2_PEA 1 T13 3370 3671 HSCP2_PEA 1 T19 3218 3519 HSCP2_PEA 1 T25 3426 3568 HSCP2 PEA 1 T31 3044 3345 HSCP2 PEA 1 T33 2796 3097 10 Segment cluster HSCP2_PEA 1_node_76 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEAlT13, HSCP2_PEA 1 T19, HSCP2_PEA_1_T31 and HSCP2_PEAlT33. Table 58 below describes 15 the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts Transcriptname begMCm Segment starting position ending position HSCP2_PEA 1 T4 5936 6215 HSCP2_PEA 1 T13 3725 4004 WO 2005/116850 PCT/IB2005/002555 931 HSCP2_PEA 1 T19 3573 3852 HSCP2 PEA 1 T31 3399 3678 HSCP2_PEA 1 T33 3151 3430 Segment cluster HSCP2_PEA_1_node_78 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEAlT31 and HSCP2_PEA_1 T33. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts Transcript name begment Segment starting positinO engdin position HSCP2_PEA 1 T4 6270 6494 HSCP2_PEA 1 T13 4059 4283 HSCP2_PEA_1 T19 3907 4131 HSCP2_PEA_1_T31 3733 3957 HSCP2 PEAlT33 3485 3709 10 Segment cluster HSCP2_PEA_1_node_80 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1 T13, HSCP2_PEA 1 T19, HSCP2_PEA 1 T31 and HSCP2_PEA 1 T33. Table 60 below describes 15 the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts I1rnsen9pt .m .Segment - . - emn startmg position poSition HSCP2_PEA_I_T4 6549 6807 WO 2005/116850 PCT/IB2005/002555 932 HSCP2 PEA 1 T13 4338 4596 HSCP2 PEA 1 T19 4186 4444 HSCP2 PEA 1 T31 4012 4270 HSCP2 PEA 1 T33 3764 4022 Segment cluster HSCP2_PEA_1_node_84 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT20 and HSCP2_PEAlT23. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts HSCP2_PEA_1_T20 3548 4013 HSCP2_PEA 1 T23 3373 3838 According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are 10 included in a separate description. Segment cluster HSCP2_PEA lnode_4 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA lT4, HSCP2_PEA_1_T13, 15 HSCP2_PEA_1_T19, HSCP2_PEA-lT20, HSCP2_PEA 1 T22, HSCP2PEA 1 T23, HSCP2_PEA 1_T25, HSCP2_PEA_1 T31, HSCP2_PEA_1_T34, HSCP2PEA 1 T45 and HSCP2PEAl T50. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts Tn nscriot na tme SegmnCt Scgment startPA ngb positions ending position WO 2005/116850 PCT/IB2005/002555 933 HSCP2 PEA 1 T4 588 643 HSCP2 PEA 1 T13 588 643 HSCP2 PEA 1 T19 588 643 HSCP2_PEA_1 T20 588 643 HSCP2 PEA 1 T22 588 643 HSCP2 PEA 1 T23 588 643 HSCP2 PEA 1 T25 588 643 HSCP2 PEA 1 T31 588 643 HSCP2 PEA 1 T34 588 643 HSCP2 PEA 1 T45 588 643 HSCP2_PEA 1 T50 588 643 Segment cluster HSCP2_PEA_1_node_7 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEA_1 _T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEAlT22, HSCP2_PEA 1_T23, HSCP2_PEA_ 1 _T25, HSCP2_PEA_1_T34, HSCP2_PEAlT45 and HSCP2_PEA 1_T50. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts Transci4t name Scgnient Segmen starting poiti ending posi tion HSCP2 PEA 1 T4 831 856 HSCP2 PEA 1 T13 831 856 HSCP2 PEA 1 T19 831 856 HSCP2 PEA 1 T20 831 856 HSCP2 PEA 1 T22 831 856 HSCP2 PEA 1 T23 831 856 HSCP2 PEA 1 T25 831 856 HSCP2 PEA 1 T34 831 856 WO 2005/116850 PCT/IB2005/002555 934 HSCP2 PEA 1 lT45 831 856 HSCP2 PEA_1 T50 831 856 Segment cluster HSCP2_PEA 1_node 13 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1 T4, HSCP2_PEA 1 T13, HSCP2_PEA 1_T19, HSCP2_PEA 1_T20, HSCP2_PEAlT22, HSCP2_PEAlT23, HSCP2_PEA 1_T25, HSCP2_PEA 1_T31, HSCP2_PEA 1_T33, HSCP2_PEAlT34 and HSCP2_PEA_1 T50. Table 64 below describes the starting and ending position of this segment on each transcript. 10 Table 64 - Segment location on transcripts Transcript name Segment Segment starting position hiding position HSCP2 PEA 1 T4 1031 1088 HSCP2 PEA 1 T13 1031 1088 HSCP2 PEA 1 T19 1031 1088 HSCP2 PEA 1 T20 1031 1088 HSCP2_PEA 1 T22 1031 1088 HSCP2 PEA 1 T23 1031 1088 HSCP2 PEA 1 T25 1031 1088 HSCP2 PEA 1 T31 644 701 HSCP2 PEA 1 IT33 396 453 HSCP2 PEA 1 T34 1031 1088 HSCP2 PEA 1 T50 1031 1088 Segment cluster HSCP2_PEA 1_node_1 5 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment 15 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_l_T13, WO 2005/116850 PCT/IB2005/002555 935 HSCP2_PEA_1_T19, HSCP2_PEA 1 T20, HSCP2_PEA 1 T22, HSCP2_PEAlT23, HSCP2_PEA_1_T25, HSCP2_PEA 1_T31, HSCP2_PEA 1_T33, HSCP2_PEA 1_T34 and HSCP2_PEAIT50. Table 65 below describes the starting and ending position of this segment on each transcript. 5 Table 65 - Segment location on transcripts Traniscript niame Seg-mentI Segmenlt star-ting posi tionl ening11 positions HSCP2 PEA 1 T4 1237 1272 HSCP2 PEA 1 T13 1237 1272 HSCP2_PEA 1 T19 1237 1272 HSCP2 PEA 1 T20 1237 1272 HSCP2 PEA 1 T22 1237 1272 HSCP2 PEA 1 T23 1237 1272 HSCP2 PEA 1 T25 1237 1272 HSCP2 PEA 1 T31 850 885 HSCP2 PEA 1 T33 602 637 HSCP2 PEA 1 T34 1237 1272 HSCP2 PEA 1 T50 1237 1272 Segment cluster HSCP2_PEA_1_node_16 according to the present invention can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEA 1_T13, 10 HSCP2 PEA_1 T19, HSCP2_PEA_1 T20, HSCP2_PEA_1_T22, HSCP2_PEA_1 T23, HSCP2 PEAIT25, HSCP2 PEA_1 T31, HSCP2_PEA_1 T33, HSCP2_PEA_1 T34 and HSCP2_PEA_1_T50. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts Transcript name Sgmen starting position~ ending position WO 2005/116850 PCT/IB2005/002555 936 HSCP2_PEA 1 T4 1273 1285 HSCP2 PEA 1 T13 1273 1285 HSCP2_PEA_1_T19 1273 1285 HSCP2 PEA 1 T20 1273 1285 HSCP2 PEA 1 T22 1273 1285 HSCP2 PEA 1 T23 1273 1285 HSCP2 PEA 1 T25 1273 1285 HSCP2 PEA_1 T31 886 898 HSCP2 PEAlT33 638 650 HSCP2_PEA_1 T34 1273 1285 HSCP2_PEA_1_T50 1273 1285 Segment cluster HSCP2_PEAlnode_18 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA 1 T13, 5 HSCP2_PEA 1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1 _T22, HSCP2_PEAlT23, HSCP2_PEA_1_T25, HSCP2_PEAlT31, HSCP2_PEAlT33, HSCP2_PEA_1_T34 and HSCP2_PEA 1_T50. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts Transcript name Segment Segment SI starting position ending position HSCP2 PEA 1 T4 1286 1308 HSCP2 PEA 1 T13 1286 1308 HSCP2 PEA 1 T19 1286 1308 HSCP2 PEA _ T20 1286 1308 HSCP2 PEA 1 T22 1286 1308 HSCP2_PEA lT23 1286 1308 HSCP2_PEA 1 T25 1286 1308 HSCP2_PEA 1 T31 899 921 WO 2005/116850 PCT/IB2005/002555 937 HSCP2 PEA 1 T33 651 673 HSCP2 PEA 1 T34 1286 1308 HSCP2 PEA 1 T50 1286 1308 Segment cluster HSCP2 PEA_1_node_20 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, HSCP2_PEA 1_T19, HSCP2_PEA 1 T20, HSCP2_PEA 1 T22, HSCP2_PEAlT23, HSCP2_PEA_1_T25, HSCP2_PEA 1 T31, HSCP2_PEAlT33, HSCP2_PEA 1 T34 and HSCP2_PEAIT50. Table 68 below describes the starting and ending position of this segment on each transcript. 10 Table 68 - Segment location on transcripts Tranlscript name111 Segmnt Segmencrt starting position lending position HSCP2_PEA 1 T4 1309 1374 HSCP2_PEA_1_T13 1309 1374 HSCP2_PEA 1_T19 1309 1374 HSCP2_PEA 1 T20 1309 1374 HSCP2 PEA 1 T22 1309 1374 HSCP2 PEA 1 T23 1309 1374 HSCP2 PEA 1 T25 1309 1374 HSCP2 PEA 1 T31 922 987 HSCP2 PEA 1 T33 674 739 HSCP2_PEA 1 T34 1309 1374 HSCP2_PEA 1 T50 1309 1374 Segment cluster HSCP2_PEA_1_node_21 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment WO 2005/116850 PCT/IB2005/002555 938 can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEA 1 T13, HSCP2_PEA_1 T19, HSCP2_PEA_1 T20, HSCP2_PEA_1_T22, HSCP2_PEA 1 T23, HSCP2 PEA_1 T25, HSCP2_PEA_1_T31, HSCP2_PEA 1 T33, HSCP2_PEA 1 T34 and HSCP2_PEA _1 T50. Table 69 below describes the starting and ending position of this segment 5 on each transcript. Table 69 - Segment location on transcripts TrInscriptnameSegment. Segment starting position ending position HSCP2 PEA 1 T4 1375 1457 HSCP2 PEA 1 T13 1375 1457 HSCP2 PEA 1 T19 1375 1457 HSCP2_PEA _1 T20 1375 1457 HSCP2 PEA_1 T22 1375 1457 HSCP2 PEA 1 T23 1375 1457 HSCP2 PEA 1 T25 1375 1457 HSCP2_PEA 1 T31 988 1070 HSCP2_PEA_1 T33 740 822 HSCP2_PEA_1 T34 1375 1457 HSCP2_PEA_1_T50 1375 1457 Segment cluster HSCP2_PEA 1 node_37 according to the present invention is supported 10 by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA 1 T4, HSCP2_PEA_1_T13, HSCP2_PEA 1_T19, HSCP2_PEAlT20, HSCP2_PEA 1 T22, HSCP2_PEAlT23, HSCP2_PEA__T25, HSCP2_PEAlT31, HSCP2_PEA_1 T33 and HSCP2_PEA 1_T50. Table 70 below describes the starting and ending position of this segment on each transcript. 15 Table 70 - Segment location on transcripts WO 2005/116850 PCT/IB2005/002555 939 Trnscript name Segmn emn starting position ending position HSCP2_PEA 1 T4 2327 2368 HSCP2 PEA_1 T13 2327 2368 HSCP2_PEA_1 T19 2114 2155 HSCP2 PEA 1 T20 2327 2368 HSCP2_PEA 1_ T22 2327 2368 HSCP2_PEA 1 T23 2327 2368 HSCP2 PEA 1 T25 2327 2368 HSCP2_PEA 1_T31 1940 1981 HSCP2 PEA 1 T33 1692 1733 HSCP2_PEA _lT50 2327 2368 Segment cluster HSCP2_PEA_1_node_38 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2 PEA 1 T22, HSCP2 PEA 1 T23, HSCP2_PEA 1_T25, HSCP2_PEAlT31, HSCP2_PEAlT33 and HSCP2_PEA_1 T50. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts Transcript nailme SeIment starting position ending p ositioni HSCP2 PEA 1 T4 2369 2442 HSCP2 PEA 1 T13 2369 2442 HSCP2 PEA 1 T19 2156 2229 HSCP2 PEA 1 T20 . 2369 2442 HSCP2 PEAlT22 2369 2442 HSCP2 PEAlT23 2369 2442 WO 2005/116850 PCT/IB2005/002555 940 HSCP2 PEA 1 T25 2369 2442 HSCP2 PEA 1 T31 1982 2055 HSCP2 PEA 1 T33 1734 1807 HSCP2 PEA 1 T50 2369 2442 Segment cluster HSCP2_PEA 1_node_39 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA 1 T22, HSCP2_PEA 1 T23, HSCP2_PEA 1_T25, HSCP2_PEA_1_T31, HSCP2_PEA 1 T33 and HSCP2_PEA 1 T50. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts Transcipt naine Segment Segment starting position ending position HSCP2 PEA 1 T4 2443 2505 HSCP2 PEA 1 T13 2443 2505 HSCP2 PEA 1 T19 2230 2292 HSCP2 PEA 1 T20 2443 2505 HSCP2 PEA 1 T22 2443 2505 HSCP2_PEA 1 T23 2443 2505 HSCP2 PEA_1 T25 2443 2505 HSCP2 PEA 1_T31 2056 2118 HSCP2 PEA 1 T33 1808 1870 HSCP2 PEA_1 T50 2443 2505 10 Segment cluster HSCP2_PEA_1 node_41 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, WO 2005/116850 PCT/IB2005/002555 941 HSCP2_PEA_1_T19, HSCP2_PEA_1 T20, HSCP2_PEA_1_T22, HSCP2_PEA 1_T23, HSCP2_PEA_1 _T25, HSCP2_PEA_1 T31, HSCP2_PEA_1_T33 and HSCP2_PEAlT50. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts Transcriptnme ...- Segment Segment starting, position eniding position HSCP2_PEA 1 T4 2506 2534 HSCP2 PEA 1 T13 2506 2534 HSCP2 PEA 1 T19 2293 2321 HSCP2 PEA 1 T20 2506 2534 HSCP2_PEA 1 T22 2506 2534 HSCP2 PEA 1 T23 2506 2534 HSCP2 PEA 1 T25 2506 2534 HSCP2_PEA 1 T31 2119 2147 HSCP2_PEA_1_T33 1871 1899 HSCP2 PEA 1_TSO 2506 2534 5 Segment cluster HSCP2_PEA_1_node_42 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T22. Table 74 below describes the 10 starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts Transcript name SegmenAt Segmient starting position ending posAii HSCP2*PEA 1 T22 2535 2596 WO 2005/116850 PCT/IB2005/002555 942 Segment cluster HSCP2_PEA 1 node_46 according to the present invention can be found in the following transcript(s): HSCP2_PEAI_T4, HSCP2_PEA_I_T13 HSCP2_PEA_I_T19, HSCP2_PEA_1_T20, HSCP2_PEA 1 T22, HSCP2_PEA_1_T23 HSCP2 PEAIT25, HSCP2 PEA_1 T31, HSCP2_PEA_ _ T33 and HSCP2 PEA_1_T50. 5 Table 75 below describes the starting and ending position of this segment on each transcript. Table. 75 - Segment location on transcripts Tj.,mscilSegment Segmlenit startimg pstio ending" Position HSCP PE_1 4 2535 2559 HSCP2_PEA_1_TI3 2535 2559 HSCP2_PEA 1_T19 2322 2346 HSCP2_PEA 1 T20 2535 2559 HSCP2_PEA 1 T22 2597 2621 HSCP2_PEA 1 T23 2535 2559 HSCP2_PEA_1_T25 2535 2559 HSCP2_PEA 1 T31 2148 2172 HSCP2 PEA_1 T33 1900 1924 HSCP2_PEA 1 T50 2535 2559 Segment cluster HSCP2_PEA_1_node_47 according to the present invention is supported 10 by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_ 1_T4, HSCP2_PEA IT13, HSCP2_PEA_1 T19, HSCP2_PEA_1 T20, HSCP2_PEA 1_T22, HSCP2_PEA _I T23, HSCP2_PEA_1 T25, HSCP2_PEA_1 T31, HSCP2_PEA_1_T33 and HSCP2 PEA _T50. Table 76 below describes the starting and ending position of this segment on each transcript. 15 Table 76 - Segment location on transcripts Transcript name Smn Segment starting position ending position WO 2005/116850 PCT/IB2005/002555 943 HSCP2 PEA 1 T4 2560 2674 HSCP2_PEA _1 T13 2560 2674 HSCP2 PEA 1 T19 2347 2461 HSCP2 PEA 1 T20 2560 2674 HSCP2 PEA 1 T22 2622 2736 HSCP2_PEA_1 T23 2560 2674 HSCP2 PEA_1 T25 2560 2674 HSCP2_PEA_1 T31 2173 2287 HSCP2 PEA 1 T33 1925 2039 HSCP2 PEA 1 T50 2560 2674 Segment cluster HSCP2_PEA 1 node_50 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1 _T13, HSCP2_PEA_1_T19, HSCP2_PEA_1 T20, HSCP2_PEA 1_T22, HSCP2_PEA 1 T23, HSCP2_PEA_1 _T25, HSCP2_PEA_1 T31, HSCP2_PEA 1_T33 and HSCP2_PEAlT50. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts Transcipt nm SegmentSegmen starting position ending position HSCP2 PEA 1 T4 2675 2731 HSCP2_PEA 1 T13 2675 2731 HSCP2_PEA 1 T19 2462 2518 HSCP2_PEA 1 T20 2675 2731 HSCP2_PEA 1 T22 2737 2793 HSCP2 PEA 1 T23 2675 2731 HSCP2 PEA 1 T25 2675 2731 HSCP2 PEA_1 T31 2288 2344 HSCP2 PEA_1 T33 2040 2096 WO 2005/116850 PCT/IB2005/002555 944 HSCP2_ PEA_1 T50 2675 2731 Segment cluster HSCP2_PEA_1 node_51 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2 PEA_1 T20, HSCP2_PEA_1_T22, HSCP2_PEA 1_T23, HSCP2_PEA_1_T25, HSCP2 PEA 1 T31, HSCP2_PEA 1 T33 and HSCP2_PEA_1_T50. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts Touiiscript nmei Segmniit SegmenC~t stairting positions sending Position HSCP2 PEA 1 T4 2732 2803 HSCP2_PEA 1 T13 2732 2803 HSCP2_PEA 1 T19 2519 2590 HSCP2_PEA_l_T20 2732 2803 HSCP2_PEA 1 T22 2794 2865 HSCP2 PEA 1 T23 2732 2803 HSCP2 PEA 1 T25 2732 2803 HSCP2 PEA 1 T31 2345 2416 HSCP2 PEA 1 T33 2097 2168 HSCP2 PEA 1 T50 2732 2803 10 Segment cluster HSCP2_PEA_ 1_node_55 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_l_T4, HSCP2_PEA_1_T13, 15 HSCP2_PEA_1_T19, HSCP2_PEA 1 T20, HSCP2_PEA 1 T23, HSCP2_PEA 1 T25, HSCP2_PEA_IT31, HSCP2_PEA_l_T33 and HSCP2_PEA_1_T50. Table 79 below describes the starting and ending position of this segment on each transcript.
WO 2005/116850 PCT/IB2005/002555 945 Table 79 - Segment location on transcripts Transcript name segment Segmnt7 stirtig positio ending psit on HSCP2_PEA 1 T4 2804 2880 HSCP2 PEA 1 T13 2804 2880 HSCP2 PEA 1 T19 2591 2667 HSCP2_PEAlT20 2804 2880 HSCP2 PEA_1 T23 2804 2880 HSCP2_PEA_1 T25 2804 2880 HSCP2 PEA 1 T31 2417 2493 HSCP2_PEA 1 T33 2169 2245 HSCP2 PEA 1 T50 2804 2880 Segment cluster HSCP2_PEA 1 node_56 according to the present invention is supported 5 by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1 T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEAlT23, HSCP2_PEA_1_T25, HSCP2_PEAlT31, HSCP2_PEA 1 T33 and HSCP2_PEA 1 T50. Table 80 below describes the starting and ending position of this segment on each transcript. 10 Table 80 - Segment location on transcripts starting position ending Positio HSCP2 PEA 1 T4 2881 2910 HSCP2 PEA 1 T13 2881 2910 HSCP2 PEA 1 T19 2668 2697 HSCP2 PEA 1 T20 2881 2910 HSCP2 PEAI T23 2881 2910 HSCP2 PEA 1 T25 2881 2910 WO 2005/116850 PCT/IB2005/002555 946 HSCP2_PEA 1 T31 2494 2523 HSCP2 PEA 1 T33 2246 2275 HSCP2_PEA 1 T50 2881 2910 Segment cluster HSCP2_PEA 1 node_60 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA 1_T4, HSCP2_PEAIT13, HSCP2_PEA_1_T19, HSCP2 PEA 1 T20, HSCP2 PEA 1 T23, HSCP2_PEA 1 T25, HSCP2_PEA_1_T31, HSCP2 PEA_1_T33 and HSCP2 PEA_1_T50. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts Tnscript mune Seg ... mnt Sement stariil oI(' o endiIng pos Iio HSCP2_PEA 1 T4 3128 3234 HSCP2 PEA 1 T13 3128 3234 HSCP2 PEA 1 T19 2915 3021 HSCP2 PEA 1 T20 3128 3234 HSCP2 PEA 1 T23 3128 3234 HSCP2 PEA 1 T25 3128 3234 HSCP2 PEA 1 T31 2741 2847 HSCP2 PEA_1_T33 2493 2599 HSCP2 PEA 1 T50 3128 3234 10 Segment cluster HSCP2_PEA_1_node_61 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T13, 15 HSCP2_PEA_1 T19, HSCP2_PEA_1 T20, HSCP2_PEAIT23, HSCP2_PEA 1 T25, WO 2005/116850 PCT/IB2005/002555 947 HSCP2_PEA_1_T31, HSCP2 PEAlT33 and HSCP2_PEA 1_T50. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts Transcript namie >Segment Segment starting' ptosition endin position HSCP2 PEA 1 T4 3235 3267 HSCP2 PEA_1 T13 3235 3267 HSCP2 PEA 1 T19 3022 3054 HSCP2 PEA _1 T20 3235 3267 HSCP2 PEAIT23 3235 3267 HSCP2_PEA_1 T25 3235 3267 HSCP2 PEA 1 T31 2848 2880 HSCP2 PEA 1 T33 2600 2632 HSCP2_PEA_1 T50 3235 3267 5 Segment cluster HSCP2_PEA 1 node_67 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA 1_T19, HSCP2_PEA_1_T20, HSCP2_PEAlT31, HSCP2_PEAlT33 and HSCP2_PEAlT50. Table 83 below describes the starting and ending position of this segment on each transcript. 10 Table 83 - Segment location on transcripts TIn scipt mei Segmenit Segment stting psition y endig positio HSCP2 PEA 1 T4 3268 3272 HSCP2 PEA 1 T19 3055 3059 HSCP2 PEA 1 T20 3268 3272 HSCP2_PEA 1 T31 2881 2885 HSCP2_PEA 1_T33 2633 2637 HSCP2_PEA 1_T50 3268 3272 WO 2005/116850 PCT/IB2005/002555 948 Segment cluster HSCP2_PEA 1 node_68 according to the present invention is supported by 88 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA_1_T19, HSCP2_PEAlT20, HSCP2_PEAlT25, HSCP2_PEA_1 T31, HSCP2_PEA 1 T33 and HSCP2_PEA_1 T50. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts Transc-rpt name Segmencrt Segment starting position ending position HSCP2 PEA 1 T4 3273 3328 HSCP2_PEA 1 T19 3060 3115 HSCP2 PEA 1 T20 3273 3328 HSCP2 PEA 1 T25 3268 3323 HSCP2 PEA 1 T31 2886 2941 HSCP2 PEA 1 T33 2638 2693 HSCP2 PEA 1 T50 3273 3328 10 Segment cluster HSCP2 PEA_1_node_69 according to the present invention is supported by 96 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAlT4, HSCP2_PEA 1 T13, 15 HSCP2_PEA 1_T19, HSCP2_PEA_1 T20, HSCP2_PEA 1 T25, HSCP2_PEA 1 T31, HSCP2_PEA 1_T33 and HSCP2_PEA_1_T50. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts Transcript name Segment Segment str- n ~ enigpoI1 WO 2005/116850 PCT/IB2005/002555 949 HSCP2_PEA_l_T4 3329 3430 HSCP2 PEA 1_T13 3268 3369 HSCP2 PEA 1 T19 3116 3217 HSCP2 PEA 1 T20 3329 3430 HSCP2 PEA 1 T25 3324 3425 HSCP2 PEA__1 T31 2942 3043 HSCP2 PEA 1 T33 2694 2795 HSCP2_PEA_l_T50 3329 3430 Segment cluster HSCP2_PEA_ 1_node_70 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1 T20. Table 86 below describes the starting 5 and ending position of this segment on each transcript. Table 86 - Segment location on transcripts Traniscnpt~n" Mill _eIlftsge HSCP2 PEA 1 T20 3431 3442 Segment cluster HSCP2_PEA 1 node_75 according to the present invention is supported 10 by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_l_T4, HSCP2_PEA_1_T13, HSCP2_PEA 1_T19, HSCP2 PEA_l_T31 and HSCP2_PEA_1_T33. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts Tramnscrp[t niarne SegmentII Segmenit starting position . ending position HSCP2 PEA 1 T4 5883 5935 HSCP2 PEA_1_ T13 3672 3724 WO 2005/116850 PCT/IB2005/002555 950 HSCP2_PEA 1 T19 3520 3572 HSCP2_PEA_1 T31 3346 3398 HSCP2_PEA _1 T33 3098 3150 Segment cluster HSCP2_PEA 1_node_77 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA 1 T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts TranmiiPt name 5r s****'el>'i Scve stril )S1t101Oi~ endMIgpsiil HSCP2 PEA 1 T4 6216 6269 HSCP2 PEA 1 T13 4005 4058 HSCP2 PEA 1 T19 3853 3906 HSCP2 PEA 1 T31 3679 3732 HSCP2 PEA 1 T33 3431 3484 10 Segment cluster HSCP2_PEA_1_node_79 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA 1 T19, HSCP2_PEA 1 T31 and HSCP2_PEA 1 T33. Table 89 below describes 15 the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts Trascrpt 1name Segment Seit startig position ending posito HSCP2_PEA 1 T4 6495 6548 WO 2005/116850 PCT/IB2005/002555 951 HSCP2 PEA 1 T13 4284 4337 HSCP2_PEA 1 T19 4132 4185 HSCP2 PEA 1 T31 3958 4011 HSCP2_PEA 1 T33 3710 3763 Segment cluster HSCP2_PEA_1 node_82 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment 5 can be found in the following transcript(s): HSCP2_PEA_1_T20 and HSCP2_PEA 1_T23. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts T mcript ne Segrpent Segmnent starting position euigPosition HSCP2 PEA 1 T20 3443 3547 HSCP2 PEA 1 T23 3268 3372 10 Variant protein alignment to the previously known protein: Sequence name: CERU HUMAN 15 Sequence documentation: Alignment of: HSCP2_PEA 1 P4 x CERU HUMAN Alignment segment 1/1: 20 Quality: 10630.00 Escore: 0 WO 2005/116850 PCT/IB2005/002555 952 Matching length: 1060 Total length: 1060 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 5 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 10 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I l l l l I I I I l l I lI I I lI I I I I I I I I I l IlII l lI l llI I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 15 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I l l l l l l l l l I I I l l l l l l l l I I l l l l l l l l l l l l l l l l l l 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 20 IIIIIllIII I llIIII I II| Ill llI llI l l lIIIlIll l III l 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 IIlIlIllIlIIlllIlIllllIlIllIllIIIlllIII 25 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 IIIIIlllllllllllIIlllllllIIIIIlIllIIlllllIIIIllllII 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 30 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 WO 2005/116850 PCT/IB2005/002555 953 I I II II l l 1l l l l l IIl l l l l I 1 l l l1 I I I l l l l l l l l l I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 5 I I l l l I l I l l l l l l ll lI l l III 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I l l l l l l l I l l l l I l l l l l l l l l I I I l l l l l l l I 10 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 15 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 11111111111111 I11111 l 111I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 20 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I l i l l l l l II I1 l l l 1 I I l l l I I I I I I I I I1 l l l lllll l l l 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 2 5 l l l l l l l l l l lll I I Ill l l 1lI 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 30 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 WO 2005/116850 PCT/IB2005/002555 954 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 5 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I l l l l l l l l l l l l l1 l I l l II I I I 1 1l 1 1I I 1l l l l l l l l l l l l l l l l l l l 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 1 0 I I I I l lI I I l lI l l l I l I l l lll 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I l l l l l l l l l l l l l l II I l l l l l l l l l l l l l l l l l l ili l l i I 15 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 20 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I111It111I11I I 1111lii I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 25 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I l l l l l l I l l l l l l l l l iI l l l l I I li l l l l I I Il i l l l l I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 30 III I i l I l I I l l I l l I l l I I I l l l III 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 WO 2005/116850 PCT/IB2005/002555 955 1051 ETTYTVLQNE 1060 l111111III 1051 ETTYTVLQNE 1060 5 10 Sequence name: CERU HUMAN Sequence documentation: 15 Alignment of: HSCP2 PEA 1 P8 x CERU HUMAN Alignment segment 1/1: Quality: 10079.00 20 Escore: 0 Matching length: 1006 Total length: 1006 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 25 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 30 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 WO 2005/116850 PCT/IB2005/002555 956 I ll i l l l I l l l l l l l lI l l l l ll1l l l lIII l l Il l l l l l l l l I l 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 5 I I II l l l l l II I I l l l l i l l l l l l I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I i l Il l I I Il l l l l l l l I l l l l l l l l l I 10 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I l l I I I1ll l l l l l l l l l l 1I 1l l l l l l l l il l l l l l l l l l l l l l l l 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 15 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I l l I l l l l l l l l l I l l l l l l l ll l l I I l I l l l l l II 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 20 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I 1 l llI Il l l l l l l l l l l l l l l II l l Il i l l l l ll1 l I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 25 I I I II I Ill1 l 1 I I l l I l lII I I l l l 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 i 30 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400I I 30 351 FFQVQECNKSSSKDNIRGKHVRI-YYIAAEEIIWNYAPSGIDIFTKENLTA 400 WO 2005/116850 PCT/IB2005/002555 957 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 l i l l l l l I I I I I I I l l l l l l l l l l l l l l l l l l l I l l I II 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 5 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I l l II 1l l l l l Il l i l I I1 l l 1l l lI I I I l l1I l l II 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 1 0 I I I l l l l I I l l l l lI llI I I l l l l I l l 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I IIIII l l l l l l l I I I l l l l II I II I I I 15 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 I I I l l l Il l l l lI I I I I I I I l i l l l l III Il l l l l l l l I l l 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 20 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I l l I III l l l l l l l l l l l l l I l l l l l l I l l l l l l l lI I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 25 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I l l l lI l l l l IIII l l l l l l l l l l l l l l III l l I l l l l I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 3 0 IIiI I II II II I 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 WO 2005/116850 PCT/IB2005/002555 958 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 IlllllllllllIIIIllllllllllllllllllllIIllIIllllll 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 5 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 IllllllllIllllllllIIIIIIlllllllllIII11lllllll1II 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 10 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 11111111111111111111111111111111111111111111111111 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 15 Il l I lllllllI lll III ll l ll1111111 l111111111 1111 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 1001 HSFQYK 1006 111111 20 1001 HSFQYK 1006 25 Sequence name: CERU HUMAN Sequence documentation: 30 Alignment of: HSCP2_PEA 1 P14 x CERU HUMAN ..
WO 2005/116850 PCT/IB2005/002555 959 Alignment segment 1/1: Quality: 9832.00 5 Escore: 0 Matching length: 994 Total length: 1065 Matching Percent Similarity: 99.90 Matching Percent Identity: 99.90 10 Total Percent Similarity: 93.24 Total Percent Identity: 93.24 Gaps: 1 Alignment: 15 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 liIIlllllllllIlillllllIIIIllllllllllllillIll 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 20 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 IlIllIIIIIIIIIIIIIIIIIIllllllllIllIIIlIIlllllII 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 2 5 I l l l l l i l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I l l l l l l l l l l I I II l ll l l ll l l l l l l I l l I l0 30 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 WO 2005/116850 PCT/IB2005/002555 960 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 II I I111111 1111111I I111111111 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 5 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I l l I I l llil l l l l I l l l l l l l I li 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 1 0 I l l l l l l lIll l l l l l l l l l l l l l l I I 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 15 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I l l l II I l l I I l l I ~I l l I l l l l I l l 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 20 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I l l l l l l l l l I l i l l l l l I l l l l l l l I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 25 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 30 I l l l l l l l l iI III Il l l l l l l I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 WO 2005/116850 PCT/IB2005/002555 961 601 TTAPDQVDKEDEDFQESNKMH ............................. 621 I l l l l l l l l l l l l l l l lI I I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 5 622 ..........................................WTFNVECL 629 1111111 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 10 630 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 679 I l I l lI lI I I I l llll l IIIIIIII I I l l l I l I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 680 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 729 1 5 I l l I l l l II I l l l l li l l I I I I I I l l l l l l l I l l I 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 730 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 779 I l I l lI l l l l l l l l l l i l I I l l l l l llll II l l l I l l l l l 20 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 780 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 829 I l l l l l l1i l l l I I I I I l l l l l l l II I i i i I I I 1 l II 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 25 830 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 879 I I l l i I li l l l l l l l l l l l l l l I I l l l l l I l l l l l l 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 30 880 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 929 I I l l l l l l l l l l l l I I Il l l l l l l I l l l I l l i l l l l l l l I l l l WO 2005/116850 PCT/IB2005/002555 962 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 930 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 979 I I I I I lI I I I I I I l l l I I I I I l I I I I I lI I I I I I I I I I I I I I l l l lI 5 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 980 ETTYTVLQNEDTKSG 994 IllllIlllllIII 1051 ETTYTVLQNEDTKSG 1065 10 15 Sequence name: CERU HUMAN Sequence documentation: 20 Alignment of: HSCP2 PEA 1 P15 x CERU HUMAN . Alignment segment 1/1: Quality: 10630.00 25 Escore: 0 Matching length: 1060 Total length: 1060 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 30 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 963 Gaps: 0 Alignment: 5 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I l l l l l l i l l l l l l l I l l l l l l i ll l i l i I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 10 l IIi l 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I11ll 1ll 1 1l1l 11lli l ll llllllilllllll ll lllllll1I 15 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I 11111 i1111 1I1 i1 111 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 20 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I l l l l l l l l l l l l l l i l l I i l l l I l l l i I l 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 25 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 111111I I 111111111 Ii I 111111 i 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 3 0 I I l l l l l I l i I l l l l l l Il I l 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 WO 2005/116850 PCT/IB2005/002555 964 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I l l l II I III l l l l I I I I l l l l Il l I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 5 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 | | I l I I l l l I l l l l l lI I l l l i l l l l I 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 10 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 15 l l l l Il l ll l l l l l l I l l l l I l I l I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I l i l I I l l l I l l I l l l l I l l l l I l 20 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 25 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 30 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 II I I l l l l l l l l l li l l I WO2005/116850 PCT/IB2005/002555 965 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 I I I l l l l l I I i l l l l l I I l l l~l l l 5 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I 1 l l l l l l l l l 1 l l l l l l lIll l l l l l l l l l I l l I ll1l i l l l l l l 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 10 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 IIII1lllIllllllll1IllllllIIllllllllllllllllllll 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 15 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I l l l I I I I Il l l l l l l l I I I I l l l I l 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 20 Il IlIII 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 I l l l l I I I I I I1 l1 l lIIIl l l I l l l l l l l l l llIl l 25 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 1051 ETTYTVLQNE 1060 1051 ETTYTVLQNE 1060 30 WO 2005/116850 PCT/IB2005/002555 966 5 Sequence name: CERU HUMAN Sequence documentation: Alignment of: HSCP2 PEA 1 P2 x CERU HUMAN 10 Alignment segment 1/1: Quality: 7636.00 Escore: 0 15 Matching length: 761 Total length: 761 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent 20 Identity: 100.00 Gaps: 0 Alignment: 25 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 lil l l l l i l l i l l I l l l li l l li l l l i ll i l l l l l l li l 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 3 0 l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 WO 2005/116850 PCT/IB2005/002555 967 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I111 111I 1 I II1111 lIi 111I 1 I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 5 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I l Il I lI lIl l I I Il i l I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 10 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I 111 II1 I11 1I 111 I 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 15 I l l I l i l I I l l I I l l l l l l l ll l l l I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I 1l l l l l I I lI l l l l l l1 l l l l l l l l l.1 I l l l l l l l l l l l l l I 20 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 25 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 30 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I l l l I I I l l l l II I l l l I I Ill l l l l l l l lI l l III I I l l I I WO 2005/116850 PCT/IB2005/002555 968 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 5 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I l I l l I I I I l l I l l I lI l llII I I I I I l l l l I I I I l I I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 10 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 I l I l I l I I l lI IlI I I I I I I I I I I I I I I I I I l 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 15 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I III II IIlIll lI IIl lIII I I I I IIl lIl l l l l l l l l l I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 20 I l l l l lI I I l III I l I I I lI III lI I l III I I l l|I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQ 761 I111111II 25 751 WEKELHHLQEQ 761 30 WO 2005/116850 PCT/IB2005/002555 969 Sequence name: CERU HUMAN Sequence documentation: 5 Alignment of: HSCP2 PEA 1 P16 x CERU HUMAN Alignment segment 1/1: Quality: 10092.00 10 Escore: 0 Matching length: 1007 Total length: 1007 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 15 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 20 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 IllIllllIIIIIlIlllIIIIIIIIIIIIIIIIIllllIIIIIlI 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 25 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 IlllllllllllllllllllllIIIIlllillllllllllllllllllllll 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 3 0 I l I l l l l I I I I I I l l l l l l l l l lI I I IIIIl l l I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 WO 2005/116850 PCT/IB2005/002555 970 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 5 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 10 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I l l l I I l I I l l l l l I I I l l l l l l l l I l 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 15 IlII IIl l 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 IIlli llllll llll I ll1 l1 l 1I IIIllll11 lllIl IIIl 20 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I l l I l i l l I I I I l lI l l l l l I I 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 25 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I l l l I l l l l 1l l I l l1 I I I l l l l l l l l l I l l l l l l l l 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 30 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I l l l I l l l l I I l i l l I I I I l I l l l I l l WO 2005/116850 PCT/IB2005/002555 971 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I Ill l l I l l l l l l l lI l l l i l l l l l l l 5 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 I I l i l I I I l I l Il l l l l l l I I l l l I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 10 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 l l l l I I I l l l I l l llIi l l l I III l 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 15 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 20 I l l l l Ill II IIl l ll II I I l l 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I l I l l l I l l I I l l I I Ill l l l l l I l 25 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I I I I l l l l l I l l l l I I I l l l l I l l l l i I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 30 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 WO 2005/116850 PCT/IB2005/002555 972 I I l l lll l l l lll l l l l l l l l l l l l l l l l l l l l l I I I I I Ill l l 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 5 I I I l I l llI I lIlI I I I l l l l l l I l llI I II I I l1 II I IlI I I l 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 1001 HSFQYKH 1007 I1111111 10 1001 HSFQYKH 1007 15 Sequence name: CERU HUMAN Sequence documentation: 20 Alignment of: HSCP2 PEA 1 P6 x CERU HUMAN Alignment segment 1/1: 25 Quality: 10079.00 Escore: 0 Matching length: 1006 Total length: 1006 Matching Percent Similarity: 100.00 Matching Percent 30 Identity: 100.00 WO 2005/116850 PCT/IB2005/002555 973 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 5 Alignment: 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I Il l l lI IIIIll l l l l l l l I l l i l l l I I I l 1 MKILILGIFLFLCST.AWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 10 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I l l l l I I I I l l I I I l l l I I I l l I l I l l l l l l 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 15 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I l I l l I I i l I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 20 I I II I I I II II 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I1 l i l l l l I I I1 l l l IIII Il l l l l l I l l i l l l I l l I II 25 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I Il l l l l l l l l I ll I II l l l l l l l l lI I I I Ill I l 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 30 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 WO 2005/116850 PCT/IB2005/002555 974 IIIl l l l l l l I l l l l l I I l l l l l l I l l l I l 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 5 II l l l l l l IIl l Il l l l l ll Il I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 10 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 Illll1lll1ll1llll1lll1l111lll1III1ll1IIIII1lll1III 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 15 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 20 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I l l ~ l II I l l I l l l l l I l l l l l l l l l Il 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 25 IIII I l l l I l 1 1 l 1 I I l l llIl1l l l I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 II30 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700I I 30 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 WO 2005/116850 PCT/IB2005/002555 975 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 11111 1111 1t1 i1111 11111 I 111 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 5 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 I l l l l l 1 I l i l l l l I I l l l l l l Il l l l l i l l l l l l l l l l l l l II 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 1 0 Ill l I I I I l l l I I I I I I l l llI I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I l l l l l l l l l l l l l l l l l l l1 l i l l l l 1 1l l l l 1I I I I I I 1 l l l1 I1 l l II 15 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 20 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 l i l l l l l l Il l l l I I I I I l l l l l l l l i l l I l l l l l I I l l l l l l i 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 25 1001 HSFQYK 1006 111111 1001 HSFQYK 1006 30 WO 2005/116850 PCT/IB2005/002555 976 Sequence name: CERU HUMAN 5 Sequence documentation: Alignment of: HSCP2 PEA 1 P22 x CERU HUMAN Alignment segment 1/1: 10 Quality: 9277.00 Escore: 0 Matching length: 936 Total length: 1065 15 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.89 Total Percent Similarity: 87.89 Total Percent Identity: 87.79 Gaps: 1 20 Alignment: 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I l l l I I I I I I I I I I I I I I I I I I I I I l lI I l l l l l l l l l l l l l l l l 25 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I IIlI I I I I I I I l I I I I I l l l l l l l I I I I I I I l l l I I I II I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 30 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHE ................... 131 WO 2005/116850 PCT/IB2005/002555 977 IlllllllllIIIIllllllllllllllllII 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 131 .................................................. 131 5 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 131 .................................................. 131 10 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 132 .......... AVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 171 : l 1i i I Il l i l1 1 1 1 1 1 1 i i l l l l l l l l l l l l l l l 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 15 172 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 221 IIIIIillllillllllllllllillilllilllIIIilllllllllll 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 20 222 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 271 Illlillllllillllillllillilllilllllilllllllillllli 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 272 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 321 2 5 l lI l l l l l l I l l ll l l l l l l l l l l l l l l I l l l l l l i l l l l l 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 322 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 371 30 451 PVIWAEVGDTlRVTFlillllllliiNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500llllllllllllllll 30 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 WO 2005/116850 PCT/IB2005/002555 978 372 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 421 l l l l l III l l l l l Il l I l l l l l lI l l l I I I I l i l l I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 5 422 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 471 II Il l l I IIIl l l l l l l l l l l l l l l l l l l l l l ll l l Il I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 472 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 521 10I l l l l I I I l lI l l l I l l I l l l l l l 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 522 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 571 I l l l l l I l I I I l l l I I I l l l l l l lIl l l 15 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 572 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 621 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 20 622 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 671 IIl1 1 lIll l l1ll lI 1ll l l l l l l l l l l l I I I l l I I I l l 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 25 672 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 721 I l l1 l l i l II l l l l I II I l l l l l l l l l l l l l I I l l1l l l l I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 722 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 771 3 0 I I I I I l l I I I I I l I I l l l l l l I l I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 WO 2005/116850 PCT/IB2005/002555 979 772 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 821 IllIllllllllllllllllllllIlllilllllllllllllllll 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 5 822 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 871 Illl l1 l ll l l l ll l l l l l l l l l l l l l l I l l l l l l l l l l l l l l l l l l 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 10 872 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 921 IIlllllllllillllllIIIlllllllilllllllllllllllllllll 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 922 ETTYTVLQNEDTKSG 936 1 5 I l I IIl l l II I I I 1051 ETTYTVLQNEDTKSG 1065 20 Sequence name: CERU HUMAN 25 Sequence documentation: Alignment of: HSCP2_PEA 1 P24 x CERU HUMAN Alignment segment 1/1: 30 WO 2005/116850 PCT/IB2005/002555 980 Quality: 8074.00 Escore: 0 Matching length: 804 Total length: 804 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 16 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRI 65 Illllllllllllllll1lllll lllll1lllllllllllll1lilll1 15 262 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRI 311 66 DTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSS 115 I I l l l l I l I I l l l l l l I I II l l l l l lI III I I I I l l l l l l I I I I 312 DTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSS 361 20 116 SKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQ 165 IlIlllIllllllllIIIIIIIIIIIIlllIIlllIIIlIIIIII 362 SKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQ 411 25 166 GTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTI 215 I iI l lI I l l lI I lI lI I l lI l l lI l l l l l lI I lI I I I I l l lI I I 412 GTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTI 461 216 RVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAP 265 3 0 I I l lI l l l l l l I I I I lI I I l l l l lI I I l l lII I 462 RVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAP 511 WO 2005/116850 PCT/IB2005/002555 981 266 TETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLIGPMKICK 315 Il l l l ll l I I 1l l1 I I1 l l I1 l l l l1 I I IIIlI I I I I l l l I 512 TETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLIGPMKICK 561 5 316 KGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKED 365 l I Il l l l l l I II I l l l l l l II I I I I I1 l 1l l l l l I l l 562 KGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKED 611 10 366 EDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFS 415 I l l I I I l l l l l l I I l l1l l1 l I l l l l l l l I l l l l l I II 612 EDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFS 661 416 GNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQ 465 1 5 I l l l l l l l l l I l l II I I l l l I I I 662 GNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQ 711 466 KYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ 515 l i l l l l i I I I I I I I l l l l l l l l l l l l l l l1 l I I1 I I Il 20 712 KYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ 761 516 NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQ 565 I I l l l I I I I I I I I I I I I I I I I l l Il 762 NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQ 811 25 566 LHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKI 615 I I111 111 I11 I1111 I1111 I1 I11 II 812 LHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKI 861 30 616 PERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPR 665 I l l I l l l l l l l l l l l l l l l l l l I l l l I I I l l l1l l i I I WO 2005/116850 PCT/IB2005/002555 982 862 PERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPR 911 666 RKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI 715 I1lll11lllll11llllll1lll1l1llllllllllll1l1llll11II 5 912 RKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI 961 716 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVY 765 lllllllllllllllllllllllIIIIIIInllllllllllllllI 962 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVY 1011 10 766 SSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNED 815 IIIIIllllIllllllllllllllllllIIIllIIIIIllllll 1012 SSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNED 1061 15 816 TKSG 819 IIII 1062 TKSG 1065 20 Sequence name: CERU HUMAN 25 Sequence documentation: Alignment of: HSCP2 PEA 1 P25 x CERU HUMAN 30 Alignment segment 1/1: WO 2005/116850 PCT/IB2005/002555 983 Quality: 6196.00 Escore: 0 Matching length: 621 Total length: 621 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 10 Alignment: 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I l l I l l l l l l llI I l I I I I I I l I I l l l l l I I I I l l l l l l IlI I 15 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 IIIIIIIIIlIIIIIIIIIIIIlllllllllllllllIllIIIIIIIIIII 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 20 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 IlIlIlIllIllllllIlllIIIIIIlIlllIllIIIIIIIIlllII 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 25 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 IIIllllIIIIlIIIIIlllllIllIIIIIllllIIIIIIIIlllIII 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 3 0 I II l lI I lIlI lI l l l l l l lI l l l l llI I l l I I I I I I l l l I I I I 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 WO 2005/116850 PCT/IB2005/002555 984 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I l lI l l i l II I I l l l l l I I I Il l l l l l l Il l l I I1 l l l l I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 5 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I lI ll I lIl i l l I l I 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 10 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I1111 I11111111I111I 111I1 11111111 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 1 5 I l l l I l l l l I I I l lII l l l l l l I 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 II l l l1 l l I II I 1l l l l l l l I I l I l I I I I I III l I l 20 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I l I l l l l l l l l l l lI l II I l l l l l l l l l l l l i l l l I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 25 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I l l l III l l l l l l l l l l l l l l I l l l l l l I Ii 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 30 601 TTAPDQVDKEDEDFQESNKMH 621 Il i l l l l l l l II I l I l l II WO 2005/116850 PCT/IB2005/002555 985 601 TTAPDQVDKEDEDFQESNKMH 621 5 Sequence name: CERU HUMAN 10 Sequence documentation: Alignment of: HSCP2 PEA 1 P33 x CERU HUMAN Alignment segment 1/1: 15 Quality: 2003.00 Escore: 0 Matching length: 202 Total length: 202 20 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 25 Alignment: 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 31 I l l l l l l l l l l l Tl l l l l IIGll l l l l l l lI II I I l l l1 l l lI l5ll l l 30 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 WO 2005/116850 PCT/IB2005/002555 986 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I l l l I I I | I I I I | I| I|I|I|IIIlIll l I I I I l I I I l I I | I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 5 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I l II I I I I I I I 1 1 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 10 I l I l I I I I I I lI l l lI I I I I I I I I I 1 1 1 1 II I I I I 1 1 1 1 1 I I II I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 201 KK 202 II 15 201 KK 202 DESCRIPTION FOR CLUSTER HUMTEN Cluster HUMTEN features 19 transcript(s) and 57 segment(s) of interest, the names for 20 which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript Nanie SEQ I NO: HUMTENPEA 1_T4 857 HUMTEN PEA _lT5 858 HUMTEN PEA 1_T6 859 HUMTEN PEA 1 T7 860 HUMTEN PEA 1 TIl1 861 HUMTEN PEA 1 T14 862 HUMTENPEA _1 T16 863 WO 2005/116850 PCT/IB2005/002555 987 HUMTENPEA 1 T17 864 HUMTENPEA 1_T18 865 HUMTEN PEA 1 T19 866 HUMTEN PEAlT20 867 HUMTEN PEAIT23 868 HUMTEN PEA 1 T32 869 HUMTENPEAlT35 870 HUMTENPEA 1 T36 871 HUMTEN PEA 1 T37 872 HUMTEN PEA 1 T39 873 HUMTENPEA 1_T40 874 HUMTEN PEA 1 T41 875 Table 2 - Segments of interest Segment Name SEQ ID NO: HUMTEN PEA 1 node 0 876 HUMTEN PEA 1 node 2 877 HUMTEN PEA 1 node 5 878 HUMTEN PEA 1 node 6 879 HUMTEN PEA 1 node 11 880 HUMTEN PEA 1 node 12 881 HUMTEN PEA 1 node 16 882 HUMTEN PEA 1 node 19 883 HUMTENPEA_1 node 23 884 HUMTEN PEA_1 node 27 885 HUMTENPEA_1 node_28 886 HUMTENPEA 1 node_30 887 HUMTENPEA 1 node 32 888 HUMTENPEA 1_node_33 889 WO 2005/116850 PCT/IB2005/002555 988 HUMTEN_ PEA_1 node_35 890 HUMTENPEA 1 node_38 891 HUMTENPEA 1 node 40 892 HUMTEN PEA_1 node_42 893 HUMTEN PEA 1 node_43 894 HUMTEN PEA 1 node 44 895 HUMTEN PEA 1 node 45 896 HUMTENPEA _ node_46 897 HUMTEN PEA 1 node 47 898 HUMTENPEA 1 node 49 899 HUMTEN PEA 1 node 51 900 HUMTEN PEA 1_node_56 901 HUMTEN PEA 1 node_65 902 HUMTENPEAInode_71 903 HUMTENPEA 1_node 73 904 HUMTENPEA 1 node_76 905 HUMTEN PEA 1 node 79 906 HUMTEN PEA 1 node 83 907 HUMTEN PEA 1 node 89 908 HUMTEN PEA 1 node 7 909 HUMTEN PEA 1 node 8 910 HUMTEN PEA 1 node 9 911 HUMTEN PEA 1 node 14 912 HUMTEN PEA 1 node 17 913 HUMTENPEA 1_node_21 914 HUMTENPEA1__node 22 915 HUMTENPEA 1 node_25 916 HUMTENPEA 1 node_36 917 HUMTEN PEA_1 node_53 918 HUMTENPEA 1_node 54 919 WO 2005/116850 PCT/IB2005/002555 989 HUMTENPEA 1_ node_57 920 HUMTEN PEA 1 node_61 921 HUMTENPEA 1 node_62 922 HUMTEN PEA 1 node_67 923 HUMTEN PEA 1 node_68 924 HUMTENPEA 1 node_69 925 HUMTEN PEA 1 node 70 926 HUMTENPEA _1 node_72 927 HUMTENPEA 1 node 84 928 HUMTENPEA 1 node 85 929 HUMTENPEA 1 node 86 930 HUMTEN PEA 1 node 87 931 HUMTEN PEA 1 node 88 932 Table 3 - Proteins of interest ProteinN:ne SEQ ID No: Corresponding Trnscrpt(s) HUMTEN PEA 1 P5 934 HUMTEN PEAlT4 HUMTEN PEA 1 P6 935 HUMTEN PEAIT5 HUMTEN PEA 1 P7 936 HUMTEN PEA 1_T6 HUMTENPEA 1 P8 937 HUMTENPEA 1 T7 HUMTENPEA 1 P10O 938 HUMTENPEA_1 Ti 1 HUMTENPEA 1 P11 939 HUMTENPEA _ T14 HUMTEN PEA 1 P13 940 HUMTEN PEA 1 T16 HUMTENPEA 1 P14 941 HUMTENPEA_1 T17 HUMTENPEA 1 P15 942 HUMTENPEAIT18 HUMTENPEA 1 P16 943 HUMTEN PEA_1 T19 HUMTENPEA 1 P17 944 HUMTENPEA_1 T20 HUMTENPEA 1 P20 945 HUMTENPEA_1_T23 HUMTENPEA 1 P26 946 HUMTEN PEA 1_T32 WO 2005/116850 PCT/IB2005/002555 990 HUMTEN PEA 1 P27 947 HUMTENPEA 1 T35 HUMTENPEA 1 P28 948 HUMTENPEA 1 T36 HUMTENPEA 1 P29 949 HUMTENPEA 1 T37 HUMTEN PEA 1 P30 950 HUMTEN PEA 1 T39 HUMTEN PEA 1 P31 951 HUMTEN PEA 1 T40 HUMTENPEA 1 P32 952 HUMTENPEA 1 T41 These sequences are variants of the known protein Tenascin precursor (SwissProt accession identifier TENA_HUMAN; known also according to the synonyms TN; Hexabrachion; Cytotactin; Neuronectin; GMEM; JI; Miotendinous antigen; Glioma-associated 5 extracellular matrix antigen; GP 150-225; Tenascin-C; TN-C), SEQ ID NO: 933, referred to herein as the previously known protein. Protein Tenascin precursor is known or believed to have the following function(s): SAM (substrate-adhesion molecule) that appears to inhibit cell migration. May play a role in supporting the growth of epithelial tumors. Is a ligand for integrins alpha-8/beta-1, alpha-9/beta 10 1, alpha-v/beta-3 and alpha-v/beta-6. The sequence for protein Tenascin precursor is given at the end of the application, as "Tenascin precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein SNP p)osition(s) Onl Comment aminfo acid sequenctce 2008 Q -> E (in dbSNP:13321). /FTId=VAR_014665. 244 Missing 370 L->V 539 Q -> R 680 Q -> R 1066 R-> H 1600 - 1608 SGFTQGHQT -> LWLHPRASN 1677 L-> I 2054 F -> FLH WO 2005/116850 PCT/IB2005/002555 991 2055 W -> L 2140- 2143 YKGA-> TRG Protein Tenascin precursor localization is believed to be secreted; extracellular matrix. It has been investigated for clinical/therapeutic use in humans, for example as a target for 5 an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: DNA antagonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases 10 (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion, which are annotation(s) related to Biological Process; cell adhesion receptor; ligand binding or carrier; protein binding, which are annotation(s) related 15 to Molecular Function; and extracellular matrix, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBI Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. 20 Cluster HUMTEN can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 37 refer to weighted expression of ESTs in 25 each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 37 and Table 5. This cluster is overexpressed (at least at a minimum level) in the WO 2005/116850 PCT/IB2005/002555 992 following pathological conditions: a mixture of malignant tumors from different tissues, ovarian carcinoma, pancreas carcinoma and skin malignancies. Table 5 - Normal tissue distribution Nameni of Tissue MNtaer adrenal 0 bladder 82 bone 867 brain 41 colon 154 epithelial 87 general 83 head and neck 20 kidney 123 lung 97 lymph nodes 37 breast 96 muscle 7 ovary 0 pancreas 10 prostate 38 skin 32 stomach 146 Thyroid 0 uterus 195 5 Table 6 - P values and ratios for expression in cancerous tissue adrenal of is 4.2e-0 sue4.6e-01 2. P e-SP1 34 2.9e-0 R42.7 adrenal 4 .2e-01 4.6e-01 2.1e-01 3.4 2.9e-01 2.7 WO 2005/116850 PCT/IB2005/002555 993 bladder 2.8e-01 4.2e-01 3.5e-01 1.6 6.0e-01 1.1 bone 4.7e-01 7.4e-01 3.2e-01 0.3 9.8e-01 0.4 brain 5.5e-02 8.0e-02 1.7e-06 2.3 5.1le-04 1.5 colon 6.5e-01 7.6e-01 9.4e-01 0.5 9.8e-01 0.4 epithelial 2.4e-02 4.2e-01 4.2e-03 1.3 7.5e-01 0.8 general 8.7e-05 3.2e-02 1.8e-09 1.7 2.1e-02 1.1 head and neck 2.3e-01 4.0e-01 9.9e-02 3.5 4.2e-01 1.6 kidney 7.0e-01 8.2e-01 6.2e-01 1.0 8.8e-01 0.6 lung 5.1e-01 6.5e-01 1.5e-01 1.5 3.2e-01 1.1 lymph nodes 3.3e-01 7.6e-01 3.2e-01 2.0 7.9e-01 0.8 breast 1.0e-01 2.3e-01 1.4e-01 1.6 5.3e-01 1.0 muscle 4.0e-02 1.7e-02 1.5e-01 5.6 1.5e-01 3.2 ovary 1.4e-01 1.7e-01 7.0e-04 3.4 6.4e-03 2.6 pancreas 7.5e-02 2.0e-01 5.8e-03 5.3 2.8e-02 3.6 prostate 8.4e-01 8.6e-01 3.6e-01 1.2 4.4e-01 1.1 skin 2.8e-01 1.7e-01 3.2e-05 5.6 5.5e-02 1.8 stomach 5.8e-01 7.5e-01 1 0.2 1 0.3 Thyroid 3.6e-01 3.6e-01 1 1.2 1 1.2 uterus 2.9e-01 7.4e-01 8.0e-01 0.6 9.9e-01 0.4 As noted above, cluster HUMTEN features 19 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Tenascin precursor. A description of each variant protein according to the present invention is now provided. 5 Variant protein HUMTEN_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEAIT4. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein 10 sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: WO 2005/116850 PCT/IB2005/002555 994 Comparison report between HUMTENPEA 1 P5 and TENAHUMAN VI: 1.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1lP5, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK 5 LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH 10 TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE 15 ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA 20 EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK 25 AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA 30 T corresponding to amino acids 1 - 1525 of TENAHUMANV1, which also corresponds to amino acids 1 - 1525 of HUMTEN_PEA lP5, a second amino acid sequence being at least WO 2005/116850 PCT/IB2005/002555 995 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI corresponding to amino acids 1526 - 1617 of 5 HUMTENPEAlP5, and a third amino acid sequence being at least 90 % homologous to TEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLE LRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADE GVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAI ATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTR 10 LVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATV DSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDL DSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLS PSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDK AQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLN 15 KITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHN GRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKG HEHSIQFAEMKLRPSNFRNLEGRRKRA corresponding to amino acids 1526 - 2201 of TENA_HUMAN_V1, which also corresponds to amino acids 1618 - 2293 of HUMTEN_PEA_1 PS, wherein said first amino acid sequence, second amino acid sequence 20 and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMTEN_PEAlP5, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for 25 TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI, corresponding to HUMTEN_PEA_1_P5. It should be noted that the known protein sequence (TENA_HUMAN; SEQ ID NO:933) has one or more changes than the sequence given at the end of the application and named as 30 being the amino acid sequence for TENA_HUMANV1 (SEQ ID NO:934). These changes were previously known to occur and are listed in the table below.
WO 2005/116850 PCT/IB2005/002555 996 Table 7 - Changes to TENAHUMANVI SNP pos'tionN~s on TN pe of a hange amino acid sequenceIeT 371 conflict 540 conflict The location of the variant protein was determined according to results from a number of 5 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN_PEA_1_P5 also has the following non-silent SNPs (Single 10 Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTENPEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). 15 Table 8 - Amino acid mutations SNP positions) Uni aminoacid Alternative amino acid(s) Previously known SNP? sequence 149 Q->* No 213 G-> S Yes 370 V -> L Yes 539 R-> Q Yes 605 V->I Yes 680 Q ->R Yes 842 V ->L No 850 D ->H Yes WO 2005/116850 PCT/IB2005/002555 997 851 L->V Yes 1066 R->H No 1534 T->M Yes 1769 L->I Yes 1873 A->T Yes 2100 Q ->E Yes 2122 K -> No 2130 Q -> No 2159 Q -> No 2265 K -> No 2291 K -> No 2291 K->Q No Variant protein HUMTENPEA_1_P5 is encoded by the following transcript(s): HUMTEN_PEAlT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA 1_T4 is shown in bold; this coding portion starts 5 at position 348 and ends at position 7226. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTENPEA 1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). 10 Table 9 - Nucleic acid SNPs SNP positin oon nucleotide Aliernative nucleic acid PrVIeviosy known SNP? sequence 115 T -> G Yes 123 A -> G Yes 315 C -> T Yes 434 C -> T Yes 503 C -> T Yes 542 G ->A Yes WO 2005/116850 PCT/IB2005/002555 998 623 A-> G Yes 792 C -> T No 984 G->A Yes 1043 A->G Yes 1455 G->T Yes 1963 G->A Yes 2156 A->G Yes 2160 G->A Yes 2386 A->G Yes 2396 A-> G Yes 2654 G->A No 2871 G-> T No 2895 G-> C Yes 2898 C ->G Yes 3005 A-> G No 3512 C ->T Yes 3544 G->A No 3635 A-> G Yes 4922 G->A No 4948 C ->T Yes 5652 T-> A Yes 5825 A->G Yes 5964 G->A Yes 6296 A -> G Yes 6368 C -> A Yes 6645 C -> G Yes 6712 A-> No 6736 A-> No 6824 G-> No 6872 C -> T Yes WO 2005/116850 PCT/IB2005/002555 999 7142 G-> No 7218 A -> No 7218 A->C No 7233 C -> G Yes 7234 C -> G Yes 7236 G-> No 7344 G->A Yes 7424 A->G No 7632 A->C No 7638 T->C No 7659 -> T No 7828 -> T No 7839 A->C No 8183 G->C Yes 8745 G->T Yes Variant protein HUMTEN_PEAlP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 5 HUMTENPEA_1_T5. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA lP6 and TENAHUMAN_V1: 10 1.An isolated chimeric polypeptide encoding for HUMTENPEA_1lP6, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC 15 EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG
VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG
Claims (52)
1. An isolated polynucleotide comprising a polynucleotide having a sequence selected from the group consisting of: RI 1723_PEAI _1TI5, RI 1723_PEA_I_T17, R11723_PEAITI9, RI1723_PEAlT20, RI 1723_PEA_I_T5, orRI1723_PEA 1 T6.
2. An isolated polynucleotide comprising a node having a sequence selected from the group consisting of: R11723_PEA_1 _node_13, Rl1723_PEA 1 node_16, R11723_PEA_1_node 19, R11723_PEA1_ node_2, RI1723_PEA 1 node_22, R11723_PEA 1_node_31, R11723_PEA 1 node 10, Rl1723_PEAInode_11, R11723_PEA _I node_15, R11723_PEA 1 node 18, R11723_PEA 1 node_20, R11723_PEA_1_node_21, Rl1723_PEA_1_node_23, R11723_PEA _node_24, RI1723_PEA_1_node_25, R11723_PEA_1_node_26, R11723_PEA_1_node_27, RI 1723_PEA 1 node 28, R11723_PEA_1 node_29, R11723_PEA_1_node_3, RI1723_PEA 1 node 30, R11723_PEA 1 node_4, R11723_PEA 1 node_5, R 1723_PEA_1_node_6, RI 1723_PEA_1 node_7 or RI 1723_PEA 1 Inode_8.
3. An isolated polypeptide comprising a polypeptide having a sequence selected from the group consisting of: RI 1723_PEA 1_P2, RI 1723_PEA_1 P6, RI 1723_PEA_1_P7, R11723 PEA_1 P13, orR11723 PEA 1 Pl0.
4. An isolated chimeric polypeptide encoding for R1 1723_PEA_1 P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR corresponding to amino acids 1 - 110 of R1 1723_PEAlP6, and a second amino acid sequence WO 2005/116850 PCT/IB2005/002555 1570 being at least 90 % homologous to MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 1 - 112 of Q8IXMO, which also corresponds to amino acids 111 222 ofRI 1723_PEA__1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
5. An isolated polypeptide encoding for a head of RI 1723_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R11723 PEA 1 P6.
6. An isolated chimeric polypeptide encoding for R1 1723_PEA 1 P6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q96AC2, which also corresponds to amino acids 1 - 83 of RI 1723_PEA _1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of R1 1723_PEAlP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
7. An isolated polypeptide encoding for a tail of R1 1723_PEA 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the WO 2005/116850 PCT/IB2005/002555 1571 sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEAlP6.
8. An isolated chimeric polypeptide encoding for R1 1723_PEA_1 P6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEF1VNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1 - 83 of Q8N2G4, which also corresponds to amino acids 1 - 83 of RI 1723_PEA_1 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of RI 1723_PEA_1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
9. An isolated polypeptide encoding for a tail of R1 1723_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEAlP6.
10. An isolated chimeric polypeptide encoding for R1 1723_PEA _1 lP6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24 - 106 of BAC85518, which also corresponds to amino acids 1 - 83 of R1 1723_PEA 1 P6, and a second amino acid WO 2005/116850 PCT/IB2005/002555 1572 sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of RI 1723_PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
11. An isolated polypeptide encoding for a tail of R 11723_PEA_1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723 PEA 1_P6.
12. An isolated chimeric polypeptide encoding for R1 1723 PEA_1 P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 1 - 64 of Q96AC2, which also corresponds to amino acids 1 - 64 of R1 1723_PEA 1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R1 1723_PEA_1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
13. An isolated polypeptide encoding for a tail of R1 1723_PEA__P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723_PEA_1_P7. WO 2005/116850 PCT/IB2005/002555 1573
14. An isolated chimeric polypeptide encoding for RI 1723_PEA_ 1 _P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 1 - 64 of Q8N2G4, which also corresponds to amino acids 1 - 64 of RI 1723_PEAI_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R1 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
15. An isolated polypeptide encoding for a tail of R1 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optiomnally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723_PEA_1_P7.
16. An isolated chimeric polypeptide encoding for R1 1723_PEA_1 P7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of R1 1723_PEA 1 P7, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG corresponding to amino acids 22 - 80 of BAC85273, which also corresponds to amino acids 6 64 of R1 1723_PEA_1 P7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R11723_PEA 1 P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. WO 2005/116850 PCT/IB2005/002555 1574
17. An isolated polypeptide encoding for a head of RI 1723_PEA_1 _P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI 1723_PEA_1_P7.
18. An isolated polypeptide encoding for a tail of RI 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723_PEA_1_P7.
19. An isolated chimeric polypeptide encoding for R1 1723_PEA_1 P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG corresponding to amino acids 24 - 87 of BAC85518, which also corresponds to amino acids 1 - 64 of R1 1723_PEA 1 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT corresponding to amino acids 65 - 93 of R1 1723_PEA _1 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
20. An isolated polypeptide encoding for a tail of R1 1723_PEA_1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R1 1723_PEA_1_P7.
21. An isolated chimeric polypeptide encoding for R1 1723_PEA 1 P13, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of RI 1723_PEA_1 P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most WO 2005/116850 PCT/IB2005/002555 1575 preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT corresponding to amino acids 64 - 84 of R1 1723_PEA 1 P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
22. An isolated polypeptide encoding for a tail of RI 1723_PEA_1 P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in RI 1723_PEA_1 P13.
23. An isolated chimeric polypeptide encoding for R1 1723_PEA1__P10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of RI 1723_PEA_1 P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of R1 1723_PEA_1 Pl0, wherein said first and second amino acid sequences are contiguous and in a sequential order.
24. An isolated polypeptide encoding for a tail of RI 1723_PEA_1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA 1_P10.
25. An isolated chimeric polypeptide encoding for R1 1723_PEA_1 P10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q8N2G4, which also corresponds to amino acids 1 - 63 of R1 1723_PEA_1 P10, and a second amino acid sequence being at least 70%, WO 2005/116850 PCT/IB2005/002555 1576 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of RI 1723_PEA 1_PI0, wherein said first and second amino acid sequences are contiguous and in a sequential order.
26. An isolated polypeptide encoding for a tail of RI 1723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA_1_Pl10.
27. An isolated chimeric polypeptide encoding for R1 1723_PEAIP10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of R1 1723_PEAiP10, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA corresponding to amino acids 22 - 79 of BAC85273, which also corresponds to amino acids 6 63 of R1 1723_PEA_1 P10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of R1 1723_PEA_1 _P10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order.
28. An isolated polypeptide encoding for a head of R1 1723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of R11723_PEA 1 P10. WO 2005/116850 PCT/IB2005/002555 1577
29. An isolated polypeptide encoding for a tail of RI 1723_PEA_ I Pl0, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723_PEA 1 Pl0.
30. An isolated chimeric polypeptide encoding for R1 1723_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 24 - 86 of BAC85518, which also corresponds to amino acids 1 - 63 of R11723_PEA 1 P10O, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK corresponding to amino acids 64 - 90 of RI 1723_PEA 1 P 10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
31. An isolated polypeptide encoding for a tail of RI 1723_PEA 1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723_PEA_1 P10.
32. An isolated oligonucleotide, comprising an amplicon selected from the group consisting of SEQ ID NOs: 975 or 978.
33. A primer pair, comprising a pair of isolated oligonucleotides capable of amplifying said amplicon of claim 32.
34. The primer pair of claim 33, comprising a pair of isolated oligonucleotides selected from the group consisting of: SEQ NOs 972 and 973; or 976 and 977. WO 2005/116850 PCT/IB2005/002555 1578
35. An antibody capable of specifically binding to an epitope of an amino acid sequence of any of claims 3-31.
36. The antibody of claim 35, wherein said amino acid sequence comprises said tail of claims 4-31.
37. The antibody of claims 35 or 36, wherein said antibody is capable of differentiating between a splice variant having said epitope and a corresponding known protein PSEC.
38. A kit for detecting ovarian cancer, comprising a kit detecting overexpression of a splice variant according to any of the above claims.
39. The kit of claim 38, wherein said kit comprises a NAT-based technology.
40. The kit of claim 39, wherein said kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
41. The kit of claim 38, wherein said kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
42. The kit of claim 38, wherein said kit comprises an antibody according to any of claims 35-37.
43. The kit of claim 42, wherein said kit further comprises at least one reagent for performing an ELISA or a Western blot.
44. A method for detecting ovarian cancer, comprising detecting overexpression of a splice variant according to any of the above claims. WO 2005/116850 PCT/IB2005/002555 1579
45. The method of claim 44, wherein said detecting overexpression is performed with a NAT-based technology.
46. The method of claim 44, wherein said detecting overexpression is performed with an immunoassay.
47. The method of claim 46, wherein said immunoassay comprises an antibody according to any of the above claims.
48. A biomarker capable of detecting ovarian cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof.
49. A method for screening for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
50. A method for diagnosing ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
51. A method for monitoring disease progression and/or treatment efficacy and/or relapse of ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
52. A method of selecting a therapy for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims and selecting a therapy according to said detection.
Applications Claiming Priority (60)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60/260,874 | 2001-01-12 | ||
US60/328,112 | 2001-10-11 | ||
US53912804P | 2004-01-27 | 2004-01-27 | |
US53912904P | 2004-01-27 | 2004-01-27 | |
US60/539,128 | 2004-01-27 | ||
US60/539,129 | 2004-01-27 | ||
US62092404P | 2004-10-22 | 2004-10-22 | |
US62097504P | 2004-10-22 | 2004-10-22 | |
US62091704P | 2004-10-22 | 2004-10-22 | |
US62087404P | 2004-10-22 | 2004-10-22 | |
US62065604P | 2004-10-22 | 2004-10-22 | |
US62091604P | 2004-10-22 | 2004-10-22 | |
US62067704P | 2004-10-22 | 2004-10-22 | |
US62091804P | 2004-10-22 | 2004-10-22 | |
US62100404P | 2004-10-22 | 2004-10-22 | |
US62085304P | 2004-10-22 | 2004-10-22 | |
US62097404P | 2004-10-22 | 2004-10-22 | |
US60/620,916 | 2004-10-22 | ||
US60/620,677 | 2004-10-22 | ||
US60/620,656 | 2004-10-22 | ||
US60/620,975 | 2004-10-22 | ||
US60/620,853 | 2004-10-22 | ||
US60/620,974 | 2004-10-22 | ||
US60/621,004 | 2004-10-22 | ||
US60/920,918 | 2004-10-22 | ||
US60/620,917 | 2004-10-22 | ||
US60/620,924 | 2004-10-22 | ||
US62113104P | 2004-10-25 | 2004-10-25 | |
US60/621,131 | 2004-10-25 | ||
US62201704P | 2004-10-27 | 2004-10-27 | |
US62232004P | 2004-10-27 | 2004-10-27 | |
US60/622,320 | 2004-10-27 | ||
US60/622,017 | 2004-10-27 | ||
US62812304P | 2004-11-17 | 2004-11-17 | |
US62823104P | 2004-11-17 | 2004-11-17 | |
US62819004P | 2004-11-17 | 2004-11-17 | |
US62811204P | 2004-11-17 | 2004-11-17 | |
US62813404P | 2004-11-17 | 2004-11-17 | |
US62816704P | 2004-11-17 | 2004-11-17 | |
US62815604P | 2004-11-17 | 2004-11-17 | |
US62811104P | 2004-11-17 | 2004-11-17 | |
US62817804P | 2004-11-17 | 2004-11-17 | |
US62810104P | 2004-11-17 | 2004-11-17 | |
US62814504P | 2004-11-17 | 2004-11-17 | |
US62825104P | 2004-11-17 | 2004-11-17 | |
US60/628,156 | 2004-11-17 | ||
US60/628,111 | 2004-11-17 | ||
US60/628,134 | 2004-11-17 | ||
US60/628,101 | 2004-11-17 | ||
US60/628,251 | 2004-11-17 | ||
US60/628,178 | 2004-11-17 | ||
US60/628,231 | 2004-11-17 | ||
US60/628,123 | 2004-11-17 | ||
US60/628,145 | 2004-11-17 | ||
US60/628,167 | 2004-11-17 | ||
US60/628,190 | 2004-11-17 | ||
US63055904P | 2004-11-26 | 2004-11-26 | |
US60/630,559 | 2004-11-26 | ||
PCT/IB2005/002555 WO2005116850A2 (en) | 2004-01-27 | 2005-01-27 | Differential expression of markers in ovarian cancer |
US11/043,806 US7368548B2 (en) | 2004-01-27 | 2005-01-27 | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2005248530A1 true AU2005248530A1 (en) | 2005-12-08 |
Family
ID=37054357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2005248530A Abandoned AU2005248530A1 (en) | 2004-01-27 | 2005-01-27 | Differential expression of markers in ovarian cancer |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1721257A2 (en) |
AU (1) | AU2005248530A1 (en) |
CA (1) | CA2554703A1 (en) |
WO (1) | WO2005116850A2 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10059756B2 (en) | 2006-11-02 | 2018-08-28 | Acceleron Pharma Inc. | Compositions comprising ALK1-ECD protein |
EP3181580A1 (en) | 2006-11-02 | 2017-06-21 | Acceleron Pharma Inc. | Alk1 receptor and ligand antagonists and uses thereof |
US8642031B2 (en) | 2006-11-02 | 2014-02-04 | Acceleron Pharma, Inc. | Antagonists of BMP9, BMP10, ALK1 and other ALK1 ligands, and uses thereof |
EP3398966A1 (en) | 2008-05-02 | 2018-11-07 | Acceleron Pharma, Inc. | Methods and compositions for modulating angiogenesis and pericyte composition |
CN102245640B (en) | 2008-12-09 | 2014-12-31 | 霍夫曼-拉罗奇有限公司 | Anti-PD-L1 antibodies and their use to enhance T-cell function |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7022821B1 (en) * | 1998-02-20 | 2006-04-04 | O'brien Timothy J | Antibody kit for the detection of TADG-15 protein |
-
2005
- 2005-01-27 WO PCT/IB2005/002555 patent/WO2005116850A2/en active Application Filing
- 2005-01-27 CA CA002554703A patent/CA2554703A1/en not_active Abandoned
- 2005-01-27 EP EP05780004A patent/EP1721257A2/en not_active Ceased
- 2005-01-27 AU AU2005248530A patent/AU2005248530A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
CA2554703A1 (en) | 2005-12-08 |
WO2005116850A2 (en) | 2005-12-08 |
EP1721257A2 (en) | 2006-11-15 |
WO2005116850A3 (en) | 2010-06-17 |
WO2005116850A9 (en) | 2006-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7368548B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer | |
US7553948B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of ovarian cancer | |
US20060046257A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of lung cancer | |
US7345142B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of cardiac disease | |
US20060014166A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of endometriosis | |
US20060147946A1 (en) | Novel calcium channel variants and methods of use thereof | |
US20090215042A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis | |
US20060263786A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer | |
WO2006054297A2 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis | |
US20100068736A1 (en) | Novel Nucleotide and Amino Acid Sequences, and Assays and Methods of Use Thereof for Diagnosis of Lung Cancer | |
US20090215046A1 (en) | Novel nucleotide and amino acid sequences, and assays methods of use thereof for diagnosis of colon cancer | |
AU2005248530A1 (en) | Differential expression of markers in ovarian cancer | |
US7528243B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer | |
US20050186600A1 (en) | Polynucleotides encoding novel UbcH10 polypeptides and kits and methods using same | |
US7714100B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of cardiac disease | |
WO2010061393A1 (en) | He4 variant nucleotide and amino acid sequences, and methods of use thereof | |
CA2555509A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of lung cancer | |
AU2005207882A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer | |
EP1954826A2 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis | |
US7906635B2 (en) | Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of ovarian cancer | |
WO2006043271A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis | |
EP1749025A2 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer | |
AU2005207625A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of cardiac disease | |
AU2005207883A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer | |
CA2554707A1 (en) | Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MK1 | Application lapsed section 142(2)(a) - no request for examination in relevant period |