EP1721257A2 - Differential expression of markers in ovarian cancer - Google Patents

Differential expression of markers in ovarian cancer

Info

Publication number
EP1721257A2
EP1721257A2 EP05780004A EP05780004A EP1721257A2 EP 1721257 A2 EP1721257 A2 EP 1721257A2 EP 05780004 A EP05780004 A EP 05780004A EP 05780004 A EP05780004 A EP 05780004A EP 1721257 A2 EP1721257 A2 EP 1721257A2
Authority
EP
European Patent Office
Prior art keywords
pea
amino acid
amino acids
homologous
acid sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP05780004A
Other languages
German (de)
French (fr)
Inventor
Gad S. Cojocaru
Sarah Pollock
Zurit Levine
Alexander Diber
Guy Kol
Amir Toporik
Rotem Sorek
Dvir Dahary
Michal Ayalon-Soffer
Pinchas Akiva
Amit Novik
Yossi Cohen
Osnat Sella-Tavor
Shira Walach
Shirley Sameah-Greenwald
Ronen Shemesh
Maxim Shklar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Compugen Ltd
Original Assignee
Compugen Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compugen Ltd filed Critical Compugen Ltd
Publication of EP1721257A2 publication Critical patent/EP1721257A2/en
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification

Definitions

  • the present invention is related to novel nucleotide and protein sequences that are diagnostic markers for ovarian cancer, and assays and methods of use thereof.
  • Ovarian cancer causes more deaths than any other cancer of the female reproductive system.
  • An estimated 25,580 new cases will be diagnosed during 2004 in the United States, and approximately 16,090 of these women will die of the disease.
  • 70% to 80% of patients will ultimately succumb to disease that is diagnosed in iate stages.
  • ovarian cancer is diagnosed in stage I, more than 90% of patients can be cured with conventional surgery and chemotherapy.
  • stage I Detection of a greater fraction of ovarian cancers at an early stage might significantly affect survival.
  • CA-125 is a member of the epithelial sialomucins markers group and is the most well documented and the best performing single marker from this group.
  • CA- 125 Another name for CA- 125 is mucin 16, and although it is a membrane protein, it can be found in the serum. Its greatest sensitivity is achieved for serous and emdometrioid ovarian tumors compared to mucinous or clear cell tumors. Other than diagnosis, it can be used for disease monitoring (Eur J Gynaecol Oncol. 2000;21(l):64-9). In about 70% of patients, a rising level of CA-125 may be the first indication of relapse, predating clinical relapse by a median of 4 months. The serum concentration of CA-125 is elevated by the vascular invasion, tissue destruction and inflammation associated with malignant disease and is elevated in over 90% of those women with advanced ovarian cancer.
  • CA-125 is not specific to ovarian cancer. It is elevated in 40% of all patients with advanced intra-abdominal malignancy. Levels can also be elevated during menstruation or pregnancy and in other benign conditions such as cndometriosis, peritonitis or cirrhosis, particularly with ascites. CA- 125 is not a marker that can be detected through use of urine samples due to a high molecular weight. There are other ovarian cancer markers originating from epithelial ucins but none can replace CA- 125, due to poorer specificity and sensitivity. These other markers may prove complementary to CA-125.
  • CA-50, CA 54-61 , C ⁇ -195 and CA 19-9 all appear to have greater sensitivity for detection of mucinous tumors while STN and TAG-72 have better sensitivity for detection of clear cell tumors (Dis Markers. 2004;20(2):53-70).
  • Kallikreins, a family of serine proteases, and other protease-related proteins are also potential markers for ovarian cancer. Indeed, the entire family of kallikreins map to a region on chromosome 19q which is shown to be amplified in ovarian cancers. In particular, kallikrein 6 (protease M) and kallilrein 10 have been reported to have sensitivity up to 75% and specificity up to 100%.
  • MMPs Matrix metalloproteinases
  • Cathepsin L a cystein protease
  • CA-125 a cystein protease
  • Hormones have a role in normal ovarian physiology. Therefore, it is not surprising that hormones, and growth and inhibition factors as well, are suitable for ovarian cancer detection. Measurements of fragments of gonadotropin in the urine were found to have sensitivity up to 83% and specificity up to 92% for detecting ovarian cancer. Inhibins, members of the Transforming Growth Factors (TGF) beta superfamily, have been shown to have a diagnostic value in the detection of granulosa cell tumor, a relatively uncommon type of ovarian cancer, associated with better prognosis overall.
  • TGF Transforming Growth Factors
  • Serum inhibin is an ovarian product which decreases to non detectable levels after menopause, however, certain ovarian cancers (mucinous carcinomas and sex cord stromal tumours such as granulosa cell tumours) continue to produce inhibin. Studies have shown that that inhibin assays which detect all inhibin forms (as opposed to test detecting specific members of the inhibins family) provide the highest sensitivity/specificity characteristics as an ovarian cancer diagnostic test (Mol Cell Endocrinol. 2002 May 31 ; 191 ( 1):97- 103). Measurement of serum TGF-alpha itself was found to have sensitivity up to 70% and specificity of 89% in early stage disease.
  • the growth factor Mesothelin was also found to have diagnostic value but only for late stage disease. Immunohistochemistry is frequently used to assess the origin of tumor and staging when a pathological tissue sample is available. A few molecular markers have been shown to have diagnostic value in Immunohistochemistry of ovarian cancer, among them Epidermal Growth Factor, p53 and HER-2. P53 expression is much lower at early stage than late stage disease. P53 high expression is more typical or characteristic of invasive serous tumors than of mucinous tumors. No benign tumors are stained with P53. HER-2 is found in less than 25% of newly diagnosed ovarian cancers.
  • Ovarian cancer of type granulosa cell tumor has in general better prognosis with late relapse and/or metastasis formation. However, about 50% of patients still die within 20 years of diagnosis.
  • immunohistochemistry staining of estrogen receptor beta (ERb) and proliferating cell nuclear antigen (PCNA) showed that loss of ERb expression and high PCNA expression, characterized a subgroup of granulosa cell tumors with a worse outcome (Histopathology. 2003 Sep;43(3):254-62).
  • Survivin expression was also shown to be correlated to tumor grade, histologic type and mutant p53 but actual correlation to survival is questionable (Mod Pathol.
  • markers have been tested over the years for ovarian cancer detection. Some markers have shown only limited value while others are still under investigation. Among them are TPA and TPS, two cytokeratins whose inclusion in a panel with CA- 125 resulted in diagnoses with sensitivity up to 93% and specificity up to 98%. LPA - lysophosphatidic acid - was a very promising marker with one study demonstrating 98% sensitivity and 90% specificity. However, this marker is very unstable and requires quick processing and freezing of plasma, and therefore has limited usage. As previously described, no single marker has been shown to be sufficiently sensitive or specific to contribute to the diagnosis of ovarian cancer. Therefore combinations of markers in panel are being tested.
  • CA-125 is one of the panel members.
  • the best performing panel combinations so far have been CA-125 with CA 15-3 with sensitivity of 93% and specificity of 93%, CA-125 with CEA (which has very little sensitivity by itself) with specificity of 93% and specificity of 93%, and CA- 125 with TAG-72 and CA 15-3 where specificity becomes 95% but sensitivity is diminished (Dis Markers. 2004;20(2):53-70).
  • the background art does not teach or suggest markers for ovarian cancer that are sufficiently sensitive and/or accurate, alone or in combination.
  • the present invention overcomes these deficiencies of the background art by providing novel markers for ovarian cancer that are both sensitive and accurate. These markers are differentially expressed and preferably overexpressed in ovarian cancer specifically, as opposed to normal ovarian tissue. The measurement of these markers, alone or in combination, in patient (biological) samples provides information that the diagnostician can correlate with a probable diagnosis of ovarian cancer.
  • the markers of the present invention alone or in combination, show a high degree of differential detection between ovarian cancer and non-cancerous states.
  • suitable biological samples which may optionally be used with preferred embodiments of the present invention include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, ovarian tissue, any human organ or tissue, including any tumor or normal tissue, any sample obtained by lavage (for example of the bronchial system or of the female reproductive system), and also samples of in vivo cell culture constituents.
  • the biological sample comprises ovarian tissue and/or a serum sample and/or a urine sample and/or secretions or other samples from the female reproductive system and/or any other tissue or liquid sample.
  • the sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay.
  • signalp_hmm and “signalp_nn” refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor.
  • T - > C means that the SNP results in a change at the position given in the table from T to C.
  • M - > Q means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-).
  • a stop codon is indicated with an asterisk at the right hand side (*).
  • a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP.
  • the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence.
  • SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker.
  • Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein.
  • the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured.
  • microarray results those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology.
  • the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured.
  • the probe name begins with the name of the cluster (gene), followed by an identifying number. These probes are listed below with their respective sequences.
  • Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgu l 33plus.affx).
  • the probe names follow the Affymetrix naming convention.
  • NCBI Gene Expression Omnibus see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210).
  • TAA histograms The following list of abbreviations for tissues was used in the TAA histograms.
  • TAA Tumor Associated Antigen
  • TAA histograms represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below (the first word is the abbreviation while the second word is the full name):
  • nucleic acid sequences of the present invention refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below.
  • oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and/or for any other use.
  • ovarian cancer refers to cancers of the ovary including but not limited to Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells).
  • the tenn "marker” in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having ovarian cancer as compared to a comparable sample taken from subjects who do not have ovarian cancer.
  • the phrase "differentially present” refers to differences in the quantity of a marker present in a sample taken from patients having ovarian cancer as compared to a comparable sample taken from patients who do not have ovarian cancer.
  • a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays.
  • a polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present.
  • diagnosis means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity.
  • the "sensitivity” of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives”). Diseased individuals not detected by the assay are “false negatives.” Subjects who are not diseased and who test negative in the assay are termed “true negatives.”
  • the "specificity” of a diagnostic assay is 1 minus the false positive rate, where the "false positive” rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
  • Diagnosing refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery.
  • the term “detecting” may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease.
  • a "biological sample obtained from the subject” may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below.
  • the term "level” refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention.
  • the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein).
  • Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage.
  • test amount refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of ovarian cancer.
  • a test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals).
  • a "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker.
  • a control amount of a marker can be the amount of a marker in a patient with ovarian cancer or a person without ovarian cancer.
  • a control amount can be either in absolute amount (e.g., microgram ml) or a relative amount (e.g., relative intensity of signals).
  • Detect refers to identifying the presence, absence or amount of the object to be detected.
  • a “label” includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochernical, or chemical means.
  • useful labels include 32 P, 35 S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin- strep tavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target.
  • the label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample.
  • the label can be inco ⁇ orated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., inco ⁇ oration of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin.
  • the label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly.
  • the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize.
  • the binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule.
  • the binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry.
  • Exemplary detectable labels include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads.
  • the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture.
  • Immunoassay is an assay that uses an antibody to specifically bind an antigen.
  • the immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen.
  • the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample.
  • Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein.
  • polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymo ⁇ hic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species.
  • immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein.
  • solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity).
  • a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an ammo acid sequence in the table below amino acid sequence comprising a sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
  • nucleic acid sequence compnsing a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
  • HUMEDF PEA 2 T5 HUMEDF PEA 2 T10 HUMEDF PEA 2 Ti l a nucleic acid sequence comprising a sequence in the table below:
  • HUMEDF PEA . 2 . node .. 6 HUMEDF PEA . 2 . node . 1 1 HUMEDF PEA _2_ node . 18 HUMEDF . PEA . . 2 . node . . 19 HUMEDF . . PEA . . 2 . node . .22 HUMEDF . PEA . 2 . node . . 2 HUMEDF , PEA . . 2 . node . . 8 HUMEDF . PEA 2_ node . 20
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • HSAPHOL T9 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
  • nucleic acid sequence compnsing a sequence in the table below.
  • an isolated polypeptide comprising an ammo acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and or
  • HSECADH node 52 HSECADH iode . .53 HSECADH_node_ .54 HSECADH_node_ . 57 HSECADH_node_ 60 HSECADH node 62 HSECADH_node_ 63 HSECADH node 1 HSECADH_node 1 HSECADH_node 11 HSECADH_node .
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • HUMGRP5E T4 HUMGRP5E T5 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below: m ⁇ m aRliyBligii HUMGRP5E P4 HUMGRP5E P5
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence compnsing a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below: Segment Name D56406 PEA 1 node 0 D56406 PEA 1 node 13 D56406 PEA 1 node 1 1 D56406 PEA 1 node 2 D56406 PEA 1 node 3 D56406 PEA 1 node 5 D56406 PEA 1 node 6 D56406 PEA 1 node 7 D56406 PEA 1 node 8 D56406 PEA 1 node 9
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • H53393 PEA 1 T10 H53393 PEA 1 Ti l H53393 PEA 1 T3 H53393 PEA 1 T9 27 a nucleic acid sequence comprising a sequence in the table below
  • H53393 PEA 1 P2 H53393 PEA 1 P3 H53393 PEA 1 P6 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
  • nucleic acid sequence comprising a sequence in the table below
  • an isolated polypeptide comprising an amino acid sequence m the table below. 2005/11685
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
  • nucleic acid sequence comprising a sequence in the table below
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • T39971 T10 T39971 T12 T39971 T16 T39971 T5 a nucleic acid sequence comprising a sequence in the table below:
  • T39971 _node . . 29 T39971 node . 3 T39971 _node . 30 T39971 node . 34 T39971 _node . . 35 T39971 _node . . 36 T39971 node . 4 T39971 node . 5 T39971 node . 8 T39971 node . 9
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence m the table below and/or
  • nucleic acid sequence comprising a sequence in the table below
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • nucleic acid sequence comprising a sequence in the table below:
  • T59832 node 1 T59832 node 7 T59832 node 29 T59832 node 39 T59832 node 2 T 9832 node 3 T59832 node 4 T59832 node 5 T59832 node 6 T59832 node 8 T59832 node 9 T59832 node 10 T59832 . node .. 11
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • HUMTEN PEA 1 P32 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide compnsing a nucleic acid sequence in the table below and/or
  • nucleic acid sequence comprising a sequence in the table below
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or Transcnpt N me i T46984 PEA 1 T2 T46984 PEA 1 T3 T46984 PEA 1 T12 T46984 PEA 1 T13 T46984 PEA 1 T14 T46984 PEA 1 T15 T46984 PEA 1 T19 T46984 PEA 1 T23 T46984 PEA 1 T27 T46984 PEA 1 T32 T46984 PEA 1 T34 T46984 PEA 1 T35 T46984 PEA 1 T40 T46984 PEA 1 T42 T46984 PEA 1 T43 T46984 PEA 1 T46 T46984 PEA 1 T47 T46984 PEA 1 T48 T46984 PEA 1 T51
  • nucleic acid sequence compnsing a sequence in the table below.
  • an isolated polypeptide comprising an amino acid sequence in the table below: According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence compnsmg a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: iranscj ⁇ t ⁇ air ⁇ i I * T481 19 T2 a nucleic acid sequence comprising a sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated chimeric polypeptide encoding for HSMUC1A_PEA_1_P63 comprising a first amino acid sequence being at least 90 % homologous to MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSV conesponding to amino acids 1 - 45 of MUC1 HUMAN, which also conesponds to amino acids 1 - 45 of HSMUC1A PEA J P63, and a second amino acid sequence being at feast 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence EEEVS ADQ VS VGASGVLGSFKEARNAPSFLSWSFSMGPSK conesponding to amino acids 46 - 85 of HSMUC 1 A PEAJ P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSMUC1A PEAJ P63 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUC 1A_PEA_1_P63.
  • an isolated chimeric polypeptide encoding for T46984 PEAJ P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T46984_PEA J P10 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of T46984 PEA 1 P10 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homolo gous to the sequence LMDQK in T46984_PEAJ_P10.
  • an isolated chimeric polypeptide encoding for T46984_PEAJ_P11 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984_PEA_1_P12 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T46984_PEA_1_P12.
  • an isolated chimeric polypeptide encoding for T46984 PEAJ JP21 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M corresponding to amino acids 1 - 1 of T46984_PEA_1_P21 , and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T46984 PEAJ P27 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RY
  • an isolated polypeptide encoding for a tail of T46984_PEAJ_P27 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984_PEA_1_P27.
  • an isolated chimeric polypeptide encoding for T46984_PEA_1_P32 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984 PEAJ P32 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984_PEA_1_P32.
  • an isolated chimeric polypeptide encoding for T46984 PEAJ P34 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984_PEAJ_P35 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GCVv SRQSREQHISSRRJKMEII TECQEKESRTIHSMRRKMEKK FI in T46984_PEA_1_P35.
  • an isolated chimeric polypeptide encoding for T46984_PEAJ JP38 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984_PEA J_P38 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T46984_PEAJ_P38.
  • an isolated chimeric polypeptide encoding for T46984 PEAJ P39 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLA conesponding to amino acids 1 - 160 of RIB2 HUMAN, which also conesponds to amino acids 1 - 160 of T46984_PEA_1_P39.
  • an isolated chimeric polypeptide encoding for T46984 PEA 1 P45 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984 PEAJ P45 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984 PEA 1 P45.
  • an isolated chimeric polypeptide encoding for T46984 PEAJ P46 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984 PEAJ P46 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_PEAJ_P46.
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P15 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M78530 PEAJ JP15 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15.
  • an isolated chimeric polypeptide encoding for M78530 PEA 1 P15 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEAJ_P15, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRJQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPR
  • an isolated polypeptide encoding for a head of M78530 PEA J P15 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS ofM78530_PEAJ_P15.
  • An isolated polypeptide encoding for a tail of M78530JPEAJ JP15 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15.
  • an isolated chimeric polypeptide encoding for M78530_PEAJ_P16 comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTmQVFWIAPPAGTGCVILj ASIVQK-RIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHP DYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P16 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P16 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of
  • M78530 PEA 1 P16 and a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYI ⁇ TFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNV conesponding to amino acids 1 - 214 of 094862, which also conesponds to amino acids 84 - 297 of M78530_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of M78530_PEA 1 P16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530_PEA_1_P 16.
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P17 comprising a first amino acid sequence being at least 90 % homologous to
  • M78530_PEAJ_P17 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of M78530_PEA_1_P17 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P17 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M78530_PEAJ_P17 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
  • an isolated chimeric polypeptide encoding for M78530_PEA_1_P17 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEG YCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEAJ_P17, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKJIIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTH
  • an isolated polypeptide encoding for a headofM78530_PEA_l_P17 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated polypeptide encoding for a tail of M78530_PEA_1_P17 comprising a polypeptide being at least 1(1%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
  • an isolated chimeric polypeptide encoding for T48119_P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T48119_P2 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for T39971 JP6, comprising a first amino acid sequence being at least 90 %> homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG corresponding to amino acids 1 - 276 of VTNC HUMAN, which also corresponds to amino acids 1 - 276 of T39971 P6, and a second amino acid sequence being at least 70%
  • an isolated polypeptide encoding for a tail of T39971 P6, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%. and most preferably at least about 95%> homologous to the sequence TQGWGD in T39971_P6.
  • an isolated chimeric polypeptide encoding for T39971_P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T39971_P9 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T39971 P1 1 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T39971_P 1 1 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T39971 J? l 1 comprising a first amino acid sequence being at least 90 %> homologous to
  • DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of Q9BSH7, which also conesponds to amino acids 327 - 363 of T39971 P11, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of T39971 P11 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T39971 P12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T39971_P12 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971 JP12.
  • an isolated chimeric polypeptide encoding for T39971_P12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T39971 P12 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VPGAVGQGRKHLGRV in T39971 P12.
  • an isolated chimeric polypeptide encoding for Z44808_PEA_1_P5 comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNWIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSS
  • ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02_HUMAN which also conesponds to amino acids 1 - 441 of Z44808_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR conesponding to amino acids 442 - 464 of Z44808_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of Z44808 PEAJ P5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEA_1_P5.
  • an isolated chimeric polypeptide encoding for Z44808 PEA J_P6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for Z44808 PEAJ P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z44808_PEA_1_P7 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLWLRGKVSFYCF in Z44808_PEA_1_P7.
  • an isolated chimeric polypeptide encoding for Z44808 PEAJ P1 1 comprising a first amino acid sequence being at least 90 % homologous to
  • DGKADTKKRHTPRGHAESTSNRQPRKQG conesponding to amino acids 188 - 446 of SM02 HUMAN, which also conesponds to amino acids 171 - 429 of Z44808_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of Z44808 PEA 1 P11 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to - 170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of S67314 PEA J_P4 comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEAJ P4.
  • an isolated chimeric polypeptide encoding for S67314_PEA_1_P5 comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 1 16 of S67314_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homolog
  • an isolated polypeptide encoding for a tail of S67314 PEA J P5 comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYJ NHGWEELRVG KSIV in S67314_PEA_1_P5.
  • an isolated chimeric polypeptide encoding for S67314 PEAJ P5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of S67314_PEA_1_P5 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSY NHG WEELRVG KSIV in S67314_PEA_1_P5.
  • an isolated chimeric polypeptide encoding for S67314 PEAJ P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of S67314 PEA J P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a tail of S67314_PEA_1_P6 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MEKLQLRNVK in S67314 PEA 1 P6.
  • an isolated chimeric polypeptide encoding for S67314 PEA J P7 comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of FABH_HUMAN, which also conesponds to amino acids 1 - 24 of S67314 PEA J P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314 PEAJ P7, and a third amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of S67314_PEA_1_P7 comprising an amino acid sequence being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314JPEAJ P7.
  • an isolated chimeric polypeptide encoding for S67314_PEA_1_P7 comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of AAP35373, which also conesponds to amino acids 1 - 24 of S67314_PEA_1_P7, second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314_PEA_1_P7, and a third amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for an edge portion of S67314 PEAJ P7 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314_PEAJ_P7.
  • an isolated chimeric polypeptide encoding for Z39337 PEA 2 PEA J P4 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWLPLSGAA conesponding to amino acids 1 - 9 of Z39337J ⁇ A 2 PEA J P4, and a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of Z39337_PEA_2_PEA_1_P4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWLPLSGAA of Z39337_PEA_2_PEA_1_P4.
  • an isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA_1_P9 comprising a first amino acid sequence being at least 90 % homologous to
  • Z39337_PEA_2_PEAJ_P9 and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence Q conesponding to amino acids 150 - 150 of Z39337_PEA_2_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P10 comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P10, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HUMPHOSLIP PEA 2 P10 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GKAGV in HUMPHOSLIP J > E A_2_P 12.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P31 comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P31 , and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG conesponding to amino acids 68 - 98 of HUMPHOSLIP PEA 2 P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order
  • an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P31 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG in HUMPHOSLlT_PEA_2 J > 31.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P33 comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYPNAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ conesponding to amino acids 1 - 183 of PLTPJTUMAN, which also corresponds to amino acids 1 - 183 of HUMPHOSLIP_PEA_2_P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to
  • an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P33 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP > EA_2_P33.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P34 comprising a first amino acid sequence being at least 90 % homologous to
  • HUMPHOSLIP_PEA_2_P34 and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence LWTSLLALTIPS conesponding to amino acids 206 - 217 of HUMPHOSLIP J ⁇ A_2_P34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2_P34 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LWTSLLALTIPS in HUMPHOSLIP > EA_2 > 34.
  • an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P35 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of HUMPHOSLIP PEA 2 P35 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a structure as follows (numbering according to HUMPHOSLIP_PEA_2_P35): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2JP35 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP > EA_2J > 35.
  • an isolated chimeric polypeptide encoding for T59832_P7 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of T59832_P7 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VRIFLALSLTL ⁇ VPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832_P9 comprising a first amino acid sequence being at least 90 % homologous to
  • GILT HUMAN which also conesponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832 P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832_P9 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832 P9.
  • an isolated chimeric polypeptide encoding for T59832_P12 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 12 - 141 of GILT_HUMAN, which also conesponds to amino acids 1 - 130 of T59832 P12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 173 - 261 of GILT_HUMAN,
  • an isolated chimeric polypeptide encoding for an edge portion of T59832 P12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832 P18 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT JIUMAN, which also conesponds to amino acids 1 - 44 of T59832_P18, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832_P18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P4 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of HSCP2 PEAJ P4 comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GGTSM in HSCP2 PEA J P4.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P8 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCP2 PEA 1 P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KCFQEHLEFGYSTAM in HSCP2_PEA_1_P8.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P14 comprising a first amino acid sequence being at least 90 %> homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKJJIASGLIGPL ⁇ CKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSC
  • an isolated polypeptide encoding for an edge portion of HSCP2 PEA _1_P14 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HWT having a structure as follows (numbering according to HSCP2 PEAJ P14): a sequence starting from any of amino acid numbers 621-x to 621; and ending at any of amino acid numbers 623 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P15 comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDK KEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALU KN ⁇ RIDTINLFPATLFDAYMVAQNPGEWMLSCQNL
  • an isolated chimeric polypeptide encoding for HSCP2 PEA J P2 comprising a first amino acid sequence being at least 90 %> homologous to
  • VHFHGHSFQYKH conesponding to amino acids 1 - 1007 of CERU_HUMAN which also conesponds to amino acids 1 - 1007 of HSCP2 PEAJ P16
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLRLTGEYGM conesponding to amino acids 1008 - 1017 of HSCP2 PEAJ P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCP2 PEAJ P16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%o homologous to the sequence LLRLTGEYGM in HSCP2_PEA_1_P16.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P22 comprising a first amino acid sequence being at least 90 % homologous to MK1LILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHE conesponding to amino acids 1 - 131 of CERU_HUMAN, which also conesponds to amino acids 1 - 131 of HSCP2 PEA J P22, a second amino acid sequence bridging amino acid sequence comprising of A, and a third amino acid sequence being at least 90 % homologous to VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQ
  • an isolated polypeptide encoding for an edge portion of HSCP2 PEA J P22 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EAV having a structure as follows (numbering according to HSCP2 PEA _1_P22): a sequence starting from any of amino acid numbers 131-x to 131; and ending at any of amino acid numbers 133 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSCP2 PEAJ P24 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MPLTMGKRNLFLLTP conesponding to amino acids 1 - 15 of HSCP2_PEA_1_P24, and a second amino acid sequence being at least 90 % homologous to
  • DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 262 - 1065 of CERU HUMAN, which also corresponds to amino acids 16 - 819 of HSCP2 PEA 1 P24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of HSCP2 PEAJ P24 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MPLTMGKRNLFLLTP of HSCP2 PEA J J > 24.
  • an isolated chimeric polypeptide encoding for HSCP2_PEA_1_P25 comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRJY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTPNLFPATLFDAYMVAQNPGEWMLS
  • CERU HUMAN which also conesponds to amino acids 1 - 621 of HSCP2_PEAJ_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence CKYCIIHQSTKLF conesponding to amino acids 622 - 634 of HSCP2 PEAJ P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCP2 PEA J P25 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CKYCIIHQSTKLF in HSCP2 PEA 1 P25.
  • an isolated chimeric polypeptide encoding for HSCP2 PEA _1_P33 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCP2 PEA J P33 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC in HSCP2_PEA_1_P33.
  • HUMTEN PEA J P5 wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for an edge portion of HUMTEN PEA J P5 comprising an amino acid sequence being at least 70%, optionally at least about 80%), preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
  • HUMTEN_PEA_1_P5 an isolated chimeric polypeptide encoding for HUMTEN PEAJ P6, comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT in HUMTEN_PEA_1_P7.
  • an isolated chimeric polypeptide encoding for HUMTEN PEAJ P8 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA J_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TT, having a structure as follows: a sequence starting from any of amino acid numbers 1525-x to 1525; and ending at any of amino acid numbers 1526+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMTEN PEA J P10 comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCD
  • an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P10 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LT, having a structure as follows: a sequence starting from any of amino acid numbers 1252-x to 1252; and ending at any of amino acid numbers 1253+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA J P13 comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCD
  • HUMTEN_PEA_1_P13 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of HUMTEN PEA 1 P13 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VT, having a structure as follows: a sequence starting from any of amino acid numbers 1343-x to 1343; and ending at any of amino acid numbers 1344+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMTEN PEAJ P14 comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFA
  • an isolated polypeptide encoding for a tail of HUMTEN_PEAJ_P 14 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF in HUMTEN PEA J P14.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA J P15 comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCD
  • an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P15 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P16 comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQC
  • an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEAJ_P16 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMTEN PEA _1_P17 comprising a first amino acid sequence being at least 90 % homologous to
  • HUMTEN PEAJ P17 and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST conesponding to amino acids 2026 - 2067 of HUMTEN PEAJ P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMTEN PEAJ P17 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P20 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN PEAJ P20 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence NAALHVYI in HUMTEN_PEA_1_P20.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P26 comprising a first amino acid sequence being at least 90 %> homologous to
  • HUMTEN_PEAJ_P26 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMTEN PEAJ P26 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTVNKQERTEKSHDSGVFFSQG in HUMTENJ ⁇ AJ JP26.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P27 comprising a first amino acid sequence being at least 90 % homologous to
  • T conesponding to amino acids 1 - 1344 of TENA_HUMAN_V1 which also conesponds to amino acids 1 - 1344 of HUMTEN PEA J P27
  • a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%. and most preferably at least 95% homologous to a polypeptide having the sequence Gl conesponding to amino acids 1345 - 1346 of HUMTEN PEAJ P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for HUMTEN PEAJ P28 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P28 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ in HUMTEN .
  • an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P29 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN PEAJ P29 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GESALSFLQTLG in HUMTEN_PEA_1_P29.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P30 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P30 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ELCISASLSQPALEGP in HUMTEN PEAJ J > 30.
  • an isolated chimeric polypeptide encoding for FIUMTEN_PEA_1_P31 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P31 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EYHL in HUMTEN_PEA_1_P31.
  • an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P32 comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCD
  • an isolated polypeptide encoding for a tail of HUMOSTRO PEA J_PEA_1_P21 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VFLNFS in HUMOSTRO PEAJ PEA 1 P21.
  • an isolated chimeric polypeptide encoding for HUMOSTRO PEA J PEAJ P25 comprising a first amino acid sequence being at least 90 % homologous to
  • HUMOSTRO PEA _1_PEA_1_P25 and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence H conesponding to amino acids 32 - 32 of HUMOSTRO J > EAJ PEA J P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for HUMOSTRO PEA _1_PEA_1_P30 comprising a first amino acid sequence being at least 90 % homologous to
  • HUMOSTRO_PEAJ_PEAJ_P30 and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI conesponding to amino acids 32 - 39 of HUMOSTRO_PEAJ_PEAJ_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMOSTRO_PEA_1_PEAJ_P30 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%., more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTRO_PEA_1_PEAJ_P30.
  • an isolated chimeric polypeptide encoding for H61775_P16 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of H61775 P16 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16.
  • an isolated chimeric polypeptide encoding for H61775_P16 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of H61775 P16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%. homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for H61775_P17 comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL
  • an isolated chimeric polypeptide encoding for HSAPHOL_P2 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
  • PATPRPLSWLRAPTRLCLDGPSPVLCA conesponding to amino acids 1 - 27 of AAH21289, which also conesponds to amino acids 23 - 49 of HSAPHOL P2, and a third amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of HSAPHOL P2 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% > homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL J > 2.
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL P2 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA conesponding to amino acids 1 - 49 of HSAPHOL P2, second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of HSAPHOL P2 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL_P2.
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 %> homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 63 - 82 of AAH21289, which also conesponds to amino acids 1 - 20 of HSAPHOL P3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAAPJLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDM
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL_P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 1 - 20 of PPBT HUMAN, which also conesponds to amino acids 1 - 20 of HSAPHOL_P3, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOLJM comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HSAPHOL M comprising a first amino acid sequence being at least 90 % homologous to MGVSTVTAARJLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYlKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNR NVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAIL
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL P5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL P6 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL_P6 comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDS AGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVG1VTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL conesponding to amino acids 1 - 287 of PPBT HUMAN, which also conesponds to amino acids 1 - 287 of H
  • an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HSAPHOL_P7 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of HS APHOL P7 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence
  • an isolated chimeric polypeptide encoding for HSAPHOL_P7 comprising a first amino acid sequence being at least 90 %> homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK conesponding to amino acids 1 -
  • an isolated polypeptide encoding for a tail of HSAPHOL P7 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL J > 7.
  • an isolated chimeric polypeptide encoding for HSAPHOL_P8 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%., preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOLJP8.
  • an isolated chimeric polypeptide encoding for HSAPHOL P8 comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 1 - 288 of PPBT HUMAN, which also conesponds to amino acids 1 - 288 of HSA
  • an isolated polypeptide encoding for a tail of HSAPHOL P8 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8.
  • an isolated chimeric polypeptide encoding for HSAPHOL P8 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for T10888 PEAJ P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888 PEA 1 P2 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888_PEA_1_P2.
  • an isolated chimeric polypeptide encoding for T10888J ⁇ AJ JP4 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888_PEA_1_P4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888 PEAJ P4.
  • an isolated chimeric polypeptide encoding for T10888_PEAJ_P4 comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIG YS WYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL conesponding to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of T10888 PEAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at
  • an isolated polypeptide encoding for a tail of T10888_PEA_1_P4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_1_P4.
  • an isolated chimeric polypeptide encoding for T10888 PEAJ P5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888_PEAJ_P5 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888_PEA_1_P5.
  • an isolated chimeric polypeptide encoding for T10888 PEAJ P6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888 PEA 1 P6 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLD1 in T10888 PEA 1 P6.
  • an isolated chimeric polypeptide encoding for HSECADH P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSECADH P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH J > 9.
  • an isolated chimeric polypeptide encoding for HSECADH P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSECADH P9 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9.
  • an isolated chimeric polypeptide encoding for HSECADH_P9 comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of CADI HUMAN, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a
  • an isolated polypeptide encoding for a tail of HSECADH P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9.
  • an isolated chimeric polypeptide encoding for HSECADH P13 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for HSECADHJP13 comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGE
  • an isolated chimeric polypeptide encoding for HSECADH P14 comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE conesponding to amino acids 1 -
  • an isolated polypeptide encoding for a tail of HSECADH P14 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH . P14.
  • an isolated chimeric polypeptide encoding for HSECADH P14 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSECADH P14 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH .P14.
  • an isolated chimeric polypeptide encoding for HSECADH P14 comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISWTTGLDRE conesponding to amino acids
  • an isolated polypeptide encoding for a tail of HSECADH P14 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH J> 14.
  • an isolated chimeric polypeptide encoding for HSECADH P15 comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRT A YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII7, which also conesponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH
  • an isolated chimeric polypeptide encoding for HSECADH P15 comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII8, which also conesponds to amino acids 1 - 229 of HSECADH J 5, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH J 5, wherein
  • MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT HUMAN, which also conesponds to amino acids 1 - 44 of T59832 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a tail of T59832_P5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for T59832_P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T59832 P7 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832 P7 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 1 - 212 of BAC98466, which also conesponds to amino acids 1 - 212 of T59832 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a
  • an isolated polypeptide encoding for a tail of T59832_P7 comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832 P7 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832 P7, and a second amino acid sequence being at least 90 %> homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVC MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYV PWVTVNGVRIFLALSLTLIVPWSQGWTR
  • an isolated polypeptide encoding for a head of T59832 P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832_P7 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 1 - 212 of Q8WU77, which also corresponds to amino acids 1 - 212 of.T59832_P7, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85% > , more preferably at least 90% and most preferably at least 95% homo
  • an isolated polypeptide encoding for a tail of T59832_P7 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832_P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T59832 P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9.
  • an isolated chimeric polypeptide encoding for T59832 P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T59832 P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9.
  • an isolated chimeric polypeptide encoding for T59832_P9 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832_P9, second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of T59832_P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%. homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P9.
  • an isolated polypeptide encoding for a tail of T59832 P9 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832J 9.
  • an isolated chimeric polypeptide encoding for T59832_P9 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T59832_P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9.
  • an isolated chimeric polypeptide encoding for T59832_P12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832 P12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832 P12 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832_P12, second amino acid sequence being at least 90 % homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE conesponding to amino acids 1 - 40 of BAC85622, which also conesponds to amino acids 91 - 130 of T59832 P 12, third amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of T59832_P12 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832 JP 12.
  • an isolated chimeric polypeptide encoding for an edge portion of T59832 P12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of T59832 P12 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK in T59832J 2.
  • an isolated chimeric polypeptide encoding for T59832_P12 comprising a first amino acid sequence being at least 90 %> homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 1 - 130 of Q8WU77, which also conesponds to amino acids 1 - 130 of T59832_P12, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832_P12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832 P18 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832_P18 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832J 8 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832 P18 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832_P18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMGRP5E_P4 comprising a first amino acid sequence being at least 90 %> homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRP_HUMAN, which also conesponds to amino acids 1 - 127 of HUMGRP5E_P4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ conesponding to amino acids 135 - 148 of GRP_HUMAN, which also conesponds to amino acids 128 - 141 of HUMGRP5E P4, wherein said first and second amino acid sequences are con
  • an isolated chimeric polypeptide encoding for an edge portion of HUMGRP5EJM comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMGRP5E P5, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRP_HUMAN, which also conesponds to amino acids 1 - 127 of HUMGRP5E P5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%., more preferably at least 90%) and most preferably at least 95%.
  • an isolated polypeptide encoding for a head of Rl 1723_PEAJ_P6 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • Rl 1723_PEA_1_P6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%., optionally at least about 80%, preferably at least about 85%., more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723_PEA_1_P6.
  • an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P6 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREG EEDHVRPEVGPRPV VLGFGRSHDPPNLVGHP AYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA_1_P6.
  • an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723 PEAJ P6.
  • an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7.
  • an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P7 comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 ofRl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
  • Rl 1723 PEAJ P7 wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of Rl 1723_PEAJ_P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%o homologous to a polypeptide having the sequence
  • Rl 1723_PEA_1_P7 64 of Rl 1723_PEA_1_P7, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least
  • an isolated polypeptide encoding for a head of Rl 1723_PEA_1_P7 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence
  • an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P7 comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV
  • an isolated polypeptide encoding for a tail of Rl 1723J ⁇ AJ JP7 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723 PEAJ P7.
  • an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P13 comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P13, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of
  • Rl 1723 PEAJ P13 wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P13 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in Rl 1723_PEA_1_P13.
  • an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of Rl 1723 PEA 1 P10 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723J ⁇ AJ J > 10.
  • an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10 comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also conesponds to amino acids 1 - 63 of Rl 1723 PEA _1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEAJ P10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence - DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEA_1_P10.
  • an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of Rl 1723J ⁇ AJ P10, second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of Rl 1723 PEA 1 P10 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWVLG of Rl 1723_PEA_1_P10.
  • an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEA_1_P10.
  • an isolated chimeric polypeptide encoding for R11723_PEAJ_P10 comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ J 0, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of R l 1723_PEA_I_P 10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P10 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723J ⁇ AJ P10.
  • an isolated chimeric polypeptide encoding for D56406_PEA_1_P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of D56406 PEA J P2 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence encoding for ARWLTPVIPALWEAETGGSRGQEMETIPANT, conesponding to D56406_PEA_1_P2.
  • an isolated chimeric polypeptide encoding for D56406 PEA _1_P5 comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLC corresponding to amino acids 1 - 23 of NEUTJTUMAN, which also corresponds to amino acids 1 - 23 of D56406_PEA_1_P5, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for H53393_PEA_1_P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for H53393 PEAJ P3, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEK VYILRAQAINRRTGRPVEPESEFID HDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQWIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKJOXDFEKKKVYTL
  • an isolated polypeptide encoding for a tail of H53393_PEA_1_P6 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VMPLLKHHTE in H53393 PEA 1 P6.
  • an isolated chimeric polypeptide encoding for HSU40434 PEAJ JM2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HSU40434 PEA 1 P12 comprising a first amino acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ conesponding to amino acids 1 - 43 of Q9BTR2, which also conesponds to amino acids 1 - 43 of
  • HSU40434_PEA_1_P12 second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%. homologous to a polypeptide having the sequence E conesponding to amino acids 44 - 44 of HSU40434 PEA 1 P12, and a third amino acid sequence being at least 90 % homologous to AAPLDGVLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQK ⁇ WKLSTEQLRC LAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLL PAALACWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQE AARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPS WRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFYKKW
  • an isolated polypeptide encoding for an edge portion of HSU40434J ⁇ AJ JM2 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%. homologous to the sequence encoding for E, conesponding to HSU40434_PEA_1_P12.
  • an isolated chimeric polypeptide encoding for M77904_P2 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of M77904 P2 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEK ⁇ SCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPK
  • an isolated polypeptide encoding for a tail of M77904_P4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
  • NKIYWDLSNERAMSLTIEPRPVKQSRKFVPGCFVC LESRTCSSNLTLTSGSKHKISFLCD
  • an isolated chimeric polypeptide encoding for M77904 P4 comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKJ IDCMSGPCPFGEVQLQPSTSLLPTLNR
  • an isolated polypeptide encoding for a tail of M77904_P4 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for M77904 JM comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M77904 P4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904JM.
  • an isolated chimeric polypeptide encoding for M77904_P5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for M77904_P5 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for M77904_P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M77904_P7 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7.
  • an isolated chimeric polypeptide encoding for M77904_P7 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of M77904_P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904_P7.
  • an isolated chimeric polypeptide encoding for M77904_P7 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of M77904_P7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904JP7.
  • an isolated chimeric polypeptide encoding for Z25299 PEA 2 P2 comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK 1 HUMAN, which also conesponds to amino acids 1 - 1 1 of Z25299_PEA_2_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKQGMRAH conesponding to amino acids 132 - 139 of Z25299 J
  • an isolated polypeptide encoding for a tail of Z25299_PEA_2_P2 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GKQGMRAH in Z25299_PEA_2_P2.
  • an isolated chimeric polypeptide encoding for Z25299 PEA 2 P3, comprising a first amino acid sequence being at least 90 %> homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK1_HUMAN, which also conesponds to amino acids 1 - 131 of Z25299_PEA_2_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKRHHKQLRDQEVDPLEMRRHSAG conesponding to amino acids
  • an isolated chimeric polypeptide encoding for Z25299_PEA_2_P7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z25299_PEA_2_P7 comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RGSLGSAQ in Z25299J»EA_2JP7.
  • an isolated chimeric polypeptide encoding for Z25299 PEA 2 P10 comprising a first amino acid sequence being at least 90 % homologous to
  • an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein.
  • the amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion as described herein.
  • the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein.
  • kits for detecting ovarian cancer comprising a kit detecting overexpression of a splice variant as described herein.
  • the kit comprises a NAT-based technology.
  • the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein.
  • the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein.
  • the kit comprises an antibody as described herein.
  • the kit further comprises at least one reagent for performing an ELISA or a Western blot.
  • a method for detecting ovarian cancer comprising detecting overexpression of a splice variant as described herein.
  • detecting overexpression is performed with a NAT-based technology.
  • detecting overexpression is performed with an immunoassay.
  • the immunoassay comprises an antibody as described herein.
  • a biomarker capable of detecting ovarian cancer comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof.
  • a method for screening for ovarian cancer comprising detecting ovarian cane er cells with a biomarker or an antibody or a method or assay as described herein.
  • a method for diagnosing ovarian cancer comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein.
  • a method for monitoring disease progression and/or treatment efficacy and/or relapse of ovarian cancer comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein.
  • any of the above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%, more preferably at least about 90%, most preferably at least about 95% homology thereto.
  • all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described).
  • nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably.
  • Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages.
  • Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions.
  • Figure 3 is schematic summary of quantitative real-time PCR analysis.
  • Figure 4 is schematic presentation of the oligonucleotide based microanay fabrication.
  • Figure 5 is schematic summary of the oligonucleotide based microanay experimental flow.
  • Figure 6 shows cancer and cell- line vs. normal tissue expression for .
  • Figure 7 shows expression of segment ⁇ in H61775 in cancerous vs. non-cancerous tissues.
  • Figure 8 shows expression of segment ⁇ in H61775 in normal tissues.
  • Figure 9 shows cancer and cell- line vs. normal tissue expression.
  • Figure 10 is a histogram showing over expression of T juncl 1-17 transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 11 is a histogram showing expression of T juncl 1- 17 transcripts in normal tissues.
  • Figure 12 shows cancer and cell- line vs. normal tissue expression.
  • Figure 13 is a histogram showing over expression of HUMGRP5Ejunc3-7 transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 14 is a histogram showing expression of HUMGRP5Ejunc3-7 transcripts in normal tissues.
  • Figure 15 shows cancer and cell- line vs. normal tissue expression.
  • Figure 16 is a histogram showing over expression of Rl 1723 segl3 transcripts in cancerous ovary samples relative to the normal PM samples.
  • Figure 17 is a histogram showing expression of Rl 1723 segl transcripts in normal tissue samples.
  • Figure 18 is a histogram showing over expression of RI 1723 juncl 1- 18 transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 19 is a histogram showing expression of Rl 1723 juncl 1-18 transcripts in normal tissue samples.
  • Figure 20 shows cancer and cell- line vs. normal tissue expression.
  • Figure 21 is a histogram showing over expression of H53393 segl3 transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 22 is a histogram showing over expression of H53393 junc21-22 transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 23 shows cancer and cell- line vs. normal tissue expression.
  • Figure 24 shows cancer and cell- line vs. normal tissue expression.
  • Figure 25 shows cancer and cell- line vs. normal tissue expression.
  • Figure 26 is a histogram showing over expression of Z25299 juncl 3- 14-21 transcripts in cancerous ovary samples relative to the normal samples.
  • Figures 27A and 27B are histograms showing over expression of Z25299 seg20 transcripts in cancerous ovary samples relative to the normal samples (27A) or in normal tissues (27B).
  • Figures 28A and 28B are histograms showing over expression of Z25299 seg23 transcripts in cancerous ovary samples relative to the normal samples (28A) or in normal tissues (28B).
  • Figure 29 shows cancer and cell- line vs. normal tissue expression.
  • Figure 30 is a histogram showing down regulation of T39971 junc23-33R transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 31 is a histogram showing expression of T39971 junc23-33R transcripts in normal tissues.
  • Figure 32 shows cancer and cell- line vs. normal tissue expression.
  • Figures 33A and 33B are histograms showing down regulation of Z44808 junc8- l 1 transcripts in cancerous ovary samples relative to the normal samples (33A) or expression in normal tissues (33 B).
  • Figure 34 shows cancer and cell- line vs. normal tissue expression.
  • Figure 35 shows cancer and cell- line vs. normal tissue expression.
  • Figure 36 shows cancer and cell- line vs. normal tissue expression.
  • Figure 37 shows cancer and cell- line vs. normal tissue expression.
  • Figure 38 shows cancer and cell- line vs. normal tissue expression.
  • Figure 39 shows cancer and cell- line vs. normal tissue expression.
  • Figure 40 shows cancer and cell- line vs. normal tissue expression.
  • Figure 41 shows cancer and cell- line vs. normal tissue expression.
  • Figure 42 shows cancer and cell- line vs. normal tissue expression.
  • Figure 43 is a histogram showing differential expression of a variety of transcripts in cancerous ovary samples relative to the normal samples.
  • Figure 44
  • the present invention is of novel markers for ovarian cancer that are both sensitive and accurate.
  • Biomolecular sequences amino acid and/or nucleic acid sequences
  • uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease.
  • markers are able to distinguish between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells), alone or in combination. These markers are differentially expressed, and preferably overexpressed in ovarian cancer specifically, as opposed to nonnal ovarian tissue.
  • markers of the present invention show a high degree of differential detection between ovarian cancer and non- cancerous states.
  • the markers of the present invention, alone or in combination can be used for prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment monitoring of ovarian cancer.
  • these markers may be used for staging ovarian cancer and/or monitoring the progression of the disease.
  • the markers of the present invention, alone or in combination can be used for detection of the source of metastasis found in anatomical places other thenovary.
  • one or more of the markers may optionally be used in combination with one or more other ovarian cancer markers (other than those described herein).
  • a combination may be used to differentiate between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from either granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells).
  • markers are specifically released to the bloodstream under conditions of ovarian cancer (or one of the above indicative conditions), and/or are otherwise expressed at a much higher level and/or specifically expressed in ovarian cancer tissue or cells, and/or tissue or cells under one of the above indicative conditions.
  • the measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of ovarian cancer and/or a condition that it is indicative of a higher risk for ovarian cancer.
  • the present invention therefore also relates to diagnostic assays for ovarian cancer, and methods of use of such markers for detection of ovarian cancer, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample.
  • the present invention relates to bridges, tails, heads and/or insertions, and/or analogs, homologs and derivatives of such peptides.
  • a "tail" refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the conesponding known protein, while at least a second portion of the variant comprises the tail.
  • a "head” refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the conesponding known protein.
  • an edge portion refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein.
  • An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein.
  • a "bridge” may optionally be an edge portion as described above, but may also include a join between a head and a "known protein” portion of a variant, or a join between a tail and a "known protein” portion of a variant, or a join between an insertion and a "known protein” portion of a variant.
  • a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant.
  • the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between).
  • bridges cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below.
  • a bridge between two edges may optionally be described as follows: a bridge portion of CONTIG-NAME_Pl (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAME P1): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2.
  • this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a conesponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below).
  • this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto.
  • this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto.
  • this invention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this invention.
  • this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention.
  • this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known conesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction conelates with the presence of a splice variant in the biological sample.
  • this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex correlates with the presence of a splice variant nucleic acid sequence in the biological sample.
  • the splice variants described herein are non- limiting examples of markers for diagnosing ovarian cancer.
  • Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of ovarian cancer.
  • any marker according to the present invention may optionally be used alone or combination.
  • Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker.
  • such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi- quantitative measurement of any marker described herein to any other marker described herein, and or any other known marker, and/or any other marker.
  • the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene.
  • a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof may be featured as a biomarker for detecting ovarian cancer and/or an indicative condition, such that a biomarker may optionally comprise any of the above.
  • the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence conesponding to a splice variant protein as described herein
  • Any oligopeptide or peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges.
  • the present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides.
  • the present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, conesponding to a splice variant of the present invention as described above, optionally for any application.
  • Non-limiting examples of methods or assays are described below.
  • the present invention also relates to kits based upon such diagnostic methods or assays.
  • Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or artificially induced, either randomly or in a targeted fashion.
  • the present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %>, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 %> identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or man induced, either randomly or in a targeted fashion.
  • the present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention.
  • the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove.
  • a "nucleic acid fragment" or an "oligonucleotide” or a "polynucleotide” are used herein interchangeably to refer to a polymer of nucleic acids.
  • a polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above).
  • cDNA complementary polynucleotide sequence
  • genomic polynucleotide sequence e.g., a combination of the above.
  • composite polynucleotide sequences e.g., a combination of the above.
  • the phrase "complementary polynucleotide sequence” refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase.
  • genomic polynucleotide sequence refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome.
  • composite polynucleotide sequence refers to a sequence, which is composed of genomic and cDNA sequences.
  • a composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween.
  • the intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements.
  • Prefened embodiments of the present invention encompass oligonucleotide probes.
  • An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein).
  • an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein).
  • Oligonucleotides designed according to the teachings of the present invention can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis.
  • Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases.
  • the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention.
  • the oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodi ester linkage.
  • oligonucleotides are those modified at one or more of the backbone, intemucleoside linkages or bases, as is broadly described hereinunder.
  • Specific examples of preferred oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural intemucleoside linkages.
  • Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat.
  • Prefened modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'.
  • modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl intemucleoside linkages, mixed heteroatom and alkyl or cycloalkyl intemucleoside linkages, or one or more short chain heteroatomic or heterocyclic intemucleoside linkages.
  • morpholino linkages formed in part from the sugar portion of a nucleoside
  • siloxane backbones sulfide, sulfoxide and sulfone backbones
  • formacetyl and thioformacetyl backbones methylene formacetyl and thioformacetyl backbones
  • alkene containing backbones sulfamate backbones
  • sulfonate and sulfonamide backbones amide backbones
  • others having mixed N, O, S and CFfc component parts, as disclosed in U.S. Pat. Nos.
  • oligonucleotides which can be used according to the present invention, are those modified in both sugar and the intemucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target.
  • An example for such an oligonucleotide mimetic includes peptide nucleic acid (PNA).
  • PNA peptide nucleic acid
  • United States patents that teach the preparation of PNA compounds include, but are riot limited to, U.S. Pat. Nos. 5,539,082; 5,714,331 ; and 5,719,262, each of which is herein incorporated by reference.
  • Other backbone modifications, which can be used in the present invention are disclosed in U.S.
  • Oligonucleotides of the present invention may also include base modifications or substitutions.
  • "unmodified” or “natural” bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U).
  • Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5- halo particularly 5-bromo, 5- trifluoromethyl and other 5- substituted uracils and
  • Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine.
  • 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefened base substitutions, even more particularly when combined with 2'-0-methoxyethyl sugar modifications.
  • oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide.
  • moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac- glycerol or triethylammonium 1 ,2-di-O-hexadecyl-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a
  • oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and/or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity.
  • a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element.
  • cis acting regulatory element refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention.
  • the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transformed.
  • cell type-specific and/or tissue-specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron- specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci.
  • the nucleic acid constmct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom.
  • the nucleic acid constmct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication.
  • the nucleic acid constmct utilized is a shuttle vector, which can propagate both in E.
  • constmct comprises an appropriate selectable marker and origin of replication
  • the constmct according to the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a vims or an artificial chromosome.
  • suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co.
  • retroviral vector and packaging systems examples include those sold by Clontech, San Diego, Calif, includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter.
  • Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5'LTR promoter.
  • Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non-viral constmcts, such as adenovims, lentivims, He ⁇ es simplex I vims, or adeno- associated vims (AAV) and lipid-based systems.
  • viral or non-viral constmcts such as adenovims, lentivims, He ⁇ es simplex I vims, or adeno- associated vims (AAV) and lipid-based systems.
  • Useful lipids for lipid -mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Choi [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)].
  • the most prefened constructs for use in gene therapy are vimses, most preferably adenovimses, AAV, lentiviruses, or retrovimses.
  • a viral constmct such as a retroviral constmct includes at least one transc ⁇ tional promoter/enhancer or locus -defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post- translational modification of messenger.
  • Such vector constmcts also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the vims used, unless it is aheady present in the viral construct.
  • LTRs long terminal repeats
  • such a constmct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed.
  • the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention.
  • the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence.
  • constmcts will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof.
  • Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
  • Hybridization assays Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non- limiting examples of probes according to the present invention were previously described).
  • Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in- situ hybridization, primer extension, Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Cunent Opin. Biotechnol. 10:71-75).
  • kits containing probes on a dipstick setup and the like Other detection methods include kits containing probes on a dipstick setup and the like.
  • Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biobgical sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long.
  • the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions.
  • Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x l ⁇ 6 cpm 32 P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x IO 6 cpm 32 P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C.
  • a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x l ⁇ 6 cpm 32 P labeled probe, at 65 °C
  • moderate hybridization is
  • hybridization of short nucleic acids can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency;
  • hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected.
  • labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art.
  • a label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample.
  • Probes can be labeled according to numerous well known methods.
  • Non- limiting examples of radioactive labels include 3H, 14C, 32P, and 35S.
  • detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies.
  • oligonucleotides of the present invention can be labeled subsequent to synthesis, by inco ⁇ orating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross- linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin- conjugated streptavidin) or the equivalent.
  • streptavidin e.g., phycoerythrin- conjugated streptavidin
  • oligonucleotide probes when fluorescently- labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes.
  • probes can be labeled according to numerous well known methods.
  • radioactive nucleotides can be inco ⁇ orated into probes of the invention by several methods.
  • Non- limiting examples of radioactive labels include 3 H, 14 C, 3 P, and 35 S.
  • Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a- nucleotides and the like.
  • Probes of the invention can be constmcted of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA.
  • RNA ribonucleic acid
  • DNA deoxyribonucleic acid
  • NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example).
  • a "primer” defines an oligonucleotide which is capable of annealing to
  • Amplification of a selected, or target, nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8: 14 Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill.
  • Non- limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1 177; Lizardi et al., 1988, BioTechnology 6:1 197- 1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra).
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • SDA strand displacement amplification
  • amplification pair refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction.
  • amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below.
  • the oligos are designed to bind to a complementary sequence under selected conditions.
  • amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid.
  • RT-PCR is carried out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA.
  • the amplification of the differentially expressed nucleic acids is carried out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences.
  • the nucleic acid i.e. DNA or RNA
  • for practicing the present invention may be obtained according to well known methods.
  • Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed.
  • the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system.
  • the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Cunent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre- mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility.
  • Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity.
  • the polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non- limiting examples of these reactions are described in greater detail below).
  • the pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C.
  • PCR Polymerase Chain Reaction
  • PCR The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Multis et al, is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification.
  • This technology provides one approach to the problems of low target sequence concentration.
  • PCR can be used to directly increase the concentration of the target to an easily detectable level.
  • This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double- stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize.
  • the primers are extended with polymerase so as to form complementary strands.
  • the steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence.
  • the length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter.
  • Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refened to as “Ligase Amplification Reaction” (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids.
  • LCR four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture.
  • ligase will covalently link each set of hybridized molecules.
  • two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA.
  • LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990).
  • the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target- independent background signal.
  • the use of LCR for mutant screening is limited to the examination of specific nucleic acid positions.
  • Self-Sustained Synthetic Reaction (3SR/NASBA) The self- sustained sequence replication reaction (3SR) is a transcription- based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest.
  • the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest.
  • the use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs).
  • Q-Beta (Q ⁇ ) Replicase In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Q ⁇ replicase.
  • thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be perfonned by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere.
  • a successful diagnostic method must be very specific.
  • a straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction.
  • a PCR running at 85 % efficiency will yield only 21 % as much final product, compared to a reaction running at 100 % efficiency.
  • a reaction that is reduced to 50 % mean efficiency will yield less than 1 % of the possible product.
  • routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield.
  • 50 % mean efficiency it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive.
  • any background products that amplify with a better mean efficiency than the intended target will become the dominant products.
  • PCR has yet to penetrate the clinical market in a significant way.
  • LCR LCR must also be optimized to use different oligonucleotide sequences for each target sequence.
  • both methods require expensive equipment, capable of precise temperature cycling.
  • nucleic acid detection technologies such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences.
  • One method of the detection of allele -specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer.
  • An allele- specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence.
  • This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect.
  • a similar 3'-mismatch strategy is used with greater effect to prevent ligation in the LCR.
  • thermostable ligase Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target- independent background ligation products initiating the amplification.
  • the direct detection method may be, for example a cycling probe reaction (CPR) or a branched DNA analysis.
  • CPR cycling probe reaction
  • branched DNA analysis e.g., a method that does not amplify the signal exponentially is more amenable to quantitative analysis.
  • CPR Cycling probe reaction
  • Branched DNA involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes).
  • the detection of at least one sequence change may be accomplished by, for example restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymo ⁇ hism (SSCP) analysis or Dideoxy finge ⁇ rinting (ddF).
  • RFLP analysis restriction fragment length polymorphism
  • ASO allele specific oligonucleotide
  • DGGE/TGGE Denaturing/Temperature Gradient Gel Electrophoresis
  • SSCP Single-Strand Conformation Polymo ⁇ hism
  • ddF Dideoxy finge ⁇ rinting
  • nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing.
  • a handful of methods have been devised to scan nucleic acid segments for mutations.
  • One option is to detennine the entire gene sequence of each test sample (e.g., a bacterial isolate). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest.
  • amplified material e.g., PCR reaction products
  • a given segment of nucleic acid may be characterized on several other levels.
  • the size of the molecule can be determined by electrophoresis by comparison to a known standard n on the same gel.
  • a more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map.
  • the presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain- terminating nucleotide analogs.
  • Restriction fragment length polymorphism For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymo ⁇ hism [RFLP] analysis). Single point mutations have been also detected by the creation or destruction of RFLPs.
  • RFLP Restriction fragment length polymorphism
  • MCC Mismatch Chemical Cleavage
  • RFLP analysis When RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered.
  • Allele specific oligonucleotide ASO: If the change is not in a recognition sequence, then allele -specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis- match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations.
  • the method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles.
  • the ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test.
  • DGGE/TGGE Denaturing/Temperature Gradient Gel Electrophoresis
  • variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences because of the conesponding changes in their electrophoretic mobilities.
  • the fragments to be analyzed usually PCR products, are "clamped” at one end by a long stretch of GC base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands.
  • the attachment of a GC "clamp" to the DNA fragments increases the action of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature.
  • CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of mutations.
  • a technique analogous to DGGE termed temperature gradient gel electrophoresis (TGGE)
  • TGGE uses a thermal gradient rather than a chemical denaturant gradient.
  • TGGE requires the use of specialized equipment which can generate a temperature gradient pe ⁇ endicularly oriented relative to the electrical field.
  • TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to mnning the gel.
  • SSCP Single-Strand Conformation Polymorphism
  • the SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non- denaturing polyacrylamide gel, so that intra- molecular interactions can form and not be disturbed during the mn.
  • This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions.
  • Dideoxy fingerprinting (ddF) The dideoxy finge ⁇ rinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP.
  • a dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis.
  • ddF is an improvement over SSCP in terms of increased sensitivity
  • ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations).
  • all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed.
  • sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment.
  • SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments.
  • SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments.
  • the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs.
  • the ddF technique as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened.
  • the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable technique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self- sustained synthetic reaction, Q ⁇ -Replicase, cycling probe reaction, branched DNA, restriction fragment length polymo ⁇ hism analysis, mismatch chemical cleavage, heteroduplex analysis, allele- specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy finge ⁇ rinting.
  • any suitable technique including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self- sustained synthetic reaction, Q ⁇ -Replicase, cycling probe reaction, branched DNA, restriction fragment length polymo ⁇ hism analysis, mismatch chemical cleavage, heteroduplex analysis, allele- specific oligonucleotides
  • Detection may also optionally be performed with a chip or other such device.
  • the nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group.
  • This reporter group can be a fluorescent group such as phycoerythrin.
  • the labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already inco ⁇ orated into the nucleic acid, which is now bound to the probes attached to the chip.
  • the identity of the nucleic acid hybridized to a given probe can be determined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide polypeptide
  • peptide amino acid residues
  • protein polymer of amino acid residues.
  • the terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a conesponding naturally occurring amino acid, as well as to naturally occuning amino acid polymers.
  • Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins.
  • polypeptide polypeptide
  • peptide and protein
  • Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques.
  • Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry.
  • Solid phase polypeptide synthesis procedures are well known in the art and further described by John Monow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles.
  • the present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein.
  • the present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %>, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, optionally and preferably including the following: filtering on (this option filters repetitive or low-complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E value is 10, gap costs are 1 1 , 1 (initialization and extension), and number of alignments shown is 50.
  • NCBI National Center of Biotechnology Information
  • Nucleotide (nucleic acid) sequence homology/identity is preferably determined by using the BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and also preferably include having an E value of 10, filtering low complexity sequences and a word size of 1 1.
  • NCBI National Center of Biotechnology Information
  • the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as deletions, insertions or substitutions of one or more amino acids, either naturally occuning or artificially induced, either randomly or in a targeted fashion.
  • peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which may have, for example, modifications rendering the peptides more Sable while in a body or more capable of penetrating into cells.
  • Natural aromatic amino acids, T ⁇ , Tyr and Phe may be substituted for synthetic non- natural acid such as Phenylglycine, TIC, naphthylelanine (Nol), ring- methylated derivatives of Phe, halogenated derivatives of Phe or o-methyl- Tyr.
  • the peptides of the present invention may also include one or more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc).
  • amino acid or “amino acids” is understood to include the 20 naturally occurring amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and ornithine.
  • amino acid includes both D- and L amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention.
  • the peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble form, the peptides of the present invention preferably include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxyl-containing side chain.
  • the peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide characteristics, cyclic forms of the peptide can also be utilized.
  • the peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques.
  • Antibodies refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen).
  • the recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad- immunoglobulin variable region genes.
  • Antibodies exist, e.g., as intact immunoglobuhns or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab' and F(ab)' 2 fragments.
  • antibody also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain 005/116850
  • Fab the fragment which contains a monovalent antigen-binding fragment of an antibody molecule
  • Fab' the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain
  • two Fab' fragments are obtained per antibody molecule
  • (Fab')2 the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction
  • F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds
  • Fv defined as a genetically engineered fragment containing the variable region of the light chain
  • Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment.
  • Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods.
  • antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2.
  • This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments.
  • a thiol reducing agent optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages
  • an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly.
  • Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in hbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720].
  • the variable chains can be linked by an intermolecular disulfide bond or cross- linked by chemicals such as glutaraldehyde.
  • the Fv fragments comprise VH and VL chains connected by a peptide linker.
  • These single-chain antigen binding proteins are prepared by constmcting a stmctural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide.
  • the stmctural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli.
  • the recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains.
  • Such genes are prepared, for example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Larrick and Fry [Methods, 2: 106-10 (1991)].
  • Humanized forms of non-human (e.g., murine) antibodies are chimeric molecules of immunoglobuhns, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin.
  • Humanized antibodies include human immunoglobuhns (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non- human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity.
  • CDR complementary determining region
  • donor antibody non- human species
  • Fv framework residues of the human immunoglobulin are replaced by conesponding non-human residues.
  • Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences.
  • the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions correspond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence.
  • the humanized antibody optimally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al., Nature, 321 :522-525 (1986); Riechmann et al., Nature, 332:323- 329 (1988); and Presta, Cun. Op. Stmct. Biol., 2:593-596 (1992)].
  • Fc immunoglobulin constant region
  • a humanized antibody has one or more amino acid residues introduced into it from a source which is non- human. These non- human amino acid residues are often refened to as import residues, which are typically taken from an import variable domain. Humanization can be essentially performed following the method of Winter and co-workers [Jones et al., Nature, 321 :522-525 (1986); Riechmann et al., Nature 332:323-327 (1988); Verhoeyen et al., Science, 239:1534- 1536 (1988)], by substituting rodent CDRs or CDR sequences for the conesponding sequences of a human antibody.
  • humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the conesponding sequence from a non-human species.
  • humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies.
  • Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and Boemer et al.
  • human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g., mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene reanangement, assembly, and antibody repertoire. This approach is described, for example, in U.S. Pat. Nos.
  • the antibody of this aspect of the present invention specifically binds at least one epitope of the polypeptide variants of the present invention.
  • epitope refers to any antigenic determinant on an antigen to which the paratope of an antibody binds.
  • Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three dimensional stmctural characteristics, as well as specific charge characteristics.
  • a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and/or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site.
  • An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present inventbn in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination.
  • One or more unique sequence portions may optionally combine with one or more other non-contiguous portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope.
  • an immunoassay can be used to qualitatively or quantitatively detect and analyze markers in a sample.
  • This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex of the antibody bound to the marker in the sample.
  • purified protein markers can be used.
  • Antibodies that specifically bind to a protein marker can be prepared using any suitable methods known in the art. After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays.
  • Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme- linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g., U.S. Pat. Nos. 4,366,241 ; 4,376,1 10; 4,517,288; and 4,837,168).
  • EIA enzyme immune assay
  • ELISA enzyme- linked immunosorbent assay
  • RIA radioimmune assay
  • Western blot assay e.g., U.S. Pat. Nos. 4,366,241 ; 4,376,1 10; 4,517,288; and 4,837,168.
  • a sample obtained from a subject can be contacted with the antibody that specifically binds the marker.
  • the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample.
  • solid supports include but are not limited to glass or plastic in the form of, e.g., a microtiter plate, a stick, a bead, or a microbead.
  • Antibodies can also be attached to a solid support. After incubating the sample with antibodies, the mixture is washed and the antibody- marker complex formed can be detected. This can be accomplished by incubating the washed mixture with a detection reagent.
  • the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker- specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture.
  • incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like.
  • the immunoassay can be used to determine a test amount of a marker in a sample from a subject.
  • a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will form an antibody- marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above.
  • the amount of an antibody- marker complex can optionally be determined by comparing to a standard.
  • the test amount of marker need not be measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal.
  • RIA Radio-immunoassay
  • the number of counts in the precipitated pellet is proportional to the amount of substrate.
  • a labeled substrate and an unlabelled antibody binding protein are employed in an alternate version of the RIA.
  • a sample containing an unknown amount of substrate is added in varying amounts.
  • the decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample.
  • Enzyme linked immunosorbent assay This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate.
  • a substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate.
  • Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody.
  • Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced.
  • a substrate standard is generally employed to improve quantitative accuracy.
  • Western blot This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents.
  • Antibody binding reagents may be, for example, protein A, or other antibodies.
  • Antibody binding reagents may be radiolabelled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis.
  • Immunohistochemical analysis This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fluorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required.
  • Fluorescence activated cell sorting FACS: This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.
  • Radio-imaging Methods include but are not limited to, positron emission tomography (PET) single photon emission computed bmography (SPECT). Both of these techniques are non- invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, US Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby inco ⁇ orated by reference as if fully set forth herein.
  • Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, vimses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20- 50 consecutive amino acids derived from the polypeptide sequences of the present invention. Methods of constmcting such display libraries are well known in the art.
  • display vehicles such as phages, vimses or bacteria
  • GenBank sequences the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in GenBank, may be found in Boguski et al, Nat Genet.
  • Novel splice variants were predicted using the LEADS clustering and assembly system as described in Sorek, R., Ast, G. & Graur, D. Alu- containing exons are alternatively spliced. Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as US20040101876 on May 27 2004; all of which are hereby inco ⁇ orated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobuhns.
  • the GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain stmctures, known and predicted proteins and detailed homology reports.
  • SNPs sequence information
  • chromosomal information chromosomal information
  • alignments and additional information such as SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain stmctures, known and predicted proteins and detailed homology reports.
  • the potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are over-expressed in tumor tissues, by using databases of expressed sequences.
  • the detailed description of the selection method is presented in Example 1 below.
  • the cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1.
  • EXAMPLE 1 Identification of differentially expressed gene products - Algorithm
  • a specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow.
  • Dry analysis Library annotation - EST libraries are manually classified according to: (i) Tissue origin (ii) Biological source - Examples of frequently used biological sources for constmction of EST libraries include cancer cell- lines; normal tissues; cancer tissues; fetal tissues; and others such as normal cell lines and pools of normal cell- lines, cancer cell- lines and combinations thereof. . A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above.
  • Protocol of library constmction various methods are known in the art for library construction including normalized library constmction; non- normalized library constmction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library constmction is not indicated. The following mles were followed: EST libraries originating from identical biological samples are considered as a single library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted.
  • the basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries.
  • Library counting Small libraries (e.g., less than 1000 sequences) were excluded from consideration unless they participate in the cluster. For this reason, the total number of libraries is actually adjusted for each cluster. Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the normal libraries it might indicate actual over- expression.
  • tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue”. The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones
  • Clones from the tissue T are at least 40 % from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts arc statistically significant.
  • EXAMPLE 4 Identification of splice variants over expressed in cancer of clusters which are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A “segment” (sometimes refened also as “seg” or “node”) is defined as the shortest contiguous transcribed region without known splicing inside. Only reliable ESTs were considered for region and segment analysis.
  • An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and (iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB).
  • the set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI). (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to SI group. Fisher Exact Test P-values were used to check if: SI is significantly enriched by cancer EST clones compared to S2; and SI is significantly enriched by cancer EST clones compared to cluster background (S1+S2+S3). Identification of unique sequence regions and division of the group of transcripts accordingly is illustrated in Figure 2. Each of these unique sequence regions conesponds to a segment, also termed herein a "node”.
  • Region 1 common to all transcripts, thus it is preferably not considered for determining differential expression between variants; Region 2: specific to Transcript 1 ; Region 3: specific to Transcripts 2+3; Region 4: specific to Transcript 3; Region 5: specific to Transcripts 1 and 2; Region 6: specific to Transcript 1.
  • EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer
  • EST supported-regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or (iv) 3 libraries.
  • Actual Marker Examples The following examples relate to specific actual marker examples. It should be noted that Table numbering is restarted within each example related to a particular Cluster, as indicated by the titles below.
  • the markers of the present invention were tested with regard to their expression in various cancerous and non-cancerous tissue samples.
  • a description of the samples used in the panel is provided in Table 1 below.
  • a description of the samples used in the normal tissue panel is provided in Table 2 below. Tests were then performed as described in the "Materials and Experimental Procedures" section below.
  • RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS
  • RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 ⁇ g) was mixed with 150 ng Random Hexamer primers (Invitrogen) and 500 ⁇ M dNTP in a total volume of 15.6 ⁇ l. The mixture was incubated for 5 min at 65 °C and then quickly chilled on ice.
  • Real-Time RT-PCR analysis- cDNA (5 ⁇ l), prepared as described above, was used as a template in Real- Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) with specific primers and UNG Enzyme (Eurogentech or ABI or Roche).
  • Threshold Cycle point which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR products signal is above the background level (passive dye ROX) and still in the Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements).
  • the y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative quantification.
  • SDHA (GenBank Accession No. NM 004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon : TGGGAACAAGAGGGCATCTGCTAAAGTTTC AG ATTCC ATTTCTGCTCAGTATCC AGT AGTGGATCATGAATTTGATGCAGTGGTGG PBGD (GenBank Accession No. BCO 19323), PBGD Forward primer: TGAGAGTGATTCGCGTGGG PBGD Reverse primer: CCAGGGTACGAGGCTTTCAAT PBGD-amplicon:
  • HPRT1 GenBank Accession No. NMJ3001944
  • HPRT1 Forward primer TGACACTGGCAAAACAATGCA
  • HPRT1 Reverse primer GGTCCTTTTCACCAGCAAGCT
  • GAPDH Forward primer TGCACCACCACCAACTGCTTAGC
  • GAPDH-amplicon TGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGTATC
  • RPL19 (GenBank Accession No. NM 000981), RPL19 Forward primer: TGGCAAGAAGAAGGTCTGGTTAG RPL19 Reverse primer: TGATCAGCCCATCTTTGATGAG RPLl 9 -amplicon:
  • TATA box (GenBank Accession No. NM_003194), TATA box Forward primer : CGGTTTGCTGCGGTAATCAT
  • TATA box Reverse primer TTTCTTGCTGCCAGTCTGGAC
  • Ubiquitin C -amplicon ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGAT
  • SDHA (GenBank Accession No. NM 004168) SDHA Forward primer:
  • Microarray fabrication Microanays were printed by pin deposition using the MicroGrid II MGII 600 robot from BioRobotics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel- Aviv, IL) as described by A. Shoshan et al, "Optical technologies and informatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001).
  • the designed oligonucleotides were synthesized and purified by desalting with the Sigma-Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino- modified linker at the 5' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham Bioscience, Piscataway, NJ, US).
  • the 50-mer oligonucleotides, fonning the target sequences, were first suspended in Ultra-pure DDW (Cat # 01 -866-1 A Kibbutz Beit-Haemek, Israel) to a concentration of 50 ⁇ M.
  • the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21°C.
  • Each slide contained a total of 9792 features in 32 subanays.
  • 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate.
  • An additional 288 features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel.
  • Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are commercially available in the Anay Control product (Anay control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Novel markers for ovarian cancer that are both sensitive and accurate. These markers are overexpressed and/or differentially expressed in ovarian cancer specifically, as opposed to normal ovarian tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis, in ovarian cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between ovarian cancer and non-cancerous states.

Description

DIFFERENTIAL EXPRESSION OF MARKERS IN OVARIAN CANCER
FIELD OF THE INVENTION The present invention is related to novel nucleotide and protein sequences that are diagnostic markers for ovarian cancer, and assays and methods of use thereof.
BACKGROUND OF THE INVENTION Ovarian cancer causes more deaths than any other cancer of the female reproductive system. An estimated 25,580 new cases will be diagnosed during 2004 in the United States, and approximately 16,090 of these women will die of the disease. Despite advances in the management of advanced ovarian cancer, 70% to 80% of patients will ultimately succumb to disease that is diagnosed in iate stages. When ovarian cancer is diagnosed in stage I, more than 90% of patients can be cured with conventional surgery and chemotherapy. At present, however, only 25% of ovarian cancers are detected in stage I. Detection of a greater fraction of ovarian cancers at an early stage might significantly affect survival. A worldwide research effort, aiming at early detection of ovarian cancer, is currently being performed; finding molecular markers for the disease is one of the major research topics (J Clin Oncol. 2003 May 15;21(10 Suppl):200-5). No single marker has been shown to be sufficiently sensitive or specific to contribute to the diagnosis of ovarian cancer. The marker that is currently most frequently used is CA-125 (Br J Cancer. 2000 May;82(9): 1535-8). Its properties do not support its use for screening, but it is a major diagnostic tool. CA-125 is a member of the epithelial sialomucins markers group and is the most well documented and the best performing single marker from this group. Another name for CA- 125 is mucin 16, and although it is a membrane protein, it can be found in the serum. Its greatest sensitivity is achieved for serous and emdometrioid ovarian tumors compared to mucinous or clear cell tumors. Other than diagnosis, it can be used for disease monitoring (Eur J Gynaecol Oncol. 2000;21(l):64-9). In about 70% of patients, a rising level of CA-125 may be the first indication of relapse, predating clinical relapse by a median of 4 months. The serum concentration of CA-125 is elevated by the vascular invasion, tissue destruction and inflammation associated with malignant disease and is elevated in over 90% of those women with advanced ovarian cancer. Yet, CA-125 is not specific to ovarian cancer. It is elevated in 40% of all patients with advanced intra-abdominal malignancy. Levels can also be elevated during menstruation or pregnancy and in other benign conditions such as cndometriosis, peritonitis or cirrhosis, particularly with ascites. CA- 125 is not a marker that can be detected through use of urine samples due to a high molecular weight. There are other ovarian cancer markers originating from epithelial ucins but none can replace CA- 125, due to poorer specificity and sensitivity. These other markers may prove complementary to CA-125. CA-50, CA 54-61 , CΛ-195 and CA 19-9 all appear to have greater sensitivity for detection of mucinous tumors while STN and TAG-72 have better sensitivity for detection of clear cell tumors (Dis Markers. 2004;20(2):53-70). Kallikreins, a family of serine proteases, and other protease-related proteins are also potential markers for ovarian cancer. Indeed, the entire family of kallikreins map to a region on chromosome 19q which is shown to be amplified in ovarian cancers. In particular, kallikrein 6 (protease M) and kallilrein 10 have been reported to have sensitivity up to 75% and specificity up to 100%. Matrix metalloproteinases (MMPs) are another family of proteases useful in ovarian cancer screening and prognosis. MMP-2 was reported to have 66% sensitivity and 100% specificity in one study. Cathepsin L, a cystein protease, was described to have a lower false positive rate compared with CA-125. Based on their biochemical proteolytic role, it would seem likely that these proteases would be active in invasion and metastasis formation and indeed these markers appear to have higher sensitivity for advanced stages of the disease. Due to their relatively low molecular weight, such proteases are candidates to be urine markers, or markers which can be detected in urine samples (Dis Markers. 2004;20(2):53-70). Hormones have a role in normal ovarian physiology. Therefore, it is not surprising that hormones, and growth and inhibition factors as well, are suitable for ovarian cancer detection. Measurements of fragments of gonadotropin in the urine were found to have sensitivity up to 83% and specificity up to 92% for detecting ovarian cancer. Inhibins, members of the Transforming Growth Factors (TGF) beta superfamily, have been shown to have a diagnostic value in the detection of granulosa cell tumor, a relatively uncommon type of ovarian cancer, associated with better prognosis overall. Serum inhibin is an ovarian product which decreases to non detectable levels after menopause, however, certain ovarian cancers (mucinous carcinomas and sex cord stromal tumours such as granulosa cell tumours) continue to produce inhibin. Studies have shown that that inhibin assays which detect all inhibin forms (as opposed to test detecting specific members of the inhibins family) provide the highest sensitivity/specificity characteristics as an ovarian cancer diagnostic test (Mol Cell Endocrinol. 2002 May 31 ; 191 ( 1):97- 103). Measurement of serum TGF-alpha itself was found to have sensitivity up to 70% and specificity of 89% in early stage disease. The growth factor Mesothelin was also found to have diagnostic value but only for late stage disease. Immunohistochemistry is frequently used to assess the origin of tumor and staging when a pathological tissue sample is available. A few molecular markers have been shown to have diagnostic value in Immunohistochemistry of ovarian cancer, among them Epidermal Growth Factor, p53 and HER-2. P53 expression is much lower at early stage than late stage disease. P53 high expression is more typical or characteristic of invasive serous tumors than of mucinous tumors. No benign tumors are stained with P53. HER-2 is found in less than 25% of newly diagnosed ovarian cancers. Ovarian cancer of type granulosa cell tumor has in general better prognosis with late relapse and/or metastasis formation. However, about 50% of patients still die within 20 years of diagnosis. In this specific tumor type, immunohistochemistry staining of estrogen receptor beta (ERb) and proliferating cell nuclear antigen (PCNA) showed that loss of ERb expression and high PCNA expression, characterized a subgroup of granulosa cell tumors with a worse outcome (Histopathology. 2003 Sep;43(3):254-62). Survivin expression was also shown to be correlated to tumor grade, histologic type and mutant p53 but actual correlation to survival is questionable (Mod Pathol. 2004 Feb;17(2):264) Many other markers have been tested over the years for ovarian cancer detection. Some markers have shown only limited value while others are still under investigation. Among them are TPA and TPS, two cytokeratins whose inclusion in a panel with CA- 125 resulted in diagnoses with sensitivity up to 93% and specificity up to 98%. LPA - lysophosphatidic acid - was a very promising marker with one study demonstrating 98% sensitivity and 90% specificity. However, this marker is very unstable and requires quick processing and freezing of plasma, and therefore has limited usage. As previously described, no single marker has been shown to be sufficiently sensitive or specific to contribute to the diagnosis of ovarian cancer. Therefore combinations of markers in panel are being tested. Usually CA-125 is one of the panel members. The best performing panel combinations so far have been CA-125 with CA 15-3 with sensitivity of 93% and specificity of 93%, CA-125 with CEA (which has very little sensitivity by itself) with specificity of 93% and specificity of 93%, and CA- 125 with TAG-72 and CA 15-3 where specificity becomes 95% but sensitivity is diminished (Dis Markers. 2004;20(2):53-70).
SUMMARY OF THE INVENTION The background art does not teach or suggest markers for ovarian cancer that are sufficiently sensitive and/or accurate, alone or in combination. The present invention overcomes these deficiencies of the background art by providing novel markers for ovarian cancer that are both sensitive and accurate. These markers are differentially expressed and preferably overexpressed in ovarian cancer specifically, as opposed to normal ovarian tissue. The measurement of these markers, alone or in combination, in patient (biological) samples provides information that the diagnostician can correlate with a probable diagnosis of ovarian cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between ovarian cancer and non-cancerous states. According to preferred embodiments of the present invention, examples of suitable biological samples which may optionally be used with preferred embodiments of the present invention include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, ovarian tissue, any human organ or tissue, including any tumor or normal tissue, any sample obtained by lavage (for example of the bronchial system or of the female reproductive system), and also samples of in vivo cell culture constituents. In a preferred embodiment, the biological sample comprises ovarian tissue and/or a serum sample and/or a urine sample and/or secretions or other samples from the female reproductive system and/or any other tissue or liquid sample. The sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay.
Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from EMBnet, maintained by the ISREC Bionformatics group and the LICR Infomiation Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformatics, http://www.ch.embnet.org/software/TMPRED_form.html) for transmembrane region prediction; (iii) signalp_hmm or (iv) signalp_nn (both from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/SignalP/background/prediction.php) for signal peptide prediction. The terms "signalp_hmm" and "signalp_nn" refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor. In some cases for the manual inspection of cellular localization prediction inventors used the ProLoc computational platform [Einat Hazkani-Covo, Erez Levanon, Gal it Rot an, Dan Graur and Amit Novik; (2004) "Evolution of multicellularity in metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis." Cell Biology International 2004;28(3): 171-8.], which predicts protein localization based on various parameters including, protein domains (e.g., prediction of trans- membranous regions and localization thereof within the protein), pl, protein length, amino acid composition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain organelle (such as, nuclear localization signal, NLS, mitochondria localization signal), signal peptide and anchor modeling and using unique domains from Pfam that are specific to a single compartment. Information is given in the text with regard to SNPs (single nucleotide polymoφhisms). A description of the abbreviations is as follows. "T - > C", for example, means that the SNP results in a change at the position given in the table from T to C. Similarly, "M - > Q", for example, means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*). As part of the description of an SNP, a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP. An FTId is a unique and stable feature identifier, which allows construction of links directly from position- specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field, as follows: FTId=XXX_number, in which XXX is the 3- letter code for the specific feature key, separated by an underscore from a 6- digit number. In the table of the amino acid mutations of the wild type proteins of the selected splice variants of the invention, the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence. SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker. Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein. Information given in the text with regard to the Homology to the known proteins was determined by Smith- Waterman version 5.1.2 using special (non default) parameters as follows: -model=sw.model -GAPEXT=0 -GAPOP= 100.0 -MATPJX=blosuml00
Information is given with regard to overexpression of a cluster in cancer based on ESTs. A key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P- value without including the level of expression in cell- lines (Pl) - library based statistics: P- value including the level of expression in cell- lines (P2) - EST clone statistics: P- value without including the level of expression in cell- lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell- lines (R3) - EST clone statistics: P- value including the level of expression in cell- lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell- lines (R4) Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer.
Information is given with regard to overexpression of a cluster in cancer based on microarrays. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. For microarrays prepared according to a design by the present inventors, the probe name begins with the name of the cluster (gene), followed by an identifying number. These probes are listed below with their respective sequences.
>H61775_0_1 1_0
CCCCAGCTTTTATAGAGCGGCCCAAGGAAGAATATTTCCAAGAAGTAGGG
>HSAPHOL_0_11_0 GGAACATTCTGGATCTGACCCTCCCAGTCTCATCTCCTGACCCTCCCACT
>HUMGRP5E_0_0_16630
GCTGATATGGAAGTTGGGGAATCTGAATTGCCAGAGAATCTTGGGAAGAG
>HUMGRP5E_0_2_0
TCTCATAGAAGCAAAGGAGAACAGAAACCACCAGCCACCTCAACCCAAGG >D56406_0_5_0
TCTGACTTTTACGGACTTGGCTTGTTAGAAGGCTGAAAGATGATGGCAGG
>M77904_0_8_0
AGTCTGTGTTTGAGGGTGAAGGCTCAGCAACCCTGATGTCTGCCAACTAC
>Z25299_0_3_0 AACTCTGGCACCTTGGGCTGTGGAAGGCTCTGGAAAGTCCTTCAAAGCTG
>Z44808080 AAAAGCATGAGTTTCTGACCAGCGTTCTGGACGCGCTGTCCACGGACATG
>Z44808_0_0_72347
ATGTTCTTAGGAGGCAAGCCAGGAGAAGCCGGGTCTGACTTTTCAGCTCA
>Z44808_0_0_72349 TCCTCCAGACCCAAAGCCACAACCCATCGCAAGTCAAGAACACTTTCCAG
>S67314_0_0_741
CACAGAGCCAGGATGTTCTTCTGACCTCAGTATCTACTCCAGCTCCAGCT
>S67314_0_0_744
TGGCATGCTGGAACATGGACTCTAGCTAGCAAGAAGGGCTCAAGGAGGTG >Z39337_0_0_66755
GCAGGGGTTAAAAGGACGTTCCAGAAGCATCTGGGGACAGAACCAGCCTC
>Z39337_0_9_0
TAATAAACGCAGCGACGTGAGGGTCCTGATTCTCCCTGGTTTTACCCCAG
>HUMPHOSLIP_0_0_18458 AAGGAAGCAGGACCAGTGGATGTGAGGCGTGGTCGAAGAACAACAGAAAG
>HUMPHOSLIP_0_0_18487
ACAGGGGCCAGATGGTGACCCATGACCCAGCCTAAAAGGCAGCCAGAGGG
>M78530_0_6_0
CTTCCTACACACATCTAGACGTTCAAGTTTGCAAATCAGTTTTTAGCAAG >HSMUC1A_0_37_0
AAAAGGAGACTTCGGCTACCCAGAGAAGTTCAGTGCCCAGCTCTACTGAG
>HSMUC 1 A_0_0_11364
AAAGGCTGGCATAGGGGGAGGTTTCCCAGGTAGAAGAAGAAGTGTCAGCA
>HSMUC1A_0_0_11365 AATTAACCCTTTGAGAGCTGGCCAGGACTCTGGACTGATTACCCCAGCCT Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the
Human Genome U133 (HG-U133) Set at www.affymetrix.com/products/arrays/specific/hgul 33. affx; GeneChip Human Genome U133A 2.0 Array at www.affymetrix.com/products/arrays/specific/hgu 133av2.affx; and Human
Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgu l 33plus.affx). The probe names follow the Affymetrix naming convention. The data is available from NCBI Gene Expression Omnibus (see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210). The dataset (including results) is available from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSEl 133 for the Series GSEl 133 database (published on March 2004); a reference to these results is as follows: Su et al (Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7. Epub 2004 Apr 09).
The following list of abbreviations for tissues was used in the TAA histograms. The term "TAA" stands for "Tumor Associated Antigen", and the TAA histograms, given in the text, represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below (the first word is the abbreviation while the second word is the full name):
("BONE", "bone"); ("COL", "colon");
("EPI", "epithelial");
("GEN", "general");
("LIVER", "liver");
("LUN", "lung"); ("LYMPH", "lymph nodes");
("MARROW", "bone marrow");
("OVA", "ovary");
("PANCREAS", "pancreas");
("PRO", "prostate"); ("STOMACH", "stomach");
("TCELL", "T cells");
("THYROID", "Thyroid");
("MAM", "breast");
("BRAIN", "brain"); ("UTERUS", "uterus");
("SKTN", "skin"); ("KIDNEY", "kidney"); ("MUSCLE", "muscle"); ("ADREN", "adrenal"); ("HEAD", "head and neck"); ("BLADDER", "bladder");
It should be noted that the terms "segment", "seg" and "node" are used interchangeably in reference to nucleic acid sequences of the present invention; they refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below. Optionally and preferably, they are examples of oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and/or for any other use.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed.
1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Haφer Collins Dictionary of Biology (1991). All of these are hereby incoφorated by reference as if fully set forth herein. As used herein, the following terms have the meanings ascribed to them unless specified otherwise. As used herein the phrase "ovarian cancer" refers to cancers of the ovary including but not limited to Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells). The tenn "marker" in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having ovarian cancer as compared to a comparable sample taken from subjects who do not have ovarian cancer. The phrase "differentially present" refers to differences in the quantity of a marker present in a sample taken from patients having ovarian cancer as compared to a comparable sample taken from patients who do not have ovarian cancer. For example, a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays. A polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present. As used herein the phrase "diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. As used herein the phrase "diagnosing" refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery. The term "detecting" may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a "biological sample obtained from the subject" may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term "level" refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention. Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made. Determining the level of the same variant in normal tissues of the same origin is preferably effected along- side to detect an elevated expression and or amplification and/or a decreased expression, of the variant as opposed to the normal tissues. A "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of ovarian cancer. A test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). A "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a patient with ovarian cancer or a person without ovarian cancer. A control amount can be either in absolute amount (e.g., microgram ml) or a relative amount (e.g., relative intensity of signals). "Detect" refers to identifying the presence, absence or amount of the object to be detected. A "label" includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochernical, or chemical means. For example, useful labels include 32P, 35S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin- strep tavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample. The label can be incoφorated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incoφoration of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin. The label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly. For example, the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize. The binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule. The binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry. Exemplary detectable labels, optionally and preferably for use with immunoassays, include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. "Immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen. The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide (or other epitope), refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymoφhic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
2005/11
15 H61775 node 0 H61775 node 5
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an ammo acid sequence in the table below amino acid sequence comprising a sequence in the table below
. ΛSStHBSaej 3
#BroteιnΦlafhe# H61775 P16 H61775 P17
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
a nucleic acid sequence compnsing a sequence in the table below:
16
HUMCEA PEA_ _nodc _56 HUMCEA PEA_ node . HUMCEA PEA_ node _58 HUMCEA_ PEA_ _node ,60 HUMCEA _PEA_ node .61 HUMCEA _PEA_ _node 62 HUMCEA _PEA_ _node_ 64
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
HUMEDF PEA 2 T5 HUMEDF PEA 2 T10 HUMEDF PEA 2 Ti l a nucleic acid sequence comprising a sequence in the table below:
HUMEDF PEA .2. node..6 HUMEDF PEA .2. node. 1 1 HUMEDF PEA _2_ node .18 HUMEDF .PEA. .2. node. .19 HUMEDF. .PEA. .2. node. .22 HUMEDF. PEA .2. node. .2 HUMEDF, PEA. .2. node. .8 HUMEDF. PEA 2_ node .20
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSAPHOL T9 a nucleic acid sequence comprising a sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSAPHOL P2
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
a nucleic acid sequence compnsing a sequence in the table below.
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an ammo acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and or
- βS» -sStHϊT MRS**.-™, ''Tjanscnpfe amel HSECADH Ti l HSECADH T18 HSECADH T19 HSECADH T20 a nucleic acid sequence compnsing a sequence in the table below
HSECADH node 52 HSECADH iode. .53 HSECADH_node_ .54 HSECADH_node_ .57 HSECADH_node_ 60 HSECADH node 62 HSECADH_node_ 63 HSECADH node 1 HSECADH_node 1 HSECADH_node 11 HSECADH_node. 2 HSECADH_node i HSECADH_node_ .18 HSECADH_node 19 HSECADH iode .3 HSECADH_node .4 HSECADH_node_45 HSECADH_node_46 HSECADH_node_ .55 HSECADH_node_ 56 HSECADH_node_58 HSECADH_node_ .5
According to preferred embodiments of the present invention, there is provided an isolated polypeptide compnsing an amino acid sequence in the table below.
HSECADH P9
HSECADH P13 HSECADH P 14 HSECADH P 15
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUMGRP5E T4 HUMGRP5E T5 a nucleic acid sequence comprising a sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: mβ m aRliyBligii HUMGRP5E P4 HUMGRP5E P5
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence compnsing a sequence in the table below:
2005 1
25 R11723. _PEA_1 node_29 Rl 1723 PEA_1_ _node_3 R11723_ _PEA_1. _node_30 R11723. _PEAJ_ _node_4 R11723 PEA . _node_5 R11723. _PEA_1. _node_6 R11723 _PEA_1. _node_7 R11723. _PEA_1. _node_8
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
SloteW Tf R11723 PEA 1 P2 R11723 PEA 1 P6 R11723 PEA 1 P7 R1 1723 PEA 1 P13 R11723 PEA 1 P10
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below: Segment Name D56406 PEA 1 node 0 D56406 PEA 1 node 13 D56406 PEA 1 node 1 1 D56406 PEA 1 node 2 D56406 PEA 1 node 3 D56406 PEA 1 node 5 D56406 PEA 1 node 6 D56406 PEA 1 node 7 D56406 PEA 1 node 8 D56406 PEA 1 node 9
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
D56406 PEA 1 P2
D56406 PEA 1 P5
D56406 PEA 1 P6
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide compnsing a nucleic acid sequence in the table below and/or:
H53393 PEA 1 T10 H53393 PEA 1 Ti l H53393 PEA 1 T3 H53393 PEA 1 T9 27 a nucleic acid sequence comprising a sequence in the table below
According to preferred embodiments of the present mvention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
H53393 PEA 1 P2 H53393 PEA 1 P3 H53393 PEA 1 P6 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HSU40434 PEAJ. _node_39 HSU40434 PEA . _node_40 HSU40434 PEA . _node_41 HSU40434 PEA , _node_42 HSU40434 PEAJ. node 43 HSU40434_PEA . _node_44 HSU40434_PEA_1. _node_47 HSU40434_PEA_1. _node_48 HSU40434_PEAJ. _node_51 HSU40434_PEA_1. _node_52 HSU40434_PEA_1. _node_53 HSU40434_PEA . _node_54 HSU40434_PEA _node_56 HSU40434_PEA_1. _node_7 HSU40434_PEA_1. _node_8
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
M77904 Ti l M77904 T3 M77904 T8 M77904 T9
a nucleic acid sequence comprising a sequence in the table below
It sSegmem :»j ιam Sepy 3Ϊ
M77904 node 0 M77904 node 11 M77904 node 12 M77904 node 14 M77904 node 15 M77904 node 17 M77904 node 2 M77904 node 21 M77904 node 23 M77904 node 24 M77904 node 27 M77904 node 28 M77904 node 4 M77904 node 6 M77904 node 7 M77904 node 8 M77904 node 9 M77904 node 19 M77904 node 22 M77904 node 25 M77904 node 26 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence m the table below. 2005/11685
31 Protein Name M77904 P2 M77904 P4 M77904 P5 M77904 P7
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or
a nucleic acid sequence comprising a sequence in the table below
Z25299 PEA 2 node 20 Z25299 PEA 2 node 21 Z25299 PEA 2 node 23 Z25299 PEA 2 node 24 Z25299 PEA 2 node 8 Z25299 PEA 2 node 12 Z25299 PEA 2 node 13 Z25299 PEA 2 node 14 Z25299 PEA 2 node 17 Z25299 PEA 2 node 18 Z25299 PEA 2 node 19 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
T39971 T10 T39971 T12 T39971 T16 T39971 T5 a nucleic acid sequence comprising a sequence in the table below:
T39971 node 0 T39971 node 18 T39971 node 21 T39971 node 22 T39971 node 23 T39971 node 31 T39971 node 33 T39971 node 7 T39971_node_ 1 T39971 _node_ JO T39971 _node_ .11 T39971 _node_ .12 T39971 _node. .15 T39971 node. .16 T39971 _node. .17 T39971 _node_ 26 T39971 _node. .27 T39971 _node. .28 T39971 _node. .29 T39971 node .3 T39971 _node .30 T39971 node .34 T39971 _node. .35 T39971 _node. .36 T39971 node .4 T39971 node .5 T39971 node .8 T39971 node .9
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence m the table below and/or
Z44808 PEA 1 Til
Z44808 PEA 1 T4
Z44808 PEA 1 T5
Z44808 PEA 1 T8
Z44808 PEA 1 T9
a nucleic acid sequence comprising a sequence in the table below
Z44808 PEA 1 node 0
Z44808 PEA 1 node 16
Z44808 PEA 1 node 2
Z44808 PEA 1 node 24
Z44808 PEA 1 node 32
Z44808 PEA 1 node 33
Z44808 PEA 1 node 36
Z44808 PEA 1 node 37
Z44808 PEA 1 node 41
Z44808 PEA 1 node 11
Z44808 PEA 1 node 13
Z44808 PEA 1 node 18
Z44808 PEA 1 node 22
Z44808 PEA 1 node 26
Z44808 PEA 1 node 30
Z44808 PEA 1 node 34 Z44808._PEA_ 1 node -3 Z44808. PEA_ l_node .39 Z44808_ PEA_ 1_node. .4 Z44808. _PEA_ l_node 6 Z44808 PEA_ l_node .8
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
S67314 PEA 1 T4 S67314 PEA 1 T5 S67314 PEA 1 T6 S67314 PEA 1 T7 a nucleic acid sequence comprising a sequence in the table below:
Segment JNameli S67314 PEA 1 node 0 S67314 PEA 1 node 11 S67314 PEA 1 node 13 S67314 PEA 1 node 15 2005/116850
36 S67314 PEA 1 node 17 S67314 PEA 1 node 4 S67314 PEA 1 node 10 S67 14 PEA 1 node 3
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
Z39337 PEA 2 PEA 1 node 3 Z39337 PEA 2 PEA 1 node 5 Z39337 PEA 2 PEA 1 node 6 Z39337 PEA 2 PEA 1 node 10 Z39337 PEA 2 PEA 1 node 1 1 Z39337 PEA 2 PEA 1 node 14
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HUMPHOSLIP. PEA_2 node 9
HUMPHOSLIP. PEA 2. node .3
HUMPHOSLIP. PEA 2. node _68
HUMPHOSLIP. PEA_2 node .70
HUMPHOSLIP. _PEA_2. node .75
HUMPHOSLIP. _PEA_2 node. .2
HUMPHOSLIP. _PEA_2 node. .3
HUMPHOSLIP _PEA_2 node. 4
HUMPHOSLIP. _PEA_2 node 6
HUMPHOSLIP. _PEA_2 node .7
HUMPHOSLIP. _PEA_2 node _8
HUMPHOSLIP. _PEA_2 node 9
HUMPHOSLIP. _PEA_2 node. -1
HUMPHOSLIP. _PEA_2 node -1
HUMPHOSLIP. _PEA_2 node _16
HUMPHOSLIP. _PEA_2. node _17
HUMPHOSLIP_PEA_2. node .23
HUMPHOSLIP. PEA 2 node _24
HUMPHOSLIP. _PEA_2 node .25
HUMPHOSLIP. _PEA_2 node - 6
HUMPHOSLIP. _PEA_2 node 29
HUMPHOSLIP. _PEA_2 node -30
HUMPHOSLIP_PEA_2. node .33
HUMPHOSLIP_PEA_2_node_36
HUMPHOSLIP_PEA_2 node .37
HUMPHOSLIP_PEA_2 node .39
HUMPHOSLIP_PEA_2_ node 40
HUMPHOSLIP_PEA_2 node 41
HUMPHOSLIP_PEA_2. _node_42
HUMPHOSLIP_PEA_2 _node_44 39 HU PHOSLIP_PEΛ_ _2_ node . HUMPHOSLIP PEA. .2. node .47 HUMPHOSLIP. PEA. .2. node 51 HUMPHOSLIP. PEA .2. node _52 HUMPHOSLIP. PEA. .2. node .53 HUMPHOSLIP. PEA .2. node. -54 HUMPHOSLIP PEA .2. node. .55 HUMPHOSLIP. PEA .2. node _58 HUMPHOSLIP PEA. .2. node. - HUMPHOSLIP PEA .2. node 60 HUMPHOSLIP PEA .2. node -61 HUMPHOSLIP. _PEA. .2. node _62 HUMPHOSLIP_PEA. .2. node .63 HUMPHOSLIP. _PEA. .2. node 64 HUMPHOSLIP. _PEA. .2. node .65 HUMPHOSLIP_PEA_ _2_ node -66 HUMPHOSLIP. _PEA. _2. node _67 HUMPHOSLIP. _PEA. _2. _node_69 HUMPHOSLIP_PEA. _2. node _71 HUMPHOSLIP. _PEA. _2. node .72 HUMPHOSLIP. _PEA. _2_ node .73 HUMPHOSLIP_PEA_ _2. node _74
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
2005/116850
40 HUMPHOSLI PEA. _2_ _P30 HUMPHOSLIP. .PEA. .2. _P31 HUMPHOSLIP_PEA. .2. _P33 HUMPHOSLIP_PEA. _2_ P34 HUMPHOSLIP -PEA. _2_ _P35 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
T59832 node 1 T59832 node 7 T59832 node 29 T59832 node 39 T59832 node 2 T 9832 node 3 T59832 node 4 T59832 node 5 T59832 node 6 T59832 node 8 T59832 node 9 T59832 node 10 T59832. node..11
T59832_node_ .12
T59832. node. -1
T59832. node .16
T59832_node_ -•
T59832. _node_20
T59832. node .25
T59832. node. .26
T59832 node. .27
T59832. _node_28
T59832. node. .30
T59832. node. .31
T59832 node .32
T59832 node .34
T59832 node .35
T59832 node .36
T59832 node .37
T59832 node .38
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
RoSNai
T59832 P5
T59832 P7
T59832 P9
T59832 P12
T59832 P18 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence compnsing a sequence in the table below: iSegmeni&Namej HSCP2 PEA 1 node 0 HSCP2 PEA 1 node 3 HSCP2 PEA 1 node 6 HSCP2 PEA 1 node 8 HSCP2 PEA 1 node 10 HSCP2 PEA 1 node 14 HSCP2 PEA 1 node 23 HSCP2 PEA 1 node 26 HSCP2 PEA 1 node 29 HSCP2 PEA 1 node 31 HSCP2 PEA 1 node 32 HSCP2 PEA 1 node 34 2005 1
44 HSCP2. PEA_ 1 node .67 HSCP2_PEA_ 1 node 68 HSCP2_PEA_ l node .69 HSCP2. _PEA_ 1 node 70 HSCP2. _PEA_ 1 node .75 HSCP2. PEA_ l_node. .77 HSCP2. PEA l_node - HSCP2 PEA_ l_node_ .82
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSCP2 PEA 1 P4 HSCP2 PEA 1 P8 HSCP2 PEA 1 P14 HSCP2 PEA 1 P1 HSCP2 PEA 1 P2 HSCP2 PEA 1 P16 HSCP2 PEA 1 P6 HSCP2 PEA 1 P22 HSCP2 PEA 1 P24 HSCP2 PEA 1 P25 HSCP2 PEA 1 P33 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUMTEN PEA 1 T6 HUMTEN PEA 1 T7 HUMTEN PEA 1 Ti l HUMTEN PEA 1 T14 HUMTEN PEA 1 T16 HUMTEN PEA 1 T17 HUMTEN PEA 1 T18 HUMTEN PEA 1 T19 HUMTEN PEA 1 T20 HUMTEN PEA 1 T23 HUMTEN PEA 1 T32 HUMTEN PEA 1 T35 HUMTEN PEA 1 T36 HUMTEN PEA 1 T37 HUMTEN PEA 1 T39 HUMTEN PEA 1 T40 HUMTEN PEA 1 T41
a nucleic acid sequence comprising a sequence in the table below:
HUMTEN PEA 1 node 17 HUMTEN PEA 1 node 21 HUMTEN PEA 1 node 22 HUMTEN PEA 1 node 25 HUMTEN PEA 1 node 36 HUMTEN PEA 1 node 53 HUMTEN PEA 1 node 54 HUMTEN PEA 1 node 57 HUMTEN PEA 1 node 61 HUMTEN PEA 1 node 62 HUMTEN PEA 1 node 67 HUMTEN PEA 1 node 68 HUMTEN PEA 1 node 69 HUMTEN PEA 1 node 70 HUMTEN PEA 1 node 72 HUMTEN PEA 1 node 84 HUMTEN PEA 1 node 85 HUMTEN PEA 1 node 86 HUMTEN PEA 1 node 87 HUMTEN PEA 1 node 88
According to prefened embodiments of the present invention, there is provided an isolated polypeptide compnsing an amino acid sequence in the table below
HUMTEN PEA I Pl 1
HUMTEN PEA 1 P13
HUMTEN PEA 1 P14
HUMTEN PEA 1 P15
HUMTEN PEA 1 P16
HUMTEN PEA 1 P17
HUMTEN PEA 1 P20
HUMTEN PEA 1 P26
HUMTEN PEA 1 P27
HUMTEN PEA 1 P28
HUMTEN PEA 1 P29
HUMTEN PEA 1 P30
HUMTEN PEA 1 P31
HUMTEN PEA 1 P32 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide compnsing a nucleic acid sequence in the table below and/or
HUMOSTRO PEA 1 PEA 1 T14
HUMOSTRO PEA 1 PEA 1 T16
HUMOSTRO PEA 1 PEA 1 T30
a nucleic acid sequence comprising a sequence in the table below
HUMOSTRO PEA 1 PEA 1 node 0
HUMOSTRO PEA 1 PEA 1 node 10
HUMOSTRO PEA 1 PEA 1 node 16
HUMOSTRO PEA 1 PEA 1 node 23
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or Transcnpt N me i T46984 PEA 1 T2 T46984 PEA 1 T3 T46984 PEA 1 T12 T46984 PEA 1 T13 T46984 PEA 1 T14 T46984 PEA 1 T15 T46984 PEA 1 T19 T46984 PEA 1 T23 T46984 PEA 1 T27 T46984 PEA 1 T32 T46984 PEA 1 T34 T46984 PEA 1 T35 T46984 PEA 1 T40 T46984 PEA 1 T42 T46984 PEA 1 T43 T46984 PEA 1 T46 T46984 PEA 1 T47 T46984 PEA 1 T48 T46984 PEA 1 T51
T46984 PEA 1 T52
T46984 PEA 1 T54
a nucleic acid sequence compnsing a sequence in the table below.
SegmenftNamei t ' m -
T46984 PEA node 2
T46984 PEA node 4
T46984 PEA node 6
T46984 PEA node 12
T46984 PEA node 14
T46984 PEA node 25
T46984 PEA node 29
T46984 PEA node 34
T46984 PEA node 46
T46984 PEA node 47
T46984 PEA node 52
T46984 PEA node 65
T46984 PEA node 69
T46984 PEA node 75
T46984 PEA node 86
T46984 PEA node 9
T46984 PEA node 13
T46984 PEA node 19
T46984 PEA node 21
T46984 PEA node 22
T46984 PEA node 26
T46984 PEA node 28
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence compnsmg a sequence in the table below:
M78530 PEA 1 node 0 M78530 PEA 1 node 15 M78530 PEA 1 node 16 M78530 PEA 1 node 19 M78530 PEA 1 node 21 M78530 PEA 1 node 23 M78530 PEA 1 node 27 M78530 PEA 1 node 29 M78530 PEA 1 node 36 M78530 PEA 1 node 37 M78530 PEA 1 node 2 M78530 PEA 1 node 4 M78530 PEA 1 node 5 M78530 PEA 1 node 7 M78530 PEA 1 node 9 M78530 PEA 1 node 10 M78530 PEA 1 node 18 M78530 PEA 1 node 25 M78530 PEA 1 node 30 M78530 PEA 1 node 33 M78530 PEA 1 node 34
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: iranscj ϊtøairøi I * T481 19 T2 a nucleic acid sequence comprising a sequence in the table below
According to prefened embodiments of the present invention, there is provided an isolated polypeptide compnsing an amino acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HSMUCI A PEA 1 node 4
HSMUC I A PEA 1 node 5
HSMUCI A PEA 1 node 6
HSMUC IA PEA 1 node 7
HSMUC IA PEA 1 node 17
HSMUCIA PEA 1 node 18
HSMUCIA PEA I node 20
HSMUCIA PEA 1 node 21
HSMUCIA PEA 1 node 23
HSMUCIA PEA 1 node 26
HSMUCIA PEA 1 node 27
HSMUCIA PEA 1 node 31
HSMUCIA PEA 1 node 34
HSMUCIA PEA 1 node 36
HSMUCIA PEA 1 node 37
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSMUCIA PEA 1 P25
HSMUCIA PEA 1 P29
HSMUCIA PEA 1 P30
HSMUCIA PEA 1 P32
HSMUCIA PEA 1 P36
HSMUCIA PEA 1 P39
HSMUCIA PEA 1 P45
HSMUCIA PEA 1 P49
HSMUCIA PEA 1 P52 HSMUC 1 A PEAJ P53
HSMUC 1A PEAJ P56
HSMUC1A_PEA J P58
HSMUC1A_PEA_1_P59
HSMUC1A_PEA_1_P63
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMUC1A_PEA_1_P63, comprising a first amino acid sequence being at least 90 % homologous to MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSV conesponding to amino acids 1 - 45 of MUC1 HUMAN, which also conesponds to amino acids 1 - 45 of HSMUC1A PEA J P63, and a second amino acid sequence being at feast 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence EEEVS ADQ VS VGASGVLGSFKEARNAPSFLSWSFSMGPSK conesponding to amino acids 46 - 85 of HSMUC 1 A PEAJ P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMUC1A PEAJ P63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUC 1A_PEA_1_P63. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ P2, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA J SKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVK1STEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVD ITGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERK1EFDS ASGTYTLYLIIGDATLKNPILWNV conesponding to amino acids 1 - 498 of RIB2 HUMAN, which also conesponds to amino acids 1 - 498 of T46984_PEA_1_P2, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCA conesponding to amino acids 499 - 501 of T46984_PEA_1_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQ conesponding to amino acids 1 - 433 of RIB2 HUMAN, which also conesponds to amino acids 1 - 433 of T46984 PEAJ P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP conesponding to amino acids 434 - 444 of T46984 PEAJ P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA _1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ICHIWKLIFLP in T46984 PEA 1 P3. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA J P10, comprising a first amino acid sequence being at least 90 %> homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNV conesponding to amino acids 1 - 498 of RIB2 HUMAN, which also conesponds to amino acids 1 - 498 of T46984 PEAJ P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LMDQK conesponding to amino acids 499 - 503 of T46984 PEA J P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984 PEA 1 P10, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homolo gous to the sequence LMDQK in T46984_PEAJ_P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEAJ_P11, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVΛ EGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLITNQKTGQEVVFVAEPDNK VYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNVADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEK RPPTVVSNTFTALILSPLLLLFALWIPJGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWT QLNMFQTLKYLAILGSVTFLAGNRMLAQQAVKR conesponding to amino acids 1 - 628 of RIB2 HUMAN, which also conesponds to amino acids 1 - 628 of T46984_PEAJ_P1 1. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEAJ_P12, comprising a first amino acid sequence being at least 90 % homologous to
MAP PGSSTVFLLALTIIASTW ALTPTHYLTKHDVERLKASLDRPFTNLES AF YSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVAS AAA VLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN corresponding to amino acids 1 - 338 of RIB2 HUMAN, which also conesponds to amino acids 1 - 338 of T46984 PEAJ P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SQDLH conesponding to amino acids 339 - 343 of T46984 PEAJ P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1_P12, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T46984_PEA_1_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ JP21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M corresponding to amino acids 1 - 1 of T46984_PEA_1_P21 , and a second amino acid sequence being at least 90 % homologous to
KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSSVTQIYHAV AALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVA RLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFES LSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKL EHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVEL RVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQLVDVNT GAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDSASGTYTLYLII GDATLK JPILW]WADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFPJΞPEKRPPTVVSNTF TALILSPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWTQLNMFQTLKY LAILGSVTFLAGNRMLAQQAVKRTAH conesponding to amino acids 70 - 631 of RIB2_HUMAN, which also conesponds to amino acids 2 - 563 of T46984 PEAJ P21 , wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ P27, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFA conesponding to amino acids 1 - 415 of RIB2 HUMAN, which also conesponds to amino acids 1 - 415 of T46984 PEA 1 P27, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP conesponding to amino acids 416 - 459 of T46984_PEA_1_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEAJ_P27, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984_PEA_1_P27. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1_P32, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKJ< IFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVE conesponding to amino acids 1 - 364 of RJB2 HUMAN, which also conesponds to amino acids 1 - 364 of T46984 PEAJ P32, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT conesponding to amino acids 365 - 397 of T46984 PEAJ JP32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984 PEAJ P32, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984_PEA_1_P32. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ P34, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVG conesponding to amino acids 1 - 329 of RIB2 HUMAN, which also conesponds to amino acids 1 - 329 of T46984JΕAJ P34. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1_P35, compris ing a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI conesponding to amino acids 1 - 287 of RIB2 HUMAN, which also conesponds to amino acids 1 - 287 of T46984 PEAJ P35, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GCWPSRQSREQHISSRRKMEILKTECQEKESRTIHSMRRKMEKKNFI conesponding to amino acids 288 - 334 of T46984 PEAJ P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEAJ_P35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GCVv SRQSREQHISSRRJKMEII TECQEKESRTIHSMRRKMEKK FI in T46984_PEA_1_P35. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEAJ JP38, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEAL conesponding to amino acids 1 - 145 of RIB2_HUMAN, which also conesponds to amino acids 1 - 145 of T46984_PEA_1_P38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDPDWCQCLQLHFCS corresponding to amino acids 146 - 160 of T46984 PEAJ P38, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA J_P38, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T46984_PEAJ_P38. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ P39, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLA conesponding to amino acids 1 - 160 of RIB2 HUMAN, which also conesponds to amino acids 1 - 160 of T46984_PEA_1_P39. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEA 1 P45, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCE conesponding to amino acids 1 - 101 of RIB2 HUMAN, which also conesponds to amino acids 1 - 101 of T46984 PEA 1 P45, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG conesponding to amino acids 102 - 1 16 of T46984JΕAJ P45, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984 PEAJ P45, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984 PEA 1 P45. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984 PEAJ P46, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAK conesponding to amino acids 1 - 69 of RIB2 HUMAN, which also conesponds to amino acids 1 - 69 of T46984_PEA_1_P46, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NSPGSADSIPPVPAG conesponding to amino acids 70 - 84 of T46984_PEAJ_P46, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984 PEAJ P46, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_PEAJ_P46. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAHGGSH SKOTVLWEYGGYASEGλ^QVAELGSPVKMEEEIRQQSDEVL IKAKAQWPAWQPLN VRAAPSAEFSVDRTRHLMSFLTMMGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPW DAGTDSGVTYESPNKPTIPQEKIRPLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQ CNIVPDNVDDIVADLAPEEKDEDDTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKA QLDLSVPCPDTQDFQPCMGPGCSDEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQ FPEDGSVCTLPTEE conesponding to amino acids 1 - 544 of Q9HCB6, which also conesponds to amino acids 1 - 544 of M78530_PEA_1_P15, a bridging amino acid T conesponding to amino acid 545 of M78530_PEA_1_P15, a second amino acid sequence being at least 90 % homologous to EKCTVNEECSPSSCLMTEWGEWDECSATCGMGMKKRHRMIKMNPADGSMCKAETSQ AEKCMMPECHTIPCLLSPWSEWS DCS VTCGKGMRTRQRMLKSLAELGDCNEDLEQVE KCMLPEC conesponding to amino acids 546 - 665 of Q9HCB6, which also conesponds to amino acids 546 - 665 of M78530_PEA_1_P15, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%ι, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS conesponding to amino acids 666 - 695 of M78530_PEA_1_P15, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530 PEAJ JP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530 PEA 1 P15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEAJ_P15, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRJQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNVRAAPSAEFSVDRTRHLMSFLTM MGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPWDAGTDSGVTYESPNKPTIPQEKIR PLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQCNIVPDNVDDIVADLAPEEKDED DTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKAQLDLSVPCPDTQDFQPCMGPGCS DEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQFPEDGSVCTLPTEETEKCTVNEEC SPSSCLMTEWGEWDECSATCGMGMKXRHRMIKMNPADGSMCKAETSQAEKCMMPE CHTIPCLLSPWSEWSDCSVTCGKGMRTRQRMLKSLAELGDCNEDLEQVEKCMLPEC corresponding to amino acids 1 - 582 of 094862, which also conesponds to amino acids 84 - 665 of M78530 PEAJ J 5, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
RKSWSSSRPITSMFLSPGSPEPASANTARS conesponding to amino acids 666 - 695 of M78530_PEAJ_P15, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of M78530 PEA J P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS ofM78530_PEAJ_P15.
An isolated polypeptide encoding for a tail of M78530JPEAJ JP15, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEAJ_P16, comprising a first amino acid sequence being at least 90 % homologous to MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTmQVFWIAPPAGTGCVILj ASIVQK-RIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHP DYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN
V conesponding to amino acids 1 - 297 of Q8NCD7, which also conesponds to amino acids 1 - 297 of M78530_PEA_1_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGC VILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN
V corresponding to amino acids 1 - 297 of Q9HCB6, which also conesponds to amino acids 1 - 297 of M78530_PEA_1_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of
M78530 PEA 1 P16, and a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYI^TFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNV conesponding to amino acids 1 - 214 of 094862, which also conesponds to amino acids 84 - 297 of M78530_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of M78530_PEA 1 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530_PEA_1_P 16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDG VTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ conesponding to amino acids 1 - 275 of Q8NCD7, which also conesponds to amino acids 1 - 275 of M78530_PEA_1_P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK conesponding to amino acids 276 - 285 of
M78530_PEAJ_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEA_1_P17, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTKTPRRRTRIQVFWJ^PPAGTGCVILKASIVQKJIIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SK^YVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ conesponding to amino acids 1 - 275 of Q9HCB6, which also conesponds to amino acids 1 - 275 of M78530_PEA_1_P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK corresponding to amino acids 276 - 285 of M78530_PEA_1_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEAJ_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEG YCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEAJ_P17, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKJIIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQ conesponding to amino acids 1 - 192 of 094862, which also conesponds to amino acids 84 - 275 of M78530_PEAJ_P17, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRQKNHRMTK conesponding to amino acids 276 - 285 of M78530_PEA_1_P17, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a headofM78530_PEA_l_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MRLSPAPLKLSRTPALLALALPLΛAALAFSDETLDKVPKSEGYCSR1LRAQGTRREGYTEFSLR VEGDPDFYKPGTSYR VTLS ofM78530_PEAJ_Pl 7. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78530_PEA_1_P17, comprising a polypeptide being at least 1(1%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T48119_P2, comprising a first amino acid sequence being at least 90 % homologous to
MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERISGLGLTPE
QKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARSIRARDPGARVLIVSEDP
ELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGKERSIYFQPPSFYVSAQDLPHIENGGV AVLTGKKVVQLDVRDNMVKLNDGS QITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTL FRKIGDFRSLEKISREVKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILP EYLSNWTMEKVRREGVKVMPNAIVQSVGVSSGKLLΠ LKDGRKVETDHIVAAVGLEP NVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGRRRVEHHDHAV VSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVDSSLPTVGVFAKATAQD NPKSATEQSGTGIRSESETESEASEITIPPSTPAVPQAPVQGEDYGKGVIFYLRDKWVGI
VLWNIFNRMPIARKIIKDGEQHEDLNEVAKLFNIHED conesponding to amino acids 50 - 613 of PCD8_HUMAN, which also conesponds to amino acids 1 - 564 of T48119_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T48119_P2, comprising a first amino acid sequence being at least 90 %> homologous to
MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERISGLGLTPE QKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARSIRARDPGARVLIVSEDP ELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGKERSIYFQPPSFYVSAQDLPHIENGGV AVLTGKKVVQLDVRDNMVKLNDGSQITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTL FRJΉGDFRSLEKJSREVKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILP EYLSNWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHΓVAAVGLEP NVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGRRRVEHHDHAV VSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVDSSLPTVGVFAKATAQD NPKSATEQSGTGIRSESETESEASEITIPPSTPAVPQAPVQGEDYGKGVIFYLRDKVVVGI VLWNIFNRMPIARK IKDGEQH EDLNEVAKLFNIHED conesponding to amino acids 50 - 613 of PCD8 HUMAN, which also corresponds to amino acids 1 - 564 of T481 19_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 JP6, comprising a first amino acid sequence being at least 90 %> homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG corresponding to amino acids 1 - 276 of VTNC HUMAN, which also corresponds to amino acids 1 - 276 of T39971 P6, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence TQGVVGD conesponding to amino acids 277 - 283 of T39971 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971 P6, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%. and most preferably at least about 95%> homologous to the sequence TQGWGD in T39971_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971_P9, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC K QVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT conesponding to amino acids 1 - 325 of VTNC_HUMAN, which also conesponds to amino acids 1 - 325 of T39971_P9, and a second amino acid sequence being at least 90 % homologous to SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGA NNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGC PAPGHL conesponding to amino acids 357 - 478 of VTNC HUMAN, which also corresponds to amino acids 326 - 447 of T39971_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 P1 1, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRJNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS conesponding to amino acids 1 - 326 of VTNC HUMAN, which also conesponds to amino acids 1 - 326 of T39971 P11, and a second amino acid sequence being at least 90 % homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of VTNC_HUMAN, which also conesponds to amino acids 327 - 363 of T39971_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971_P 1 1 , comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 J? l 1 , comprising a first amino acid sequence being at least 90 %> homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS conesponding to amino acids 1 - 326 of Q9BSH7, which also conesponds to amino acids 1 - 326 of T39971 P11, and a second amino acid sequence being at least 90 %> homologous to
DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of Q9BSH7, which also conesponds to amino acids 327 - 363 of T39971 P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971 P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 P12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQ VTRGD VFTMPEDE YTVYDDGEEKNN ATVHEQ VGGPSLTS DLQ AQSKGNPEQTP V LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK conesponding to amino acids 1 - 223 of VTNC TUMAN, which also conesponds to amino acids 1 - 223 of T39971_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV conesponding to amino acids 224 - 238 of T39971_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971_P12, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971 JP12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971_P12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK conesponding to amino acids 1 - 223 of Q9BSH7, which also conesponds to amino acids 1 - 223 of T39971_P12, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV conesponding to amino acids 224 - 238 of T39971 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971 P12, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VPGAVGQGRKHLGRV in T39971 P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNWIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDK SISSGDIGK ΕIKPFKXtFLRKKSK K^
ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02_HUMAN, which also conesponds to amino acids 1 - 441 of Z44808_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR conesponding to amino acids 442 - 464 of Z44808_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z44808 PEAJ P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808 PEA J_P6, comprising a first amino acid sequence being at least 90 % homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLE1AYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLR XSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRH conesponding to amino acids 1 - 428 of SM02JTUMAN, which also conesponds to amino acids 1 - 428 of Z44808 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RSKRNL conesponding to amino acids 429 - 434 of Z44808 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z44808 PEA 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence RSKRNL in Z44808_PEA_1_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808 PEAJ P7, comprising a first amino acid sequence being at least 90 % homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEKPFKJIFLRKKSK KX^
ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02_HUMAN, which also conesponds to amino acids 1 - 441 of Z44808_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLWLRGKVSFYCF conesponding to amino acids 442 - 454 of Z44808_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z44808_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLWLRGKVSFYCF in Z44808_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z44808 PEAJ P1 1, comprising a first amino acid sequence being at least 90 % homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT conesponding to amino acids 1 - 170 of SM02_HUMAN, which also conesponds to amino acids 1 - 170 of Z44808 PEA 1 P11, and a second amino acid sequence being at least 90 % homologous to
DIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGL YKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQ GCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLD KNSSGDIGKlKEIKPFKJ LRKKSK IG£C\Ta^
DGKADTKKRHTPRGHAESTSNRQPRKQG conesponding to amino acids 188 - 446 of SM02 HUMAN, which also conesponds to amino acids 171 - 429 of Z44808_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of Z44808 PEA 1 P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to - 170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314 PEAJ P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH_HUMAN, which also conesponds to amino acids 1 - 1 16 of S67314 PEA J_P4, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL conesponding to amino acids 1 17 - 215 of S67314_PEA_1_P4, wherein said firstand second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314 PEA J P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314_PEA_1_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314 PEA 1 P4, comprising a first amino acid sequence being at least 90 % homologous to
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL conesponding to amino acids 1 17 - 215 of S67314_PEA_1_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314 PEA J_P4, comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEAJ P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1_P5, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 1 16 of S67314_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV conesponding to amino acids 1 17 - 178 of S67314 PEAJ P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314 PEA J P5, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYJ NHGWEELRVG KSIV in S67314_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314 PEAJ P5, comprising a first amino acid sequence being at least 90 % homologous to
M VD AFLGTWKLVDSKNFDDYMKSLG VGFATRQV ASMTKPTTI I EKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 1 16 of AAP35373, which also corresponds to amino acids 1 - 116 of S67314_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%) homologous to a polypeptide having the sequence
DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV conesponding to amino acids 1 17 - 178 of S67314_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314_PEA_1_P5, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSY NHG WEELRVG KSIV in S67314_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314 PEAJ P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of S67314 PEA J P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEAJ_ P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTΉIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 1 16 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314_PEA_1_P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%o homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S67314_PEA_1_P6, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MEKLQLRNVK in S67314 PEA 1 P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314 PEA J P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of FABH_HUMAN, which also conesponds to amino acids 1 - 24 of S67314 PEA J P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314 PEAJ P7, and a third amino acid sequence being at least 90 % homologous to
GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of FABH HUMAN, which also conesponds to amino acids 36 - 144 of S67314_PEA_1_P7, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of S67314_PEA_1_P7, comprising an amino acid sequence being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314JPEAJ P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S67314_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of AAP35373, which also conesponds to amino acids 1 - 24 of S67314_PEA_1_P7, second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314_PEA_1_P7, and a third amino acid sequence being at least 90 %> homologous to
GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of AAP35373, which also conesponds to amino acids 36 - 144 of S67314 PEAJ P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of S67314 PEAJ P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314_PEAJ_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z39337 PEA 2 PEA J P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWLPLSGAA conesponding to amino acids 1 - 9 of Z39337JΕA 2 PEA J P4, and a second amino acid sequence being at least 90 % homologous to
MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADGDFPDTIQCAYIHLVSREECEHAYPGQ ITQNMLCAGDEKYGKDSCQGDSGGPLVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCR YTNWIQKTIQAK conesponding to amino acids 1 - 244 of KLK6 HUMAN, which also conesponds to amino acids 10 - 253 of Z39337_PEA_2_PEA_1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of Z39337_PEA_2_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWLPLSGAA of Z39337_PEA_2_PEA_1_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADG conesponding to amino acids 1 - 149 of KLK6JTUMAN, which also conesponds to amino acids 1 - 149 of
Z39337_PEA_2_PEAJ_P9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence Q conesponding to amino acids 150 - 150 of Z39337_PEA_2_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P10, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P10, and a second amino acid sequence being at least 90 % homologous to
KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMK DPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMES YFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKP SGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSN HSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVTNHAGFLTI GADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV conesponding to amino acids 163 - 493 of PLTP HUMAN, which also conesponds to amino acids 68 - 398 of HUMPHOSLIP PEA 2 P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMPHOSLIP PEA 2 P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P12, comprising a first amino acid sequence being at least 90 % homologous to
MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYPNAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRG AFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDK VPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVP PDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLK TMLQIGVMPMLN conesponding to amino acids 1 - 427 of PLTPJTUMAN, which also conesponds to amino acids 1 - 427 of HUMPHOSLIP_PEA_2_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKAGV conesponding to amino acids 428 - 432 of HUMPHOSLIP PEA 2 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GKAGV in HUMPHOSLIP J>E A_2_P 12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P31, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P31 , and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG conesponding to amino acids 68 - 98 of HUMPHOSLIP PEA 2 P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P31 , comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG in HUMPHOSLlT_PEA_2 J>31. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P33, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYPNAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ conesponding to amino acids 1 - 183 of PLTPJTUMAN, which also corresponds to amino acids 1 - 183 of HUMPHOSLIP_PEA_2_P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 184 - 200 of HUMPHOSLIP PEA 2 P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2 P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP >EA_2_P33. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P34, comprising a first amino acid sequence being at least 90 % homologous to
MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPV conesponding to amino acids 1 - 205 of PLTP_HUMAN, which also conesponds to amino acids 1 - 205 of
HUMPHOSLIP_PEA_2_P34, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence LWTSLLALTIPS conesponding to amino acids 206 - 217 of HUMPHOSLIP JΕA_2_P34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2_P34, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LWTSLLALTIPS in HUMPHOSLIP >EA_2 >34. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P35, comprising a first amino acid sequence being at least 90 % homologous to
MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF conesponding to amino acids 1 - 109 of PLTPJTUMAN, which also corresponds to amino acids 1 - 109 of HUMPHOSLIP PEA 2 P35, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQ conesponding to amino acids 163 - 183 of PLTP HUMAN, which also conesponds to amino acids 111 - 131 of HUMPHOSLIP_PEA_2_P35, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 132 - 148 of HUMPHOSLIP PEA 2 P35, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMPHOSLIP PEA 2 P35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a structure as follows (numbering according to HUMPHOSLIP_PEA_2_P35): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMPHOSLIP PEA 2JP35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP >EA_2J>35. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 %> homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVN VTLYYEALCGGCRAFLI RELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 12 - 223 of GILT HUMAN, which also conesponds to amino acids 1 - 212 of T59832 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VRJFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 213 - 238 of T59832_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VRIFLALSLTLΓVPWSQGWTRQRDQR in T59832_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLΗVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 12 - 214 of
GILT HUMAN, which also conesponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832 P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832 P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 12 - 141 of GILT_HUMAN, which also conesponds to amino acids 1 - 130 of T59832 P12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 173 - 261 of GILT_HUMAN, which also conesponds to amino acids 131 - 219 of T59832 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832 P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P18, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT JIUMAN, which also conesponds to amino acids 1 - 44 of T59832_P18, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of G1LTJTUMAN, which also corresponds to amino acids 45 - 133 of T59832 P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P4, comprising a first amino acid sequence being at least 90 %> homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD WGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKJSTNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTJ DIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYS1HAHGVQTESSTVTPTLPGETLTYVWK1PERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE conesponding to amino acids 1 - 1060 of CERU HUMAN, which also corresponds to amino acids 1 - 1060 of HSCP2 PEAJ P4, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%., more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence GGTSM conesponding to amino acids 1061 - 1065 of HSCP2 PEAJ P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEAJ P4, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GGTSM in HSCP2 PEA J P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P8, comprising a first amino acid sequence being at least 90 % homologous to
MKILILGIFLFLCSTPAWAJKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKJONIRGKΉVRHYYIAAEEIΓ NYAPSGIDIFTKENLTAPGSDSAVFFEQG TTWGGSYKKLVΎREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVPJ^KNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNE1DLHT VHFHGHSFQYK corresponding to amino acids 1 - 1006 ofCERU HUMAN, which also conesponds to amino acids 1 - 1006 of HSCP2 PEA J P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KCFQEHLEFGYSTAM conesponding to amino acids 1007 - 1021 of HSCP2 PEA _1_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEA 1 P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KCFQEHLEFGYSTAM in HSCP2_PEA_1_P8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P14, comprising a first amino acid sequence being at least 90 %> homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKJJIASGLIGPLΠCKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTMGGSYKKXVΥREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH conesponding to amino acids 1 - 621 of CERU_HUMAN, which also conesponds to amino acids 1 - 621 of HSCP2_PEA J P14, a second amino acid sequence bridging amino acid sequence comprising of W, and a third amino acid sequence being at least 90 %> homologous to
TFNVECLTTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGIL GPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGA GTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENE SWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYL MGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHV TDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 694 - 1065 of CERU_HUMAN, which also conesponds to amino acids 623 - 994 of HSCP2 PEAJ P14, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCP2 PEA _1_P14, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HWT having a structure as follows (numbering according to HSCP2 PEAJ P14): a sequence starting from any of amino acid numbers 621-x to 621; and ending at any of amino acid numbers 623 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P15, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDK KEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALU KNΎRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSY KLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMH AINGRMFGNLQGLTMHVGDEVN WYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE conesponding to amino acids 1 - 1060 of CERUJTUMAN, which also conesponds to amino acids 1 - 1060 of HSCP2 PEAJ P15, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GEYPASSETHRRIWNVIYPITVSVIILFQISTKE conesponding to amino acids 1061 - 1094 of HSCP2 PEAJ P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEA I P 15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEYPASSETHRRJWNVIYPITVSVIILFQISTKE in HSCP2_PEA_1_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEA J P2, comprising a first amino acid sequence being at least 90 %> homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKJ ALYLQYTDETFRTTIEJKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS
HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPJKDIASGLIGPLΠCKKDSLDKEKEKΉIDP^F\^MFS\^DENFSWYLEDNTKTY
CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEHWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLA MYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ conesponding to amino acids 1 - 761 of CERU HUMAN, which also conesponds to amino acids 1 - 761 of HSCP2 PEAJ P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%., more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence K conesponding to amino acids 762 - 762 of HSCP2 PEAJ P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P16, comprising a first amino acid sequence being at least 90 % homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKJDIASGLIGPLΠCKKDSLDKEKEKΉIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNXNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYK LVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT
VHFHGHSFQYKH conesponding to amino acids 1 - 1007 of CERU_HUMAN, which also conesponds to amino acids 1 - 1007 of HSCP2 PEAJ P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLRLTGEYGM conesponding to amino acids 1008 - 1017 of HSCP2 PEAJ P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEAJ P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%o homologous to the sequence LLRLTGEYGM in HSCP2_PEA_1_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD GRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLΠCKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDT LFPATLFDAYLVRVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIΓ NYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY
PLSIEPIGVRFNKJ JNIEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDMRJMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYK conesponding to amino acids 1 - 1006 of CERUJTUMAN, which also corresponds to amino acids 1 - 1006 of HSCP2 PEA J P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSL conesponding to amino acids 1007 - 1009 of HSCP2 PEAJ P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P22, comprising a first amino acid sequence being at least 90 % homologous to MK1LILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHE conesponding to amino acids 1 - 131 of CERU_HUMAN, which also conesponds to amino acids 1 - 131 of HSCP2 PEA J P22, a second amino acid sequence bridging amino acid sequence comprising of A, and a third amino acid sequence being at least 90 % homologous to VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEΠWNΎAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAGNEADVHGIYFSGNTYLWR
GERPVDTANLFPQTSLTLHMV PDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS
TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKW
YRQYTDS JTRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE SSTVTPTLPGETLTYVWKJPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR
PYLKVFNPRRJ LEFALLFLVFDENESWYLDDNKTYSDHPEKVNKDDEEFIESNKMHAI NGRMFGNLQGLTMHVGDEVNWYLMGMGNE1DLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 262 - 1065 of CERU HU AN, which also conesponds to amino acids 133 - 936 of HSCP2_PEA_1_P22, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCP2 PEA J P22, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EAV having a structure as follows (numbering according to HSCP2 PEA _1_P22): a sequence starting from any of amino acid numbers 131-x to 131; and ending at any of amino acid numbers 133 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEAJ P24, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MPLTMGKRNLFLLTP conesponding to amino acids 1 - 15 of HSCP2_PEA_1_P24, and a second amino acid sequence being at least 90 % homologous to
VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY
YIAAEEΠWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGVP ^NKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKW YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVTQIFKNMATRPYSIHAHGVQTE SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR PYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF
DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 262 - 1065 of CERU HUMAN, which also corresponds to amino acids 16 - 819 of HSCP2 PEA 1 P24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSCP2 PEAJ P24, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MPLTMGKRNLFLLTP of HSCP2 PEA J J>24. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2_PEA_1_P25, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRJY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTPNLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTWGGSYKKLVYR 3YTDASFTNRKI5RGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVPJNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH conesponding to amino acids 1 - 621 of
CERU HUMAN, which also conesponds to amino acids 1 - 621 of HSCP2_PEAJ_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence CKYCIIHQSTKLF conesponding to amino acids 622 - 634 of HSCP2 PEAJ P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEA J P25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CKYCIIHQSTKLF in HSCP2 PEA 1 P25. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCP2 PEA _1_P33, comprising a first amino acid sequence being at least 90 % homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS
HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY
HSHIDAPKDIASGLIGPLIICKK conesponding to amino acids 1 - 202 of CERU HUMAN, which also conesponds to amino acids 1 - 202 of HSCP2 PEAJ P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC conesponding to amino acids 203 - 232 of HSCP2 PEA J P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCP2 PEA J P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC in HSCP2_PEA_1_P33. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEA J P5, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK
LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRFNIPRRACGCAAAP
DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG
VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCA ELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRA VDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA T conesponding to amino acids 1 - 1525 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1525 of HUMTEN_PEA J_P5, a second amino acid sequence being at least 70%., optionally at least 80%., preferably at least 85%., more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI conesponding to amino acids 1526 - 1617 of HUMTEN PEA J_P5, and a third amino acid sequence being at least 90 % homologous to TEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLE LRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADE GVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAI ATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTR LVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATV DSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRJFAEKGPQKSSTITAKFTTDL DSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEV1VGPDTTSYSLADLS PSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDK AQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLN KJTAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHN GRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKG HEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1526 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1618 - 2293 of
HUMTEN PEA J P5, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMTEN PEA J P5, comprising an amino acid sequence being at least 70%, optionally at least about 80%), preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI, conesponding to HUMTEN_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEAJ P6, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWeiFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL T AEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLN WTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTE conesponding to amino acids 1 - 1527 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1527 of HUMTEN PEA J P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAHIT GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN conesponding to amino acids 1528 - 1658 of HUMTEN JΕAJ P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVESfVSDAHSLHESQQFTVSGDAKQAHIT GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN in HUMTEN_PEA J_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR A VDJ GLEAATPYRVSIYGVIRG YRTPVLSAEASTAKEPEIGNLNVSDITPESFNLS WMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVT conesponding to amino acids 1 - 1617 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1617 of HUMTEN_PEA_1_P7, and a second amino acid sequence being at least 70%>, optionally at least 80%., preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT conesponding to amino acids 1618 - 1673 of HUMTEN_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT in HUMTEN_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEAJ P8, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI
LENKKSIPVSARVATYLPAPEGLKFKSIKXTSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPAT1NAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA T corresponding to amino acids 1 - 1525 of TENA HUM AN_V 1 , which also conesponds to amino acids 1 - 1525 of HUMTEN PEA 1 P8, and a second amino acid sequence being at least 90 %> homologous to
TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTKKQSEPLEITLLAPERTRD LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HRWLMGRYGDNNHSQGVNΛVTHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1617 - 2201 of TENA HUMAN V1, which also conesponds to amino acids 1526 - 2110 of HUMTEN_PEAJ_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA J_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TT, having a structure as follows: a sequence starting from any of amino acid numbers 1525-x to 1525; and ending at any of amino acid numbers 1526+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEA J P10, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKΕTSVEVEWDPLDIAFETWEIIFRjNIMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLN TAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EWVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVL conesponding to amino acids 1 - 1252 of TENAJTUMAN V1, which also corresponds to amino acids 1 - 1252 of HUMTEN_PEA_1_P10, and a second amino acid sequence being at least 90 % homologous to TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIVFLRRK GRENFYQNWKAYAAGFGDRREEFWLGLDNL NKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYH NGRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWK conesponding to amino acids 1344 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1253 - 2110 of HUMTENJPEAJ PIO, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LT, having a structure as follows: a sequence starting from any of amino acid numbers 1252-x to 1252; and ending at any of amino acid numbers 1253+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA J P13, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTE YEVS LISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW conesponding to amino acids 1 - 1343 ofTENA HUMAN Vl, which also conesponds to amino acids 1 - 1343 of HUMTEN PEAJ J 3, and a second amino acid sequence being at least 90 % homologous to
TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV KLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLPJFAEKGPQKSSTITAKFTTDLDS PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT AQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRS FSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH SIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1708 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1344 - 1837 of
HUMTEN_PEA_1_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN PEA 1 P13, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VT, having a structure as follows: a sequence starting from any of amino acid numbers 1343-x to 1343; and ending at any of amino acid numbers 1344+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEAJ P14, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTAL1TWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIV conesponding to amino acids 1 - 2025 of TENA HUMAN V1, which also conesponds to amino acids 1 - 2025 of HUMTEN PEAJ P14, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF conesponding to amino acids 2026 - 2091 of HUMTEN_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEAJ_P 14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF in HUMTEN PEA J P14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA J P15, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRPJΕTSYRQTGLAPGQEYEISLHIVJKNNTRGPGLKRVTTTRLDAPSQIEVKJDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAS conesponding to amino acids 1 - 1070 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1070 of HUMTEN_PEAJ_P1 , and a second amino acid sequence being at least 90 % homologous to
TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTKKQSEPLEITLLAPERTRD LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HRVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1617 - 2201 of TENA_HUMAN_V 1 , which also conesponds to amino acids 1071 - 1655 of HUMTEN PEAJ P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKN TRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAS conesponding to amino acids 1 - 1070 of TENA_HUMAN_V1 , which also conesponds to amino acids 1 - 1070 of HUMTEN PEAJ P16, and a second amino acid sequence being at least 90 % homologous to TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV KLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDS PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ ALEVFCDMTSDGGGWIVFLRRK GRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT AQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRS FSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH SIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1708 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1071 - 1564 of HUMTEN PEAJ J>16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEAJ_P16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEA _1_P17, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYi IK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCTNGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISD1NPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIV corresponding to amino acids 1 - 2025 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 2025 of
HUMTEN PEAJ P17, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST conesponding to amino acids 2026 - 2067 of HUMTEN PEAJ P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEAJ P17, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST in HUMTEN_PEA_1_P17.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P20, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRJTYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRJTAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKJQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLG conesponding to amino acids 1 - 2057 of TENA HUMAN V1, which also conesponds to amino acids 1 - 2057 of HUMTEN PEAJ ^O, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NAALHVYI conesponding to amino acids 2058 - 2065 of HUMTEN PEAJ P20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEAJ P20, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence NAALHVYI in HUMTEN_PEA_1_P20. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P26, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKIiTSVEVEWDPLDIAFETWEIIFRiNrMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVIOWTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVTKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA I ATT conesponding to amino acids 1 - 1708 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1708 of HUMTEN PEAJ P26, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence GTVNKQERTEKSHDSGVFFSQG conesponding to amino acids 1709 - 1730 of
HUMTEN_PEAJ_P26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEAJ P26, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTVNKQERTEKSHDSGVFFSQG in HUMTENJΕAJ JP26. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P27, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKJΞTSVEVEWDPLDIAFETWEIIFRNIV NKEDEG
EITKSLRRPETSYRQTGLAPGQEYEISLHIVK NTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EV WAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV
T conesponding to amino acids 1 - 1344 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1344 of HUMTEN PEA J P27, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%. and most preferably at least 95% homologous to a polypeptide having the sequence Gl conesponding to amino acids 1345 - 1346 of HUMTEN PEAJ P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN PEAJ P28, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIWTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EWVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLT conesponding to amino acids 1 - 1253 of TENA_HUMAN_V1, which also corresponds to amino acids 1 - 1253 of HUMTEN_PEA_1_P28, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ conesponding to amino acids 1254 - 1292 of HUMTEN_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P28, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ in HUMTEN .PEAJ J 28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P29, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAST conesponding to amino acids 1 - 1071 of TENA_HUMAN_V 1 , which also conesponds to amino acids 1 - 1071 of HUMTEN PEAJ P29, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GESALSFLQTLG conesponding to amino acids 1072 - 1083 of HUMTEN PEAJ J>29, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEAJ P29, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GESALSFLQTLG in HUMTEN_PEA_1_P29. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P30, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRΉKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP
DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRC EEGQC VCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLPJIPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTG conesponding to amino acids 1 - 954 of TENA_HUMAN_V 1 , which also conesponds to amino acids 1 - 954 of HUMTEN PEAJ P30, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELCISASLSQPALEGP conesponding to amino acids 955 - 970 of HUMTEN PEAJ P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN PEA 1 P30, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ELCISASLSQPALEGP in HUMTEN PEAJ J>30. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for FIUMTEN_PEA_1_P31 , comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKTKS1K£TSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTR conesponding to amino acids 1 - 802 of TEN A HUMAN V 1 , which also conesponds to amino acids 1 - 802 of HUMTEN_PEA_1_P31, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EYHL conesponding to amino acids 803 - 806 of HUMTEN PEA _1_P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P31 , comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EYHL in HUMTEN_PEA_1_P31. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P32, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRC WGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVAT conesponding to amino acids 1 - 710 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 710 of HUMTEN PEAJ P32, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence CE conesponding to amino acids 711 - 712 of HUMTEN PEAJ P32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO PEAJ PEA J P21 , comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ conesponding to amino acids 1 - 58 of OSTP_HUMAN, which also conesponds to amino acids 1 - 58 of HUMOSTRO PEA J PEA J P21, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence VFLNFS conesponding to amino acids 59 - 64 of HUMOSTRO_PEAJ_PEAJ_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMOSTRO PEA J_PEA_1_P21 , comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VFLNFS in HUMOSTRO PEAJ PEA 1 P21. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO PEA J PEAJ P25, comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQ conesponding to amino acids 1 - 31 of
OSTP JXUMAN, which also conesponds to amino acids 1 - 3 1 of
HUMOSTRO PEA _1_PEA_1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence H conesponding to amino acids 32 - 32 of HUMOSTRO J>EAJ PEA J P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO PEA _1_PEA_1_P30, comprising a first amino acid sequence being at least 90 % homologous to
MRJAVICFCLLGITCAIPVKQADSGSSEEKQ conesponding to amino acids 1 - 31 of
OSTP_HUMAN, which also conesponds to amino acids 1 - 31 of
HUMOSTRO_PEAJ_PEAJ_P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI conesponding to amino acids 32 - 39 of HUMOSTRO_PEAJ_PEAJ_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMOSTRO_PEA_1_PEAJ_P30, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%., more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTRO_PEA_1_PEAJ_P30. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P16, comprising a first amino acid sequence being at least 90 % homologous to
MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 1 - 93 of Q9P2J2, which also conesponds to amino acids 1 - 83 of H61775 P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV conesponding to amino acids 84 - 152 of H61775_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H61775 P16, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P16, comprising a first amino acid sequence being at least 90 % homologous to
MVWCLGLAVLSLVISQGADGRGKPEVVSWGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 - 83 of AAQ88495, which also conesponds to amino acids 1 - 83 of H61775 P16, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV conesponding to amino acids 84 - 152 of H61775 P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H61775 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%. homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to
MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 1 - 93 of Q9P2J2, which also conesponds to amino acids 1 - 83 of H61775_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL
RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 - 83 of AAQ88495, which also conesponds to amino acids 1 - 83 of H61775_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
PHSGPAAAFIRRRGWWPGPRCA conesponding to amino acids 1 - 22 of HSAPHOL _P2, second amino acid sequence being at least 90 %> homologous to
PATPRPLSWLRAPTRLCLDGPSPVLCA conesponding to amino acids 1 - 27 of AAH21289, which also conesponds to amino acids 23 - 49 of HSAPHOL P2, and a third amino acid sequence being at least 90 % homologous to
EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL
HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT
ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD
GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT
DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG
SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK
VVGGEPvENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 83 - 586 of AAH21289, which also conesponds to amino acids 50 - 553 of HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL J>2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA conesponding to amino acids 1 - 49 of HSAPHOL P2, second amino acid sequence being at least 90 % homologous to
EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHiNIRDIDVIMGGGRKYMYPKlSfKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKΗSHFIWNRTELLTLDPHNVDYLLGI EPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK Λ /GGERE TVSMVDYAHNNΥQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 21 - 524 of PPBT HUMAN, which also conesponds to amino acids 50 - 553 of HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 %> homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 63 - 82 of AAH21289, which also conesponds to amino acids 1 - 20 of HSAPHOL P3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAAPJLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNR>TNVTDPSLSEMVVVAIQILRJ ]SIPKGFFLLVEGGmDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF conesponding to amino acids 123 - 586 of AAH21289, which also conesponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 1 - 20 of PPBT HUMAN, which also conesponds to amino acids 1 - 20 of HSAPHOL_P3, and a second amino acid sequence being at least 90 % homologous to
GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF conesponding to amino acids 61 - 524 of PPBT HUMAN, which also conesponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOLJM, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTS1LRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F conesponding to amino acids 124 - 586 of AAH21289, which also conesponds to amino acids 1 - 463 of HSAPHOLJM. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL M, comprising a first amino acid sequence being at least 90 % homologous to MGVSTVTAARJLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYlKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNR NVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F conesponding to amino acids 62 - 524 of PPBTJTUMAN, which also conesponds to amino acids 1 - 463 of HSAPHOLJM. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P5, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARJLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM conesponding to amino acids 63 - 417 of AAH21289, which also conesponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 % homologous to
DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 440 - 586 of AAH21289, which also conesponds to amino acids 356 - 502 of HSAPHOL P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P5, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM conesponding to amino acids 1 - 355 of PPBT HUMAN, which also corresponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 377 - 524 of PPBT_HUMAN, which also conesponds to amino acids 356 - 502 of HSAPHOL P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P6, comprising a first amino acid sequence being at least 90 %> homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL conesponding to amino acids 63 - 349 of AAH21289, which also conesponds to amino acids 1 - 287 of HSAPHOL P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF conesponding to amino acids 395 - 586 of AAH21289, which also corresponds to amino acids 288 - 479 of HSAPHOL P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDS AGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVG1VTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL conesponding to amino acids 1 - 287 of PPBT HUMAN, which also conesponds to amino acids 1 - 287 of HSAPHOL P6, and a second amino acid sequence being at least 90 % homologous to
GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF conesponding to amino acids 333 - 524 of PPBT_HUMAN, which also conesponds to amino acids 288 - 479 of HSAPHOL J?6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 %> homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSA ATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK conesponding to amino acids 63 - 326 of AAH21289, which also conesponds to amino acids 1 - 264 of HSAPHOL P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 265 - 306 of HS APHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HS APHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR conesponding to amino acids 1 - 262 of PPBT HUMAN, which also conesponds to amino acids 1 - 262 of HSAPHOL J>7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 263 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HS APHOL P7, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence
YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 %> homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK conesponding to amino acids 1 - 264 of 075090, which also corresponds to amino acids 1 - 264 of HSAPHOL P7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 265 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL J>7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 63 - 350 of AAH21289, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%., preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOLJP8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 1 - 288 of PPBT HUMAN, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 %> homologous to
MISPFLVLAlGTCLTNSLVPEKTiKDPKY RDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT A YLCGVKANEGTVGVS AATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 1 - 288 of 075090, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOLJP8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888 PEAJ P2, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRJGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVS conesponding to amino acids 1 - 319 of CEA6_HUMAN, which also corresponds to amino acids 1 - 319 of T10888_PEA_1_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP conesponding to amino acids 320 - 324 of T10888_PEA_1_P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888 PEA 1 P2, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888_PEA_1_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888JΕAJ JP4, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNPJGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL conesponding to amino acids 1 - 234 of CEA6_HUMAN, which also conesponds to amino acids 1 - 234 of T10888 PEAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of T10888 J>EAJ_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888 PEAJ P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888_PEAJ_P4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIG YS WYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL conesponding to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of T10888 PEAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of T10888 PEAJ P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_1_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888 PEAJ P5, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGS YECEIQNPAS ANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTM1TVSG conesponding to amino acids 1 - 320 of CEA6JTUMAN, which also corresponds to amino acids 1 - 320 of T10888_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV conesponding to amino acids 321 - 390 of T10888_PEAJ_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888_PEAJ_P5, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVY conesponding to amino acids 1 - 141 of CEA6_HUMAN, which also conesponds to amino acids 1 - 141 of T10888_PEA_1_P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI conesponding to amino acids 142 - 183 of T10888_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888 PEA 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLD1 in T10888 PEA 1 P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P9, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of Q9UII7, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH J>9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P9, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of Q9UII8, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%), more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH_P9, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of CADI HUMAN, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homobgous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG conesponding to amino acids 275 - 322 of HSECADH J*"9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P13, comprising a first amino acid sequence being at least 90 %> homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GN A VEDPMEILITVTDQNDNKPEFTQEVFKGS VMEG ALPGTS VM E VTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of Q9UI17, which also conesponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADHJP13, comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of Q9UII8, which also conesponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH JM 3, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of CAD I HUMAN, which also corresponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P1 , wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE conesponding to amino acids 1 - 336 of Q9UII7, which also conesponds to amino acids 1 - 336 of HSECADH J114, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKJLLGQLSVNTVMV conesponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present mvention, there is provided an isolated polypeptide encoding for a tail of HSECADH P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH .P14. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVY A WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE conesponding to amino acids 1 - 336 of Q9UII8, which also conesponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV conesponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P14, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH .P14. Accordmg to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISWTTGLDRE conesponding to amino acids 1 - 336 of CADI JTUMAN, which also conesponds to amino acids 1 - 336 of HSECADH J>14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV conesponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSECADH P14, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH J> 14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P15, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRT A YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII7, which also conesponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH P15, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH P15, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII8, which also conesponds to amino acids 1 - 229 of HSECADH J 5, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH J 5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSECADH_P1 , comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVY A WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of CAD1 HUMAN, which also conesponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH P15, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P5, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT HUMAN, which also conesponds to amino acids 1 - 44 of T59832 P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG conesponding to amino acids 45 - 189 of T59832_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAF1PARAGLACLWDLLGRCPRG in T59832_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 12 - 223 of GILT_HUMAN, which also conesponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 213 - 238 of T59832 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P7, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 1 - 212 of BAC98466, which also conesponds to amino acids 1 - 212 of T59832 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 213 - 238 of T59832_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P7, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832 P7, and a second amino acid sequence being at least 90 %> homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVC MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYV PWVTVNGVRIFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 1 - 148 of BAC85622, which also conesponds to amino acids 91 - 238 of T59832 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 1 - 212 of Q8WU77, which also corresponds to amino acids 1 - 212 of.T59832_P7, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 213 - 238 of T59832_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P7, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 12 - 214 of GILT_HUMAN, which also conesponds to amino acids 1 - 203 of T59832_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832 P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 1 - 203 of BAC98466, which also conesponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%) homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832 P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832_P9, second amino acid sequence being at least 90 % homologous to
MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVC MEEFEDMERSLPLCLQLYAPGLSPDTPMECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 1 - 113 of BAC85622, which also conesponds to amino acids 91 - 203 of T59832 P9, and a third amino acid sequence being at least 70%o, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR corresponding to amino acids 204 - 244 of T59832JP9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%. homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P9. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832J 9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 1 - 203 of Q8WU77, which also conesponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 12 - 141 of GILT HUMAN, which also conesponds to amino acids 1 - 130 of T59832JM2, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 173 - 261 of GILT_HUMAN, which also conesponds to amino acids 131 - 219 of T59832 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832 P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P12, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV conesponding to amino acids 1 - 90 of T59832_P12, second amino acid sequence being at least 90 % homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE conesponding to amino acids 1 - 40 of BAC85622, which also conesponds to amino acids 91 - 130 of T59832 P 12, third amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 72 - 122 of BAC85622, which also conesponds to amino acids 131 - 181 of T59832J 2, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 182 - 219 of T59832J 2, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832_P12, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832 JP 12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832 P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832 P12, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK in T59832J 2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P12, comprising a first amino acid sequence being at least 90 %> homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 1 - 130 of Q8WU77, which also conesponds to amino acids 1 - 130 of T59832_P12, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 162 - 250 of Q8WU77, which also conesponds to amino acids 131 - 219 of T59832 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT_HUMAN, which also conesponds to amino acids 1 - 44 of T59832_P18, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 173 - 261 of GILT HUMAN, which also conesponds to amino acids 45 - 133 of T59832 P18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832_P18, comprising a first amino acid sequence being at least 90 %> homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 1 - 44 of Q8WU77, which also conesponds to amino acids 1 - 44 of T59832JM 8, and a second amino acid sequence being at least 90 %> homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK corresponding to amino acids 162 - 250 of Q8WU77, which also conesponds to amino acids 45 - 133 of T59832_P18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832J 8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832 P18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 1 - 44 of Q8NEI4, which also conesponds to amino acids 1 - 44 of T59832 P18, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 162 - 250 of Q8NEI4, which also corresponds to amino acids 45 - 133 of T59832J 8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832_P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMGRP5E_P4, comprising a first amino acid sequence being at least 90 %> homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRP_HUMAN, which also conesponds to amino acids 1 - 127 of HUMGRP5E_P4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ conesponding to amino acids 135 - 148 of GRP_HUMAN, which also conesponds to amino acids 128 - 141 of HUMGRP5E P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMGRP5EJM, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMGRP5E P5, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRP_HUMAN, which also conesponds to amino acids 1 - 127 of HUMGRP5E P5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%., more preferably at least 90%) and most preferably at least 95%. homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS corresponding to amino acids 128 - 142 of HUMGRP5E_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMGRP5E_P5, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR conesponding to amino acids 1 - 110 of Rl 1723_PEAJ_P6, and a second amino acid sequence being at least 90 %> homo logous to
MYAQALLWGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPPJ'VVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 1 - 112 of Q8IXM0, which also conesponds to amino acids 111 - 222 of Rl 1723 JΕAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of Rl 1723_PEAJ_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R1 1723_PEA_1_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q96AC2, which also conesponds to amino acids 1 - 83 of Rl 1723_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl 1723 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%., optionally at least about 80%, preferably at least about 85%., more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723_PEA_1_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q8N2G4, which also conesponds to amino acids 1 - 83 of Rl 1723_PEA J P6, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of Rl 1723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P6, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREG EEDHVRPEVGPRPV VLGFGRSHDPPNLVGHP AYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA_1_P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6, comprising a first amino acid sequence being at least 90 %> homologous to M WVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24 - 106 of BAC85518, which also conesponds to amino acids 1 - 83 of Rl 1723_PEAJ_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl 1723 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723 PEAJ P6. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also conesponds to amino acids 1 - 64 of Rl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 ofRl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723 PEAJ P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723_PEAJ_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%o homologous to a polypeptide having the sequence
MWVLG conesponding to amino acids 1 - 5 of Rl 1723_PEA_1_P7, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG conesponding to amino acids 22 - 80 of BAC85273, which also conesponds to amino acids 6 -
64 of Rl 1723_PEA_1_P7, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least
95% homologous to a polypeptide having the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723_PEA_1_P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MWVLG of Rl 1723_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT in Rl 1723_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV
MEQSAG conesponding to amino acids 24 - 87 of BAC85518, which also conesponds to amino acids 1 - 64 of Rl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723JΕAJ JP7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723 PEAJ P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P13, comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P13, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of
Rl 1723 PEAJ P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in Rl 1723_PEA_1_P13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10, comprising a first amino acid sequence being at least 90 %> homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723JΕAJ JM0, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEA 1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723JΕAJ J>10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also conesponds to amino acids 1 - 63 of Rl 1723 PEA _1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEAJ P10, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence - DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEA_1_P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of Rl 1723JΕAJ P10, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA corresponding to amino acids 22 - 79 of BAC85273, which also conesponds to amino acids 6 - 63 of Rl 1723_PEA_1_P10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%o homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEA _1_P10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of Rl 1723 PEA 1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWVLG of Rl 1723_PEA_1_P10. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEA_1_P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R11723_PEAJ_P10, comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ J 0, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of R l 1723_PEA_I_P 10, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R1 1723JΕAJ P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for D56406_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to
MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMT LLNVCSLVNNLNSPAEETGEVHEEELVARRj LPTALDGFSLEAMLTIYQLHKICHSRAF QHWE conesponding to amino acids 1 - 120 of NEUT HUMAN, which also conesponds to amino acids 1 - 120 of D56406 PEA 1 P2, second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence ARWLTPVIPALWEAETGGSRGQEMETIPANT conesponding to amino acids 121 - 151 of D56406 PEAJ P2, and a third amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY conesponding to amino acids 121 - 170 of NEUTJXUMAN, which also conesponds to amino acids 152 - 201 of D56406 PEAJ P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of D56406 PEA J P2, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence encoding for ARWLTPVIPALWEAETGGSRGQEMETIPANT, conesponding to D56406_PEA_1_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for D56406 PEA _1_P5, comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLC corresponding to amino acids 1 - 23 of NEUTJTUMAN, which also corresponds to amino acids 1 - 23 of D56406_PEA_1_P5, and a second amino acid sequence being at least 90 % homologous to
SEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEEL VARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKR KJPYILKRQLYENKPRRPYILKRDSYYY conesponding to amino acids 26 - 170 of NEUT_HUMAN, which also conesponds to amino acids 24 - 168 of D56406 PEAJ P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of D56406_PEA_1_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise CS, having a structure as follows: a sequence starting from any of amino acid numbers 23-x to 24; and ending at any of amino acid numbers + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for D56406 PEA 1 P6, comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK conesponding to amino acids 1 - 45 of NEUT HUMAN, which also conesponds to amino acids 1 - 45 of D56406 PEA 1 P6, and a second amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKJ5EVIKRKIPYILKRQLYENKPRRPYILKRDSYYY conesponding to amino acids 121 - 170 of NEUT HUMAN, which also conesponds to amino acids 46 - 95 of D56406 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of D56406 PEAJ J>6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KL, having a structure as follows: a sequence starting from any of amino acid numbers 45-x to 46; and ending at any of amino acid numbers 46+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H53393_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to
MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQP YFSVESETGIIKTALLNMDRENREQYQ WIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNF TIQDNK conesponding to amino acids 1 - 543 of CAD6 HUMAN, which also conesponds to amino acids 1 - 543 of H53393 PEAJ P2, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GK conesponding to amino acids 544 - 545 of H53393 PEAJ P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H53393 PEAJ P3, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEK VYILRAQAINRRTGRPVEPESEFID HDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQWIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKJOXDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVMVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHN1TVIATELNNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQ conesponding to amino acids 1 - 504 of CAD6 HUMAN, which also conesponds to amino acids 1 - 504 of H53393_PEA_1_P3, and a second amino acid sequence being at least 70%., optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RFGFSLS corresponding to amino acids 505 - 511 of H53393 PEAJ P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H53393_PEA_1_P3, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%. homologous to the sequence RFGFSLS in H53393_PEA_1_P3. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H53393 PEA _1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MRTYRYFLLIJWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQWIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKK conesponding to amino acids 1 - 333 of CAD6 HUMAN, which also conesponds to amino acids 1 - 333 of H53393 JΕAJJP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VMPLLKHHTE conesponding to amino acids 334 - 343 of H53393_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of H53393_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VMPLLKHHTE in H53393 PEA 1 P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU40434 PEAJ JM2, comprising a first amino acid sequence being at least 90 % homologous to
MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISS LSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDALP LDLLLFLNPDAFSGPQACTRFFS ITKANVDLLPRGAPERQRLLPAALACWGVRGSLLS EADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGP PSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRRE VEKT ACPSGKKAREIDESLIF YKKWELEAC VD AALLATQMDRVN Al PFT Y EQLD VLKH KLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMSPQVA TLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW corresponding to amino acids 1 - 458 of Q14859, which also conesponds to amino acids 1 - 458 of HSU40434_PEAJ_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU40434 PEA 1 P12, comprising a first amino acid sequence being at least 90 % homologous to MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ conesponding to amino acids 1 - 43 of Q9BTR2, which also conesponds to amino acids 1 - 43 of
HSU40434_PEA_1_P12, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%. homologous to a polypeptide having the sequence E conesponding to amino acids 44 - 44 of HSU40434 PEA 1 P12, and a third amino acid sequence being at least 90 % homologous to AAPLDGVLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKΪWKLSTEQLRC LAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLL PAALACWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQE AARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPS WRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRV NAIPFTYEQLDVLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKAL LEVNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW corresponding to amino acids 44 - 457 of Q9BTR2, which also corresponds to amino acids 45 - 458 of HSU40434JΕAJ JP12, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSU40434JΕAJ JM2, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%. homologous to the sequence encoding for E, conesponding to HSU40434_PEA_1_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P2, comprising a first amino acid sequence being at least 90 %> homologous to
MLSIKSGERJVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES corresponding to amino acids 67 - 341 of Q8WU91, which also corresponds to amino acids 1 - 275 of M77904_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGWCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLΠCCVKKKJ KKTNKGP AVGIYNGNINTEMPRQPKKFQKGPJ DNDSHVYAVIEDTM GHLLQDSSGSFLQPEVD
TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE conesponding to amino acids 276 - 770 of M77904 P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904 P2, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKΗSCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAA VGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE in M77904 P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904 P2, comprising a first amino acid sequence being at least 90 % homologous to
MLSIKSGER1VFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRJKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTISCTDHRYCQR KSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQKLQQHTHEKPCNTSFSYLV ASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAPSFQQEASRQGLTVSFIPYFKEEGV FTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAF MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKKXKKTNKGPAVGIYNGNP TEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 67 - 836 of Q96QU7, which also conesponds to amino acids 1 - 770 of M77904 P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904 P4, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES conesponding to amino acids 1 - 341 of Q8WU91, which also conesponds to amino acids 1 - 341 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NKIYWDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVJ ALWEAKAGGSLEVRSSRPAWPTW conesponding to amino acids 342 - 487 of M77904 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
NKIYWDLSNERAMSLTIEPRPVKQSRKFVPGCFVC LESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVIPALWEAKAGGSLEVRSSRPAWPTW in M77904_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904 P4, comprising a first amino acid sequence being at least 90 % homologous to MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKJ IDCMSGPCPFGEVQLQPSTSLLPTLNR
TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS corresponding to amino acids 1 - 416 of Q9FI5V8, which also corresponds to amino acids 1 - 416 of M77904_P4, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW conesponding to amino acids 417 - 487 of M77904_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P4, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904 JM, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERJVFTFSCQSPENHFVIEIQK IDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFWPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKJSFLCDDLTRLWMNVEKTIS conesponding to amino acids 1 - 416 of Q96QU7, which also conesponds to amino acids 1 - 416 of M77904 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW conesponding to amino acids 417 - 487 of M77904JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904JM. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 % homologous to
MIIQEQRTRAEE1FSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKJΗ JKKXTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 606 - 836 of Q96QU7, which also conesponds to amino acids 1 - 231 of M77904_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 %> homologous to
MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLπCC\T XKKKKUSIKGPAVGIYNGN TEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 419 - 649 of Q9H8C2, which also conesponds to amino acids 1 - 231 of M77904_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFV1EIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR conesponding to amino acids 1 - 219 of Q8WU91 , which also corresponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF conesponding to amino acids 220 - 238 of M77904 J*7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 %> homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR conesponding to amino acids 1 - 219 of Q9H5V8, which also conesponds to amino acids 1 - 219 of M77904 P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF conesponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 %> homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q96QU7, which also conesponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF conesponding to amino acids 220 - 238 of M77904 P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904JP7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299 PEA 2 P2, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK 1 HUMAN, which also conesponds to amino acids 1 - 1 1 of Z25299_PEA_2_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKQGMRAH conesponding to amino acids 132 - 139 of Z25299 JPEA_2_P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299_PEA_2_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GKQGMRAH in Z25299_PEA_2_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299 PEA 2 P3, comprising a first amino acid sequence being at least 90 %> homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK1_HUMAN, which also conesponds to amino acids 1 - 131 of Z25299_PEA_2_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKRHHKQLRDQEVDPLEMRRHSAG conesponding to amino acids 132 - 156 of
Z25299_PEA_2_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299 PEA 2 P3, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG in Z25299_PEA_2_P3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299_PEA_2_P7, comprising a first amino acid sequence being at least 90 % homologous to
MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNP conesponding to amino acids 1 - 81 of ALK 1 HUMAN, which also conesponds to amino acids 1 - 81 of Z25299 PEA 2 P7, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence RGSLGSAQ conesponding to amino acids 82 - 89 of Z25299_PEA_2_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z25299_PEA_2_P7, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RGSLGSAQ in Z25299J»EA_2JP7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25299 PEA 2 P10, comprising a first amino acid sequence being at least 90 % homologous to
MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPT conesponding to amino acids 1 - 82 of ALK 1 HUMAN, which also conesponds to amino acids 1 - 82 of Z25299_PEA_2_P10. According to prefened embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally the amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion as described herein. Optionally the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein. According to prefened embodiments of the present invention, there is provided a kit for detecting ovarian cancer, comprising a kit detecting overexpression of a splice variant as described herein. Optionally the kit comprises a NAT-based technology. Optionally the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit comprises an antibody as described herein. Optionally the kit further comprises at least one reagent for performing an ELISA or a Western blot. According to prefened embodiments of the present invention, there is provided a method for detecting ovarian cancer, comprising detecting overexpression of a splice variant as described herein. Optionally detecting overexpression is performed with a NAT-based technology. Optionally detecting overexpression is performed with an immunoassay. Optionally the immunoassay comprises an antibody as described herein. According to prefened embodiments of the present invention, there is provided a biomarker capable of detecting ovarian cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof. According to prefened embodiments of the present invention, there is provided a method for screening for ovarian cancer, comprising detecting ovarian cane er cells with a biomarker or an antibody or a method or assay as described herein. According to prefened embodiments of the present invention, there is provided a method for diagnosing ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein. According to prefened embodiments of the present invention, there is provided a method for monitoring disease progression and/or treatment efficacy and/or relapse of ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein. According to prefened embodiments of the present invention, there is provided a method of selecting a therapy for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay as described herein and selecting a therapy according to said detection. According to prefened embodiments of the present invention, preferably any of the above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%, more preferably at least about 90%, most preferably at least about 95% homology thereto. Unless otherwise noted, all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described). All nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably.
BRIEF DESCRIPTION OF DRAWINGS Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages. Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions. Figure 3 is schematic summary of quantitative real-time PCR analysis. Figure 4 is schematic presentation of the oligonucleotide based microanay fabrication. Figure 5 is schematic summary of the oligonucleotide based microanay experimental flow. Figure 6 shows cancer and cell- line vs. normal tissue expression for . Figure 7 shows expression of segmentδ in H61775 in cancerous vs. non-cancerous tissues. Figure 8 shows expression of segmentδ in H61775 in normal tissues. Figure 9 shows cancer and cell- line vs. normal tissue expression. Figure 10 is a histogram showing over expression of T juncl 1-17 transcripts in cancerous ovary samples relative to the normal samples. Figure 11 is a histogram showing expression of T juncl 1- 17 transcripts in normal tissues. Figure 12 shows cancer and cell- line vs. normal tissue expression. Figure 13 is a histogram showing over expression of HUMGRP5Ejunc3-7 transcripts in cancerous ovary samples relative to the normal samples. Figure 14 is a histogram showing expression of HUMGRP5Ejunc3-7 transcripts in normal tissues. Figure 15 shows cancer and cell- line vs. normal tissue expression. Figure 16 is a histogram showing over expression of Rl 1723 segl3 transcripts in cancerous ovary samples relative to the normal PM samples. Figure 17 is a histogram showing expression of Rl 1723 segl transcripts in normal tissue samples. Figure 18 is a histogram showing over expression of RI 1723 juncl 1- 18 transcripts in cancerous ovary samples relative to the normal samples. Figure 19 is a histogram showing expression of Rl 1723 juncl 1-18 transcripts in normal tissue samples. Figure 20 shows cancer and cell- line vs. normal tissue expression. Figure 21 is a histogram showing over expression of H53393 segl3 transcripts in cancerous ovary samples relative to the normal samples. Figure 22 is a histogram showing over expression of H53393 junc21-22 transcripts in cancerous ovary samples relative to the normal samples. Figure 23 shows cancer and cell- line vs. normal tissue expression. Figure 24 shows cancer and cell- line vs. normal tissue expression. Figure 25 shows cancer and cell- line vs. normal tissue expression. Figure 26 is a histogram showing over expression of Z25299 juncl 3- 14-21 transcripts in cancerous ovary samples relative to the normal samples. Figures 27A and 27B are histograms showing over expression of Z25299 seg20 transcripts in cancerous ovary samples relative to the normal samples (27A) or in normal tissues (27B). Figures 28A and 28B are histograms showing over expression of Z25299 seg23 transcripts in cancerous ovary samples relative to the normal samples (28A) or in normal tissues (28B). Figure 29 shows cancer and cell- line vs. normal tissue expression. Figure 30 is a histogram showing down regulation of T39971 junc23-33R transcripts in cancerous ovary samples relative to the normal samples. Figure 31 is a histogram showing expression of T39971 junc23-33R transcripts in normal tissues. Figure 32 shows cancer and cell- line vs. normal tissue expression. Figures 33A and 33B are histograms showing down regulation of Z44808 junc8- l 1 transcripts in cancerous ovary samples relative to the normal samples (33A) or expression in normal tissues (33 B). Figure 34 shows cancer and cell- line vs. normal tissue expression. Figure 35 shows cancer and cell- line vs. normal tissue expression. Figure 36 shows cancer and cell- line vs. normal tissue expression. Figure 37 shows cancer and cell- line vs. normal tissue expression. Figure 38 shows cancer and cell- line vs. normal tissue expression. Figure 39 shows cancer and cell- line vs. normal tissue expression. Figure 40 shows cancer and cell- line vs. normal tissue expression. Figure 41 shows cancer and cell- line vs. normal tissue expression. Figure 42 shows cancer and cell- line vs. normal tissue expression. Figure 43 is a histogram showing differential expression of a variety of transcripts in cancerous ovary samples relative to the normal samples. Figure 44 shows cancer and cell- line vs. normal tissue expression.
DESCRIPTION OF PREFERRED EMBODIMENTS The present invention is of novel markers for ovarian cancer that are both sensitive and accurate. Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. Furthermore, at least certain of these markers are able to distinguish between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells), alone or in combination. These markers are differentially expressed, and preferably overexpressed in ovarian cancer specifically, as opposed to nonnal ovarian tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of ovarian cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between ovarian cancer and non- cancerous states. The markers of the present invention, alone or in combination, can be used for prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment monitoring of ovarian cancer. For example, optionally and preferably, these markers may be used for staging ovarian cancer and/or monitoring the progression of the disease. Furthermore, the markers of the present invention, alone or in combination, can be used for detection of the source of metastasis found in anatomical places other thenovary. Also, one or more of the markers may optionally be used in combination with one or more other ovarian cancer markers (other than those described herein). According to an optional embodiment of the present invention, such a combination may be used to differentiate between various types of ovarian cancer, such as Ovarian epithelial tumors (serous, mucinous, endometroid, clear cell, and Brenner tumor), ovarian germ-cell tumors, (teratoma, dysgerminoma, endodermal sinus tumor, and embryonal carcinoma) and ovarian stromal tumors (originating from either granulosa, theca, Sertoli, Leydig, and collagen-producing stromal cells). These markers are specifically released to the bloodstream under conditions of ovarian cancer (or one of the above indicative conditions), and/or are otherwise expressed at a much higher level and/or specifically expressed in ovarian cancer tissue or cells, and/or tissue or cells under one of the above indicative conditions. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of ovarian cancer and/or a condition that it is indicative of a higher risk for ovarian cancer. The present invention therefore also relates to diagnostic assays for ovarian cancer, and methods of use of such markers for detection of ovarian cancer, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample. In another embodiment, the present invention relates to bridges, tails, heads and/or insertions, and/or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and/or insertions are described in greater detail below with regard to the Examples. As used herein a "tail" refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the conesponding known protein, while at least a second portion of the variant comprises the tail. As used herein a "head" refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the conesponding known protein. As used herein "an edge portion" refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein. An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein. A "bridge" may optionally be an edge portion as described above, but may also include a join between a head and a "known protein" portion of a variant, or a join between a tail and a "known protein" portion of a variant, or a join between an insertion and a "known protein" portion of a variant. Optionally and preferably, a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant, comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant. Also optionally, the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between). It should be noted that a bridge cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below. For example, certain descriptions of the bridges feature the following format: a bridge between two edges (in which a portion of the known protein is not present in the variant) may optionally be described as follows: a bridge portion of CONTIG-NAME_Pl (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAME P1): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2. In this example, it should also be read as including bridges in which n is any number of amino acids between 10-50 amino acids in length. Furthermore, the bridge polypeptide cannot extend beyond the sequence, so it should be read such that 49-x (for example) is not less than 1, nor 50 + ((n-2) - x) (for example) greater than the total sequence length. In another embodiment, this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a conesponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below). In another embodiment, this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this invention. In another embodiment, this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention. In another embodiment, this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known conesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction conelates with the presence of a splice variant in the biological sample. In another embodiment, this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex correlates with the presence of a splice variant nucleic acid sequence in the biological sample. According to the present invention, the splice variants described herein are non- limiting examples of markers for diagnosing ovarian cancer. Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of ovarian cancer. According to optional but prefened embodiments of the present invention, any marker according to the present invention may optionally be used alone or combination. Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker. Furthermore, such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi- quantitative measurement of any marker described herein to any other marker described herein, and or any other known marker, and/or any other marker. With regard to such a ratio between any marker described herein (or a combination thereof) and a known marker, more preferably the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene. According to other prefened embodiments of the present invention, a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof, may be featured as a biomarker for detecting ovarian cancer and/or an indicative condition, such that a biomarker may optionally comprise any of the above. According to still other prefened embodiments, the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence conesponding to a splice variant protein as described herein Any oligopeptide or peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges. The present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides. The present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, conesponding to a splice variant of the present invention as described above, optionally for any application. Non-limiting examples of methods or assays are described below. The present invention also relates to kits based upon such diagnostic methods or assays.
Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or artificially induced, either randomly or in a targeted fashion. The present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %>, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 %> identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or man induced, either randomly or in a targeted fashion. The present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention. In cases where the polynucleotide sequences of the present invention encode previously unidentified polypeptides, the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove. A "nucleic acid fragment" or an "oligonucleotide" or a "polynucleotide" are used herein interchangeably to refer to a polymer of nucleic acids. A polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above). As used herein the phrase "complementary polynucleotide sequence" refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase. As used herein the phrase "genomic polynucleotide sequence" refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome. As used herein the phrase "composite polynucleotide sequence" refers to a sequence, which is composed of genomic and cDNA sequences. A composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween. The intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements. Prefened embodiments of the present invention encompass oligonucleotide probes. An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Alternatively, an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Oligonucleotides designed according to the teachings of the present invention can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis. Equipment and reagents for executing solid-phase synthesis are commercially available from, for example, Applied Biosystems. Any other means for such synthesis may also be employed; the actual synthesis of the oligonucleotides is well within the capabilities of one skilled in the art and can be accomplished via established methodologies as detailed in, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Cunent Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Cunent Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988) and "Oligonucleotide Synthesis" Gait, M. J., ed. (1984) utilizing solid phase chemistry, e.g. cyanoethyl phosphoramidite followed by deprotection, desalting and purification by for example, an automated trityl-on method or HPLC. Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases. Preferably, the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention. The oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodi ester linkage. Preferably used oligonucleotides are those modified at one or more of the backbone, intemucleoside linkages or bases, as is broadly described hereinunder. Specific examples of preferred oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural intemucleoside linkages. Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat. NOs: 4,469,863; 4,476,301 5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466, 677; 5,476,925; 5,519,126; 5,536,821 5,541,306; 5,550,1 11; 5,563,253; 5,571,799; 5,587,361; and 5,625,050. Prefened modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms can also be used. Alternatively, modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl intemucleoside linkages, mixed heteroatom and alkyl or cycloalkyl intemucleoside linkages, or one or more short chain heteroatomic or heterocyclic intemucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CFfc component parts, as disclosed in U.S. Pat. Nos. 5,034,506; 5,166,315; 5, 185,444; 5,214,134; 5,216,141 ; 5,235,033; 5,264,562; 5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541 ,307; 5,561 ,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 5,623, 070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. Other oligonucleotides which can be used according to the present invention, are those modified in both sugar and the intemucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target. An example for such an oligonucleotide mimetic, includes peptide nucleic acid (PNA). United States patents that teach the preparation of PNA compounds include, but are riot limited to, U.S. Pat. Nos. 5,539,082; 5,714,331 ; and 5,719,262, each of which is herein incorporated by reference. Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat. No: 6,303,374. Oligonucleotides of the present invention may also include base modifications or substitutions. As used herein, "unmodified" or "natural" bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5- halo particularly 5-bromo, 5- trifluoromethyl and other 5- substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8- azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefened base substitutions, even more particularly when combined with 2'-0-methoxyethyl sugar modifications. Another modification of the oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. Such moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac- glycerol or triethylammonium 1 ,2-di-O-hexadecyl-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, as disclosed in U.S. Pat. No: 6,303,374. It is not necessary for all positions in a given oligonucleotide molecule to be uniformly modified, and in fact more than one of the aforement ioned modifications may be incoφorated in a single compound or even at a single nucleoside within an oligonucleotide. It will be appreciated that oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and/or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity. To enable cellular expression of the polynucleotides of the present invention, a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element. As used herein, the phrase "cis acting regulatory element" refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention. Preferably, the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transformed. Examples of cell type-specific and/or tissue- specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron- specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas- specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). The nucleic acid constmct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom. The nucleic acid constmct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication. Preferably, the nucleic acid constmct utilized is a shuttle vector, which can propagate both in E. coli (wherein the constmct comprises an appropriate selectable marker and origin of replication) and be compatible for propagation in cells, or integration in a gene and a tissue of choice. The constmct according to the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a vims or an artificial chromosome. Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5'LTR promoter. Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non-viral constmcts, such as adenovims, lentivims, Heφes simplex I vims, or adeno- associated vims (AAV) and lipid-based systems. Useful lipids for lipid -mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Choi [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are vimses, most preferably adenovimses, AAV, lentiviruses, or retrovimses. A viral constmct such as a retroviral constmct includes at least one transcφtional promoter/enhancer or locus -defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post- translational modification of messenger. Such vector constmcts also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the vims used, unless it is aheady present in the viral construct. In addition, such a constmct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constmcts will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
Hybridization assays Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non- limiting examples of probes according to the present invention were previously described). Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in- situ hybridization, primer extension, Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Cunent Opin. Biotechnol. 10:71-75). Other detection methods include kits containing probes on a dipstick setup and the like. Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biobgical sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long. Thus, the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions. Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x lθ6 cpm 32P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x IO6 cpm 32P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C. More generally, hybridization of short nucleic acids (below 200 bp in length, e.g. 17-40 bp in length) can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency; (i) hybridization solution of 6 x SSC and 1 %> SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 1 - 1.5 °C below the Tm final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm; (ii) hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 2 - 2.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm, final wash solution of 6 x SSC, and final wash at 22 °C; (iii) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 %> SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 %> nonfat dried milk, hybridization temperature. The detection of hybrid duplexes can be carried out by a number of methods. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Such labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art. A label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample. Probes can be labeled according to numerous well known methods. Non- limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Non-limiting examples of detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies. Other detectable markers for use with probes, which can enable an increase in sensitivity of the method of the invention, include biotin and radio -nucleotides. It will become evident to the person of ordinary skill that the choice of a particular label dictates the manner in which it is bound to the probe. For example, oligonucleotides of the present invention can be labeled subsequent to synthesis, by incoφorating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross- linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin- conjugated streptavidin) or the equivalent. Alternatively, when fluorescently- labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. For instance, samples may be hybridized to an inelevait probe and treated with RNAse A prior to hybridization, to assess false hybridization. Although the present invention is not specifically dependent on the use of a label for the detection of a particular nucleic acid sequence, such a label might be beneficial, by increasing the sensitivity of the detection. Furthermore, it enables automation. Probes can be labeled according to numerous well known methods. As commonly known, radioactive nucleotides can be incoφorated into probes of the invention by several methods. Non- limiting examples of radioactive labels include 3H, 14C, 3 P, and 35S. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a- nucleotides and the like. Probes of the invention can be constmcted of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA. NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example). As used herein, a "primer" defines an oligonucleotide which is capable of annealing to
(hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions. Amplification of a selected, or target, nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8: 14 Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non- limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1 177; Lizardi et al., 1988, BioTechnology 6:1 197- 1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra). The terminology "amplification pair" (or "primer pair") refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction. Other types of amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below. As commonly known in the art, the oligos are designed to bind to a complementary sequence under selected conditions. In one particular embodiment, amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid. In one prefened embodiment, RT-PCR is carried out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA. In another prefened embodiment, the amplification of the differentially expressed nucleic acids is carried out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences. The nucleic acid (i.e. DNA or RNA) for practicing the present invention may be obtained according to well known methods. Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed. Optionally, the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system. As commonly known in the art, the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Cunent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre- mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility. Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity. The polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non- limiting examples of these reactions are described in greater detail below). The pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C. Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Multis et al, is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This technology provides one approach to the problems of low target sequence concentration. PCR can be used to directly increase the concentration of the target to an easily detectable level. This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double- stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter. Because the desired segments of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified." Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refened to as "Ligase Amplification Reaction" (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids. In LCR, four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture. Provided that there is complete complementarity at the junction, ligase will covalently link each set of hybridized molecules. Importantly, in LCR, two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA. LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990). However, because the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target- independent background signal. The use of LCR for mutant screening is limited to the examination of specific nucleic acid positions. Self-Sustained Synthetic Reaction (3SR/NASBA): The self- sustained sequence replication reaction (3SR) is a transcription- based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest. In a cocktail of enzymes and substrates that includes a second primer, reverse transcriptase, RNase H, RNA polymerase and ribo-and deoxyribonucleoside triphosphates, the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest. The use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs). Q-Beta (Qβ) Replicase: In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Qβ replicase. A previously identified major problem with false positives resulting from the replication of unhybridized probes has been addressed through use of a sequence-specific ligation step. However, available thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be perfonned by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere. A successful diagnostic method must be very specific. A straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction. While the 3SR NASBA, and Qβ systems are all able to generate a large quantity of signal, one or more of the enzymes involved in each cannot be used at high temperature (i.e., > 55 degrees C). Therefore the reaction temperatures cannot be raised to prevent non-specific hybridization of the probes. If probes are shortened in order to make them melt more easily at low temperatures, the likelihood of having more than one perfect match in a complex genome increases. For these reasons, PCR and LCR cunently dominate the research field in detection technologies. The basis of the amplification procedure in the PCR and LCR is the fact that the products of one cycle become usable templates in all subsequent cycles, consequently doubling the population with each cycle. The final yield of any such doubling system can be expressed as:
(1+X)n =y, where "X" is the mean efficiency (percent copied in each cycle), "n" is the number of cycles, and "y" is the overall efficiency, or yield of the reaction. If every copy of a target DNA is utilized as a template in every cycle of a polymerase chain reaction, then the mean efficiency is 100 %. If 20 cycles of PCR are performed, then the yield will be 220; 0r 1,048,576 copies of the starting material. If the reaction conditions reduce the mean efficiency to 85 %, then the yield in those 20 cycles will be only 1.85^0, or 220,513 copies of the starting material. In other words, a PCR running at 85 % efficiency will yield only 21 % as much final product, compared to a reaction running at 100 % efficiency. A reaction that is reduced to 50 % mean efficiency will yield less than 1 % of the possible product. In practice, routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield. At 50 % mean efficiency, it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive. In addition, any background products that amplify with a better mean efficiency than the intended target will become the dominant products. Also, many variables can influence the mean efficiency of PCR, including target DNA length and secondary stmcture, primer length and design, primer and dNTP concentrations, and buffer composition, to name but a few. Contamination of the reaction with exogenous DNA (e.g., DNA spilled onto lab surfaces) or cross-contamination is also a major consideration. Reaction conditions must be carefully optimized for each different primer pair and target sequence, and the process can take days, even for an experienced investigator. The laboriousness of this process, including numerous technical considerations and other factors, presents a significant drawback to using PCR in the clinical setting. Indeed, PCR has yet to penetrate the clinical market in a significant way. The same concerns arise with LCR, as LCR must also be optimized to use different oligonucleotide sequences for each target sequence. In addition, both methods require expensive equipment, capable of precise temperature cycling. Many applications of nucleic acid detection technologies, such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences. One method of the detection of allele -specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer. An allele- specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence. This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect. A similar 3'-mismatch strategy is used with greater effect to prevent ligation in the LCR.
Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target- independent background ligation products initiating the amplification.
Moreover, the combination of PCR with subsequent LCR to identify the nucleotides at individual positions is also a clearly cumbersome proposition for the clinical laboratory. The direct detection method according to various preferred embodiments of the present invention may be, for example a cycling probe reaction (CPR) or a branched DNA analysis. When a sufficient amount of a nucleic acid to be detected is available, there are advantages to detecting that sequence directly, instead of making more copies of that target, (e.g., as in PCR and LCR). Most notably, a method that does not amplify the signal exponentially is more amenable to quantitative analysis. Even if the signal is enhanced by attaching multiple dyes to a single oligonucleotide, he conelation between the final signal intensity and amount of target is direct. Such a system has an additional advantage that the products of the reaction will not themselves promote further reaction, so contamination of lab surfaces by the products is not as much of a concern. Recently devised techniques have sought to eliminate the use of radioactivity and/or improve the sensitivity in automatable formats. Two examples are the "Cycling Probe Reaction" (CPR), and "Branched DNA" (bDNA). Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses a long chimeric oligonucleotide in which a central portion is made of RNA while the two termini are made of DNA. Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process. The signal, in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the oligonucleotide is vulnerable to RNases that may carried through sample preparation. Branched DNA: Branched DNA (bDNA), involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased. The detection of at least one sequence change according to various prefened embodiments of the present invention may be accomplished by, for example restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymoφhism (SSCP) analysis or Dideoxy fingeφrinting (ddF). The demand for tests which allow the detection of specific nucleic acid sequences and sequence changes is growing rapidly in clinical diagnostics. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing. A handful of methods have been devised to scan nucleic acid segments for mutations. One option is to detennine the entire gene sequence of each test sample (e.g., a bacterial isolate). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor- intense and expensive to be practical and effective in the clinical setting. In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard n on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain- terminating nucleotide analogs. Restriction fragment length polymorphism (RFLP): For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymoφhism [RFLP] analysis). Single point mutations have been also detected by the creation or destruction of RFLPs.
Mutations are detected and bcalized by the presence and size of the RNA fragments generated by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals which are not suited for use in a clinical laboratory. RFLP analysis suffers from low sensitivity and requires a large amount of sample. When RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered. Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity, but again, these are few in number. Allele specific oligonucleotide (ASO): If the change is not in a recognition sequence, then allele -specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis- match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations. The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles. The ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test. That is to say, they are inapplicable when one needs to detect the presence of a mutation within a gene or sequence of interest. Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis" (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences because of the conesponding changes in their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of GC base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the action of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature. Modifications of the technique have been developed, using temperature gradients, and the method can be also applied to RNA:RNA duplexes. Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires specialized equipment b prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE). CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of mutations. A technique analogous to DGGE, termed temperature gradient gel electrophoresis (TGGE), uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperature gradient peφendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to mnning the gel. Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single- Strand Conformation Polymoφhism" (SSCP) was developed by Hayashi, Sekya and colleagues and is based on the observation that single strands of nucleic acid <an take on characteristic conformations in non- denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations. The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non- denaturing polyacrylamide gel, so that intra- molecular interactions can form and not be disturbed during the mn. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fingeφrinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened. According to a presently prefened embodiment of the present invention the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable technique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self- sustained synthetic reaction, Qβ-Replicase, cycling probe reaction, branched DNA, restriction fragment length polymoφhism analysis, mismatch chemical cleavage, heteroduplex analysis, allele- specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy fingeφrinting. Detection may also optionally be performed with a chip or other such device. The nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group. This reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already incoφorated into the nucleic acid, which is now bound to the probes attached to the chip. Since the sequence and position of each probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given probe can be determined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
Amino acid sequences and peptides The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a conesponding naturally occurring amino acid, as well as to naturally occuning amino acid polymers. Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins. The terms "polypeptide," "peptide" and "protein" include glycoproteins, as well as non-glycoproteins. Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Solid phase polypeptide synthesis procedures are well known in the art and further described by John Monow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516- 544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511- 514, Takamatsu et al. (1987) EMBO J. 6:307-311, Comzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463. The present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein. The present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %>, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, optionally and preferably including the following: filtering on (this option filters repetitive or low-complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E value is 10, gap costs are 1 1 , 1 (initialization and extension), and number of alignments shown is 50. Nucleotide (nucleic acid) sequence homology/identity is preferably determined by using the BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and also preferably include having an E value of 10, filtering low complexity sequences and a word size of 1 1. Finally, the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as deletions, insertions or substitutions of one or more amino acids, either naturally occuning or artificially induced, either randomly or in a targeted fashion. It will be appreciated that peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which may have, for example, modifications rendering the peptides more Sable while in a body or more capable of penetrating into cells. Such modifications include, but are not limited to N terminus modification, C terminus modification, peptide bond modification, including, but not limited to, CH2-NH, CH2-S, CH2-S=0, 0=C-NH, CH2-0, CH2-CH2, S=C-NH, CH=CH or CF=CH, backbone modifications, and residue modification. Methods for preparing peptidomimetic compounds are well known in the art and are specified. Further details in this respect are provided hereinunder. Peptide bonds (-CO-NH-) within the peptide may be substituted, for example, by N- methylated bonds (-N(CH3)-CO-), ester bonds (-C(R)H-C-0-0-C(R)-N-), ketomethylen bonds (-CO-CH2-), α-aza bonds (-NH-N(R)-CO-), wherein R is any alkyl, e.g, methyl, carba bonds (- CH2-NH-), hydroxyethylene bonds (-CH(OH)-CH2-), thioamide bonds (-CS-NH-), olefinic double bonds (-CH=CH-), retro amide bonds (-NH-CO-), peptide derivatives (-N(R)-CH2-CO-), wherein R is the "normal" side chain, naturally presented on the carbon atom. These modifications can occur at any of the bonds along the peptide chain and even at several (2-3) at the same time. Natural aromatic amino acids, Tφ, Tyr and Phe, may be substituted for synthetic non- natural acid such as Phenylglycine, TIC, naphthylelanine (Nol), ring- methylated derivatives of Phe, halogenated derivatives of Phe or o-methyl- Tyr. In addition to the above, the peptides of the present invention may also include one or more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc). As used herein in the specification and in the claims section below the term "amino acid" or "amino acids" is understood to include the 20 naturally occurring amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and ornithine. Furthermore, the term "amino acid" includes both D- and L amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention.
Table 1
Table 1 Cont.
Since the peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble form, the peptides of the present invention preferably include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxyl-containing side chain. The peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide characteristics, cyclic forms of the peptide can also be utilized. The peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques. These methods include exclusive solid phase synthesis well known in the art, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Synthetic peptides can be purified by preparative high performance liquid chromatography and the composition of which can be confirmed via amino acid sequencing. In cases where large amounts of the peptides of the present invention are desired, the peptides of the present invention can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514, Takamatsu et al. (1987) EMBO J. 6:307-311, Comzzi et al. (1984) EMBO J. 3:1671- 1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421 -463 and also as described above.
Antibodies "Antibody" refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen). The recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad- immunoglobulin variable region genes. Antibodies exist, e.g., as intact immunoglobuhns or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab' and F(ab)'2 fragments. The term "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain 005/116850
221 constant region domains, CHI, CH2 and CH3, but does not include the heavy chain variable region. The functional fragments of antibodies, such as Fab, F(ab')2, and Fv that are capable of binding to macrophages, are described as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds; (4) Fv, defined as a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (5) Single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain and the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of producing polyclonal and monoclonal antibodies as well as fragments thereof are well known in the art (See for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988, incoφorated herein by reference). Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly. These methods are described, for example, by Goldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and references contained therein, which patents are hereby incoφorated by reference in their entirety. See also Porter, R. R. [Biochem. J. 73: 1 19- 126 ( 1959)]. Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody. Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in hbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variable chains can be linked by an intermolecular disulfide bond or cross- linked by chemicals such as glutaraldehyde. Preferably, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (sFv) are prepared by constmcting a stmctural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The stmctural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing sFvs are described, for example, by [Whitlow and Filpula, Methods 2: 97- 105 (1991); Bird et al., Science 242:423-426 (1988); Pack et al., Bio/Technology 11:1271-77 (1993); and U.S. Pat. No. 4,946,778, which is hereby incoφorated by reference in its entirety. Another form of an antibody fragment is a peptide coding for a single complementarity- determining region (CDR). CDR peptides ("minimal recognition units") can be obtained by constmcting genes encoding the CDR of an antibody of interest. Such genes are prepared, for example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Larrick and Fry [Methods, 2: 106-10 (1991)]. Humanized forms of non-human (e.g., murine) antibodies are chimeric molecules of immunoglobuhns, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies include human immunoglobuhns (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non- human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by conesponding non-human residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions correspond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al., Nature, 321 :522-525 (1986); Riechmann et al., Nature, 332:323- 329 (1988); and Presta, Cun. Op. Stmct. Biol., 2:593-596 (1992)]. Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized antibody has one or more amino acid residues introduced into it from a source which is non- human. These non- human amino acid residues are often refened to as import residues, which are typically taken from an import variable domain. Humanization can be essentially performed following the method of Winter and co-workers [Jones et al., Nature, 321 :522-525 (1986); Riechmann et al., Nature 332:323-327 (1988); Verhoeyen et al., Science, 239:1534- 1536 (1988)], by substituting rodent CDRs or CDR sequences for the conesponding sequences of a human antibody. Accordingly, such humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the conesponding sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies. Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and Boemer et al. are also available for the preparation of human monoclonal antibodies (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boemer et al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g., mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene reanangement, assembly, and antibody repertoire. This approach is described, for example, in U.S. Pat. Nos. 5,545,807; 5,545,806; 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: Marks et al., Bio/Technology 10,: 779- 783 ( 1992); Lonberg et al., Nature 368: 856-859 (1994); Morrison, Nature 368 812- 13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 (1996); Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar, Intern. Rev. Immunol. 13, 65-93 (1995). Preferably, the antibody of this aspect of the present invention specifically binds at least one epitope of the polypeptide variants of the present invention. As used herein, the term "epitope" refers to any antigenic determinant on an antigen to which the paratope of an antibody binds. Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three dimensional stmctural characteristics, as well as specific charge characteristics. Optionally, a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and/or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site. An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present inventbn in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination. One or more unique sequence portions may optionally combine with one or more other non-contiguous portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope.
Immunoassays In another embodiment of the present invention, an immunoassay can be used to qualitatively or quantitatively detect and analyze markers in a sample. This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex of the antibody bound to the marker in the sample. To prepare an antibody that specifically binds to a marker, purified protein markers can be used. Antibodies that specifically bind to a protein marker can be prepared using any suitable methods known in the art. After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme- linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g., U.S. Pat. Nos. 4,366,241 ; 4,376,1 10; 4,517,288; and 4,837,168). Generally, a sample obtained from a subject can be contacted with the antibody that specifically binds the marker. Optionally, the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample. Examples of solid supports include but are not limited to glass or plastic in the form of, e.g., a microtiter plate, a stick, a bead, or a microbead. Antibodies can also be attached to a solid support. After incubating the sample with antibodies, the mixture is washed and the antibody- marker complex formed can be detected. This can be accomplished by incubating the washed mixture with a detection reagent. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker- specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like. Usually the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10 °C to 40 °C. The immunoassay can be used to determine a test amount of a marker in a sample from a subject. First, a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will form an antibody- marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above. The amount of an antibody- marker complex can optionally be determined by comparing to a standard. As noted above, the test amount of marker need not be measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal. Preferably used are antibodies which specifically interact with the polypeptides of the present invention and not with wild type proteins or other isoforms thereof, for example. Such antibodies are directed, for example, to the unique sequence portions of the polypeptide variants of the present invention, including but not limited to bridges, heads, tails and insertions described in greater detail below. Prefened embodiments of antibodies according to the present invention are described in greater detail with regard to the section entitled "Antibodies". Radio-immunoassay (RIA): In one version, this method involves precipitation of the desired substrate and in the methods detailed hereinbelow, with a specific antibody and J 25 radiolabelled antibody binding protein (e.g., protein A labeled with I ) immobilized on a precipitable carrier such as agarose beads. The number of counts in the precipitated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. Western blot: This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabelled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Immunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fluorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.
Radio-imaging Methods These methods include but are not limited to, positron emission tomography (PET) single photon emission computed bmography (SPECT). Both of these techniques are non- invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, US Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby incoφorated by reference as if fully set forth herein.
Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, vimses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20- 50 consecutive amino acids derived from the polypeptide sequences of the present invention. Methods of constmcting such display libraries are well known in the art. Such methods are described in, for example, Young AC, et al., "The three-dimensional structures of a polysaccharide binding antibody to Cryptococcus neoformans and its complex with a peptide from a phage display library: implications for the identification of peptide mimotopes" J Mol Biol 1997 Dec 12;274(4):622-34; Giebel LB et al. "Screening of cyclic peptide phage libraries identifies ligands that bind streptavidin with high affinities" Biochemistry 1995 Nov 28;34(47): 15430-5; Davies EL et al, "Selection of specific phage-display antibodies using libraries derived from chicken immunoglobulin genes" J Immunol Methods 1995 Oct 12; 186(1): 125-35; Jones C RT al. "Cunent trends in molecular recognition and bioseparation" J Chromatogr A 1995 Jul 14;707(l ):3-22; Deng SJ et al. "Basis for selection of improved carbohydrate-binding single-chain antibodies from synthetic gene libraries" Proc Natl Acad Sci U S A 1995 May 23;92(1 1):4992-6; and Deng SJ et al. "Selection of antibody single-chain variable fragments with improved carbohydrate binding by phage display" J Biol Chem 1994 Apr l;269(13):9533-8, which are incoφorated herein by reference.
The following sections relate to Candidate Marker Examples (first section) and to
Experimental Data for these Marker Examples (second section). It should be noted that Table numbering is restarted within each section.
CANDIDATE MARKER EXAMPLES SECTION This Section relates to Examples of sequences according to the present invention, including illustrative methods of selection thereof. Description of the methodology undertaken to uncover the biomolecular sequences of the present invention Human ESTs and cDNAs were obtained from GenBank versions 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank/release.notes/gbl36.release.notes); NCBI genome assembly of April 2003; RefSeq sequences from June 2003; Genbank version 139 (December 2003); Human
Genome from NCBI (Build 34) (from Oct 2003); and RefSeq sequences from December 2003; and from the LifeSeq library of Incyte Coφoration (ES Ts only; Wilmington, DE, USA). With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in GenBank, may be found in Boguski et al, Nat Genet. 1993 Aug;4(4):332-3; all of which are hereby incorporated by reference as if fully set forth herein). Novel splice variants were predicted using the LEADS clustering and assembly system as described in Sorek, R., Ast, G. & Graur, D. Alu- containing exons are alternatively spliced. Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as US20040101876 on May 27 2004; all of which are hereby incoφorated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobuhns. It then aligns the expressed sequences to the genome taking alternatively splicing into account and clusters overlapping expressed sequences into "clusters" that represent genes or partial genes. These were annotated using the GeneCarta (Compugen, Tel- Aviv, Israel) platform. The GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain stmctures, known and predicted proteins and detailed homology reports. A brief explanation is provided with regard to the method of selecting the candidates. However, it should noted that this explanation is provided for descriptive puφoses only, and is not intended to be limiting in any way. The potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are over-expressed in tumor tissues, by using databases of expressed sequences. Various parameters related to the information in the EST libraries, determined according to a manual classification process, were used to assist in locating genes and/or splice variants thereof that are over- expressed in cancerous tissues. The detailed description of the selection method is presented in Example 1 below. The cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1.
EXAMPLE 1 Identification of differentially expressed gene products - Algorithm In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes ) an algorithm based on an analysis of frequencies was configured. A specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow. Dry analysis Library annotation - EST libraries are manually classified according to: (i) Tissue origin (ii) Biological source - Examples of frequently used biological sources for constmction of EST libraries include cancer cell- lines; normal tissues; cancer tissues; fetal tissues; and others such as normal cell lines and pools of normal cell- lines, cancer cell- lines and combinations thereof. . A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above.
(iii) Protocol of library constmction - various methods are known in the art for library construction including normalized library constmction; non- normalized library constmction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library constmction is not indicated. The following mles were followed: EST libraries originating from identical biological samples are considered as a single library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted. If the percentage of such sequences (as compared to all other sequences) was at least 4 standard deviations above the average for all libraries being analyzed, this library was tagged as being contaminated and was eliminated from further consideration in the below analysis (see also Sorek, R. & Safer, H.M. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 31, 1067-74 (2003)for further details). Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest were analyzed. Splice variants were identified by using the LEADS software package as described above. EXAMPLE 2 Identification of genes over expressed in cancer. Two different scoring algorithms were developed. Libraries score -candidate sequences which are supported by a number of cancer libraries, are more likely to serve as specific and effective diagnostic markers. The basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries. Library counting: Small libraries (e.g., less than 1000 sequences) were excluded from consideration unless they participate in the cluster. For this reason, the total number of libraries is actually adjusted for each cluster. Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the normal libraries it might indicate actual over- expression. The algorithm - Clone countin : For counting EST clones each library protocol class was given a weight based on our belief of how much the protocol reflects actual expression levels: (i) non-normalized : 1 (ii) normalized : 0.2 (iii) all other classes : 0.1 Clones number score - The total weighted number of EST clones from cancer libraries was compared to the EST clones from normal libraries. To avoid cases where one library contributes to the majority of the score, the contribution of the library that gives most clones for a given cluster was limited to 2 clones. The score was computed as
where: c - weighted number of "cancer" clones in the cluster. C- weighted number of clones in all "cancer" libraries. n - weighted number of "normal" clones in the cluster. N- weighted number of clones in all "normal" libraries. Clones number score significance - Fisher exact test was used to check if EST clones from cancer libraries are significantly over-represented in the cluster as compared to the total number of EST clones from cancer and normal libraries. Two search approaches were used to find either general cancer- specific candidates or tumor specific candidates. • Libraries/sequences originating from tumor tissues are counted as well as libraries originating from cancer cell- lines ("normal" cell- lines were ignored). • Only libraries/sequences originating from tumor tissues are counted
EXAMPLE 3 Identification of tissue specific genes For detection of tissue specific clusters, tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue". The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones
(weighed - as described above) from tissue T in the cluster; and 2. Clones from the tissue T are at least 40 % from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts arc statistically significant.
EXAMPLE 4 Identification of splice variants over expressed in cancer of clusters which are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A "segment" (sometimes refened also as "seg" or "node") is defined as the shortest contiguous transcribed region without known splicing inside. Only reliable ESTs were considered for region and segment analysis. An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and (iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB). The set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI). (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to SI group. Fisher Exact Test P-values were used to check if: SI is significantly enriched by cancer EST clones compared to S2; and SI is significantly enriched by cancer EST clones compared to cluster background (S1+S2+S3). Identification of unique sequence regions and division of the group of transcripts accordingly is illustrated in Figure 2. Each of these unique sequence regions conesponds to a segment, also termed herein a "node".
Region 1 : common to all transcripts, thus it is preferably not considered for determining differential expression between variants; Region 2: specific to Transcript 1 ; Region 3: specific to Transcripts 2+3; Region 4: specific to Transcript 3; Region 5: specific to Transcripts 1 and 2; Region 6: specific to Transcript 1.
EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer A search for EST supported (no mRNA) regions for genes of: (i) known cancer markers (ii) Genes shown to be over- expressed in cancer in published micro-anay experiments. Reliable EST supported-regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or (iv) 3 libraries. Actual Marker Examples The following examples relate to specific actual marker examples. It should be noted that Table numbering is restarted within each example related to a particular Cluster, as indicated by the titles below.
EXPERIMENTAL EXAMPLES SECTION This Section relates to Examples describing experiments involving these sequences, and illustrative, non- limiting examples of methods, assays and uses thereof. The materials and experimental procedures are explained first, as all experiments used them as a basis for the work that was performed.
The markers of the present invention were tested with regard to their expression in various cancerous and non-cancerous tissue samples. A description of the samples used in the panel is provided in Table 1 below. A description of the samples used in the normal tissue panel is provided in Table 2 below. Tests were then performed as described in the "Materials and Experimental Procedures" section below.
Table 1: Tissue samples in testing panel
Table 2: Tissue samples in normal panel:
Materials and Experimental Procedures RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS
(Wilmington, DE 19801, USA, http://www.absbioreagents.com) or Ambion (Austin, TX 78744
USA, http://www.ambion.com). Alternatively, RNA was generated from tissue samples using
TRI- Reagent (Molecular Research Center), according to Manufacturer's instmctions. Tissue and
RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 μg) was mixed with 150 ng Random Hexamer primers (Invitrogen) and 500 μM dNTP in a total volume of 15.6 μl. The mixture was incubated for 5 min at 65 °C and then quickly chilled on ice. Thereafter, 5 μl of 5X Superscriptll first strand buffer (Invitrogen), 2.4μl 0.1M DTT and 40 units RNasin (Promega) were added, and the mixture was incubated for 10 min at 25 °C, followed by further incubation at 42 °C for 2 min. Then, 1 μl (200units) of Superscriptll (Invitrogen) was added and the reaction (final volume of 25μl) was incubated for 50 min at 42 °C and then inactivated at 70 °C for 15min. The resulting cDNA was diluted 1 :20 in TE buffer (10 mM Tris pH=8, 1 mM EDTA pH=8). Real-Time RT-PCR analysis- cDNA (5μl), prepared as described above, was used as a template in Real- Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) with specific primers and UNG Enzyme (Eurogentech or ABI or Roche). The amplification was effected as follows: 50 °C for 2 min, 95 °C for 10 min, and then 40 cycles of 95 "C for 15sec, followed by 60 °C for 1 min. Detection was performed by using the PE Applied Biosystem SDS 7000. The cycle in which the reactions achieved a threshold level (Ct) of fluorescence was registered and was used to calculate the relative transcript quantity in the RT reactions. The relative quantity was calculated using the equation Q=efficiencyΛ"α. The efficiency of the PCR reaction was calculated from a standard curve, created by using serial dilutions of several reverse transcription (RT) reactions. To minimize inherent differences in the RT reaction, the resulting relative quantities were normalized to the geometric mean of the relative quantities of several housekeeping (HSKP) genes. Schematic summary of quantitative real-time PCR analysis is presented in Figure 3. As shown, the x-axis shows the cycle number. The Cj =
Threshold Cycle point, which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR products signal is above the background level (passive dye ROX) and still in the Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements). The y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative quantification.
The sequences of the housekeeping genes measured in all the examples on ovarian canceφanel were as follows:
SDHA (GenBank Accession No. NM 004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon : TGGGAACAAGAGGGCATCTGCTAAAGTTTC AG ATTCC ATTTCTGCTCAGTATCC AGT AGTGGATCATGAATTTGATGCAGTGGTGG PBGD (GenBank Accession No. BCO 19323), PBGD Forward primer: TGAGAGTGATTCGCGTGGG PBGD Reverse primer: CCAGGGTACGAGGCTTTCAAT PBGD-amplicon:
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGGAC AGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NMJ300194), HPRT1 Forward primer: TGACACTGGCAAAACAATGCA
HPRT1 Reverse primer: GGTCCTTTTCACCAGCAAGCT
HPRTl -amplicon:
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAA
AGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
GAPDH (GenBank Accession No. BC026907)
GAPDH Forward primer: TGCACCACCAACTGCTTAGC
GAPDH Reverse primer: CCATCACGCCACAGTTTCC
GAPDH-amplicon : TGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGTATC
GTGGAAGGACTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGA
TGG
The sequences of the housekeeping genes measured in all the examples on normal tissue samples panel were as follows:
RPL19 (GenBank Accession No. NM 000981), RPL19 Forward primer: TGGCAAGAAGAAGGTCTGGTTAG RPL19 Reverse primer: TGATCAGCCCATCTTTGATGAG RPLl 9 -amplicon:
TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATGCCA
ACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA TATA box (GenBank Accession No. NM_003194), TATA box Forward primer : CGGTTTGCTGCGGTAATCAT
TATA box Reverse primer: TTTCTTGCTGCCAGTCTGGAC
TATA box -amplicon:
CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATTTT
CAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAGACTG GCAGCAAGAAA Ubiquitin (GenBank Accession No. BC000449)
Ubiquitin Forward primer: ATTTGGGTCGCGGTTCTTG
Ubiquitin Reverse primer: TGCCTTGACATTCTCGATGGT
Ubiquitin C -amplicon: ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGAT
CTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG
TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA
SDHA (GenBank Accession No. NM 004168) SDHA Forward primer:
TGGGAACAAGAGGGCATCTG SDHA Reverse primer: CCACCACTGCATCAAATTCATG
SDHA-amplicon :
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT
AGTGGATCATGAATTTGATGCAGTGGTGG
Oligonucleotide-based micro-array experiment protocol-
Microarray fabrication Microanays (chips) were printed by pin deposition using the MicroGrid II MGII 600 robot from BioRobotics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel- Aviv, IL) as described by A. Shoshan et al, "Optical technologies and informatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). The designed oligonucleotides were synthesized and purified by desalting with the Sigma-Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino- modified linker at the 5' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham Bioscience, Piscataway, NJ, US). The 50-mer oligonucleotides, fonning the target sequences, were first suspended in Ultra-pure DDW (Cat # 01 -866-1 A Kibbutz Beit-Haemek, Israel) to a concentration of 50μM. Before printing the slides, the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21°C. Each slide contained a total of 9792 features in 32 subanays. Of these features, 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate. An additional 288 features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel. Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are commercially available in the Anay Control product (Anay control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06).
Post-coupling processing of printed slides After the spotting of the oligonucleotides to the glass (CodeLink) slides, the slides were incubated for 24 hours in a sealed saturated NaCl humidification chamber (relative humidity 70- 75%). Slides were treated for blocking of the residual reactive groups by incubating them in blocking solution at 50°C for 15 minutes (lOml/slide of buffer containing 0.1M Tris, 50mM ethanolamine, 0.1 %> SDS). The slides were then rinsed twice with Ultra- pure DDW (double distilled water). The slides were then washed with wash solution (lOml/slide. 4X SSC, 0.1% SDS)) at 50°C for 30 minutes on the shaker. The slides were then rinsed twice with Ultra-pure DDW, followed by drying by centrifugation for 3 minutes at 800 φm. Next, in order to assist in automatic operation of the hybridization protocol, the slides were treated with Ventana Discovery hybridization station barcode adhesives. The printed slides were loaded on a Bio-Optica (Milan, Italy) hematology staining device and were incubated for 10 minutes in 50ml of 3-Aminopropyl Triethoxysilane (Sigma A3648 lot #122K589). Excess fluid was dried and slides were then incubated for three hours in 20 mm/Hg in a dark vacuum desiccator (Pelco 2251, Ted Pella, Inc. Redding CA).
The following protocol was then followed with the Genisphere 900-RP (random primer), with mini elute columns on the Ventana Discovery HybStation™, to perform the microanay experiments. Briefly, the protocol was performed as described with regard to the instmctions and information provided with the device itself. The protocol included cDNA synthesis and labeling. cDNA concentration was measured with the TBS-380 (Turner Biosystems. Sunnyvale, CA.) PicoFlour, which is used with the OliGreen ssDNA Quantitation reagent and kit.
Hybridization was performed with the Ventana Hybridization device, according to the provided protocols (Discovery Hybridization Station Tuscon AZ). The slides were then scanned with GenePix 4000B dual laser scanner from Axon Instmments Inc, and analyzed by GenePix Pro 5.0 software. Schematic summary of the oligonucleotide based microanay fabrication and the experimental flow is presented in Figures 4 and 5. Briefly, as shown in Figure 4, DNA oligonucleotides at 25uM were deposited (printed) onto Amersham 'CodeLink' glass slides generating a well defined 'spot'. These slides are covered with a long-chain, hydrophilic polymer chemistry that creates an active 3-D surface that covalently binds the DNA oligonucleotides 5 '-end via the
C6-amine modification. This binding ensures that the full length of the DNA oligonucleotides is available for hybridization to the cDNA and also allows lower background, high sensitivity and reproducibility. Figure 5 shows a schematic method for performing the microanay experiments. It should be noted that stages on the left- hand or right- hand side may optionally be performed in any order, including in parallel, until stage 4 (hybridization). Briefly, on the left- hand side, the target oligonucleotides are being spotted on a glass microscope slide (although optionally other materials could be used) to form a spotted slide (stage 1). On the right hand side, control sample RNA and cancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), to form labeled probes. It should be noted that the control and cancer samples come from conesponding tissues (for example, nonnal prostate tissue and cancerous prostate tissue). Furtheπuore, the tissue from which the RNA was taken is indicated below in the specific examples of data for particular clusters, with regard to overexpression of an oligonucleotide from a "chip" (microanay), as for example "prostate" for chips in which prostate cancerous tissue and normal tissue were tested as described above. In stage 3, the probes are mixed. In stage 4, hybridization is performed to form a processed slide. In stage 5, the slide is washed and scanned to form an image file, followed by data analysis in stage 6.
2005/116850
251 DESCRIPTION FOR CLUSTER H61775 Cluster H61775 features 2 transcπpt(s) and 6 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
TαWe 2 - Segments of interest
H61775 node 2 H61775 node 4 H61775 node 6 H61775 node 8 H61775 node 0 H61775 node 5
Table 3 - Proteins of interest
Cluster H61775 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 6 refer to weighted expression of ESTs in each 2005/116850
252 category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million)
Overall, the following results were obtained as shown with regard to the histograms in Figure 6 and Table 4 This cluster is overexpressed (at least at a minimum level) in the following pathological conditions brain malignant tumors and a mixture of malignant tumors from different tissues
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster H61775 features 2 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein H61775 P16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H61775_T21. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H61775 P16 and Q9P2J2 (SEQ ID NO:953): l.An isolated chimeric polypeptide encoding for H61775J 6, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 1 - 93 of Q9P2J2, which also conesponds to amino acids 1 - 83 of H61775 P16, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV conesponding to amino acids 84 - 152 of H61775J 6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H61775 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775JM6. Comparison report between H61775_P16 and AAQ88495 (SEQ ID NO:954): l .An isolated chimeric polypeptide encoding for H61775J 6, comprising a first amino acid sequence being at least 90 % homologous to
MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 - 83 of AAQ88495, which also conesponds to amino acids 1 - 83 of H61775_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV conesponding to amino acids 84 - 152 of H61775 P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of H61775_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCW RSSCSVTLQV in H61775_P16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein H61775JP16 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775 P16 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 6 - Amino acid mutations
Variant protein H61775_P16 is encoded by the following transcript(s): H61775 T21, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H61775_T21 is shown in bold; this coding portion starts at position 261 and ends at position 716. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H61775 P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein H61775_P17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H61775_T22. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between H61775 P17 and Q9P2J2: l .An isolated chimeric polypeptide encoding for H61775_P17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11 - 93 of Q9P2J2, which also conesponds to amino acids 1 - 83 of H61775 P17.
Comparison report between H61775 P17 and AAQ88495: l.An isolated chimeric polypeptide encoding for H61775 P17, comprising a first amino acid sequence being at least 90 % homologous to MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWL RFGFLLPIFIQFGLYSPRIDPDYVG conesponding to amino acids 1 - 83 of AAQ88495, which also conesponds to amino acids 1 - 83 of H61775_P17.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Vanant protein H61775_P17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein H61775 P17 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 8 - Amino acid mutations
Variant protein H61775_P17 is encoded by the following transcπpt(s) H61775_T22, for which the sequence(s) is/are given at the end of the application The coding portion of transcript H61775JT22 is shown in bold, this coding portion starts at position 261 and ends at position 509 The transcπpt also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein H61775_P17 sequence provides support for the deduced sequence of this variant protein accordmg to the present invention) Table 9 - Nucleic acid SNPs
As noted above, cluster H61775 features 6 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application These segment(s) are portions of nucleic acid sequence(s) which are descnbed herein separately because they are of particular interest A description of each segment according to the present invention is now provided
Segment cluster H61775_node_2 according to the present invention is supported by 17 libraries The number of libraries was determined as previously descπbed This segment can be found in the following transcπpt(s) H61775_T21 and H61775_T22 Table 10 below descnbes the starting and ending position of this segment on each transcript Table 10 - Segment location on transcripts
Segment cluster H61775_node_4 according to the present invention is supported by 20 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s)- H61775_T21 and H61775 T22 Table 11 below descπbes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts Transcnp *nam%Me< <
H61775 T21 319 507 H61775 T22 319 507 Segment cluster H61775_node_6 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775 T22. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster H61775_node_8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775 T21. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster H61775_node_0 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H61775_T21 and H61775 T22. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster H61775_node_5 according to the present invention can be found in the following transcript(s): H61775 T22. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/PswORJLCti/aLAXQjXh07 :Q9P2J2 Sequence documentation:
Alignment of: H61775JP16 x Q9P2J2
Alignment segment 1/1:
Quality: 803.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MV CLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 I I I I I I I I I I I II I I I I I I I II I I I I I II I I I I I I II II I I I I I I I I II I 11 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 60
51 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 83 I I I I I I II I I I I I I I II I II I I I II I I I I I II I 61 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 93
Sequence name: /tmp/Psw0RJLCti/aLAXQjXh07 :AAQ88495
Sequence documentation:
Alignment of: H61775_P16 x AAQ88495
Alignment segment 1/1:
Quality: 803.00
Escore : 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MV CLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 I I I I || I I I || || I || I I I I I I I I I I I || I I || I I I I M I I I I I II I I II 1 MVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50
51 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 83 I I I I I I I I I I I I I I I I I I I I I II I II II I I II I 51 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 83
Sequence name: /tmp/naab8yR3GC/pSM412lL5o :Q9P2J2
Sequence documentation:
Alignment of: H61775_P17 x Q9P2J2
Alignment segment 1/1: Quality: 803.00
Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MVWCLGLAVLSLVISQGADGRGKPEWSVVGRAGESVVLGCDLLPPAGRP 50 I I I I I I I I I I I I I I I I I II II I I I I II I I II I I I I II I I II I I I I I I I I I 11 MV CLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 60 51 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 83 I I I I I II I I I I I II I I I I II I I I I I II I I I I I I 61 PLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG 93
Sequence name: /tmp/naab8yR3GC/pSM412lL5o : AAQ88495
Sequence documentation:
Alignment of: H61775_P17 x AAQ88495
Alignment segment 1/1: Quality: 803 . 00 Escore: 0 Matching length: Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MV CLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MV CLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRP 50
51 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 83 II I I I I I I I II I I I I I I I I I I I I I I I I I I II I I 51 PLHVIE LRFGFLLPIFIQFGLYSPRIDPDYVG 83
Expression of immunoglobulin superfamily, member 9 H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8 in normal and cancerous ovary tissues.
Expression of immunoglobulin superfamily, member 9 transcripts detectable by or according to H61775seg8, H61775seg8 amplicon(s) and H61775seg8F2 and H61775seg8R2 primers was measured by real time PCR. In parallel the expression of four housekeeping genes: PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), GAPDH (GenBank Accession No. BC026907; GAPDFl amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48,71 , Table 1 , "Tissue samples in testing panel"), to obtain a value of fold up -regulation for each sample relative to median of the normal PM samples. Figure 7 is a histogram showing over expression of the above- indicated immunoglobulin superfamily, member 9 transcripts in cancerous ovary samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained As is evident from Figure 7, the expression of immunoglobulin superfamily, member 9 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, ,71 Table 1, "Tissue samples in testing panel") and including benign samples (samples No. 56, 62, 64). Notably an over- expression of at least 5 fold was found in 21 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of immunoglobulin superfamily, member 9 transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 2.76E-4. The above value demonstrates statistical significance of the results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: H61775seg8F2 forward primer; and H61775seg8R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: H61775seg8 H61775seg8F2 (SEQ ID NO:955) GAAGGCTCTTGTCACTTACTAGCCAT H61775seg8R2 (SEQ ID NO:956)
TGTCACCATATTTAATCCTCCCAA Amplicon (SEQ ID NO: 957)
GAAGGCTCTTGTCACTTACTAGCCATGTGATTTTGGAAAGAAACTTAACATTAATTC CTTCAGCTACAATGGAATTCTTGGGAGGATTAAATATGGTGACA
Expression of immunoglobulin superfamily, member 9 H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8 in different normal tissues.
Expression of immunoglobulin superfamily, member 9 transcripts detectable by or according to H61775 seg8 amplicon(s) and H61775 seg8F and H61775 seg8R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM 000981 ; RPL19 amplicon), TATA box (GenBank Accession No.
NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometnc mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2 "Tissue samples in normal panel", above), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 8, presenting the histogram showing the expression of H61775 transcripts which are detectable by amplicon as depicted in sequence name H61775seg8, in different normal tissues. Amplicon and primers are as above.
DESCRIPTION FOR CLUSTER HSAPHOL Cluster HSAPHOL features 7 transcript(s) and 18 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest Segrneii Name,, i ΕQ , r- HSAPHOL node 11 18 26J
Table 3 - Proteins of interest
These sequences are variants of the known protein Alkaline phosphatase, tissue- nonspecific isozyme precursor (SwissProt accession identifier PPBT HUMAN; known also according to the synonyms EC 3.1.3.1 ; AP-TNAP; Liver/bone/kidney isozyme; TNSALP), SEQ ID NO: 36, refened to herein as the previously known protein. The variant proteins according to the present invention are variant(s) of a known diagnostic marker, called Alkaline Phosphatase.
Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor is known or believed to have the following function(s): THIS ISOZYME MAY PLAY A ROLE IN SKELETAL MINERALIZATION. The sequence for protein Alkaline phosphatase, tissue- nonspecific isozyme precursor is given at the end of the application, as "Alkaline phosphatase, tissue- nonspecific isozyme precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor localization is believed to be attached to the membrane by a GPI-anchor. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ossification; metabolism, which are annotation(s) related to Biological Process; magnesium binding; alkaline phosphatase; hydrolase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSAPHOL features 7 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Alkaline phosphatase, tissue-nonspecific isozyme precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSAPHOL_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL T4. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL P2 and AAH21289 (SEQ ID NO: 36): l.An isolated chimeric polypeptide encoding for HSAPHOL P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCA corresponding to amino acids 1 - 22 of HSAPFIOL_P2, second amino acid sequence being at least 90 % homologous to PATPRPLSWLRAPTRLCLDGPSPVLCA conesponding to amino acids 1 - 27 of AAH21289, which also conesponds to amino acids 23 - 49 of HSAPHOL P2, and a third amino acid sequence being at least 90 % homologous to
EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 83 - 586 of AAH21289, which also conesponds to amino acids 50 - 553 of HSAPHOL P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL J>2. 3.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a stmcture as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL P2 and PPBT_HUMAN : 1.An isolated chimeric polypeptide encoding for HSAPHOL_P2, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 49 of HS APHOL P2, second amino acid sequence being at least 90 % homologous to
EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARJLKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 21 - 524 of PPBT HUMAN, which also conesponds to amino acids 50 - 553 of HSAPHOL P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL_P2. 3. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a stmcture as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans- membrane region prediction programs predict that this protein has a trans- membrane region, and similarity to known proteins suggests a GPI anchor. Variant protein HSAPHOL_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HSAPHOL J>2 is encoded by the following transcript(s): HSAPHOL T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL T4 is shown in bold; this coding portion starts at position 1 and ends at position 1659. The transcript also has the following SNPs as listed in Table 6 (given according to then position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein HSAPHOL_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HSAPHOL_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T5. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL P3 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL_P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 63 - 82 of AAH21289, which also conesponds to amino acids 1 - 20 of HSAPHOL P3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKI IAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 123 - 586 of AAH21289, which also conesponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between HSAPHOL P3 and PPBT HUMAN : l .An isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP conesponding to amino acids 1 - 20 of PPBT HUMAN, which also conesponds to amino acids 1 - 20 of I-ISAPHOL P3, and a second amino acid sequence being at least 90 % homologous to
GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF conesponding to amino acids 61 - 524 of PPBT HUMAN, which also conesponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a stmcture as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and or gene stmcture, and/or similarity to known proteins.. Variant protein HSAPHOL_P3 also has the following non-silent SNPs (Single Nucleotide
Polymoφhisms) as listed in Table 7, (given according to the ir position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HSAPHOL JP3 is encoded by the following transcript(s): HSAPHOLJT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL T5 is shown in bold; this coding portion starts at position 253 and ends at position 1704. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs Variant protein HSAPHOL_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL JT6. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL JM and AAH21289: l.An isolated chimeric polypeptide encoding for HSAPHOL P4, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F conesponding to amino acids 124 - 586 of AAH21289, which also conesponds to amino acids 1 - 463 of HSAPHOL_P4.
Comparison report between HSAPHOL J and PPBT HUMAN : 1.An isolated chimeric polypeptide encoding for HSAPHOL P4, comprising a first amino acid sequence being at least 90 % homologous to MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP 283 MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F conesponding to amino acids 62 - 524 of PPBT HUMAN, which also conesponds to amino acids 1 - 463 of HSAPHOL P4. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because only one of the two trans- membrane region prediction programs (Tmpred: 1, Trnhmm: 0) has predicted that this protein has a trans- membrane region, but similarity to known proteins suggests a GPI anchor. In addition both signafpeptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HSAPHOL_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL JM sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
2005/116
284 Variant protein HSAPHOLJM is encoded by the following transcπpt(s) HSAPH0L T6, for which the sequence(s) is/are given at the end of the application The coding portion of transcnpt HSAPHOL_T6 is shown in bold, this coding portion starts at position 215 and ends at position 1603 The transcnpt also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein HSAPHOLJM sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 10 - Nucleic acid SNPs
285
Variant protein HSAPHOL P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T7. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P5 and AAH21289: l.An isolated chimeric polypeptide encoding for HSAPHOL P5, comprising a first amino acid sequence being at least 90 %> homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM conesponding to amino acids 63 - 417 of AAH21289, which also conesponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 440 - 586 of AAH21289, which also conesponds to amino acids 356 - 502 of HSAPHOL_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a stmcture as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL_P5 and PPBT HUMAN : l .An isolated chimeric polypeptide encoding for HSAPHOL P5, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKJRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM conesponding to amino acids 1 - 355 of PPBT HUMAN, which also conesponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 %> homologous to
DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF conesponding to amino acids 377 - 524 of PPBTJRJMAN, which also conesponds to amino acids 356 - 502 of HSAPHOL P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and or gene stmcture and/or similarity to known protein.. Variant protein HSAPHOL_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSAPHOL P5 is encoded by the following transcπpt(s) HSAPHOLJT7, for which the sequence(s) is/are given at the end of the application The coding portion of transcript HSAPHOL T7 is shown in bold, this coding portion starts at position 253 and ends at position 1758 The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein HSAPHOL_P5 sequence provides support for the deduced sequence of this vanant protein according to the present invention) Table 12 - Nucleic acid SNPs
Variant protein HSAPHOL P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) FISAPHOL T8. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P6 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVK ANEGTVG VSAATERSRCNTTQGNEVTSILRWAKDAGKS VGI VTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL conesponding to amino acids 63 - 349 of AAH21289, which also conesponds to amino acids 1 - 287 of HSAPHOL P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF conesponding to amino acids 395 - 586 of AAH21289, which also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL_P6 and PPBT HUMAN : l.An isolated chimeric polypeptide encoding for HSAPHOL_P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL conesponding to amino acids 1 - 287 of PPBT HUMAN, which also conesponds to amino acids 1 - 287 of HSAPHOL P6, and a second amino acid sequence being at least 90 % homologous to
GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF conesponding to amino acids 333 - 524 of PPBT_HUMAN, which also conesponds to amino acids 288 - 479 of HSAPHOL P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a stmcture as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both signafpeptide prediction programs predict that this protein has a signal peptide, and at least one of two trans- membrane region prediction programs predicts that this protein has a trans- membrane region, also similarity to known proteins suggests a GPI anchor.. Variant protein HSAPHOL P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSAPHOL P6 is encoded by the following transcript(s): HSAPHOL_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPH0L T8 is shown in bold; this coding portion starts at position 253 and ends at position 1689. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSAPHOL P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T9. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P7 and AAH21289: 1 n isolated chimeric polypeptide encoding for HSAPHOL P7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKS VGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK conesponding to amino acids 63 - 326 of AAH21289, which also conesponds to amino acids 1 - 264 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 265 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7.
Comparison report between HSAPHOL J>7 and PPBT HUMAN : l .An isolated chimeric polypeptide encoding for HSAPHOL P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR conesponding to amino acids 1 - 262 of PPBTJTUMAN, which also conesponds to amino acids 1 - 262 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 263 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL J>7.
Comparison report between HSAPHOL J>7 and 075090: 1.An isolated chimeric polypeptide encoding for HSAPHOL J>7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK conesponding to amino acids 1 - 264 of 075090, which also conesponds to amino acids 1 - 264 of HSAPHOL P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 265 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HS APHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%o homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL J>7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein HSAPHOL_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein HSAPHOL P7 is encoded by the following transcπpt(s) HSAPHOL T9, for which the sequence(s) is/are given at the end of the application The coding portion of transcript HSAPHOL T9 is shown in bold, this coding portion starts at position 253 and ends at position 1170 The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein HS APHOL P7 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 16 - Nucleic acid SNPs
Variant protein HSAPHOL_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL T10. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL JP8 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 63 - 350 of AAH21289, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
Comparison report between HSAPHOL_P8 and PPBT_HUMAN: 1.An isolated chimeric polypeptide encoding for HS APHOL P8, comprising a first amino acid sequence being at least 90 %> homologous to
MISPFLVLAIGTCLTNSLVPEKΕKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 1 - 288 of PPBT HUMAN, which also corresponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL J*8, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
Comparison report between HSAPHOL P8 and 075090 (SEQ ID NO:958): 1.An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G conesponding to amino acids 1 - 288 of 075090, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HSAPHOL_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Variant protein HSAPHOL P8 is encoded by the following transcπpt(s). HSAPHOL πO, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL T10 is shown in bold; this coding portion starts at position 253 and ends at position 1200. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
As noted above, cluster HSAPHOL eatures 18 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest A descπption of each segment according to the present invention is now provided.
Segment cluster HSAPHOL node l 1 according to the present invention is supported by 48 libraries. The number of libranes was deteπnined as previously described. This segment can be found in the following transcπpt(s): HSAPHOL T10, HSAPHOL T5, HSAPHOL T7, HSAPHOL_T8 and HSAPHOLJT9. Table 19 below describes the starting and ending position of this segment on each transcript Table 19 - Segment location on transcripts
Segment cluster HSAPHOL node l 3 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSAPHOLJT10, HSAPHOL T4, HSAPHOL T7, HSAPHOL T8 and HSAPHOL_T9. Table 20 below describes the starting and ending position of this segment on each transcπpt. 7 έ>/e 20 - Segment location on transcripts
Segment cluster FlSAPHOL_nodeJ5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSAPHOL node 19 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T10, HSAPHOLJM, HSAPHOLJT5, HSAPHOL T6, HSAPHOLJ7, HSAPHOL T8 and HSAPHOL T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
HSAPHOL T10 550 724 HSAPHOL T4 385 559 HSAPHOL T5 430 604 HSAPHOL T6 329 503 HSAPHOL T7 550 724 HSAPHOL T8 550 724 HSAPHOL T9 550 724
Segment cluster HSAPHOL_node_2 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T10, HSAPHOLJM, HSAPHOL T5, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOLJT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSAPHOL_node_21 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL JT0, HSAPHOLJM, HSAPHOL_T5, HSAPHOL T6, HSAPHOLJT7, HSAPHOL T8 and HSAPHOLJT9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSAPHOL_node_23 according to the present invention is supported by 45 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOLJM, HSAPHOL T5, HSAPHOL T6, HSAPHOL T7, HSAPHOL T8 and HSAPHOL_T9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSAPHOL_node_26 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s) : HSAPHOL TIO. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSAPHOL_node_28 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6 and HSAPHOL_T7. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HSAPHOL_node 8 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOLJT5, HSAPHOL T6, HSAPHOL T7 and HSAPHOL_T8. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSAPHOL_node_40 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL T5, HSAPHOLJT6, HSAPHOL T7 and HSAPHOLJT8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSAPHOL_node_42 according to the present invention is supported by 99 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL T5, HSAPHOL T6, HSAPHOLJT7, HSAPHOL_T8 and HSAPHOL T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSAPHOL node 16 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJT10, HSAPHOLJM, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL T7, HSAPHOL T8 and HSAPHOL T9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSAPHOL_node_25 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOLJM, HSAPHOLJT5, HSAPHOLJT6, HSAPHOLJT7 and HSAPHOL T8. Table 32 below describes the starting and ending position of this segment on each transcript. 7αWe 32 - Segment location on transcripts
Segment cluster HSAPHOL_node_34 according to the present invention is supported by 48 libranes. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL_T5, HSAPHOL T6, HSAPHOL T7 and HSAPHOL T8. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSAPHOL node _35 according to the present invention is supported by 51 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL_T5, HSAPHOL_T6 and HSAPHOL T8. Table 34 below describes the starting and ending position of this segment on each transcript. 7αό/e 34 - Segment location on transcripts
Segment cluster HSAPHOL_node 6 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL T5, HSAPHOL _T6, HSAPHOL T7 and HSAPHOL_T8. Table 35 below describes the starting and ending position of this segment on each transcπpt. Table 35 - Segment location on transcripts
Segment cluster HSAPHOL_node_41 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOLJM, HSAPHOL T5, HSAPHOL T6, HSAPHOL T7 and HSAPHOL T8. Table 36 below describes the starting and endmg position of this segment on each transcript. Table 36 - Segment location on transcripts
Microanay (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 37. Table 37 - Oligonucleotides related to this gene 310
Variant protein alignment to the previously known protein Sequence name: /tmp/rTOιp70HMr/xEFXPsrVLD: PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P2 x PPBT_HUMAN Alignment segment 1/1:
Quality: 4926.00 Escore: 0 Matching length: 507 Total length: 507 Matching Percent Similarity: 99.61 Matching Percent Identity: 99.41 Total Percent Similarity: 99.61 Total Percent Identity: 99.41 Gaps : 0
Alignment :
47 LCAEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 96 I II II I I I I II II I I I I I I I I I I II I I I II II I I I I I I I I II II II I I 18 LVPEKEKDPKY RDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 67
97 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 146 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 117
LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSVGIVTTT 196
II I I II I II I II I I I II II I I II II I II II I I II I I I I I I I II I II II II LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSVGIVTTT 167
RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDV 246
I I I II I I I I I II I I I I I I I I I II II II II II I I II I I I I I I II I I II I I I RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQ MHNIRDIDV 217
IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKPRYKHSH 296
I II I II II II I I II I II I I II II II II II I I I I II I II I I I II II : I I II IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRHKHSH 267
FIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMWVAI 346 I II I I II II II I II I I I I I II I I II I II I II II II I II I I I II II II I I I FI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMWVAI 317
QILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 396 I I I I M I I I || I M I I I I I I I || I I M I I || I || I || I I I || I I || I I I I QILRKNP GFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 367
SEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 446 I I II I II I I I I I I II I I I II I I I I I I I II I I I I I I II II II I I I I II I II SEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 417
GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 496
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 467 . . . . . AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL 546 II I I I I II I II I II I I II I II II I I I I I I I I I II I I I I II I I II I II II I 468 AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL 517
547 YPLSVLF 553 I I I II I I 518 YPLSVLF 524
Sequence name: /tmp/rTOip70HMr/xEFXPsrVLD: AAH21289
Sequence documentation:
Alignment of: HSAPHOL_P2 x AAH21289
Alignment segment 1/1:
Quality: 5108.00 Escore: 0 Matching length: 531 Total length: 586 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 90.61 Total Percent Identity: 90.61 Gaps : 1
Alignment : PATPRPLSWLRAPTRLCLDGPSPVLCA 49
II II I I I I II I I II II I I I I I II II II PATPRPLSWLRAPTRLCLDGPSPVLCAGLEHQLTSDHCQPTPSHPRRLHL 50 . . . . . EKEKDPKY RDQAQETLK 67 I I I II I I I II II I II I I I WAPGIKQVLGCTMISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLK 100
YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE 117
I II II I II I II II I II I II II II II I II I II II II I II II I I II II I II I YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE 150
MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR 167 M M I II I II I I I II I I I I I II I I II I II II I I I I I I I I II I II II II II MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR 200
CNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS 217 I I II I II I I I II II II I II I I I II II I II II I II I I II I I I I I II II II I CNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS 250
DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES 267
I II II I I II II I I II I II I I I I II I I I I I I II I I II I II II I II I I II I I DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES 300 . . . . . DEKARGTRLDGLDLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 317
II I I II I I I II I I I I I I II I I II I I I I I I I II I I I II II I I I I II I I I I I DEKARGTRLDGLDLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLG 350
LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 367 II I I II I I I I II I I I I I I II I I I I II II I I I I I II I I I I I II I I II I I I I 351 LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 400
368 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGY 417 I II I II II I II II II I II I I I II I I II II II II II I II II I II I I II I II 401 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGY 450
418 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH 467 II I I II II I II I I II II II I I I II II I I I I I I I I I I I I II I I I I I I II II 451 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH 500
468 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA 517 I I I I I II I I I I I II II II II I II I I II I I II I II I I II I I I I I II I I II I 501 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA 550 518 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 553 I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I II I 551 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/pYLJnulFqm/UcqrrsA3UA: PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P3 x PPBT_HUMAN
Alignment segment 1/1: Quality: 4615.00 Escore: 0 Matching length: 484 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 Total Percent Similarity: 92.37 Total Percent Identity: 92.18 Gaps : 1
Alignment :
1 MISPFLVLAIGTCLTNSLVP 20 I I I II I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50
21 GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 . . . . . 61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 110 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 111 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 160 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200
161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 210 || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 211 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 260 I I I I I I II I I I I : I II II I I I II I I II II I I II I II I II II I I I I I I II I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 . . . . . 261 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 310 I I I I I I I I I I I I I I II II I I I I I I I I I I I II I II I I I I II II I I I I I I II 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 311 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 360 I I II II I I I I I I II I I I I II II I I I I I I I I I I I I I I II I II I I I I I I I II 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400
361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 I I || I || I I I I I || I I I I I I I M || || I || I I I I M I I I I I I I I I I I || I 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450
411 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 460 I I I I I II I I I I II I I I I I I I I II I I II II I I II I I I I I I I I I I I I I I I I I 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500
461 SSAGSLAAGPLLLALALYPLSVLF 484 I I I I I I I I I I II I I II I I I I I I I I 501 SSAGSLAAGPLLLALALYPLSVLF 524
Sequence name: /tmp/ρYLJnulFqm/UcqrrsA3UA: AAH21289 Sequence documentation:
Alignment of: HSAPHOL_P3 x AAH21289
Alignment segment 1/1:
Quality: 4626.00 Escore: 0 Matching length: 484 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.37 Total Percent Identity: 92.37 Gaps : 1
Alignment : 1 MISPFLVLAIGTCLTNSLVP 20 I I II I I II II I I II II I I I I 63 ISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 112
21 GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60
113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162
61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 110 II II I I I II I II M II I I I II I II I I I I II I I I I I I II II I II I I II I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 111 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 160 II II II I I I I I I I I I I II I I I I II I I I II I I I II I I I I II I I II I I I II I 213 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 262
161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 210 I I I I I I I I I I I I I II I I I I I I I II I I I I II I I I II I I I II I I I II I II II 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312
211 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 260 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
313 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 362
261 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 310 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I 363 RNNVTDPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412
311 EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAP 360 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I II I I I I I I II I
413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 . . . . .
361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I I I I I I I I
463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512
411 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 460 I II I I I I II I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562
461 SSAGSLAAGPLLLALALYPLSVLF 484 I I I I I I I I I I I I II I I I I I I I I I I
563 SSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/iYbOicGuUc/lM HKKvSld: PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPH0L_P4 x PPBTJtUMAN
Alignment segment 1/1: Quality: 4517.00
Escore: 0 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.78 Total Percent Similarity: 100.00 Total Percent Identity: 99.78 Gaps : 0
Alignment:
1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 I I I I I I II II I II II I II I I I I I I I II I II I I II I I II I I I I I II II I I I 62 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 111 . . . . . 51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSV 100 I II I II I I I I I I I I I I I I I I II I I I II II I I I II I II II II II I I I I I II 112 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSV 161
101 GIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQGCKDIAYQLMHN 150 II I II I I I || I || I I I I I I I I I I I I I I I I I II I I I I I M I II I I II I II I
162 GIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQGCKDIAYQLMHN 211
151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 200 I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I II I I I I I I I 212 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 261
201 RYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 I : I I II II I II II II I I I II II II II I I I II II I II II II I II I I I I II I 262 RHKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 311 . . . . .
251 MVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 I I I I I I I I I I I II II I I I I I I I II II I I I I II I I I I I II I I I I I I I I I II
312 MVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 361
301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 II I I I I II II II I I I I I I I I I I I I I II II I I I I I I II I I I I I I I I I I I I I
362 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 411
351 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 400 I I I I I I I I I II I I II I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I I I
412 AILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 461
401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 450 I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I II I I I I I I I I I I I I I II I 462 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 511 451 LLALALYPLSVLF 463 II I I I I I I I II I I 512 LLALALYPLSVLF 524
Sequence name: /tmp/iYbOicGuUc/lM HKKvSld: AAH21289
Sequence documentation:
Alignment of: HSAPHOLJP4 x AAH21289
Alignment segment 1/1:
Quality: 4528.00 Escore: 0 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 124 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 173
51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSV 100 I I II I I I I II I I II I II II I I II I I I I I II II I I I II I I I II I I I II II I
174 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSV 223
101 GIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQGCKDIAYQLMHN 150 I I II I I I I II II I I I I I I I I I I II I II I I I II I I I I I I I I I I II I I I I I I 224 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 273
151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKP 200 I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I II I I I I I 274 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKP 323
201 RYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 I I I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I I I II I I I I I II II II 324 RYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 373
251 MVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
374 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 423
301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I 424 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 473
351 AILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 474 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 523
401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 450 II I I I I I II I I I II I II I II II I II I I II I II II I I I I II I I I II II II I 524 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 573
451 LLALALYPLSVLF 463 I I I I I I I II I I I I 574 LLALALYPLSVLF 586
Sequence name: /tmρ/v0YiupJ4xl/ 6HH5Tm6Ym: PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P5 x PPBTJiUMAN
Alignment segment 1/1:
Quality: 4816.00 Escore: 0 Matching length: 502 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.80 Total Percent Similarity: 95.80 Total Percent Identity: 95.61 Gaps : 1
Alignment : MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50
I I I I I I I I I I I II I II I I I II I I II II I I I I I I I I I I I I II I II I I I II I MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 . . . . . AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
I II II II I I II I I I I II I I II I II II I I I II I I I I I I I I I I I I I I I I I I I AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
II II II I I II II II I II II II II I II II I II II II II I I I II II I II I I I YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I M I I I I I M I I I I I I I I I I I I I I || || || I I I I I I I I I I I I M I I I I I I LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 200
CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
DLVDTWKSFKPRYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I DLVDT KSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 . . . . . RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I II I II I I I I I RNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350
EAVEM DHSHVFTFGGYTPRGNSIFGLAP 378 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400
379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 428 II I I I II II I I II M I I I II I I I I I I I I II I I I I I I I I I I I II I I II I I I 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450
429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 478 I II II I II I I I I II II I I I I I I I II I I I I I II II I I I I II I I I I I II I II 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500
479 SSAGSLAAGPLLLALALYPLSVLF 502 II I II I II II I I I I I I I II I II II 501 SSAGSLAAGPLLLALALYPLSVLF 524
Sequence name: /tmp/vOYiupJ4xl/ 6HH5Tm6Ym: AH21289
Sequence documentation:
Alignment of: HSAPHOL_P5 x AAH21289
Alignment segment 1/1:
Quality: 4827.00
Escore: 0 Matching length: 502 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.80 Total Percent Identity: 95.80 Gaps : 1
Alignment :
1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 M I || M M II II I II II II I I II I II II II II II II II II I II II II II 63 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 112
51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I II I I I I I I I I II II I I I I II I I I I I I I I I I I I I II I I I I I I I I II II I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162
101 YNTNAQVPDS GTAT YLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 . . . . . 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 200 II I I II I I I I II I II I I I II I I I I I II II I I I I I I I I I I I I II I I I I I I I 213 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I II I I I II II II I I I II I I II II II I I I I II I I I I I I I I II II I I II I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312
251 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || || I I I I I I I I I I || I I I 313 DLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 I I I II I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 . . . . . 351 EAVEM DHSHVFTFGGYTPRGNSIFGLAP 378 I II I I I I I I I I I I I II I I I I I I II I I I I 413 EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAP 462 379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 428 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512
429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 478 I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I II I I I I I I II I I I I I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562
479 SSAGSLAAGPLLLALALYPLSVLF 502 I I I I I I I I I I I I I I I I I I I I I I I I 563 SSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/LlylqOddii/lFFtdNNCUx : PPBTJiUMAN
Sequence documentation:
Al ignment of : HSAPHOL_P6 x PPBTJiUMAN Alignment segment 1/1:
Quality: 4575.00 Escore: 0 Matching length: 479 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 Total Percent Similarity: 91.41 Total Percent Identity: 91.22 Gaps : 1
Alignment : . . . . . 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I II I I I I II I II I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I II II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL 287 I II I I I I I II I I : I I I II I I I I I I I II I I I I I I II I I 251 DLVDT KSFKPRHKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 300
288 GGRIDHGHHEGKAKQALH 305 || I I I I I I I I I I I I || I I
301 RNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350
306 EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAP 355 I I I I I I I I I I I II I I I I I I II I I II I I II I I II II I I I I I I I I II I I I I I 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400
356 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 405 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I
401 MLSDTDKKPFTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLR 450 . . . . .
406 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 455 I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500
456 SSAGSLAAGPLLLALALYPLSVLF 479 I I I I I I I I I II I I I I I I I I I I I I I 501 SSAGSLAAGPLLLALALYPLSVLF 524 Sequence name: /tmp/LlylqOddii/lFFtdNNCUx : AAH21289
Sequence documentation:
Alignment of: HSAPHOL_P6 x AAH21289
Alignment segment 1/1:
Quality: 4586.00 Escore: 0 Matching length: 479 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.41 Total Percent Identity: 91.41 Gaps : 1
Alignment:
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 II I I I I I I M I I II II I I II I I II I I I I I II I I I I I II I I I II I I I I I II 63 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 112
51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I II I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II II I I I II I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I I I II I I I II I I I I II II I I I I I I I I I I I I I I I I
163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212
151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I || I I I I I || I I I || I I I I I I I I I I I I I I I I I I I I I I II I I I I I I
213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262
201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312
251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL 287 II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 DLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 . . . . .
288 GGRIDHGHHEGKAKQALH 305 I I I I I I I I I I I I I I I I I I
363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412
306 EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAP 355 I II I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462
356 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 405 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512
406 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 455 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 456 SSAGSLAAGPLLLALALYPLSVLF 479 I I I I I I II I I I II I I I I I I I I I I I 563 SSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKcW: PPBTJiUMAN
Sequence documentation:
Alignment of: HSAPH0L_P7 x PPBTJiUMAN
Alignment segment 1/1:
Quality: 2574.00 Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.62 Total Percent Similarity: 100.00 Total Percent Identity: 99.62 Gaps: 0
Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50
51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I II II I I I I I I II I I I I I I I I I I II I I I I I I II I I I II II I I I I I I II 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
101 YNTNAQVPDSAGTAT YLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I II I II I I I I II I I I I I I II I I I I I I I II I I I I I II I I I II 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 200 I I I I I I II I I I I I I I I I I I I II II I I I I I I I I I II I I I I I II I I II I I I I 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I II I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I II I I I I I I II II 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
251 DLVDTWKSFKPRYK 264 111111111111:1 251 DLVDTWKSFKPRHK 264
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKc : AH21289
Sequence documentation: Alignment of: HSAPHOL_P7 x AAH21289
Alignment segment 1/1: Quality: 2585.00
Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 112 . . . . . 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212
151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 213 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I II I I I I I I I II I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312
251 DLVDTWKSFKPRYK 264 I I I I I I I I I I II I I 313 DLVDTWKSFKPRYK 326
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKcW: 075090
Sequence documentation:
Alignment of: HSAPH0L_P7 x 075090
Alignment segment 1/1:
Quality: 2585.00 Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 II II I I I I I I I I I I I I I I I || II I II II I I I II I I I II I I II I II I II II 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50
51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I II I I I I I II II I I I I I I I I I I II I I I I I I II I I I I I I II I I I I II I II I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I II II I I I II I I I I II I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II I I I I I II 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
251 DLVDTWKSFKPRYK 264 I I I I M I I I I II I I 251 DLVDTWKSFKPRYK 264 Sequence name: /tmp/H6G7vkGMmy/rSlj UOCll : PPBTJiUMAN
Sequence documentation:
Alignment of: HSAPHOL_P8 x PPBTJiUMAN
Alignment segment 1/1: Quality: 2819.00
Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.65 Total Percent Similarity: 100.00 Total Percent Identity: 99.65 Gaps : 0
Alignment:
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 II I I II I I II I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 . . . . . 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I II I I I I I I I I I I I I I I II I I I II I I I I I II I II I I I I I II I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 II II II II II II M II I II II I I I II II II II I II II II I I I II II II II 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200
201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 II I I I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250
251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 I I II I I I I II I I : I I II I I I I I I II I I I I I I I I I I I I I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLG 288
Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll : AAH21289
Sequence documentation:
Alignment of: HSAPH0L_P8 x AAH21289
Alignment segment 1/1':
Quality: 2830.00 Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 M I I I I I I I I || I I || M I I I II I I I II I I II I I I I I I I I I I I I I I II M 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I I I I I II I I I I I I II I II I I I I I I II I I II I I II II 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I II II I I I I II I II II I I I I I II I I I I I I I I I I I I II I II I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 . . . . . 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I II I I I I I I II II I I I I I I I II I I I I I I I I II II I I I I I I II I I II 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 II I II I II II I II I II I II II I I I I I I II II M II I I I I II I II I I II II 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312
251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 I M I I I I I II || I M II II I I I I II I I II I I I I I I I I I 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 350
Sequence name: /tmp/H6G7vkGMmy/rSlj wUOCll :O75090
Sequence documentation:
Alignment of: HSAPHOL_P8 x 075090
Alignment segment 1/1: Quality: 2830.00
Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I II I I I I I II II I I I I I I I I I I I I I I I I I I I I II I I I I II I I II I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 . . . . . 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I II I I I I I I II I I I I I II I I I II I I I I I II I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I II I II I I I II I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150
151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I II I II I I I I I I I I I I II I II I I I I II I I I II I I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200
201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I II I I I I I I II I I I I II I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 . . . 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 I I I I I I I I I I I II I I I I II I I I I I II I I I I I I I I I I I I 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288
DESCRIPTION FOR CLUSTER T10888 Cluster T10888 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
T10888 PEA 1 P2 57 T10888 PEA 1 P4 58 T10888 PEA 1 P5 59 T10888 PEA 1 P6 60
These sequences are vaπants of the known protein Carcinoembryonic antigen- related cell adhesion molecule 6 precursor (SwissProt accession identifier CEA6 JXUMAN, known also according to the synonyms Normal cross-reacting antigen, Nonspecific crossreactmg antigen, CD66c antigen), SEQ ID NO 56, refened to herein as the previously known protein The sequence for protein Carcinoembryonic antigen- related cell adhesion molecule 6 precursor is given at the end of the application, as "Carcinoembryonic antigen- related cell adhesion molecule 6 precursor amino acid sequence" Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Am o acid mutations for Known Protein
Protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor localization is believed to be attached to the membrane by a GPI- anchor. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the d g database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; cell-cell signaling, which are annotation(s) related to Biological Process; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T10888 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 9 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 9 and Table 5 This cluster is overexpressed (at least at a minimum level) in the following pathological conditions colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastπc carcinoma
Table 5 - Normal tissue distribution βvameιofTissue ilif™ bladder colon 107 epithelial 52 general 22 head and neck 40 lung 237 breast pancreas 32 prostate 12 stomach
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster T10888 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen- related cell adhesion molecule 6 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T10888_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T1. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10888_PEAJ_P2 and CEA6JIUMAN: l.An isolated chimeric polypeptide encoding for Η0888_PEA_1_P2, comprising a first amino acid sequence being at least 90 %> homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVN SGS YMCQAHNSATGLNRTTVTMITVS conesponding to amino acids 1 - 319 of CEA6_HUMAN, which also conesponds to amino acids 1 - 319 of T10888_PEAJ_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP conesponding to amino acids 320 - 324 of T10888 PEAJ P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888 PEAJ P2, comprising a polypeptide being at least 70%, optionally at feast about 80%, preferably at least about 85%ι, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DWTRP in T10888_PEA_1_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T10888 JΕAJ P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888 PEAJ P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T10888_PEAJ_P2 is encoded by the following transcript(s): T10888 PEAJ T1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888 PEA 1 T1 is shown in bold; this coding portion starts at position 151 and ends at position 1122. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 8 - Nucleic acid SNPs
Variant protein T10888_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888 PEAJ JM. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T10888 PEAJ _P4 and CEA6_HUMAN: l .An isolated chimeric polypeptide encoding for T10888 PEAJ JM, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL conesponding to amino acids 1 - 234 of CEA6_HUMAN, which also conesponds to amino acids 1 - 234 of T10888 PEA J P4, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of
T10888 PEAJ JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888 PEA 1 P4, comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_1_P4.
Comparison report between T10888 PEAJ JM and Q13774 (SEQ NO:959): l.An isolated chimeric polypeptide encoding for T10888 PEAJ JM, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKR DAGSYECEIQNPASANRSDPVTLNVL corresponding to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of T10888_PEA_1 JM, and a second amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of T10888 PEAJ JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEAJ_P4, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_1_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T10888JPEAJ JM also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888 PEAJ P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein T10888JΕA 1 P4 is encoded by the following transcript(s): T10888_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888 PEAJ T4 is shown in bold; this coding portion starts at position 151 and ends at position 918. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888JPEAJ JM sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T10888 PEA 1 P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T5. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relations hip of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10888 PEAJ J>5 and CEA6JTUMAN: l .An isolated chimeric polypeptide encoding for T10888_PEA_1_P5, comprising a first amino acid sequence being at least 90 %> homologous to MGPPSAPPCP HVPWKJΞVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVπ SDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVSG conesponding to amino acids 1 - 320 of CEA6 HUMAN, which also conesponds to amino acids 1 - 320 of T10888_PEAJ_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV conesponding to amino acids 321 - 390 of T10888_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T10888_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888 PEA 1 P5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a trans- membrane region downstream of this signal peptide.. Variant protein T10888_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein T10888_PEA_1_P5 is encoded by the following transcript(s): T10888JΕAJ T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888 PEAJ T5 is shown in bold; this coding portion starts at position 151 and ends at position 1320. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T10888 PEAJ P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T6. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. Comparison report between T10888_PEA_1_P6 and CEA6JJUMAN: l .An isolated chimeric polypeptide encoding for T10888_PEAJ_P6, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLA
HNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQ NDTGFYTLQVIKSDLVN EEATGQFH VY conesponding to amino acids 1 - 141 of CEA6 HUMAN, which also conesponds to amino acids 1 - 141 of T10888 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI conesponding to amino acids 142 - 183 of T10888 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888JΕAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI in T10888J»EAJJ>6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T10888_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T10888_PEA_1_P6 is encoded by the following transcript(s): T10888_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888 PEAJ T6 is shown in bold; this coding portion starts at position 151 and ends at position 699. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888 PEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster T10888 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A descπption of each segment according to the present invention is now provided.
Segment cluster Tl 0888 J*EAJ node J 1 according to the present invention is supported by 57 libraries. The number of libranes was deteπnined as previously described. This segment can be found in the following transcript(s): T10888_PEAJ_T1 and T10888_PEA_1_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_12 according to the present invention is supported by 9 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcript(s): T10888_PEAJ_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster T10888_PEAJ_nodeJ7 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1 and T10888_PEA_1_T4. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_4 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEAJ_T1, T10888_PEA_1_T4, T10888_PEA_1_T5 and T10888_PEA_1_T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_6 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888JPEAJ T1, T10888JΕA _1_T4, T10888_PEA_1_T5 and T10888 PEA 1 T6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): T10888 PEA 1 T6. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_9 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1, T10888_PEAJ_T4 and T10888 PEA J_T5. Table 21 below descπbes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
T10888 PEA 1 Tl 575 853 T 10888 PEA 1 T4 575 853 T10888 PEA 1 T5 575 853
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster T10888JΕAJ_node 15 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T4. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmρ/tM4EgaoKvm/vuztUrlRc7 : CEA6_HUMAN Sequence documentation:
Alignment of: T10888 PEA 1 P2 x CEA6 HUMAN
Alignment segment 1/1:
Quality: 3163.00 Escore: 0 Matching length: 319 Total length: 319 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II I I I I I II I I II I I II 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I I I I II I I I I I I I I II I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 . . . . . 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 II I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300
301 AHNSATGLNRTTVTMITVS 319
301 AHNSATGLNRTTVTMITVS 319
Sequence name: /tmp/Y llgj 7TCe/PgdufzLOlW:CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P4 x CEA6_HUMAN
Alignment segment 1/1: Quality: 2310.00
Escore: 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 . . . . . 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I I || I I I I I I II I M I I I I I II || II M II I I I I I I I I II II II I I II I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I II I II I I I I II I II I I I I I I I I I I I II II I I I II II I I II I II I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I I I I I I I I I I I I II I I II I I II I II II I II I I II 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
Sequence name: /tmp/Yjllgj 7TCe/PgdufzL01W:Q13774
Sequence documentation:
Alignment of: T10888_PEA_1_P4 x Q13774
Alignment segment 1/1:
Quality: 2310.00 Escore: 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I II II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I I II I I I I I I I I I I I I I I I I I I I I I I I II I I II I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
Sequence name: /tmp/x5xDBacdpj /rTXRGepv3y : CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P5 x CEAδJiUMAN
Alignment segment 1/1:
Quality: 3172.00 Escore: 0 Matching length: 320 Total length: 320 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 II I I I I I I I I I I I I I I II I I I II II II I I I I I II I I I I M I I I II I I II I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 . . . . . 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I II II I I I II I I I I I I II I I I I I II I I I II I I II II I I I I I I II I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 I I I I II I I I I I II I I I I I I I II I I I I I I II I I I I I I I I II I I I I I II I II 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250
251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 M I I I I II II I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300
301 AHNSATGLNRTTVTMITVSG 320 I I I I I II I I I I I I I I I I I I I 301 AHNSATGLNRTTVTMITVSG 320
Sequence name: /tmp/VAhvYFeatq/QNEM573uCo : CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P6 x CEA6_HUMAN Alignment segment 1/1:
Quality: 1393.00 Escore: 0 Matching length: 143 Total length: 143 Matching Percent Similarity: 99.30 Matching Percent Identity: 99.30 Total Percent Similarity: 99.30 Total Percent Identity: 99.30 Gaps : 0
Alignment : . . . . . 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I II I II I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 II I I II I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I II II II I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYRE 143 I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I II I I I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPE 143
Alignment of: T10888_PEA_1_P6 x CEA6_HUMAN
Alignment segment 1/1: Quality: 101.00 Escore: 0 Matching length: 141 Total length: 183 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 77.05 Total Percent Identity: 77.05 Gaps : 1
Alignment :
1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I II I I I I I I I I I I I II I II I II I I I I I I I II I I II I II I I I I I II II I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 . . . . . 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYREYFHMTSG 150 I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I II I I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVY 141 151 CWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI 183
141 141
Expression of CEA6 JfUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T transcripts which are detectable by amplicon as depicted in sequence name T juncl 1-17 in normal and cancerous ovary tissues Expression of CEA6 HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to juncl 1 - 17, T juncl 1-17 amplicon(s) and
T10888juncl l- 17F and T10888juncl 1 - 17R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon), FIPRT] (GenBank Accession No. NM 000194; amplicon - HPRTl -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 10 is a histogram showing over expression of the above- indicated CEA6JJUMAN Carcinoembryonic antigen- related cell adhesion molecule 6 transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 20 fold over- expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 10, the expression of CEA6 HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45- 48, 71,Table 1, "Tissue samples in testing panel") and including benign samples (samples No. 56-65). Notably an over-expression of at least 20 fold was found in 25 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEA6 JFUMAN Carcinoembryonic antigen- related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 3.79E-02. Threshold of 20 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.97E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T10888juncl 1-17F forward primer; and T10888juncl 1-17R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T10888juncl 1- 17
T10888juncl 1-17F (SEQ ID NO:960) CCAGCAATCCACACAAGAGCT T10888juncl l- 17R (SEQ ID NO:961) CAGGGTCTGGTCCAATCAGAG T10888juncl l-17 (SEQ ID NO:962)
CCAGCAATCCACACAAGAGCTCTTTATCCCCAACATCACTGTGAATAATAGCGGAT CCTATATGTGCCAAGCCCATAACTCAGCCACTGGCCTCAATAGGACCACAGTCACG ATGATCACAGTCTCTGATTGGACCAGACCCTG
Expression of CEA6 HUMAN Carcinoembryonic antigen- related cell adhesion molecule 6 T transcripts which are detectable by amplicon as depicted in sequence name T juncl 1-17 in different normal tissues. Expression of CEA6JTUMAN Carcinoembryonic antigen- related cell adhesion molecule 6 transcripts detectable by or according to T10888 juncl 1 - 17 amplicon(s) and T10888 juncl 1- 17F and T 10888 juncl 1 - 17R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2 above, "Tissue samples in normal panel") to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 11, presenting the histogram showing the expression of T transcripts which are detectable by amplicon as depicted in sequence name T juncl 1- 17, in different normal tissues. Amplicon and primers are as above.
DESCRIPTION FOR CLUSTER HSECADH Cluster HSECADH features 4 transcript(s) and 30 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest Table 3 - Proteins of interest
These sequences are variants of the known protein Epithehafcadheπn precursor (SwissProt accession identifier CADI JTUMAN, known also according to the synonyms E- cadhenn, Uvomoruhn, Cadheπn- 1, CAM 120/80), SEQ ID NO 95, referred to herein as the previously known protein The vanant proteins according to the present invention are vanants of a known diagnostic marker, called E-Cadhenn Protein Epithehal-cadheπn is known or believed to have the following functιon(s) Cadheπns are calcium dependent cell adhesion proteins They preferentially interact with themselves in a homophihc manner in connecting cells, cadheπns may thus contnbute to the sorting of heterogeneous cell types E-cadheπn has a potent invasive suppressor role It is also a ligand for mtegπn alpha- E/beta-7 The sequence for protein Epithehal-cadheπn precursor is given at the end of the application, as "Epithehal-cadheπn precursor amino acid sequence" Known polymoφhisms for this sequence are as shown m Table 4 Table 4 - Amino acid mutations for Known Protein
Protein Epithelial-cadherin localization is believed to be Type 1 membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion; homophilic cell adhesion, which are annotation(s) related to Biological Process; calcium binding; protein binding, which are annotation(s) related to Molecular Function; and membrane; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSECADH can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 12 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 12 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues and ovarian carcinoma.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue bladder 3.9e-01 3.4e-01 4.1e-01 1.7 3.8e-01 1.7 brain 3.7e-01 4.9e-01 1.4 1.0 colon 6.6e-01 7.4e-01 9.5e-01 0.6 9.3e-01 0.5 epithelial 1.3e-01 6.8e-01 9.5e-01 0.8 0.5 general 1.6e-06 1.5e-03 6.3e-05 1.5 5.6e-01 0.9 head and neck 1.5e-01 2.7e-01 4.6e-01 2.1 7.5e-01 1.2 kidney 8.3e-01 8.7e-01 9.9e-01 o 1 T liver 4.4e-01 6.9e-01 1 T 6.9e-01 TT lung 7.2e-01 8.8e-01 7.5e-01 09~ 9.9e-01 oT breast 7.5e-02 l.le-01 3.1e-01 TT 5.1e-01 TT ovary 4.5e-02 3.6e-02 4 7e-03 3.8 1.4e-02 TT
As noted above, cluster HSECADH features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Epithelial- cadherin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSECADH P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH JT 1. An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSECADH_P9 and Q9U1I7 (SEQ ID NO:963): l.An isolated chimeric polypeptide encoding for HSECADH P9, comprising a first amino acid sequence being at least 90 % homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of Q9UII7, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG corresponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH P9.
Comparison report between HSECADH P9 and Q9UII8 (SEQ ID NO:964): l .An isolated chimeric polypeptide encoding for HSECADH JP9, comprising a first amino acid sequence being at least 90 %> homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of Q9UII8, which also conesponds to amino acids 1 - 274 of HSECADH P9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG conesponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH_P9. Comparison report between HSECADH P9 and CADI JTUMAN: l.An isolated chimeric polypeptide encoding for HSECADH J?9, comprising a first amino acid sequence being at least 90 %> homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEG conesponding to amino acids 1 - 274 of CADI J-IUMAN, which also conesponds to amino acids 1 - 274 of HSECADFl JP9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG conesponding to amino acids 275 - 322 of HSECADH P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence TACRSRIANSCHSGDSWRNSCFANSDSAALAVSSEESGGQRALTAPRG in HSECADH_P9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HSECADH_P9 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH P9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
Variant protein HSECADH P9 is encoded by the following transcript(s): HSECADH π 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADH JT 1 is shown in bold; this coding portion starts at position 125 and ends at position 1090. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HSECADH_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH_T18. An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSECADH P13 and Q9UI17: l .An isolated chimeric polypeptide encoding for HSECADH_P13, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQl KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTTNRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of Q9UII7, which also conesponds to amino acids 1 - 379 of HSECADH J"" 13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. Comparison report between HSECADH P13 and Q9UII8: l.An isolated chimeric polypeptide encoding for HSECADH_P13, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of Q9UII8, which also conesponds to amino acids 1 - 379 of HSECADH P13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
Comparison report between HSECADH P13 and CAD1_HUMAN: l .An isolated chimeric polypeptide encoding for HSECADH P 13, comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQ1 KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADLQGEGL STTATAVITVTDTNDNPPIFNPTT conesponding to amino acids 1 - 379 of CADI HUM AN, which also conesponds to amino acids 1 - 379 of HSECADHJP13, and a second amino acid sequence VIL conesponding to amino acids 380 - 382 of HSECADH P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSECADH_P13 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSECADH_P13 is encoded by the following transcript(s): HSECADH_T18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSECADH_T18 is shown in bold; this coding portion starts at position 125 and ends at position 1270. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADH P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs ISl^pbsition^^^uele ti ejSAS ; toιatw 'Se^^ucklf^eiSGiaiciι ^€p^^^'
Variant protein HSECADH P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH_T19. An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSECADH P14 and Q9UII7: l.An isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRT A YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYA WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE corresponding to amino acids 1 - 336 of Q9UII7, which also conesponds to amino acids 1 - 336 of HSECADH P14, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV conesponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P14, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH_P14.
Comparison report between HSECADH_P14 and Q9UII8: l.An isolated chimeric polypeptide encoding for HSECADH P14, comprising a first amino acid sequence being at least 90 %> homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTAYFSLDTRFKVGTDGVITVT RPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNAVEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE conesponding to amino acids 1 - 336 of Q9UII8, which also conesponds to amino acids 1 - 336 of HSECADH_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV corresponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH J* 14, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADH_P14.
Comparison report between HSECADH J>14 and CAD1_HUMAN: 1.An isolated chimeric polypeptide encoding for HSECADH J114, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSN GNA VEDPMEILITVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNT YNAAIAYTILSQDPELPDKNMFTP RNTGVISVVTTGLDRE conesponding to amino acids 1 - 336 of CAD1 HUMAN, which also conesponds to amino acids 1 - 336 of HSECADH P14, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV conesponding to amino acids 337 - 373 of HSECADH P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSECADH P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRGQEDPEGVEDKCVLAQSRGQSKILLGQLSVNTVMV in HSECADHJ 4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region Variant protein HSECADH_P14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein HSECADH P14 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 11 - Amino acid mutations
Vanant protein HSECADH P14 is encoded by the following transcπpt(s) HSECADH JT 9, for which the sequence(s) is/are given at the end of the application The coding portion of transcπpt HSECADH JIT 9 is shown in bold, this coding portion starts at position 125 and ends at position 1243 The transcnpt also has the following SNPs as listed m Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column mdicates whether the SNP is known or not, the presence of known SNPs in vanant protein HSECADH P14 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 12 - Nucleic acid SNPs
Variant protein HSECADH_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSECADH T20. An alignment is given to the known protein (Epithelial-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSECADH P15 and Q9U1I7: l.An isolated chimeric polypeptide encoding for HSECADH JM 5, comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVY A WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII7, which also conesponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH P15, wherein said first and second amino acid sequences are contiguous and in a sequential order.
Comparison report between HSECADH P15 and Q9UII8: l.An isolated chimeric polypeptide encoding for HSECADH P15, comprising a first amino acid sequence being at least 90 %> homologous to MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLPJ QKRDWVIPPISCPENEKGPFPKNLVQI KSNKDKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of Q9UII8, which also corresponds to amino acids 1 - 229 of HSECADH J 5, and a second amino acid sequence VSIS corresponding to amino acids 230 - 233 of HSECADH P15, wherein said first and second amino acid sequences are contiguous and in a sequential order.
Comparison report between HSECADH JM 5 and CAD1 HUMAN: l .An isolated chimeric polypeptide encoding for HSECADH JM 5, comprising a first amino acid sequence being at least 90 % homologous to
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFED CTGRQRTA YFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVY A WDSTYRKFSTKVTLNT VGHHHRPPPHQASVSGIQAELLTFPNSSPGLRRQKRDWVIPPISCPENEKGPFPKNLVQI KSJ KTJKEGKVFYSITGQGADTPPVGVFIIERETGWLKVTEPLDRERIATYT conesponding to amino acids 1 - 229 of CADI JTUMAN, which also corresponds to amino acids 1 - 229 of HSECADH P15, and a second amino acid sequence VSIS conesponding to amino acids 230 - 233 of HSECADH JM 5, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HSECADH JM 5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADHJ 5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSECADH JM 5 is encoded by the following transcπpt(s): HSECADH JT20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcnpt HSECADH JT20 is shown in bold; this coding portion starts at position 125 and ends at position 823. The transcnpt also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSECADPIJM5 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster HSECADH features 30 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HSECADH_node_0 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T1 1 , HSECADH T18, HSECADHJT19 and HSECADH JT20. Table 15 below describes the starting and ending positbn of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster HSECADH_node_14 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T11, HSECADHJT18, HSECADHJT19 and HSECADH T20. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HSEC ADH node 15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T20. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HSECADH_node_21 according to the present invention is supported by 40 libranes The number of libraries was determined as previously described This segment can be found in the following transcπpt(s). HSECADH .T18 and HSECADHJT 9 Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HSECADH_node_22 according to the present invention is supported by 1 libraries. The number of libranes was deteπnined as previously described. This segment can be found in the following transcπpt(s): HSECADH T19. Table 19 below describes the starting and ending position of this segment on each transcript. 7αZ>/e 19 - Segment location on transcripts
Segment cluster HSECADH_node_25 according to the present invention is supported by 34 libranes. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSECADH_T18. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSECADH_node_26 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH JT18. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSECADH_node_48 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T11. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSECADH_node_52 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSECADH_node_53 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HSECADFl T1 1 Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSECADH_node_54 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T1 1. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSECADH_node_57 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH_T11. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSECADH_node_60 according to the present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH JT 1. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HSECADH_node_62 according to the present invention is supported by 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH JT 1. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSECADH_node_63 according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH JT 1. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSECADH_node_7 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously descπbed This segment can be found in the following transcπpt(s). HSECADH T1 1 , HSECADH_T18, HSECADH JT19 and HSECADH JT20 Table 30 below describes the starting and ending position of this segment on each transcript.
Table 30 - Segment location on transcripts
Transcnpftnamell -i [Segment startmg,posιtιon Segmen€e mg position" ifci
HSECADH Ti l 288 511
HSECADH T18 288 511
HSECADH T19 288 511
HSECADH T20 288 51 1
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSECADH node l according to the present invention can be found in the following transcript(s): HSECADH JT 1, HSECADHJT18, HSECADH_T19 and
HSECADH_T20. Table 31 below descπbes the starting and ending position of this segment on each transcript.
Table 31 - Segment location on transcripts
HSECADH Ti l 167 172
HSECADH T18 167 172
HSECADH T19 167 172
HSECADH T20 167 172 Segment cluster HSECADH node 1 1 according to the present invention is supported by 23 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HSECADH JM 1 , HSECADHJT 8, HSECADH_T19 and HSECADF1_T20. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HSECADH_node J 2 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSECADH T11, HSECADHJT 8, HSECADH T19 and HSECADH JT20. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSECADH_node_17 according to the present invention can be found in the following transcript(s): HSECADH JT1 1, HSECADHJT18 and HSECADH JT 9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSECADH node 18 according to the present invention is supported by 41 libraries The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s) HSECADHJT11, HSECADH T18 and HSECADH JTl 9 Table 35 below describes the starting and ending position of this segment on each transcπpt Table 35 - Segment location on transcripts
Segment cluster HSECADH_nodeJ9 according to the present invention can be found in the following transcπpt(s) HSECADH_T18 and HSECADH T19 Table 36 below describes the starting and ending position of this segment on each transcπpt Table 36 - Segment location on transcripts
Segment cluster HSECADH node 3 according to the present invention is supported by 18 libraries The number of libranes was determined as previously described This segment can be found in the following transcπpt(s)- HSECADH JT 1 , HSECADH JM 8, HSECADH JT19 and HSECADH JT20. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HSECADH_node_42 according to the present invention is supported by 43 libranes. The number of libranes was determined as previously described. This segment can be found in the following transcπpt(s): HSECADH T11. Table 38 below descπbes the startmg and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HSECADH_node_45 according to the present invention is supported by 39 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcπpt(s): HSECADH JT 1. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts ■Transcriipt name Segment sfarii & ositio mns Segment endmg?jrøsitib HSECADH Ti l 1018 1051 2
399
Segment cluster HSECADH_node_46 according to the present invention is supported by 40 libranes The number of libranes was deteπnined as previously described This segment can be found in the following transcnpt(s) HSECADFI T11 Table 40 below describes the starting and ending position of this segment on each transcript Table 40 - Segment location on transcripts
Segment cluster HSECADH_node_55 according to the present invention is supported by 36 libranes The number of libranes was determined as previously described This segment can be found in the following transcπpt(s) HSECADH JT 1 Table 41 below describes the starting and ending position of this segment on each transcript Table 41 - Segment location on transcripts
Segment cluster HSECADH_node_56 according to the present invention is supported by 42 libranes The number of libranes was deteπnined as previously descnbed This segment can be found in the following transcnpt(s) HSECADH T11 Table 42 below describes the starting and end g position of this segment on each transcript Table 42 - Segment location on transcripts
Segment cluster FlSECADH_node_58 according to the present invention is supported by 61 libraries The number of libraries was deteπnined as previously described This segment can be found m the following transcπpt(s) HSECADH_T1 1 Table 43 below descπbes the starting and ending position of this segment on each transcript Table 43 - Segment location on transcripts ffranscnpfename s.14 Segment startingtpositiorl jSegment eriding posιtion*Λ HSECADH Ti l 2431 2481
Segment cluster HSECADH_node_59 according to the present invention can be found in the following transcnpt(s) I ISECADH JT 1 Table 44 below describes the starting and ending position of this segment on each transcπpt Table 44 - Segment location on transcripts
Vanant protem alignment to the previously known protein Sequence name: /tmp/2x0l2XZlA3/JXvUszCm30:Q9UII7 Sequence documentation:
Alignment of: HSECADH P9 x Q9UII7
Alignment segment 1/1: Quality: 2727.00 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 II II II II I II I l-l I I I I II I I I II M I I I II II II I I I II II II II I II 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I II I II I I I I II II I II I I II II I I I I II II II I I I I II I II II II II 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 . . . . . 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I II I I I I II I II I II II I II I I I I I II II I I II I I I II II I I I I I II I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I II I I I I II I II I I I I I I I I I I I I I II I II I I I I I II I II I I I II I II II 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 I I II II I I I I I I I I I I I I I I II I I 251 TVTDQNDNKPEFTQEVFKGSVMEG 274
Sequence name: /tmp/2xOI2XZlA3/JXvUszCm30:Q9UII8
Sequence documentation:
Alignment of: HSECADH_P9 x Q9UII8
Alignment segment 1/1:
Quality: 2727.00 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 II I II I I M I I I I I I I || I I I || I I I I I I II II I I I I I I || II M I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 II I II I I I I I I I II I I I I I II I II II I II I I I I I II I II I I I I I I II I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 . . . . . 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEG 274 II I M I I I I I I I II I I I I I II I II 251 TVTDQNDNKPEFTQEVFKGSVMEG 274
Sequence name: /tmp/2xOI2XZlA3/JXvUszCm30: CADlJiUMAN
Sequence documentation: 104
Alignment of: HSECADH_P9 x CADlJiUMAN
Alignment segment 1/1:
Quality: 2727.00 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I II I II I II I I I II II I I I I I I I I I I I- I I I I I I I I I I I I I I I II I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I II I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 . . . . . 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I II I I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I I I I II I II I I I I I I I I I I II I I I I I I II II I I I I II I II II I I II I I II 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
251 TVTDQNDNKPEFTQEVFKGSVMEG 274 I I I I I II I II I I I I II I I I II I I I 251 TVTDQNDNKPEFTQEVFKGSVMEG 274
Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl : Q9UII7
Sequence documentation:
Alignment of: HSECADH_P13 x Q9UII7
Alignment segment 1/1:
Quality: 3720.00 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I I I I I II I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I II I I II II II I I I I II I I I I I I II I II I I I I I I I I I I I I I I I I I II I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 . . . . . 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I II II I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I II I I I II I I I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
301 AYTILSQDPELPDKNMFTINRNTGVISWTTGLDRESFPTYTLWQAADL 350 I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || I I I I I I I I I 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLWQAADL 350 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 I I II II I II I I II I I I II II I I I I I II II 351 QGEGLSTTATAVITVTDTNDNPPIFNPTT 379
Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl : Q9UII8
Sequence documentation:
Alignment of: HSECADH_P13 x Q9UII8
Alignment segment 1/1:
Quality: 3720.00 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I II II I I II I II I I I I I II I I I I I I I I I I I I M I I II I I I I I I I II MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I I I || I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I || I I I I I I I I VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I I I I II II I I I I II I I I I I I I I I I I I I I II II I I I II I I I I I I I YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
I I I I I I I I I I I I II I I I I I I I I II I I I I I I II I I I I I II I I I I I I I II II RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 . . . . . PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
II I I I II I I I II II I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
I I I I I II I II I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350 I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I AYTILSQDPELPDKNMFTINRNTGVISWTTGLDRESFPTYTLVVQAADL 350
QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 I I I I I I I I I I I I II I I I I I I I I I I I I I I I QGEGLSTTATAVITVTDTNDNPPIFNPTT 379
Sequence name: /tmp/e5Y8HiBmjB/iwybld8ikl : CADlJiUMAN
Sequence documentation:
Alignment of : HSECADH_ P13 x CADlJiUMAN
Alignment segment 1/1:
Quality: 3720.00 Escore: 0 Matching length: 379 Total length: 379 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I I I II II I II I I I II I I II II I II II II II I I I I I I II I I I II I II 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I I II I I I I I I I II I I I I I II I I I I I II II I I I I I I II I II I I I II I I VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
I I I I I II I I I I I I I II I II I I I I I II I I II I I I I I I I I I I I II I I I I I I I YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
I I I I I I I I I I I I I I II I I II I I I I II I I II I II I I II I I I I I I I I I II I I RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I I I II I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
AYTILSQDPELPDKNMFTINRNTGVISWTTGLDRESFPTYTLVVQAADL 350 I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRESFPTYTLVVQAADL 350
QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QGEGLSTTATAVITVTDTNDNPPIFNPTT 379 Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU :Q9UII7
Sequence documentation:
Alignment of: HSECADH_P14 x Q9UII7
Alignment segment 1/1:
Quality: 3313.00 Escore: 0 Matching length: 336 Total length: 336 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: . . . . . 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I II I I I II I I I I II II I II I I II I I II I I I I I I I I I I I I I I I I I I I I II 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 . . . . . 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I II I I I I I I I I I I I II I I II I II I I I I II I I I I I I II II I I I I I II II I I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 I I I || I I I I I I I I I I I || I I I I I I I I I I II I I I I || 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336
Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU:Q9UH8
Sequence documentation:
Alignment of: HSECADH_P14 x Q9UII8
Alignment segment 1/1: Quality: 3313.00 Escore: 0 Matching length: 336 Total length: 336 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I II I II I I I II I I I II I I II I II I I I I II I II I I I I I I I I I I I II I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I II I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I II I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 . . . . . 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I II I I II II I I I I I I I II I II I I I II II I I I I I I I I II I I I I I I II I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I II I II I I I II I I II I I I II I I I I I I I I I II I I I I I I I I I II I I I II I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I I I I || I I I I I I I I I I || I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I I II I I II I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336
Sequence name: /tmp/RtiX8vFyZe/iovNeRHKWU : CADlJiUMAN
Sequence documentation:
Alignment of: HSECADHJP14 x CADlJiUMAN
Alignment segment 1/1:
Quality: 3313.00 Escore: 0 Matching length: 336 Total length: 336 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I || I || I I I || || I I I I || I I I I || I I I || I M I I I I M I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I II I I II I I I I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I II I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 . . . . . 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I II I II I I I I I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYTLFSHAVSSNGNAVEDPMEILI 250
251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 TVTDQNDNKPEFTQEVFKGSVMEGALPGTSVMEVTATDADDDVNTYNAAI 300
301 AYTILSQDPELPDKNMFTINRNTGVISWTTGLDRE 336 I I I I II I I I I I I I I I I I II I I I I II I I I I I I I I I I I 301 AYTILSQDPELPDKNMFTINRNTGVISVVTTGLDRE 336
Sequence name: /tmp/rMRrwmuokD/lrmk2jOfgw: Q9UII7
Sequence documentation:
Alignment of: HSECADH_P15 x Q9UII7
Alignment segment 1/1:
Quality: 2289.00 Escore: 0 Matching length: 229 Total length: 229 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I I I I I I II I II I I I I I II I I I I I II II I I II I II I II I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I II I I I I II I I II I II I I I I I I I II I I I I I I I I I I I I I I I I II I I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 II I I I I I II I I I I I I I I I M I II I I I I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229
Sequence name: /tmp/rMRrwmuokD/lrmk2jOfg :Q9UII8
Sequence documentation:
Alignment of: HSECADH_P15 x Q9UII8
Alignment segment 1/1:
Quality: 2289.00 Escore: 0 Matching length: 229 Total length: 229 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I I I I I I || M II I I I II I I I I I I I II I I I I I I I I I I I II I I I I II I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I I I I I I II I I I I I I I I I I II II I I I I I I I I I I II I I I I I I I II II I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 . . . . . 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II II 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 I I I I II II I I I I I I I I I I I I I I I I II I I I 201 PVGVFIIERETGWLKVTEPLDRERIATYT 229 Sequence name: /tmp/rMRrwmuokD/lrmk2jOfg : CADlJiUMAN
Sequence documentation:
Alignment of: HSECADH_P15 x CADlJiUMAN
Alignment segment 1/1:
Quality: 2289.00 Escore: 0 Matching length: 229 Total length: 229 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGR 50
51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100 I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLV 100
101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150 I I I I I I I I I I I II I I I II I I I I I I II I I I II II I II I I I I I I I I I I I I I I 101 YAWDSTYRKFSTKVTLNTVGHHHRPPPHQASVSGIQAELLTFPNSSPGLR 150
151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200 I I I I I II I II I I II I I I I I I I I I I I II II I II I I I I I I I I I I I I II I I I I 151 RQKRDWVIPPISCPENEKGPFPKNLVQIKSNKDKEGKVFYSITGQGADTP 200
201 PVGVFI IERETGWLKVTEPLDRERIATYT 22 9 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 PVGVFI IERETGWLKVTEPLDRERIATYT 229
DESCRIPTION FOR CLUSTER HUMGRP5E Cluster HUMGRP5E features 2 transcript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Gastrin-releasing peptide precursor (SwissProt accession identifier GRPJTUMAN; known also according to the synonyms GRP; GRP- 10), SEQ ID NO. 107, refened to herein as the previously known protein. Gastrin-releasing peptide is known or believed to have the following functιon(s): stimulates gastrin release as well as other gastrointestinal hormones. The sequence for protein Gastrin-releasing peptide precursor is given at the end of the application, as "Gastrin-releasing peptide precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Gastrin-releasing peptide localization is believed to be Secreted.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Diabetes, Type II. It has been investigated for clinical therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Bombesin antagonist; Insulinotropin agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the dmg database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anorectic/ Antiobesity; Releasing honnone; Anticancer; Respiratory; Antidiabetic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; neuropeptide signaling pathway, which are annotation(s) related to Biological Process; growth factor, which are annotation(s) related to Molecular Function; and secreted, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMGRP5E features 2 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Gastrin-releasing peptide precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMGRP5E P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGRP5E JM. An alignment is given to the known protein (Gastrin-releasing peptide precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HUMGRP5E P4 and GRP_HUMAN: l.An isolated chimeric polypeptide encoding for HUMGRP5E P4, comprising a first amino acid sequence being at least 90 % homologous to
MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRPJTUMAN, which also conesponds to amino acids 1 - 127 of HUMGRP5E P4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ conesponding to amino acids 135 - 148 of GRPJIUMAN, which also corresponds to amino acids 128 - 141 of HUMGRP5EJM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMGRP5EJM, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a stmcture as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HUMGRP5E P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5E P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMGRP5E P4 is encoded by the following transcript(s): HUMGRP5E T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGRP5E T4 is shown in bold; this coding portion starts at position 622 and ends at position 1044 The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein HUMGRP5EJM sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 6 - Nucleic acid SNPs
Vanant protein HUMGRP5E P5 according to the present invention has an amino acid sequence as given at the end of the application, it is encoded by transcnpt(s) HUMGRP5E_T5. An alignment is given to the known protem (Gastnn- releasing peptide precursor) at the end of the application One or more alignments to one or more previously published protem sequences are given at the end of the application A bπef descπption of the relationship of the vanant protein accordmg to the present invention to each such aligned protein is as follows
Compaπson report between HUMGRP5E P5 and GRP HUMAN l.An isolated chimeric polypeptide encoding for HUMGRP5E_P5, compnsing a first amino acid sequence being at least 90 %> homologous to MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK conesponding to amino acids 1 - 127 of GRP HUMAN, which also corresponds to amino acids 1 - 127 of HUMGRP5E P5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS conesponding to amino acids 128 - 142 of HUMGRP5EJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMGRP5E P5, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HUMGRP5E P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5E P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HUMGRP5E P5 is encoded by the following transcript(s): HUMGRP5E_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGRP5E_T5 is shown in bold; this coding portion starts at position 622 and ends at position 1047. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5E_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster HUMGRP5E features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMGRP5E_node_0 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5E_T4 and HUMGRP5E T5. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster HUMGRP5E_node_2 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5E_T4 and HUMGRP5E T5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster HUMGRP5E_node_8 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5E T4 and HUMGRP5E T5. Table 1 1 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMGRP5E_node 3 according to the present invention can be found in the following transcript(s): HUMGRP5E T4 and HUMGRP5E T5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HUMGRP5E_node_7 according to the present invention can be found in the following transcript(s): HUMGRP5E_T5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/412zs2m yT/B0wjOUAX0d:GRP_HUMAN
Sequence documentation:
Alignment of : HUMGRP5E_P4 x GRPJiUMAN
Alignment segment 1/1 Quality: 1291.00 Escore: 0 Matching length: 141 Total length: 148 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.27 Total Percent Identity: 95.27 Gaps: 1
Alignment:
1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50
51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 I I I I I I I I I I I I I i I I I II I II I I I I I I I I II I I I I II I I II II I II I II 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 101 PKALGNQQPSWDSEDSSNFKDVGSKGK GSQREGRNPQLNQQ 141 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 101 PKALGNQQPSWDSEDSSNFKDVGSKGKVGRLSAPGSQREGRNPQLNQQ 148
Sequence name: /tmρ/lme91dnvfv/KbP5io8PtU:GRPJiUMAN
Sequence documentation:
Alignment of: HUMGRP5EJP5 x GRPJiUMAN
Alignment segment 1/1: Quality: 1248.00 Escore: 0 Matching length: 127 Total length: 127 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 I II I I I II II I I I I I I I I I II II II II II I II II I II I I II I I I I I II I I 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50
51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 I I I II I I I II I I I I I I I I I I I II II I I II I I II II I I I I I II I I I I II II 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100
101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127 I I I I I I II I I I I II I I I I I II I I I I I I 101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127
Expression of GRP HUMAN - gastrin-releasing peptide HUMGRP5E transcripts which are detectable by amplicon as depicted in sequence name HUMGRP5E junc3-7 in normal and cancerous ovary tissues Expression of GRP HUMAN - gastrin-releasing peptide transcripts detectable by or according to junc3-7, HUMGRP5Ejunc3-7 amplicon(s) and HUMGRP5Ejunc3-7F and
HUMGRP5Ejunc3-7R primers was measured by real time PCR. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -ampliconand SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-a plicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample No 45-48, 71 Table 1 above, "Tissue samples in testing panel"), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 13 is a histogram showing over expression of the above -indicated GRP_HUMAN
- gastrin-releasing peptide transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained). As is evident from Figure 13, the expression of GRP_HUMAN - gastrin-releasing peptide transcripts detectable by the above amplicon(s) in several cancer samples was higher in several cancerous samples than in the non-cancerous samples (Sample No. 45, 47-48, 71 Table 1 above, "Tissue samples in testing panel") and including benign samples (samples No. 57-62 Table 1 above, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 13 out of 43 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HUMGRP5Ejunc3-7F forward primer; and HUMGRP5Ejunc3-7R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: HUMGRP5Ejunc3-7.
HUMGRP5Ejunc3-7F (SEQ ID NO:965) ACCAGCCACCTCAACCCA HUMGRP5Ejunc3-7R (SEQ ID NO:966) CTGGAGCAGAGAGTCTTTGCCT HUMGRP5Ejunc3-7 (SEQ ID N0.967) ACCAGCCACCTCAACCCAAGGCCCTGGGCAATCAGCAGCCTTCGTGGGATTCAGAG GATAGCAGCAACTTCAAAGATGTAGGTTCAAAAGGCAAAGACTCTCTGCTCCAG
Expression of GRP HUMAN - gastrin-releasing peptideHUMGRP5E transcripts, which are detectable by amplicon as depicted in sequence name HUMGRP5E junc3-7 in different normal tissues. Expression of GRP HUM AN - gastrin-releasing peptide transcripts detectable by or according to HUMGRP5E junc3-7 amplicon(s) and HUMGRP5E junc3-7F and HUMGRP5E junc3-7R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35 above), to obtain a value of relative expression of each sample relative to median of the breast samples. The results are described in Figure 14, presenting the histogram showing the expression of HUMGRP5E transcripts, which are detectable by amplicon as depicted in sequence name HUMGRP5E junc3-7, in different normal tissues. Primers and amplicons are as above. DESCRIPTION FOR CLUSTER Rl 1723 Cluster Rl 1723 features 6 transcript(s) and 26 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments oj interest
Table 3 - Proteins of interest
Cluster Rl 1723 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 15refer to weighted express n of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 15 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and kidney malignant tumors.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
■.-sjyel adrenal 4.2e-01 4.6e-01 4.6e-01 2.2 5.3e-01 1.9 brain 2.2e-01 2.0e-01 1.2e-02 2.8 5.0e-02 2.0 epithelial 3.0e-05 6.3e-05 1.8e-05 6.3 3.4e-06 6.4 general 7.2e-03 4.0e-02 1.3e-04 2.1 l .le-03 1.7 head and neck 5.0e-01 1.0 7.5e-01 1.3 kidney 1.5e-01 2.4e-01 4.4e-03 5.4 2.8e-02 3.6 lung 1.2e-01 1.6e-01 1.6 1.3 breast 5.9e-01 4.4e-01 1.1 6.8e-01 1.5 ovary 1.6e-02 1.3e-02 l.Oe-01 3.8 7.0e-02 3.5 pancreas 5.5e-01 2.0e-01 3.9e-01 1.9 1.4e-01 2.1 skin 1 4.4e-01 To" 1.9e-02 T utems 1.5e-02 5.4e-02 1.9e-01 TT 1.4e-01 TT 436
It should be noted that the variants of this cluster are variants of the hypothetical protein PSEC0181 (referred to herein as "PSEC"). Furtheπnore, use of the known protein (WT protein) for detection of ovanan cancer, alone or in combination with one or more variants of this cluster and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention. As described in greater detail below, m ovarian cancer, the variants of the present invention show a similar expression patter to that of PSEC, except that at least one variant shows greater overexpression than PSEC in ovarian cancer. As noted above, cluster Rl 1723 features 6 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein Rl 1723 PEA 1 P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rl 1723 PEAJ T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Rl 1723_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723_PEAJ_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein Rl 1723_PEA_1_P2 is encoded by the following transcript(s): Rl 1723JPEA _1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Rl 1723_PEA_1_T6 is shown in bold; this coding portion starts at position 1716 and ends at position 2051. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723j°EAJ_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein Rl 1723 PEA 1 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rl 1723 PEA J T15. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Rl 1723_PEA_1_P6 and Q8IXM0 (SEQ ID N0.968): 1.An isolated chimeric polypeptide encoding for Rl 1723 PEA J P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR conesponding to amino acids 1 - 110 of Rl 1723 PEAJ P6, and a second amino acid sequence being at least 90 % homologous to MYAQ ALL VVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAE VEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 1 - 112 of Q8IXM0, which also conesponds to amino acids 1 1 1 - 222 of Rl 1723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R11723_PEA_1_P6.
Comparison report between Rl 1723 JPEAJ_P6 and Q96AC2 (SEQ ID NO:969): l.An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P6, comprising a first amino acid sequence being at least 90 %> homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q96AC2, which also conesponds to amino acids 1 - 83 of Rl 1723 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RG IRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 84 - 222 of Rl 1723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723 PEAJ P6.
Comparison report between Rl 1723_PEA_1_P6 and Q8N2G4 (SEQ ID NO:970): l .An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q8N2G4, which also conesponds to amino acids 1 - 83 of R11723 PEAJ P6, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl 1723 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in Rl 1723 PEAJ P6.
Comparison report between Rl 1723_PEA_1_P6 and BAC85518 (SEQ ID NO:971): l .An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 24 - 106 of BAC85518, which also conesponds to amino acids 1 - 83 of Rl 1723 PEA J P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl 1723 PEA J J>6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723_PEA_1_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Rl 1723_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723 PEA 1 _P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein Rl 1723 PEAJ P6 is encoded by the following transcript(s): Rl 1723JΕAJ JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Rl 1723_PEAJ_T15 is shown in bold; this coding portion starts at position 434 and ends at position 1099. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723 PEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein Rl 1723_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rl 1723_PEA_1_T17. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Rl 1723_PEA_1_P7 and Q96AC2: l .An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQ1QCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also conesponds to amino acids 1 - 64 of Rl 1723 PEAJ P7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723 PEAJ P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7. Comparison report between Rl 1723_PEA_1_P7 and Q8N2G4: l.An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 of Rl 1723 PEAJ P7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in Rl 1723_PEAJ_P7.
Comparison report between Rl 1723_PEA_1_P7 and BAC85273 (SEQ ID NO:972): l .An isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of Rl 1723JPEAJJP7, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG conesponding to amino acids 22 - 80 of BAC85273, which also conesponds to amino acids 6 - 64 of Rl 1723 PEAJ P7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723_PEA_1_P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of Rl 1723_PEA_1_P7. 3.An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7. Comparison report between Rl 1723_PEA_1_P7 and BAC85 18: l.An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 24 - 87 of BAC85518, which also corresponds to amino acids 1 - 64 of Rl 1723_PEAJ_P7, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of Rl 1723JPEAJ JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in Rl 1723_PEA_1_P7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Rl 1723 PEA 1 P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723 PEAJ P7 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 10 - Amino acid mutations
Variant protein Rl 1723_PEA_1_P7 is encoded by the following transcript(s): Rl 1723_PEAJ_T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Rl 1723j°EA_l_T17 is shown in bold; this coding portion starts at position 434 and ends at position 712. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723_PEAJ_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein Rl 1723_PEAJ_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) 19 and Rl 1723 PEA 1 T5. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Rl 1723_PEA_1_P13 and Q96AC2: l .An isolated chimeric polypeptide encoding for R11723 PEAJ JM3, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of Rl 1723_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723JPEAJ JM3, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in Rl 1723 PEA 1 P13.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region..
Variant protein R11723_PEA_1_P13 is encoded by the following transcript(s): Rl 1723_PEA_1_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcnpt Rl 1723 PEAJ T19 is shown in bold; this coding portion starts at position 434 and ends at position 685. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723 PEAJ P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein Rl 1723 PEA J P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rl 1723_PEA_1_T20. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Rl 1723JPEAJ J O and Q96AC2: l .An isolated chimeric polypeptide encoding for Rl 1723 PEAJ JMO, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ JMO, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEAJ JMO, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723 PEA 1 P10, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723_PEA_1_P10. Comparison report between Rl 1723_PEA_1_P10 and Q8N2G4: l.An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P10, comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ JMO, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of
Rl 1723 PEAJ JMO, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723 PEA 1 P10.
Comparison report between Rl 1723_PEA_1_P10 and BAC85273: l.An isolated chimeric polypeptide encoding for Rl 1723 JΕAJ P10, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of Rl 1723 JΕAJ JMO, second amino acid sequence being at least 90 %> homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA conesponding to amino acids 22 - 79 of BAC85273, which also conesponds to amino acids 6 -
63 of Rl 1723 PEAJ P10, and a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least
95% homologous to a polypeptide having the sequence
DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEAJ P10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of Rl 1723_PEA I P 10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of Rl 1723_PEA_1_P10. 3. An isolated polypeptide encoding for a tail of Rl 1723_PEA_ 1 _P 10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723 JΕAJ JMO.
Comparison report between Rl 1723_PEA_1_P10 and BAC85518: 1.An isolated chimeric polypeptide encoding for Rl 1723 JΕAJ JMO, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ P10, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723 PEA 1 JMO, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723 JΕAJ JMO.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein R11723 PEAJ P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Rl 1723 PEAJ P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein Rl 1723 JΕAJ JMO is encoded by the following transcπpt(s): Rl 1723 JPEAJ_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcπpt Rl 1723 PEA J T20 is shown in bold; this coding portion starts at position 434 and ends at position 703. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein Rl 1723JΕAJ JMO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster Rl 1723 features 26 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster Rl 1723_PEAJ_nodeJ3 according to the present invention is supported by 5 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723JPEAJ T19, R11723JPEAJJT5 and Rl 1723_PEA_1_T6 Table 15 below describes the starting and ending position of this segment on each transcript Table 15 - Segment location on transcripts
Segment cluster Rl 1723 J°EAJ_nodeJ6 according to the present invention is supported by 3 libraries The number of libranes was deteπnined as previously descπbed This segment can be found in the following transcπpt(s) Rl 1723_PEA_1_T17, Rl 1723JPEAJ T19 and Rl 1723_PEA_1_T20 Table 16 below descπbes the starting and ending position of this segment on each transcπpt Table 16 - Segment location on transcripts
Segment cluster R11723_PEA_l_nodeJ9 according to the present invention is supported by 45 libranes The number of libraries was determined as previously described This segment can be found in the following transcnpt(s) Rl 1723_PEA_1_T5 and Rl 1723_PEA_1_T6 Table 17 below describes the starting and ending position of this segment on each transcript Table 17 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_2 according to the present invention is supported by 29 libraries. The number of libranes was determined as previously described This segment can be found in the following transcπpt(s). R11723_PEA_1_T15, Rl 1723 PEA 1 T17, Rl 1723 _PEA_1_T19, Rl 1723 PEAJ JT20, Rl 1723J>EAJ_T5 and Rl 1723_PEAJ_T6 Table 18 below describes the starting and ending position of this segment on each transcript Table 18 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_22 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described This segment can be found in the following transcript(s): R11723_PEA_1_T5 and R11723 PEAJ T6. Table 19 below descnbes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster R11723_PEAJ_node_31 according to the present mvention is supported by 70 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcnpt(s): Rl 1723 PEA 1 T15, Rl 1723_PEA_1_T5 and Rl 1723_PEA_1_T6. Table 20 below describes the starting and ending position of this segment on each transcript (it should be noted that these transcripts show alternative polyadenylation). Table 20 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Rl 1723_PEA_l_node_10 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723_PEAJ_T15, Rl 1723_PEA_1_T17, Rl 1723JΕAJ JT9, Rl 1723JΕAJJT20, Rl 1723_PEA_1_T5 and Rl 1723JΕAJ JT6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_l 1 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723_PEA_1_T15, Rl 1723_PEA_1_T17, Rl 1723_PEA_1_T19, R l 1723_PEAJ_T20, Rl 1723_PEA_1_T5 and Rl 1723_PEA_1_T6. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_15 according to the present invention can be found in the following transcript(s): Rl 1723_PEA_1_T20. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_18 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723_PEA_1_T15, Rl 1723 JPEAJ T5 and Rl 1723 PEAJ T6. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node_20 according to the present invention can be found in the following transcript(s): Rl 1723_PEA_1_T5 and Rl 1723_PEA_1_T6. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node_21 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): R11723_PEAJ_T5 and R11723_PEAJ_T6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
R11723 PEA 1 T5 1020 1082 R11723 PEA 1 T6 1054 1116
Segment cluster Rl 1723_PEAJ_node_23 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723_PEAJ_T5 and Rl 1723_PEAJ_T6. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_24 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723 PEAJ T15, Rl 1723_PEA_1_T5 and R11723 PEAJ JT6. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node_25 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723 PEA 1 T15, Rl 1723_PEA_1_T5 and Rl 1723_PEAJ_T6. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_26 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723_PEA_1_T15, Rl 1723 JΕAJ _T5 and Rl 1723_PEA_1_T6. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_27 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1_T15, R11723_PEA_1_T5 and Rl 1723_PEAJ_T6. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
R11723 PEA 1 T15 905 986 R11723 PEA 1 T5 1823 1904 R11723 PEA 1 T6 1857 1938
Segment cluster Rl 1723_PEA_l_node_28 according to the present invention can be found in the following transcript(s): Rl 1723_PEAJ_T15, R11723_PEAJ_T5 and Rl 1723_PEA_1_T6. Table 32 below describes the starting and ending position of this segment on each transcript.
Segment cluster Rl 1723_PEA_l_node_29 according to the present invention is supported by 69 libranes The number of libraries was determined as previously described This segment can be found m the following transcπpt(s) Rl 1723_PEA_1_T15, Rl 1723 PEAJ T5 and Rl 1723_PEA_1_T6 Table 33 below describes the starting and ending position of this segment on each transcπpt Table 33 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node_3 according to the present invention can be found in the following transcπpt(s) Rl 1723_PEA_1_T15, Rl 1723 J>E A I T 17, Rl 1723 J>EAJ_T19, Rl 1723_PEAJ_T20, Rl 1723 PEAJ T5 and Rl 1723JΕAJ JT6. Table 34 below descnbes the starting and ending position of this segment on each transcript Table 34 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_30 according to the present invention can be found in the following transcript(s): Rl 1723_PEA_1_T15, Rl 1723_PEA_1_T5 and Rl 1723_PEA_1_T6. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node_4 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1_T15, R11723_PEAJ_T17, Rl 1723_PEA_1_T19, Rl 1723_PEA_1_T20, Rl 1723_PEA_1_T5 and Rl 1723_PEA J T6. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster Rl 1723 PEA _l_node_5 according to the present invention is supported by 26 libraries The number of libraries was deteπnined as previously described This segment can be found in the following transcπpt(s) Rl 1723 PEA 1 T15, Rl 1723_PEAJ_T17, R11723_PEA_1_T19, Rl 1723_PEA_1_T20, Rl 1723_PEA_1_T5 and Rl 1723 PEAJ T6 Table 37 below descπbes the startmg and ending position of this segment on each transcript Table 37 - Segment location on transcripts iPPranscnpt namei Segment startinf position R11723 PEA 1 T15 372 414 R11723 PEA 1 T17 372 414 R11723 PEA 1 T19 372 414 R11723 PEA 1 T20 372 414 R11723 PEA 1 T5 372 414 R11723 PEA 1 T6 372 414
Segment cluster Rl 1723_PEAJ_node_6 according to the present invention is supported by 27 libranes The number of libranes was determined as previously descπbed This segment can be found in the following transcnpt(s) Rl 1723_PEA_1_T15, Rl 1723_PEA_1_T17, R11723_PEA_1_T19, R11723JΕAJJT20, R11723_PEA_1_T5 and R11723_PEA_1_T6. Table 38 below describes the starting and ending position of this segment on each transcript Table 38 - Segment location on transcripts
Segment cluster Rl 1723_PEAJ_node 7 according to the present invention is supported by 29 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): R11723JΕAJ T15, Rl 1723_PEA J_T17, R1 1723JΕAJ_T19, R1 1723 PEAJJT20, R1 1723JΕAJJT5 and R1 1723_PEA_1_T6. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster Rl 1723_PEA_l_node_8 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl 1723 PEAJ JT6. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb :Q8IXM0
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q8IXM0
Alignment segment 1/1: Quality: 1128.00
Escore: 0 Matching length: 112 Total length: 112 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Al ignment :
111 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 160 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MYAQALLWGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 50
161 GEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 210 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 GEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 100 211 RQRKEKHSMRTQ 222 I I I I I I I I I I I I 101 RQRKEKHSMRTQ 112
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb : Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q96AC2
Alignment segment 1/1:
Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb :Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q8N2G4
Alignment segment 1/1:
Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 I I I I II I I I I II II I II I I I I I I I I I I I I I I I I 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb :BAC85518
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x BAC85518
Alignment segment 1/1: Quality: 835.00
Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I II I I I I II I I I I II I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 I II I II I I I I I I I I I I I I II II I I II I I I I II I 74 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 106
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh :Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q96AC2
Alignment segment 1/1:
Quality: 654.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I II I I I II I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAG 64 II I I I I I II I I I II 51 QDMCQKEVMEQSAG 64
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh :Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q8N2G4
Alignment segment 1/1:
Quality: 654.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I II I II I I I I I II I II I I I I I I I I I I II I I II I I II I I II I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAG 64 I I I I I I I I II I II I 51 QDMCQKEVMEQSAG 64
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:BAC85273
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x BAC85273
Alignment segment 1/1:
Quality: 600.00 Escore: 0 Matching length: 59 Total length: 59 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 I II I I I I I I I I I I II II I I II I II I I II I II I I I I I I I I II I I I I I I I I I 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71
56 KEVMEQSAG 64 I I I I I I I 72 KEVMEQSAG 80
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:BAC8551S
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x BAC85518
Alignment segment 1/1:
Quality: 654.00
Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I M I I I I I I I II I I I I II M I II I I I I I I I I I I I I I I II I I I I I I I I 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSAG 64 II I I I I I I II I I I I 74 QDMCQKEVMEQSAG 87
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x Q96AC2
Alignment segment 1/1: Quality: 645.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSA 63 I I I I I I I I I I I I I 51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/ρc7Z7XmlYR:Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x Q8N2G4
Alignment segment 1/1: Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I II I I I I I II I II II I I I I I I I I I I I I I I I II I II I I I I I II I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSA 63 I I I I II I I I I I I I 51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85273
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x BAC85273 Alignment segment 1/1:
Quality : 591.00
Escore : 0 Matching length: 58 Total length : 58 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71
56 KEVMEQSA 63 I I I I I I I I 72 KEVMEQSA 79
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85518
Sequence documentation: Alignment of: R11723_PEA_1_P10 x BAC85518
Alignment segment 1/1: Quality: 645.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II I I I I I I I I II I I I 24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSA 63 I I I I I II I I I II I 74 QDMCQKEVMEQSA 86
Alignment of: R11723_PEA_1_P13 x Q96AC2 Alignment segment 1/1:
Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSA 63 I I I I I I I I I I I || 51 QDMCQKEVMEQSA 63
Expression of Rl 1723 transcripts which are detectable by amplicon as depicted in sequence Rl 1723 segl 3 in normal and cancerous ovary tissues Expression of transcripts detectable by or according to segl 3, R11723segl3 amplicon(s) and Rl 1723segl3F and Rl 1723segl3R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl - amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71 , Table 1 , "Tissue samples in testing panel", above), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 16 is a histogram showing over expression of the above -indicated transcripts in cancerous ovary samples relative to the normal PM samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 16, the expression of transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 45-48, 71, Table 1 , "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 23 out of 43 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 4.76E-04. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 2M8E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair Rl 1723seglF forward primer; and Rl 1723segl3R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Rl 1723segl3.
Rl 1723segl3F (SEQ ID NO:973)- ACACTAAAAGAACAAACACCTTGCTC Rl 1723segl3R (SEQ ID NO:974)- TCCTCAGAAGGCACATGAAAGA Rl 1723segl3 (SEQ ID NO:975 - ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTG ACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTCTGAGGA
Expression of Rl 1723 transcripts which are detectable by amplicon as depicted in sequence name Rl 1723segl3 in different normal tissues
Expression of Rl 1723 transcripts detectable by or according to Rl 1723segl3 amplicon and Rl 1723segl3F, Rl 1723segl3R was measured by real time PCR. In parallel the expression of four housekeeping genes RPL 19 (GenBank Accession No. NMJ300981; RPL 19 amplicon), TATA box (GenBank Accession No. NM 003194; TATA amplicon), Ubiquitin(GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2 above, "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 17, presenting the histogram showing the expression of R11723 transcripts, which are detectable by amplicon as depicted in sequence name Rl 1723segl3, in different normal tissues. Primers and amplicon are as above.
Expression of Rl 1723 transcripts, which are detectable by amplicon as depicted in sequence Rl 1723 juncl 1- 18 in normal and cancerous ovary tissues Expression of transcripts detectable by or according to juncl 1- 18 R1 1723 juncl 1- 18 amplicon and R11723 juncl 1- 18F and R1 172 juncl 1- 18R primers was measured by real time PCR (It should be noted that the variants of this cluster are variants of the hypothetical protein PSEC0181 (refened to herein as "PSEC"). Furthermore, use of the known protein (WT protein) for detection of ovarian cancer, alone or in combination with one or more variants of this cluster and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NMJD00194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos 45-48, 71, Table 1, above: "Tissue samples in ovarian cancer testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 18 is a histogram showing over expression of the above -indicated transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained. As is evident from Figure 18, the expression of transcripts detectable by the above amplicon in cancer samples was higher than in the non-cancerous samples (Sample Nos 45-48, 71 Table 1, "Tissue samples in ovarian cancer testing panel"). Notably an over- expression of at least 5 fold was found in 23 out of 43 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example,' for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair R l 1723 juncl 1 - 18F forward primer; and Rl 1723 juncl 1-18R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Rl 1723 juncl 1- 18
R1 1723juncl l- 18F (SEQ ID NO:976)- AGTGATGGAGCAAAGTGCCG Rl 1723 juncl 1- 18R (SEQ ID NO:977>- CAGCAGCTGATGCAAACTGAG R11723 juncl l- 18 (SEQ ID NO:978>-
AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGG CCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACT CAGTTTGCATCAGCTGCTG
Expression of Rl 1723 transcripts, which are detectable by amplicon as depicted in sequence name Rl 1723 junc 11 - 18 in different normal tissues Expression of Rl 1723 transcripts detectable by or according to Rl 1723segl3 amplicon and Rl 1723 juncl 1-18F, Rl 1723 juncl 1-18R was measured by real time PCR. In parallel the expression of four housekeeping genes- RPL19 (GenBank Accession No. NM 000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM 003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20 Table 2 above: "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are descnbed in Figure 19, presenting the histogram showing the expression of Rl 1723 transcπpts, which are detectable by amplicon as depicted in sequence name Rl 1723 junc 1 1 - 18, in different normal tissues Amplicon and primers are as above
DESCRIPTION FOR CLUSTER D56406 Cluster D56406 features 3 transcπpt(s) and 10 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein vaπants are given in table 3 Table 1 - Transcripts of interest
D56406 PEA 1 T3 147 D56406 PEA 1 T6 148 D56406 PEA 1 T7 149
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Neurotensin/neuromedin N precursor [Contains: Large neuromedin N (NmN- 125); Neuromedin N (NmN) (NN); Neurotensin (NT); Tail peptide] (SwissProt accession identifier NEUTJJUMAN), SEQ ID NO: 160, refened to herein as the previously known protein. Protein Neurotensin/neuromedin N precursor is known or believed to have the following function(s): Neurotensin may play an endocrine or paracrine role in the regulation of fat metabolism. It causes contraction of smooth muscle. The sequence for protein Neurotensin/neuromedin N precursor is given at the end of the application, as "Neurotensin/neuromedin N precursor [Contains: Large neuromedin N (NmN- 125); Neuromedin N (NmN) (NN); Neurotensin (NT); Tail peptide] amino acid sequence". Protein Neurotensin/neuromedin N precursor localization is believed to be secreted; packaged within secretory vesicles. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction, which are annotation(s) related to Biological Process; neuropeptide hormone, which are annotation(s) related to Molecular Function; and extracellular; soluble fraction, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster D56406 features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Neurotensin neuromedin N precursor. A description of each variant protein according to the present invention is now provided. Variant protein D56406_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D56406 PEAJ T3. An alignment is given to the known protein (Neurotensin/neuromedin N precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D56406 PEAJ P2 and NEUT HUMAN: l .An isolated chimeric polypeptide encoding for D56406 PEAJ P2, comprising a first amino acid sequence being at least 90 % homologous to
MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMT LLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAF QHWE conesponding to amino acids 1 - 120 of NEUTJTUMAN, which also conesponds to amino acids 1 - 120 of D56406 PEAJ _P2, second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence ARWLTPVIPALWEAETGGSRGQEMETIPANT conesponding to amino acids 121 - 151 of D56406 PEA 1 P2, and a third amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKJPYILKRQLYENKPRRPYILKRDSYYY conesponding to amino acids 121 - 170 of NEUTjTUMAN, which also conesponds to amino acids 152 - 201 of D56406 PEA 1 P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of D56406 PEAJ P2, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at feast about 95% homologous to the sequence encoding for ARWLTPVIPALWEAETGGSRGQEMETIPANT, conesponding to D56406JΕAJ _P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein D56406_PEA_1_P2 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
Variant protein D56406 PEA 1 P2 is encoded by the following transcript(s): D56406_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D56406 PEAJ T3 is shown in bold; this coding portion starts at position 106 and ends at position 708. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406_PEA J P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein D56406 PEAJ P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D56406 PEAJ T6. An alignment is given to the known protein (Neurotensin/neuromedin N precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D56406 PEA J P5 and NEUT HUMAN: 1.An isolated chimeric polypeptide encoding for D56406 PEA J_P5, comprising a first amino acid sequence being at least 90 % homologous to MMAGMKIQLVCMLLLAFSSWSLC conesponding to amino acids 1 - 23 of NEUT HUMAN, which also conesponds to amino acids 1 - 23 of D56406 PEAJ P5, and a second amino acid sequence being at least 90 % homologous to SEEEMKALEADFLTINMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEEL VARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKR KIPYILKRQLYENKPRRPYILKRDSYYY conesponding to amino acids 26 - 170 of NEUT_HUMAN, which also conesponds to amino acids 24 - 168 of D56406JΕAJ J?5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of D56406_PEAJ_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise CS, having a stmcture as follows: a sequence starting from any of amino acid numbers 23-x to 23; and ending at any of amino acid numbers 24 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein D56406 PEAJ P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406 PEAJ P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein D56406 PEA 1 P5 is encoded by the following transcript(s): D56406 PEAJ T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D56406_PEA_1_T6 is shown in bold; this coding portion starts at position 106 and ends at position 609. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406 PEA _1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein D56406 PEA 1 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D56406 PEA _1_T7. An alignment is given to the known protein (Neurotensin neuromedin N precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D56406_PEA_1_P6 and NEUTJ-IUMAN: l .An isolated chimeric polypeptide encoding for D56406_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK conesponding to amino acids 1 - 45 of NEUT HUMAN, which also conesponds to amino acids 1 - 45 of D56406 PEAJ P6, and a second amino acid sequence being at least 90 % homologous to LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to amino acids 121 - 170 of NEUT_HUMAN, which also conesponds to amino acids 46 - 95 of D56406 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of D56406 PEA 1 P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KL, having a structure as follows: a sequence starting from any of amino acid numbers 45-x to 45; and ending at any of amino acid numbers 46+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein D56406 PEAJ P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406 PEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein D56406 PEAJ P6 is encoded by the following transcript(s): D56406_PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D56406_PEA_1_T7 is shown in bold; this coding portion starts at position 106 and ends at position 390. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D56406 PEAJ P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
As noted above, cluster D56406 features 10 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster D56406_PEA_l_node_0 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): D56406_PEA_1_T3, D56406JPEAJ JT6 and D56406 JΕAJ π. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 11. Table 11 - Oligonucleotides related to this segment
Segment cluster D56406_PEAJ_node_13 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA_1_T3, D56406_PEA_1_T6 and D56406j°EA_l_T7. Table 12 below describes the starting and ending position of this segment on each transcript.
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster D56406_PEAJ_node l 1 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406 PEA J T3. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster D56406_PEAJ_node_2 according to the present invention can be found in the following transcript(s): D56406_PEA_1_T3 and D56406 PEAJ JT7. Table 14 below describes the starting and ending position of this segment on each transcript. 7 6/e 14 - Segment location on transcripts ®?SMPβPPtPf i |S?8|WI
D56406 PEA 1 T3 179 184 D56406 PEA 1 T7 179 184 Segment cluster D56406 PEAJ node according to the present invention is supported by 46 libranes The number of libraries was determined as previously described This segment can be found in the following transcnpt(s): D56406_PEA_1_T3, D56406JΕAJ T6 and D56406 JΕAJ π. Table 15 below descπbes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster D56406_PEA_l_node_5 according to the present invention is supported by 48 libranes. The number of libraries was determined as previously described. This segment can be found m the following transcπpt(s): D56406JΕAJ T3 and D56406_PEAJ_T6. Table 16 below describes the starting and ending position of this segment on each transcπpt. Table 16 - Segment location on transcripts
Segment cluster D56406_PEAJ_node_6 according to the present invention is supported by 34 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406JΕAJ T3 and D56406_PEAJ_T6. Table 17 below descπbes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster D56406 JΕAJ node 7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406JΕAJ T3 and D56406_PEA_1_T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster D56406_PEAJ_node_8 according to the present invention can be found in the following transcript(s): D56406 PEAJ JT3 and D56406 PEAJ T6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster D56406_PEAJ_node_9 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D56406_PEA_1_T3 and D56M06 PEAJ JT6. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/jU49325aMA/8F0XuN7La5 :NEUT_HUMAN
Sequence documentation:
Alignment of: D56406_PEA_1_P2 x NEUTJiUMAN Alignment segment 1/1:
Quality: 1591.00 Escore: 0 Matching length: 170 Total length: 201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 84.58 Total Percent Identity: 84.58 Gaps : 1
Alignment :
1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50 I I I I I I I I I II I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50
51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 I II II I I I I I I II I II I I I I I II I I I I I I I I I I I II I I I I I I I I I I II I I 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100
101 MLTIYQLHKICHSRAFQHWEARWLTPVIPALWEAETGGSRGQEMETIPAN 150 I II II I I II I I I I I II I I I I 101 MLTIYQLHKICHSRAFQHWE 120
151 TLIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYY 200 I I II I I I I I I I I I I I I I I I I I I II I I II I I I II I I I I I I I I I II I II II 121 .LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYY 169
201 Y 201
170 Y 170
Sequence name: /tmp/ Wui8Kd4y9/zbf3ihRwnR: NEUTJiUMAN
Sequence documentation:
Alignment of: D56406_PEA_1_P5 x NEUTJiUMAN
Alignment segment 1/1: Quality: 1572.00 Escore: 0 Matching length: 168 Total length: 170 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 98.82 Total Percent Identity: 98.82 Gaps : 1
Alignment :
1 MMAGMKIQLVCMLLLAFSSWSLC.. SEEEMKALEADFLTNMHTSKISKAH 48 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50
49 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 98 I I I I I II I I I I I I II I I I I I I I I I I I I I II I I II I II I I I I I II I I I I I I 51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100 . . . . . 99 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 148 I I I I I I I I I I II I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 101 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 150 149 QLYENKPRRPYILKRDSYYY 168 I I I II I I I I I I I I II I I II I 151 QLYENKPRRPYILKRDSYYY 170 Sequence name: /tmp/f5d07fF5D7/E4N5xjUIAN :NEUT_HUMAN
Sequence documentation
Alignment of: D56406 PEA 1 P6 x NEUT HUMAN
Alignment segment 1/1
Quality: 844.00 Escore: 0 Matching length: 95 Total length: 170 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 55.88 Total Percent Identity: 55.88 Gaps :
Alignment :
1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK 45 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAH 50
45 45
51 VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100
46 LIQEDILDTGNDKNGKEEVIKRKIPYILKR 75 I I I I II I I I I I I I I I I I I II I I I I I II I I I 101 MLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKR 150
76 QLYENKPRRPYILKRDSYYY 95 I I I I I I I I I I I I I I I I I I I I 151 QLYENKPRRPYILKRDSYYY 170
DESCRIPTION FOR CLUSTER H53393 Cluster H53393 features 4 transcπpt(s) and 16 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein vanants are given in table 3 Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are vaπants of the known protein Cadhenn- 6 precursor (SwissProt accession identifier CAD6_HUMAN; known also according to the synonyms Kidney- cadhenn; K-cadheπn), SEQ ID NO 184, refened to herein as the previously known protein Protein Cadhenn- 6 precursor is known or believed to have the following functιon(s): Cadheπns are calcium dependent cell adhesion proteins. They preferentially interact with themselves in a homophihc manner in connecting cells; cadherins may thus contribute to the sortmg of heterogeneous cell types. The sequence for protein Cadheπn-6 precursor is given at the end of the application, as "Cadheπn-6 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Cadherin-6 precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion; homophilic cell adhesion, which are annotation(s) related to Biological Process; calcium binding; protein binding, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster H53393 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 20 refer to weighted expression of ESTs in each category, as 'bails per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 20 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
above. These transcript(s) encode for protein(s) which are variant(s) of protein Cadherin-6 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein H53393 PEA 1 P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53393 PEAJ T10. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53393 JΕAJ P2 and CAD6 JiUMAN: l.An isolated chimeric polypeptide encoding for H53393 PEAJ P2, comprising a first amino acid sequence being at least 90 %> homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKJITSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEK VYILRAQA RRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQWIQA JKDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATE1NNPKQSSRVPLY1KV LDVNDNAPEFAEFYETFVCEKAKADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNF TIQDNK conesponding to amino acids 1 - 543 of CAD6_HUMAN, which also corresponds to amino acids 1 - 543 of H53393 PEAJ P2, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence GK conesponding to amino acids 544 - 545 of H53393_PEA_1_P2, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region-
Variant protein H53393_PEA_1_P2 is encoded by the following transcript(s): H53393_PEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393 PEAJ T10 is shown in bold; this coding portion starts at position 327 and ends at position 1961. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein H53393_PEA_1_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53393 PEAJ T1 1 and H53393_PEA_1_T3. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53393 PEA _1_P3 and CAD6 HUMAN: l .An isolated chimeric polypeptide encoding for H53393 PEA 1 P3, comprising a first amino acid sequence being at least 90 % homologous to MRTYRYFLLLFWVGQPYPTLSTPLSKTITSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGDIQATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDP DNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKWYSILQGQPYFSVESETGIIKTALLNMDRENREQYQWIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEASNPYVEPR FLYLGPFKDSATVWVVEDVDEPP SKLAYILQIREDAQINTTIGSVTAQDPDAARNPV KYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRETLLWHNITVIATEINNPKQSSRVPLYIKV LDVNDNAPEFAEFYETFVCEKAKADQ conesponding to amino acids 1 - 504 of CAD6_HUMAN, which also conesponds to amino acids 1 - 504 of H53393 PEAJ P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RFGFSLS corresponding to amino acids 505 - 51 1 of H53393 PEA 1 P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2 An isolated polypeptide encoding for a tail of H53393_PEA_1_P3, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%o homologous to the sequence RFGFSLS in H53393_PEA_1_P3.
The location of the vanant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein H53393_PEA_1_P3 is encoded by the following transcript(s): H53393_PEA_1_T11 and H53393_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393JPEAJ JT11 is shown in bold; this coding portion starts at position 327 and ends at position 1859. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEA_1_P3 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 8 - Nucleic acid SNPs
The coding portion of transcript H53393_PEA_1_T3 is shown in bold, this coding portion starts at position 327 and ends at position 1859. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393 PEAJ P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein H53393 PEAJ P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53393 PEAJ T9. An alignment is given to the known protein (Cadherin-6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the vanant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53393_PEA_1_P6 and CAD6JTUMAN: l .An isolated chimeric polypeptide encoding for H53393 PEA J P6, comprising a first amino acid sequence being at least 90 % homologous to
MRTYRYFLLLFWVGQPYPTLSTPLSKJ^TSGFPAKKRALELSGNSKNELNRSKRSWMWN QFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDLFIINENTGD1QATKRLD REEKPVYILRAQAINRRTGRPVEPESEFIIKIHDINDNEPIFTKEVYTATVPEMSDVGTFVV QVTATDADDPTYGNSAKVVYSILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQA KDMGGQMGGLSGTTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADV GENAEIEYSITDGEGLDMFDVITDQETQEGIITVKK conesponding to amino acids 1 - 333 of CAD6 HUMAN, which also conesponds to amino acids 1 - 333 of H53393 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VMPLLKHHTE conesponding to amino acids 334 - 343 of H53393_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H53393_PEAJ_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VMPLLKHHTE in H53393 JΕAJ P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region..
Variant protein H53393 PEA 1 P6 is encoded by the following transcript(s): H53393 PEAJ T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53393 PEAJ T9 is shown in bold; this coding portion starts at position 327 and ends at position 1355. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53393_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster H53393_PEAJ_node_0 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393JPEAJJT10, H53393_PEA_1_T11, H53393 PEAJ T3 and H53393 PEAJ T9. Table 1 1 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_10 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393 PEA 1 T10, H53393JPEAJ JT 1, H53393 PEAJ T3 and H53393_PEA_1_T9 Table 12 below describes the starting and ending position of this segment on each transcπpt Table 12 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_12 according to the present invention is supported by 7 libranes The number of libranes was deteπnined as previously described This segment can be found in the following transcnpt(s) H53393_PEA_1_T10, H53393JΕAJ T11, H53393_PEA_1_T3 and H53393JPEAJ JT9 Table 13 below describes the starting and ending position of this segment on each transcnpt Table 13 - Segment location on transcripts
Segment cluster H53393_PEAJ_nodeJ3 according to the present invention is supported by 2 libranes The number of libranes was determined as previously descnbed This segment can be found in the following transcnpt(s) H53393 PEAJ T9 Table 14 below descπbes the starting and ending position of this segment on each transcπpt Table 14 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_15 according to the present invention is supported by 1 1 libranes. The number of libraries was determined as previously descπbed. This segment can be found in the following transcπpt(s): H53393 PEAJ T10, H53393JΕA J T1 1 and H53393 PEAJ T3. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_17 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393 PEAJ T10, H53393 JΕAJ T11 and H53393 PEAJ T3. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_19 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393 PEAJJT10, H53393 JΕAJ T11 and H53393 PEAJ JT3 Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_23 according to the present invention is supported by 1 libraries. The number of libranes was determined as previously descnbed. This segment can be found in the following transcπpt(s): H53393_PEA_1_T10 and H53393_PEA_1_T11. Table 18 below describes the starting and ending position of this segment on each transcript. 7 Z?/e 18 - Segment location on transcripts
Segment cluster H53393_PEAJ_node_24 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393 PEAJ T10, H53393 PEAJ T11 and H53393 PEAJ T3. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_25 according to the present invention is supported by 6 libraries The number of libraries was determined as previously descπbed This segment can be found in the following transcπpt(s) H53393_PEA_1_T10 and H53393 PEAJ JT 1 Table 20 below describes the starting and ending position of this segment on each transcript Table 20 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_29 according to the present invention is supported by 41 libranes The number of libranes was determined as previously descπbed This segment can be found in the following transcnpt(s) H53393_PEA_1_T3 Table 21 below descπbes the starting and ending position of this segment on each transcnpt Table 21 - Segment location on transcripts
Segment cluster H53393_PEAJ_node_4 according to the present invention is supported by 12 libranes The number of libraries was determined as previously descnbed This segment can be found in the following transcπpt(s) H53393 PEAJ T10, H53393 PEAJ T11, H53393 J>EA _1_T3 and H53393 PEAJ T9 Table 22 below describes the starting and ending position of this segment on each transcnpt Table 22 - Segment location on transcripts Segment starting position SeSfflpit, ehding positiot H53393 PEA 1 T10 199 554
Segment cluster H53393_PEA_l_node_6 according to the present invention is supported by 14 libranes The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) H53393_PEA_1_T10, H53393 PEAJ T1 1, H53393 PEAJ T3 and H53393_PEA_1_T9 Table 23 below describes the starting and ending position of this segment on each transcnpt Table 23 - Segment location on transcripts
Segment cluster H53393_PEA_l_node_8 according to the present mvention is supported by 12 libraries The number of libraries was determined as previously descnbed This segment can be found in the following transcπpt(s) H53393 PEAJ T10, H53393 PEAJ T11, H53393 PEAJ T3 and H53393 J»EAJ_T9 Table 24 below describes the starting and ending position of this segment on each transcnpt Table 24 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster H53393_PEAJ_node_21 according to the present invention can be found in the following transcript(s): H53393_PEA_1_T11 and H53393_PEA_1_T3. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster H53393 J>EAJ_node_22 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53393_PEA_1_T10, H53393_PEA_1_T11 and H53393_PEA_1_T3. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: /tmp/oAlc9u2qp7/lHgSZJi6al : CAD6_HUMAN
Sequence documentation:
Alignment of: H53393_PEA_J_P2 x CAD6_HUMAN
Alignment segment 1/1:
Quality: 5281.00 Escore: 0 Matching length: 543 Total length: 543 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50
51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100
101 FI INENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFI IKIH 150 I I I II I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I II I 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150
151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200 I I I I I I I I I I I I I I I II I I I I I I || I I I I I || I I I I I I I I I I I I I I I I II
151 DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200
201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250
251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 I I I I I II I II I I I II I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 . . . . .
301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 I I I II I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I 301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350
351 NPYVEPRFLYLGPFKDSATVRIWEDVDEPPVFSKLAYILQIREDAQINT 400 I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 351 NPYVEPRFLYLGPFKDSATVRIWEDVDEPPVFSKLAYILQIREDAQINT 400
401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I
401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450
451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 501 KADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNFTIQDNK 543 I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I 501 KADQLIQTLHAVDKDDPYSGHQFSFSLAPEAASGSNFTIQDNK 543
Sequence name: /tmp/I80QylyXbk/TP0IdLltx5 : CADδJiUMAN
Sequence documentation:
Alignment of: H53393_PEA_1_P3 x CAD6_HUMAN
Alignment segment 1/1:
Quality: 4900.00 Escore: 0 Matching length: 504 Total length: 504 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50
SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100
I I II I I II I I I I I I I I I I I I I I I I II I I I I I II I I II I I II I I II I I I II SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100
FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150
I I I I II I I I I I I I II I I I II I I I I I I I I I I I II I I II I I I I I I II I I I I I FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150
DINDNEPIFTKEVYTATVPEMSDVGTFWQVTATDADDPTYGNSAKVVYS 200
I I II I I II I I I I I I I I I I I I II I I I II I I I I I I I II I I I I I I I I I I I I II DINDNEPIFTKEVYTATVPEMSDVGTFVVQVTATDADDPTYGNSAKVVYS 200
ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250
TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 I I I I I I I I I I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I || I I I I I TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300
AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350 I I I I I I I II I II I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I AEIEYSITDGEGLDMFDVITDQETQEGIITVKKLLDFEKKKVYTLKVEAS 350
NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NPYVEPRFLYLGPFKDSATVRIVVEDVDEPPVFSKLAYILQIREDAQINT 400
TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSI FTSKLLDRET 450 II I I I II II I I II I I II I I I I II I I I I I I II I I I II I II I II I I I II I II 401 TIGSVTAQDPDAARNPVKYSVDRHTDMDRIFNIDSGNGSIFTSKLLDRET 450
451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500 I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I II I I I I I I I I I II 451 LLWHNITVIATEINNPKQSSRVPLYIKVLDVNDNAPEFAEFYETFVCEKA 500
501 KADQ 504 I I I I 501 KADQ 504
Sequence name: /tmp/Ntvj ylOCi/c5Li3091on : CAD6 HUMAN
Sequence documentation:
Alignment of: H53393 PEA 1 P6 x CAD6JiUMAN
Alignment segment 1/1 Quality: 3247.00
Escore: 0 Matching length: 335 Total length: 335 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.40 Total Percent Similarity: 100.00 Total Percent Identity: 99.40 Gaps : 0
Alignment:
1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50 I I II I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 MRTYRYFLLLFWVGQPYPTLSTPLSKRTSGFPAKKRALELSGNSKNELNR 50
51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SKRSWMWNQFFLLEEYTGSDYQYVGKLHSDQDRGDGSLKYILSGDGAGDL 100 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I 101 FIINENTGDIQATKRLDREEKPVYILRAQAINRRTGRPVEPESEFIIKIH 150
151 DINDNEPIFTKEVYTATVPEMSDVGTFWQVTATDADDPTYGNSAKVVYS 200 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 DINDNEPIFTKEVYTATVPEMSDVGTFWQVTATDADDPTYGNSAKVVYS 200
201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 201 ILQGQPYFSVESETGIIKTALLNMDRENREQYQVVIQAKDMGGQMGGLSG 250
251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300 I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 251 TTTVNITLTDVNDNPPRFPQSTYQFKTPESSPPGTPIGRIKASDADVGEN 300
301 AEIEYSITDGEGLDMFDVITDQETQEGIITVKKVM 335 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : 301 AEIEYS ITDGEGLDMFDVI TDQETQEGI ITVKKLL 335
Expression of CAD6 HUMAN Cadherin-6 [Precursor]; Kidney-cadherin; K-cadherin H53393 transcripts which are detectable by amplicon as depicted in sequence name H53393 segl 3 in normal and cancerous ovary tissues
Expression of CAD6 HUMAN Cadherin-6 [Precursor]; Kidney-cadherin; K-cadherin transcripts detectable by or according to segl3, H53393 segl3 amplicon(s) and H53393 segl3F and H53393 segl3R primers was measured by real time PCR. In this specific example, the realtime PCR reaction efficiency was assumed to be 2 and was not calculated by a standard curve reaction (as detailed above in the section of "Real-Time RT-PCR analysis "). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl - amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post- mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 21 is a histogram showing over expression of the above- indicated CAD6 HUMAN Cadherin-6 [Precursor] transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 21, the expression of CAD6 HUMAN Cadherin-6 [Precursor] transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 19 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CAD6 IUMAN Cadherin-6
[Precursor] transcripts detectable by the above amplicon(s) in ovary cancer samples versus the nonual tissue samples was deteπnined by T test as 5.5E-03. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 6.94E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: H53393 segl3F forward primer; and
H 53393 segl 3R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: H53393 segl3. H53393 segl 3 Forward primer (SEQ ID NO:979): AATGCCGCTTCTTAAACACCA
H53393 segl 3 Reverse primer (SEQ ID NO:980):
AGAACTGGCATTTTTCTGAAAATAATAA
H53393 segl 3 Amplicon (SEQ ID NO:981):
AATGCCGCTTCTTAAACACCATACAGAGTGAACCCATTTACTTTTCTCCAGTTCCTA AGTTACCAGGGGCAATTATATCTCACATAAACATTCCTTTAGATTTTTATTTTACTTA
TTATTTTCAGAAAAATGCCAGTTCT
Expression of CAD6 HUMAN Cadherin-6 [Precursor] H53393 transcripts which are detectable by amplicon as depicted in sequence name H53393 junc21-22 in normal and cancerous ovary tissues Expression of CAD6 HUMAN Cadherin-6 [Precursor] transcripts detectable by or according to junc21-22, H53393 junc21-22 amplicon(s) and H53393 junc21-22F and H53393 junc21-22R primers was measured by real time PCR. In this specific example, the real-time PCR reaction efficiency was assumed to be 2 and was not calculated by a standard curve reaction (as detailed above in the section of "Real-Time RT-PCR analysis"). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), FIPRT 1 (GenBank Accession No. NM 000194; amplicon - HPRTl - amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel", above), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 22 is a histogram showing over expression of the above- indicated CAD6JJUMAN Cadherin-6 [Precursor] transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 22, the expression of CAD6JXUMAN Cadherin-6 [Precursor] transcripts detectable by the above amplicon(s) in cancer samples was higher than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 23 out of 43 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: H53393 junc21-22F forward primer; and H53393 junc21-22R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: H53393 junc21- 22. H53393 junc21-22 Forward primer (SEQ ID NO:982): TGGTTTTTCTCTTAGTTGATTCAGACC
H53393 junc21-22 Reverse primer (SEQ ID NO:983): GAGCCACTGGCTGCTTCAG H53393 junc21-22 Amplicon (SEQ ID NO:984): TGGTTTTTCTCTTAGTTGATTCAGACCTTGCATGCTGTTGACAAGGATGACCCTTATA GTGGGCACCAATTTTCGTTTTCCTTGGCCCCTGAAGCAGCCAGTGGCTC DESCRIPTION FOR CLUSTER HSU40434 Cluster HSU40434 features 1 transcript(s) and 36 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
HSU40434 PEA 1 node 1 189 HSU40434 PEA 1 node 16 190 HSU40434 PEA 1 node 30 191 HSU40434 PEA 1 node 32 192 HSU40434 PEA 1 node 57 193 HSU40434 PEA 1 node 0 194 HSU40434 PEA 1 node 10 195 HSU40434 PEA 1 node 13 196 HSU40434 PEA 1 node 18 197 HSU40434 PEA 1 node 2 198 HSU40434 PEA 1 node 20 199 HSU40434 PEA 1 node 21 200 HSU40434 PEA 1 node 23 201 HSU40434 PEA 1 node 24 202 HSU40434 PEA 1 node 26 203 HSU40434 PEA 1 node 28 204 HSU40434 PEA 1 node 3 205 HSU40434 PEA 1 node 35 206 HSU40434 PEA 1 node 36 207
Table 3 - Proteins of interest
These sequences are variants of the known protein Mesothelin precursor (SwissProt accession identifier MSLNjTUMAN; known also according to the synonym CAKl antigen), SEQ ID NO: 225, refened to herein as the previously known protein. The variant proteins according to the present invention are variants of a known diagnostic marker, called Mesothelin(CAK-l). Protein Mesothelin precursor is known or believed to have the following function(s): may play a role in cellular adhesion. Antigenic protein reactive with antibody Kl. The sequence for protein Mesothelin precursor is given at the end of the application, as "Mesothelin precursor amino acid sequence". Protein Mesothelin precursor localization is believed to be attached to the membrane by a GPI-anchor. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion, which are annotation(s) related to Biological Process; protein binding, which are annotation(s) related to Molecular Function; and membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSU40434 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 23 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 23 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues, ovarian carcinoma and pancreas carcinoma.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
above. These transcript(s) encode for protein(s) which are variant(s) of protein Mesothelin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSU40434_PEA 1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSU40434_PEA_1_T 13. An alignment is given to the known protein (Mesothelin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSU40434JΕAJ JM2 and Q14859 (SEQ ID NO:985): l.An isolated chimeric polypeptide encoding for HSU40434j°EA_l_P12, comprising a first amino acid sequence being at least 90 %> homologous to
MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISS LSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDALP LDLLLFLNPDAFSGPQACTRFFSRJTKANVDLLPRGAPERQRLLPAALACWGVRGSLLS EADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGP PSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRRE VEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKH KLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMSPQVA TLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSS1W conesponding to amino acids 1 - 458 of Q 14859, which also conesponds to amino acids 1 - 458 of HSU40434 PEAJJM2. Comparison report between HSU40434 PEA 1 P12 and Q9BTR2 (SEQ ID NO:986): l .An isolated chimeric polypeptide encoding for HSU40434 PEAJ P12, comprising a first amino acid sequence being at least 90 % homologous to
MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ conesponding to amino acids 1 - 43 of Q9BTR2, which also conesponds to amino acids 1 - 43 of HSU40434JΕAJ JM2, second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence E conesponding to amino acids 44 - 44 of HSU40434 PEAJ JM2, and a third amino acid sequence being at least 90 %> homologous to AAPLDGVLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRC LAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLL PAALACWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQDQQE AARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIVAAWRQRSSRDPS WRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRV NAIPFTYEQLDVLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKAL LEVNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW conesponding to amino acids 44 - 457 of Q9BTR2, which also corresponds to amino acids 45 - 458 of HSU40434_PEA_1_P 12, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSU40434JΕAJ JM2, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence encoding for E, conesponding to HSU40434_PEA_1_P12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HSU40434 PEA 1 P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU40434 PEAJ JM2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSU40434JΕAJ JM2 is encoded by the following transcript(s): F1SU40434_PEA_1_T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSU40434_PEA_1_T13 is shown in bold; this coding portion starts at position 420 and ends at position 1793. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU40434_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster HSU40434 features 36 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSU40434_PEA_l_node_l according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_16 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster HSU40434 PEA l node 30 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_32 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_57 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSU40434_PEAJ_node_0 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_nodeJ0 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HSU40434 PEA J_nodeJ3 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ T13. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_18 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEA 1 T13. Table 16 below describes the starting and ending position of this segment on each transcript.
Segment cluster HSU40434_PEAJ_node_2 according to the present invention is supported by 11 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcπpt(s): HSU40434JΕA 1 T13. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_20 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HSU40434 JPEAJ_node_21 according to the present invention can be found in the following transcript(s): HSU40434 PEAJ T13. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_23 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434JPEAJJT13. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts ϊsSe menti ndmfeipQsiΛion|i| liliifeaili HSU40434 PEA 1 T13 930 1043
Segment cluster HSU40434_PEA_l_node_24 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434JΕAJ JT13. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_26 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEAJ_T13. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSU40434JΕ A J_node_28 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_3 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node 5 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ JT3. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSU40434 PEAJ node 36 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEA 1 T13. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node 37 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 27 below descπbes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HSU40434JΕA l_node_38 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEAJ JT3. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSU40434 J>EAJ_node_39 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434JΕAJ JT 13. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster FlSU40434_PEA_l_node_40 according to the present invention can be found in the following transcript(s): HSU40434 PEAJ T13. Table 30 below describes the starting and ending position of this segment on each transcript. 7α6/e 30 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_41 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_42 according to the present invention can be found in the following transcript(s): HSU40434 PEAJ T13. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_43 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSU40434j°EA_l_node_44 according to the present invention can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_47 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_48 according to the present invention is supported by 50 libraries The number of libranes was determined as previously described This segment can be found in the following transcnpt(s) HSU40434_PEA_1_T13 Table 36 below descπbes the starting and ending position of this segment on each transcript Table 36 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_51 according to the present invention can be found in the following transcnpt(s) HSU40434JPEAJ JT13 Table 37 below descπbes the starting and ending position of this segment on each transcπpt Table 37 - Segment location on transcripts TransOTpt-namefi iSegmentlstartmfe-posιtιonsβ'l~» HSU40434 PEA 1 T13 2090 2113
Segment cluster HSU40434_PEA_l_node_52 according to the present invention is supported by 52 libraries The number of libraries was deteπnined as previously descπbed This segment can be found in the following transcπpt(s) HSU40434_PEAJ_T13 Table 38 below describes the starting and ending position of this segment on each transcπpt. Table 38 - Segment location on transcripts
Segment cluster HSU40434_PEA_l_node_53 accordmg to the present invention is supported by 58 libraries The number of libranes was deteπnined as previously descπbed. This segment can be found in the following transcript(s): HSU40434_PEAJ_T13. Table 39 below describes the starting and ending position of this segment on each transcript.
Segment cluster HSU40434_PEA_l_node_54 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_56 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434_PEA_1_T13. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HSU40434_PEAJ_node_7 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU40434 PEA 1 T13. Table 42 below describes the starting and ending position of this segment on each transcript.
Segment cluster FISU40434_PEΛ_l_node_8 according to the present invention is supported by 28 libraries The number of libranes was determined as previously described This segment can be found in the following transcπpt(s) HSU40434 PEAJ JT13 Table 43 below describes the starting and ending position of this segment on each transcript Table 43 - Segment location on transcripts
Vanant protein alignment to the previously kno wn protein. Sequence name: /tmp/tZTolplA9ι/eTMh]qGV2R: Q14859 Sequence documentation:
Alignment of: HSU40434_PEA_1 P12 x Q14859
Alignment segment 1/1
Quality: 4448.00 Escore: 0 Matching length: 458 Total length: 458 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50 I I I II I I II I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I II I I I I 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50
51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I 51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I 101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150
151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200 I I I I II I I I I I I I I I M I I I I I II I I I I I I I I I I I I II I I I I I I I I I I II 151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200
201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250 I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250
251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I II I I I 251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300
301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350 I II II I II I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I II I I 301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350
351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400 I || I I II I I I || I I I I I I I I || I I I I I I I I I I M I I I I I II I I I I II II I 351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400
401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450 I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I 401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450
451 SVPPSSIW 458 I II I II I I 451 SVPPSSIW 458
Sequence name: /tmρ/tZTolplA9i/eTMhjqGV2R:Q9BTR2
Sequence documentation:
Alignment of: HSU40434_PEA_1_P12 x Q9BTR2
Alignment segment 1/1:
Quality: 4338.00 Escore: 0 Matching length: 457 Total length: 458 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 99.78 Total Percent Identity: 99.78 Gaps : 1
Alignment :
1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQEAAPLDG 50 I II I I I I I II I I II II I I I I I I I I I I I I I I I I I I I I I II II II Mill'; 1 MALPTARPLLGSCGTPALGSLLFLLFSLGWVQPSRTLAGETGQ.AAPLDG 49 51 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 100 II I I I II I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II I I II 50 VLANPPNISSLSPRQLLGFPCAEVSGLSTERVRELAVALAQKNVKLSTEQ 99
101 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 150 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 100 LRCLAHRLSEPPEDLDALPLDLLLFLNPDAFSGPQACTRFFSRITKANVD 149
151 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 200 I I II I II I I II I I II I I I I I I I I II I I II I I I I I I I I II II I I I I II I I I 150 LLPRGAPERQRLLPAALACWGVRGSLLSEADVRALGGLACDLPGRFVAES 199
201 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 250 I II I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 200 AEVLLPRLVSCPGPLDQDQQEAARAALQGGGPPYGPPSTWSVSTMDALRG 249
251 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 300 11 I I I I I I I I II I I I I I II I I I I I I I I I I II I I I I I I I II I I II I I I I I I 250 LLPVLGQPIIRSIPQGIVAAWRQRSSRDPSWRQPERTILRPRFRREVEKT 299
301 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 350 I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I || 300 ACPSGKKAREIDESLIFYKKWELEACVDAALLATQMDRVNAIPFTYEQLD 349
351 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I 350 VLKHKLDELYPQGYPESVIQHLGYLFLKMSPEDIRKWNVTSLETLKALLE 399
401 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I 400 VNKGHEMSPQVATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELS 449
451 SVPPSSIW 458 I I I I I I I I 450 SVPPSSIW 457
Sequence name: /tmρ/tZTolplA9i/eTMhjqGV2R:MSLN_HUMAN
Sequence documentation:
Alignment of: HSU40434_PEA_1_P12 x MSLNJiUMAN
Alignment segment 1/1: Quality: 4074.00 Escore: 0 Matching length: 440 Total length: 448 Matching Percent Similarity: 98.86 Matching Percent Identity: 97.95 Total Percent Similarity: 97.10 Total Percent Identity: 96.21 Gaps: 1
Alignment :
19 GSLLFLLFSLGWVQPSRTLAGETGQEAAPLDGVLANPPNISSLSPRQLLG 68 I I I I I I I I M I I I : I : I I I I I I I I I : I I I III I I I I I I I I I I I I I 17 GSLLFLLFSLGWVHPARTLAGETGTESAPLGGVLTTPHNISSLSPRQLLG 66
69 FPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDAL 118 I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 67 FPCAEVSGLSTERVRELAVALAQKNVKLSTEQLRCLAHRLSEPPEDLDAL 116
119 PLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALA 168 I I I I I I I I II I I I I I I II I I I I I II II I I I I I I I I I I I I I I I I I I I I I II 117 PLDLLLFLNPDAFSGPQACTRFFSRITKANVDLLPRGAPERQRLLPAALA 166 . . . . . 169 CWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQD 218 I I I I I I II II I I I I I I I II I I I I I I I I I II I I I I I I II I I I I I I I I I I I I 167 CWGVRGSLLSEADVRALGGLACDLPGRFVAESAEVLLPRLVSCPGPLDQD 216 219 QQEAARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIV 268 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 217 QQEAARAALQGGGPPYGPPSTWSVSTMDALRGLLPVLGQPIIRSIPQGIV 266
269 AAWRQRSSRDPSWRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFY 318 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 267 AAWRQRSSRDPSWRQPERTILRPRFRREVEKTACPSGKKAREIDESLIFY 316
319 KKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKHKLDELYPQGYPESV 368 I I I II I I I II I II II I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I 317 KKWELEACVDAALLATQMDRVNAIPFTYEQLDVLKHKLDELYPQGYPESV 366
369 IQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVNKGHEMS PQ 410 I I II I II I II I I I I I I I I I I I I I I I I I I I I II I : I I I I I I II 367 IQHLGYLFLKMSPEDIRKWNVTSLETLKALLEVDKGHEMSPQAPRRPLPQ 416
411 VATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW 458 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 417 VATLIDRFVKGRGQLDKDTLDTLTAFYPGYLCSLSPEELSSVPPSSIW 464
DESCRIPTION FOR CLUSTER M77904
Cluster M77904 features 4 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
Cluster M77904 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 24 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 24 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
above. A description of each variant protein according to the present invention is now provided.
Variant protein M77904 P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904 T3. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904_P2 and Q8WU91 (SEQ ID NO:987): l .An isolated chimeric polypeptide encoding for M77904_P2, comprising a first amino acid sequence being at least 90 % homologous to
MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQ VLVQHPQNES conesponding to amino acids 67 - 341 of Q8WU91, which also corresponds to amino acids 1 - 275 of M77904j°2, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence NKTYWDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGWCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE conesponding to amino acids 276 - 770 of M77904 P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of M77904_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NKIYWDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKJSFLCD DLTRLWMNVEKTISCTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVL VPAQKLQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAP SFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPR DQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISN CSPTSGKQLDLLFSVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKJ^KKKKTNKGP AVGIYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVD TYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDT DIPLLNTQEPMEPAE in M77904 P2. Comparison report between M77904_P2 and Q96QU7 (SEQ ID NO:988): 1.An isolated chimeric polypeptide encoding for M77904 P2, comprising a first amino acid sequence being at least 90 % homologous to MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNRTFIWD VKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQ EGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDEL MTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAG NFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTISCTDHRYCQR KSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQKLQQHTHEKPCNTSFSYLV ASAIPSQDLYFGSFCPGGSIKQIQVKQNISVTLRTFAPSFQQEASRQGLTVSFIPYFKEEGV FTVTPDTKSKVYLRTPNWDRGLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAF MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLIICCVKKJO KKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 67 - 836 of Q96QU7, which also conesponds to amino acids 1 - 770 of M77904_P2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signafpeptide prediction programs predict that this protein is a non-secreted protein.. Variant protein M77904_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein M77904_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 6 - Amino acid mutations
Variant protein M77904_P2 is encoded by the following transcnpt(s) M77904_T3, for which the sequence(s) is/are given at the end of the application The coding portion of transcπpt M77904_T3 is shown in bold, this coding portion starts at position 238 and ends at position 2547 The transcπpt also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein M77904_P2 sequence provides support for the deduced sequence of this variant protein according to the present mvention) Table 7 - Nucleic acid SNPs
Variant protein M77904 P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904 T8. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904JM and Q8WU91 : l .An isolated chimeric polypeptide encoding for M77904JM, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERJVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRJGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES conesponding to amino acids 1 - 341 of Q8WU91, which also conesponds to amino acids 1 - 341 of M77904JM, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVIPALWEAKAGGSLEVRSSRPAWPTW corresponding to amino acids 342 - 487 of M77904JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of M77904 P4, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
NKIYVVDLSNERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCD DLTRLWMNVEKTISTPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLT PVIPALWEAKAGGSLEVRSSRPAWPTW in M77904 P4. Comparison report between M77904 P4 and Q9H5V8 (SEQ ID NO:989): l.An isolated chimeric polypeptide encoding for M77904 P4, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPPJSΓVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTT^PEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS conesponding to amino acids 1 - 416 of Q9H5V8, which also corresponds to amino acids 1 - 416 of M77904JM, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence
TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW conesponding to amino acids 417 - 487 of M77904JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904JM, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904JM. Comparison report between M77904 JM and Q96QU7: 1.An isolated chimeric polypeptide encoding for M77904 JM, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFP EDELMTWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGN MAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYWDLSNERAMSLTIEPRP VKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS conesponding to amino acids 1 - 416 of Q96QU7, which also conesponds to amino acids 1 - 416 of M77904JM, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW conesponding to amino acids 417 - 487 of M77904 P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of M77904JM, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence TPLNQCICPWPWIALLSPPCLSGVPWVGCKSYQKGPSGRARWLTPVIPALWEAKAGGS LEVRSSRPAWPTW in M77904 P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein M77904JM also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904JM sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein M77904 JM is encoded by the following transcript(s): M77904JT8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904 T8 is shown in bold; this coding portion starts at position 137 and ends at position 1597. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904 P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein M77904_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904_T9. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904_P5 and Q96QU7: l.An isolated chimeric polypeptide encoding for M77904_P5, comprising a first amino acid sequence being at least 90 % homologous to MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGLΠCCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 606 - 836 of Q96QU7, which also conesponds to amino acids 1 - 231 of M77904_P5. Comparison report between M77904 P5 and Q9H8C2 (SEQ ID NO:990): 1.An isolated chimeric polypeptide encoding for M77904 P5, comprising a first amino acid sequence being at least 90 % homologous to MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL TVILIAAVGGGVLLLSALGL11CCVKKKKKKTNKGPAVGIYNGNINTEMPRQPKKFQKG RKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQGTMGVCPPSPPTICSRAP TAKLATEEPPPRSPPESESEPYTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE conesponding to amino acids 419 - 649 of Q9H8C2, which also conesponds to amino acids 1 - 231 of M77904 P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein M77904_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein M77904 P5 is encoded by the following transcript(s): M77904 T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904_T9 is shown in bold; this coding portion starts at position 1226 and ends at position 1918. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904 P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein M77904_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77904JT 1. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77904_P7 and Q8WU91 : 1.An isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERI VFTFSCQSPENHF VIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR corresponding to amino acids 1 - 219 of Q8WU91, which also corresponds to amino acids 1 - 219 of M77904 P7, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF conesponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904 P7. Comparison report between M77904 P7 and Q9H5V8: 1.An isolated chimeric polypeptide encoding for M77904_P7, comprising a first amino acid sequence being at least 90 %> homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERI VFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRJGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSIANRSSIKR conesponding to amino acids 1 - 219 of Q9H5V8, which also conesponds to amino acids 1 - 219 of M77904 P7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF corresponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77904_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904_P7. Comparison report between M77904_P7 and Q96QU7: l.An isolated chimeric polypeptide encoding for M77904 P7, comprising a first amino acid sequence being at least 90 % homologous to
MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPTLLAKPCYIVIS KRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTSLLPTLNR TFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSR IKMQEGVKMALHLPWFHPRNVSGFSI ANRSSIKR conesponding to amino acids 1 - 219 of Q96QU7, which also conesponds to amino acids 1 - 219 of M77904_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence EKAPPCYLIRLKHTRSSLF conesponding to amino acids 220 - 238 of M77904_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of M77904JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence EKAPPCYLIRLKHTRSSLF in M77904J>7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region..
Variant protein M77904 P7 is encoded by the following transcript(s): M77904_T1 1 , for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77904_T1 1 is shown in bold; this coding portion starts at position 137 and ends at position 850. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77904 P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster M77904_node_0 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904JT 1 and M77904 _T8. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster M77904_node_l 1 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904JT3 and M77904_T8. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
M77904 T3 1064 1285 M77904 T8 1161 1382
Segment cluster M77904_nodeJ2 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904 T8. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster M77904_node_14 according to the present invention is supported by 5 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): M77904_T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster M77904_node_15 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): M77904JT3 and M77904_T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster M77904_node_17 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904 T3 and M77904JT9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster M77904_node_2 according to the present invention is supported by 1 libraries The number of libraries was determined as previously described This segment can be found in the following transcnpt(s) M77904JT3 Table 19 below descπbes the starting and ending position of this segment on each transcnpt Table 19 - Segment location on transcripts
Segment cluster M77904_node_21 according to the present invention is supported by 54 libraries The number of libranes was determined as previously described This segment can be found in the following transcπpt(s) M77904 T3 and M77904_T9 Table 20 below descπbes the starting and ending position of this segment on each transcript Table 20 - Segment location on transcripts
Segment cluster M77904_node_23 accordmg to the present invention is supported by 24 libranes The number of libranes was determined as previously descπbed This segment can be found in the following transcπpt(s) M77904 T3 and M77904 T9 Table 21 below descπbes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster M77904_node_24 according to the present invention is supported by 48 libranes. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): M77904 T3 and M77904 T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster M77904_node_27 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904JT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster M77904_node_28 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T3 and M77904JT9. Table 24 below describes the starting and ending position of this segment on each transcript. 7αWe 24 - Segment location on transcripts
Segment cluster M77904_node_4 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904 T1 1 , M77904 T3 and M77904 T8. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster M77904_node_6 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904J 1, M77904JT3 and M77904_T8. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster M77904_node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904 T11. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster M77904_node_8 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904_T1 1 , M77904JT3 and M77904 T8. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster M77904_node_9 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904 T 1. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster M77904_node_19 according to the present invention is supported by 42 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M77904 T3 and M77904_T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster M77904_node_22 according to the present invention can be found in the following transcript(s): M77904_T3 and M77904JT9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster M77904_node_25 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77904JT3 and M77904_T9. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster M77904_node_26 according to the present invention is supported by 39 libraries The number of libraries was deteπnined as previously described This segment can be found in the following transcπpt(s) M77904_T3 and M77904JT9 Table 33 below describes the starting and ending position of this segment on each transcript Table 33 - Segment location on transcripts
Microanay (chip) data is also available for this gene as follows As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in vanous disease conditions, particularly cancer The following oligonucleotide was found to hit this segment (with regard to ovanan cancer), shown in Table 33 Table 33 - Oligonucleotide related to this gene
Vanant protein alignment to the previously known protein Sequence name: /tmp/c2Fe8npYgJ/QPDZHH46Xl :Q8WU91
Sequence documentation: Alignment of: M77904_P2 x Q8WU91 Alignment segment 1/1
Quality: 2730.00 Escore : 0 Matching length: 275 Total length: 275 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 50 I I I I I I I I I I I I I I I I I I I I I I I || I I I I I I I I I I || I I I I I I II I I I I I 67 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 116
51 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 100 I I I I I I I I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I II I I I I I 117 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 166
101 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 150 II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I 167 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 216 . . . . . 151 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFWPAHLRASVSF 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 217 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 266 201 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 250 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I II I I I 267 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 316
251 DQDAQSPGILRLQFQVLVQHPQNES 275 I I I I I I I I I I I I I I I I I I I I I I I I I 317 DQDAQSPGILRLQFQVLVQHPQNES 341
Sequence name: /tmp/c2Fe8npYgJ/QPDZHH46X1 : Q96QU7
Sequence documentation:
Alignment of: M77904_P2 x Q96QU7
Alignment segment 1/1: Quality: 7633.00
Escore: 0 Matching length: 770 Total length: 770 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 50 I I I I I I I I I I I I I I I I I I I I I I I II II I II I I I I I I I I I I I I I I I I I I I I 67 MLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCMSGPCPFGEVQLQPSTS 116 . . . . . 51 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 100 I I I I II I I II II I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I
117 LLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIGPGESCPDGVTHSISGR 166
101 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 150 I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I
167 IDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPWFHPRNVSGFSIANRSS 216
151 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFVVPAHLRASVSF 200 II I I II I I I I I I I I I I II I I II I II I II I I I I I I II I I I II II I I I I II I 217 IKRLCIIESVFEGEGSATLMSANYPEGFPEDELMTWQFWPAHLRASVSF 266
201 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 250 I I II I I I II I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II
267 LNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDKQPGNMAGNFNLSLQGC 316 . . . . .
251 DQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ 300 I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I'l I I I II I I I II I
317 DQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLSNERAMSLTIEPRPVKQ 366
301 SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 350 I I I I I I I I II I I II I II I I I I I I II II I I I II II I I I I I I I I I I II I I I I 367 SRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFLCDDLTRLWMNVEKTIS 416
351 CTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQK 400 I I I I I I I I I I I I I I I I I I I I I I I i ! I I I I I I I I ! I I I I I I I I I I I I I I j I
417 CTDHRYCQRKSYSLQVPSDILHLPVELHDFSWKLLVPKDRLSLVLVPAQK 466
401 LQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVT 450 I I I I 1 I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I 1 I I I I I 467 LQQHTHEKPCNTSFSYLVASAIPSQDLYFGSFCPGGSIKQIQVKQNISVT 516 451 LRTFAPSFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDR 500 I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I I I I I I 517 LRTFAPSFQQEASRQGLTVSFIPYFKEEGVFTVTPDTKSKVYLRTPNWDR 566
501 GLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAE 550 I I I I I I I I II I I I I I I I I I I I I I I I II I I II I I I I I I I I I II I I II II II 567 GLPSLTSVSWNISVPRDQVACLTFFKERSGVVCQTGRAFMIIQEQRTRAE 616
551 EIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I
617 EIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLFSVTLTPRTVDL 666
601 TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMP 650 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I II I I 667 TVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVGIYNGNINTEMP 716
651 RQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQ 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I 717 RQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFLQPEVDTYRPFQ 766 . . . . .
701 GTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDV 750 I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I 767 GTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEPYTFSHPNNGDV 816
751 SSKDTDIPLLNTQEPMEPAE 770 I I I I I I I I I I I I I I II I I I I
817 SSKDTDIPLLNTQEPMEPAE 836 Sequence name: /tmp/4AUsKD5TnV/TBRg9DoebW:Q8WU91
Sequence documentation:
Alignment of: M77904_P4 x Q8WU91
Alignment segment 1/1:
Quality: 3341.00 Escore: 0 Matching length: 341 Total length: 341 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I I I I II I I I I I I I I I I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50
51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I ! I I I I I I I I I I I ! I I I I I I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100
101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I I II II I I I II I II II I I I I I I I II I I I I I I I I I I I I I II I I I I II 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150
151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I M I I I I II I I I I I II II I I I I I I I I I II I I I I I I I I I I II I I I II II II 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200
201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 I I I I I II I I I I I II II II I I II I II I I I I I I I I I I II I I I I I I I I I I I I I 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250
251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300
301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES 341 I II I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNES 341
Sequence name: /tmp/4AUsKD5TnV/TBRg9DoebW:Q9H5V8
Sequence documentation:
Alignment of: M77904_P4 x Q9H5V8
Alignment segment 1/1: Quality: 4081.00 Escore: 0 Matching length: 416 Total length: 416 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I || I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I II 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50
51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100
101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I II II I II I II I II II II I I I I II I I I I I II I I I II II I I II II I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 . . . . . 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I II I I 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250
251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 II I I I I I I I I I I I I I I I I I II II I I I I I I II I I I I I I I I I I I I I I I I I I I 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300
301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 I I I I I II I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350
351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 I I I I II I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I II I 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 401 CDDLTRLWMNVEKTIS 416 I I I I I I I I I I I I I II I 401 CDDLTRLWMNVEKTIS 416
Sequence name: /tmρ/4AUsKD5TnV/TBRg9DoebW:Q96QU7
Sequence documentation:
Alignment of: M77904_P4 x Q96QU7
Alignment segment 1/1: Quality: 4081.00 Escore: 0 Matching length: 416 Total length: 416 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50
51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I II I II I I I I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 . . . . . 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I I I I I I II I I I I II I I I I I I I I I I I I II II I I I I II I I I I I I I I I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I I I I j I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200
201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 I I I I I I I I I I M I I I I II I I I I I I I I I I I I I II I I I I I I I I I M I I I II I 201 FHPRNVSGFSIANRSSIKRLCIIESVFEGEGSATLMSANYPEGFPEDELM 250 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II II I I II I I I I I I I II I I 251 TWQFVVPAHLRASVSFLNFNLSNCERKEERVEYYIPGSTTNPEVFKLEDK 300 . . . . . 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYVVDLS 350 I I I I I I I I I I I I II I I I I I I I I I I I II I I I II I I I I I II I II I I II I I I I 301 QPGNMAGNFNLSLQGCDQDAQSPGILRLQFQVLVQHPQNESNKIYWDLS 350 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400 I I I I I I I I I I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I 351 NERAMSLTIEPRPVKQSRKFVPGCFVCLESRTCSSNLTLTSGSKHKISFL 400
401 CDDLTRLWMNVEKTIS 416 I || || I I I M I I I I I I 401 CDDLTRLWMNVEKTIS 416
Sequence name: /tmp/IChL9nLIus/pmgyBTHuqO:Q96QU7
Sequence documentation:
Alignment of: M77904_P5 x Q96QU7
Alignment segment 1/1: Quality: 2285.00 Escore: 0 Matching length: 231 Total length: 231 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 50 I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I II I I I I I I I 606 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 655
51 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 100 I I I I I I I I I II I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I 656 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 705 . . . . . 101 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 150 I I I I I II I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 706 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 755 151 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 200 I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 756 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 805
201 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 231 I I I I I I II I II I I I I I I II I I I I I I I I I I I I 806 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 836
Sequence name: /tmp/IChL9nLIus/pmgyBTHuqO:Q9H8C2
Sequence documentation:
Alignment of: M77904_P5 x Q9H8C2
Alignment segment 1/1: Quality: 2285.00
Escore: 0 Matching length: 231 Total length: 231 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 50 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 419 MIIQEQRTRAEEIFSLDEDVLPKPSFHHHSFWVNISNCSPTSGKQLDLLF 468 . . . . . 51 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 100 I I II II I I II I I I I I I II I I I I I I I I I II I I I I I II I I I II I II I I II I I 469 SVTLTPRTVDLTVILIAAVGGGVLLLSALGLIICCVKKKKKKTNKGPAVG 518
101 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 150 ■ I I | | | | I | | I I | | I I I I I I I I M I I I I I I I I I I | | I I I I I I I I I I I I I I I 519 IYNGNINTEMPRQPKKFQKGRKDNDSHVYAVIEDTMVYGHLLQDSSGSFL 568
151 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 200 I I I I I II I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I II I I I I I 569 QPEVDTYRPFQGTMGVCPPSPPTICSRAPTAKLATEEPPPRSPPESESEP 618
201 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 231 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 619 YTFSHPNNGDVSSKDTDIPLLNTQEPMEPAE 649
Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd57 :Q8WU91
Sequence documentation:
Alignment of: M77904_P7 x Q8WU91
Alignment segment 1/1:
Quality: 2124.00 Escore: 0 Matching length: 219 Total length: 219 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : . . . . . 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100
101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150
151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200
201 FHPRNVSGFSIANRSSIKR 219 I I I I I I I I I I I I I I I I I I I 201 FHPRNVSGFSIANRSSIKR 219
Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd574 : Q9H5V8
Sequence documentation:
Alignment of: M77904_P7 x Q9H5V8
Alignment segment 1/1:
Quality: 2124.00 Escore: 0 Matching length: 219 Total length: 219 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I I I I I I I I I I I I ! I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50
51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I I II I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I II I I II II I I I I I I II I I I I I I I II I I I I I II I I I I I I I I I I II I I I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 . . . . . 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I I I I I I I I I I II I I I I I I II I I I I I I II I II I I I I I I I I I I I I II I I I I I 151 PGESCPDGVTHSISGRIDATVVRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKR 219 I I I I I I I I I I I I II I I I I I 201 FHPRNVSGFSIANRSSIKR 219
Sequence name: /tmp/sQqi6hWOGJ/KjbKmDd574 :Q96QU7
Sequence documentation:
Alignment of: M77904_P7 x Q96QU7
Alignment segment 1/1:
Quality: 2124.00 Escore: 0 Matching length: 219 Total length: 219 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGLNCGVSIALLGVLLLGAARLPRGAEAFEIALPRESNITVLIKLGTPT 50
51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 51 LLAKPCYIVISKRHITMLSIKSGERIVFTFSCQSPENHFVIEIQKNIDCM 100
101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGPCPFGEVQLQPSTSLLPTLNRTFIWDVKAHKSIGLELQFSIPRLRQIG 150 . . . . . 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 151 PGESCPDGVTHSISGRIDATWRIGTFCSNGTVSRIKMQEGVKMALHLPW 200 201 FHPRNVSGFSIANRSSIKR 219 I I I I I I I I I I I I I I I II I I 201 FHPRNVSGFSIANRSSIKR 219
DESCRIPTION FOR CLUSTER Z25299 Cluster Z25299 features 5 transcπpt(s) and 1 1 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein variants are given in table 3 Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Antileukoproteinase 1 precursor (SwissProt accession identifier ALK1 HUMAN; known also according to the synonyms ALP; HUS1- 1 ; Seminal proteinase inhibitor; Secretory leukocyte protease inhibitor; BLPI; Mucus proteinase inhibitor; MPI; WAP four- disulfide core domain protein 4; Protease inhibitor WAP4), SEQ ID NO: 272, refened to herein as the previously known protein. Protein Antileukoproteinase 1 precursor is known or believed to have the following function(s): Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase- mediated damage to oral and possibly other mucosal tissues. The sequence for protein Antileukoproteinase 1 precursor is given at the end of the application, as "Antileukoproteinase 1 precursor amino acid sequence". Protein Antileukoproteinase 1 precursor localization is believed to be Secreted.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Elastase inhibitor; Tryptase inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anti- inflammatory; Antiasthma. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteinase inhibitor; serine protease inhibitor, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster Z25299 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 25 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 25 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As note a ove, cluster Z25299 eatures 5 transc pt s , wh ch were l sted in Ta e 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Antileukoproteinase 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z25299_PEA_2_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299 PEA 2 T1. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299_PEA_2_P2 and ALK1 HUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA_2_P2, comprising a first amino acid sequence being at least 90 % homologous to
MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GK-KRCCPDTCGIKCLDPVDTP1^TRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK1 JIUMAN, which also conesponds to amino acids 1 - 131 of Z25299j°EA_2_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKQGMRAH conesponding to amino acids 132 - 139 of Z25299JΕA 2 P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299 PEA 2 P2, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GKQGMRAH in Z25299JΕA 2 P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z25299_PEA_2_P2 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein Z25299 PEA 2 P2 is encoded by the following transcript(s): Z25299J>EA_2JT, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299 PEA 2 T1 is shown in bold; this coding portion starts at position 124 and ends at position 540. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299 PEA 2 P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein Z25299_PEA_2_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299 PEA 2 T2. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299_PEA_2_P3 and ALK1 HUM AN : l.An isolated chimeric polypeptide encoding for Z25299 PEA 2 P3, comprising a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPiNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK1 JTUMAN, which also conesponds to amino acids 1 - 131 of Z25299_PEA_2_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKRHHKQLRDQEVDPLEMRRHSAG conesponding to amino acids 132 - 156 of Z25299_PEA_2_P3, wherein said first and second ammo acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299_PEA_2_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG in Z25299_PEA_2_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z25299_ TJEA_2J>3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299 PEA 2 P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein Z25299JΕA . P3 is encoded by the following transcript(s): Z25299 PEA 2 T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEA_2_T2 is shown in bold; this coding portion starts at position 124 and ends at position 591. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein Z25299_PEA_2_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299 PEA 2 T6. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299_PEA_2_P7 and ALK1 HUMAN: 1 An isolated chimeric polypeptide encoding for Z25299_PEA_2_P7, compnsing a first amino acid sequence being at least 90 % homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNP conesponding to amino acids 1 - 81 of ALK1 HUMAN, which also conesponds to amino acids 1 - 81 of Z25299_PEA_2_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RGSLGSAQ conesponding to amino acids 82 - 89 of Z25299_PEA_2_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2 An isolated polypeptide encoding for a tail of Z25299 PEA 2 P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RGSLGSAQ in Z25299_PEA_2_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z25299_PEA_2_P7 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299 PEA 2 P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein Z25299_PEA_2_P7 is encoded by the following transcript(s): Z25299_PEA_2_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299JΕA 2 T6 is shown in bold; this coding portion starts at position 124 and ends at position 390. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein Z25299 PEA 2 P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299_PEA_2_T9. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299_PEA_2_P10 and ALK1 HUMAN: l.An isolated chimeric polypeptide encoding for Z25299 PEA 2 P10, comprising a first amino acid sequence being at least 90 %> homologous to
MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPT conesponding to amino acids 1 - 82 of ALK1 JTUMAN, which also conesponds to amino acids 1 - 82 of Z25299_PEA_2_P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z25299JΕA 2 P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 12 - Amino acid mutations Variant protein Z25299 PEA 2 P10 is encoded by the following transcript(s): Z25299_PEA_2_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEA_2_T9 is shown in bold; this coding portion starts at position 124 and ends at position 369. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299 PEA 2 P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
448 A -> C Yes As noted above, cluster Z25299 features 1 1 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z25299_PEA_2_node_20 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster Z25299_PEA_2_node_21 according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T6 and Z25299JPEA 2 T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z25299_PEA_2_node_23 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster Z25299_PEA_2_node_24 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T2 and Z25299_PEA_2_T3. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster Z25299_PEA_2_node_8 according to the present invention is supported by 218 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299 J>EA_2_T 1 , Z25299 _PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEA_2_T9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, vanous oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 19. Table 19 - Oligonucleotides related to this segment
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z25299_PEA_2_node_12 according to the present invention is supported by 228 libranes. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEA_2_T9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to ovarian cancer), shown in Table 21. Table 21 - Oligonucleotides related to this segment Oligonucleotide narnefc verexpressed m cancers βhip reference ? Z25299 0 3 0 ovarian carcinoma OVA
Segment cluster Z25299_PEA_2_node_13 according to the present invention is supported by 246 libraries The number of libraries was determined as previously descπbed This segment can be found in the following transcπpt(s) Z25299_PEA_2_T1 , Z25299 PEA 2 T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEA_2_T9 Table 22 below describes the startmg and ending position of this segment on each transcript Table 22 - Segment location on transcripts
Segment cluster Z25299_PEA_2_nodeJ4 according to the present invention can be found the following transcnpt(s) Z25299J>EA_2_T1, Z25299J>EA_2_T2, Z25299J>EA_2_T3, Z25299_PEA_2_T6 and Z25299j^EA_2_T9 Table 23 below describes the starting and ending position of this segment on each transcnpt Table 23 - Segment location on transcripts iscπpt name lit-i1 Segment'StartώiSrøsition I SegfhentSffflmg positions* Z25299 PEA 2 Tl 358 367 Z25299 PEA 2 T2 358 367 Z25299 PEA 2 T3 358 367 Z25299 PEA 2 T6 358 367
Segment cluster Z25299_PEA_2_node_17 according to the present invention can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T2 and Z25299JΕA_2_T3. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster Z25299_PEA_2_node_18 according to the present invention is supported by 221 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299JΕA 2 T2, Z25299 JΕA_2_T3 and Z25299 PEA 2 T6. Table 25 below describes the starting and ending position of this segment on each transcript. 7αWe 25 - Segment location on transcripts ftTrariscnptjnarflfil Z25299 PEA 2 Tl 372 427 Z25299 PEA 2 T2 372 427 Z25299 PEA 2 T3 372 427 Z25299 PEA 2 T6 368 423
Segment cluster Z25299_PEA_2_node_19 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299 PEA 2 T1, Z25299JΕA 2 T2, Z25299_PEA_2_T3 and Z25299_PEA_2JT6. Table 26 below describes the starting and ending position of this segment on each transcnpt. Table 26 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/oXgeQ4MeyL/K6VqblMQu2 : ALKl iUMAN
Sequence documentation:
Alignment of: Z25299_PEA_2_P2 x ALKl HUMAN
Alignment segment 1/1:
Quality: 1371.00 Escore: 0 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I I I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I II I I I I I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100
101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 I I II I I I II I I I I I I I I I I I I I I I I I I I I II 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131
Sequence name: /tmp/rbf314VLIm/yR43i4SbP4 : ALKlJiUMAN
Sequence documentation:
Alignment of: Z25299_PEA_2_P3 x ALKlJiUMAN
Alignment segment 1/1: Quality: 1371.00
Escore: 0 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I II I I I I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I I I II I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 I I I I I II II I I I II I I I I I I I I II I I I I I I II I I I I I I I I II I I I II I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100
101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131
Sequence name: /tmp/KCtSXACZXe/rK4T6LKeRX: ALKlJiUMAN
Sequence documentation: Alignment of : Z25299_PEA_2_P7 x ALKlJiUMAN
Alignment segment 1/1: Quality: 835.00
Escore: 0 Matching length: 81 Total length: 81 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I II I I I I I I I I I II II I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP 81 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP 81
Sequence name: /tmp/LcBlcAxB6c/NSI9pqfxoU:ALKlJiUMAN Sequence documentation:
Alignment of: Z25299_PEA_2_P10 x ALKlJiUMAN
Alignment segment 1/1:
Quality: 844.00 Escore: 0 Matching length: 82 Total length: 82 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I II I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT 82 I I I I I I II I I I II I I I II I I I I I I I I I II I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT 82
Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299 juncl3-14- 21 in normal and cancerous ovary tissues Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by or according to juncl3-14-21, Z25299 June 13- 14-21 amplicon(s) and Z25299 junc 13- 14-21 F and Z25299 June 13- 14-21 R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon- PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71 , Table 1 , "Tissue samplesin testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 26 is a histogram showing over expression of the above- indicated Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts in cancerous ovary samples relative to the normal samples. The number and percentage of samples that exhibit at least 5 fold over- expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 26, the expression of Secretory leukocyte protease inhibitor Acid- stable proteinase inhibitor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 12 out of 42 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 3.0E-04. The above value demonstrates statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z25299 juncl3-14-21F forward primer; and Z25299 juncl3-14-21R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z25299 juncl 3- 14-21. Z25299 juncl3- 14-21 Forward primer (SEQ ID NO:991): ACCCCAAACCCAACTTGATTC Z25299 junc 13- 14-21 Reverse primer (SEQ ID NO:992): TCAGTGGTGGAGCCAAGTCTC Z25299 juncl 3- 14-21 Amplicon (SEQ ID NO:993): ACCCCAAACCCAACTTGATTCCTGCCATATGGAGGAGGCTCTGGAGTCCTGCTCTGT GTGGTCCAGGTCCTTTCCACCCTGAGACTTGGCTCCACCACTGA
Expressbn of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299 seg20 in normal and cancerous ovary tissues Expression of Secretory leukocyte protease inhibitor Acid- stable proteinase inhibitor transcripts detectable by or according to seg20, Z25299 seg20 amplicon(s) and Z25299 seg20F and Z25299 seg20R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samplesin testing panel" above), to obtain a value of fold up- regulation for each sample relative to median of the normal PM samples. Figure 27A is a histogram showing over expressbn of the above- indicated Secretory leukocyte protease inhibitor Acid-stable proteinase transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 27A, the expression of Secretory leukocyte protease inhibitor Acid- stable proteinase transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1 , "Tissue samples in testing panel"). Notably an over- expression of at least 10 fold was found in 30 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 9.81E-07. Threshold of 10 fold overexpression was found to differentiate between cancer and normal samples with P value of 5E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z25299 seg20F forward primer; and Z25299 seg20R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z25299 seg20. Z25299 seg20 Forward primer (SEQ ID NO:994): CTCCTGAACCCTACTCCAAGCA Z25299 seg20 Reverse primer (SEQ ID NO:995): CAGGCGATCCTATGGAAATCC Z25299 seg20 Amplicon (SEQ ID NO:996):
CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAA CTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCTG
Expression of Secretory leukocyte protease inhibitor (Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G) Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues
Expression of Secretory leukocyte protease inhibitor transcripts detectable by or according to Z25299seg20 amplicon(s) and primers: Z25299seg23F Z25299seg20R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM 000981 ; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDFIA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 1 above, Tissue samples in testing panel), to obtain a value of relative expression of each sample relative to median of the ovary samples. Primers and amplicon are as above. Results are shown in Figure 27B.
Expression of Secretory leukocyte protease inhibitor Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299 seg23 in normal and cancerous ovary tissues Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by or according to seg23, Z25299 seg23 amplicon(s) and Z25299 seg23F and Z25299 seg23R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue samples in testing panel" above), to obtain a value of fold up- regulation for each sample relative to median of the normal PM samples. Figure 28A is a histogram showing over expression of the above -indicated Secretory leukocyte protease inhibitor Acid- stable proteinase inhibitor transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 28A, the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1 , "Tissue samples in testing panel"). Notably an over-expression of at least 10 fold was found in 31 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Secretory leukocyte protease inhibitor Acid -stable proteinase inhibitor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 2.48E-07. Threshold of 10 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.61E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z25299 seg23F forward primer; and Z25299 seg23R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z25299 seg23. Z25299 seg23 Forward primer (SEQ ID NO:997): CAAGCAATTGAGGGACCAGG Z25299 seg23 Reverse primer (SEQ ID NO:998): CAAAAAACATTGTTAATGAGAGAGATGAC Z25299 seg23 Amplicon (SEQ ID NO:999):
CAAGCAATTGAGGGACCAGGAAGTGGATCCTCTAGAGATGAGGAGGCATTCTGCTG GATGACTTTTAAAAATGTTTTCTCCAGAGTCATCTCTCTCATTAACAATGTTTTTTG
Expression of Secretory leukocyte protease inhibitor Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg23 in different normal tissues
Expression of Secretory leukocyte protease inhibitor transcripts detectable by or according to Z25299seg23 amplicon(s) and primers (as above): Z25299seg23F Z25299seg23R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDF1A- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 1 above, Tissue samples in testing panel), to obtain a value of relative expression of each sample relative to median of the ovary samples. Results are shown in Figure 28B.
DESCRIPTION FOR CLUSTER T39971 Cluster T39971 features 4 transcript(s) and 28 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Vitronectin precursor (SwissProt accession identifier VTNC HUMAN; known also according to the synonyms Semm spreading factor; S-protein; V75), SEQ ID NO: 602, refened to herein as the previously known protein. Protein Vitronectin precursor is known or believed to have the following function(s): Vitronectin is a cell adhesion and spreading factor found in semm and tissues. Vitronectin interacts with glycosaminoglycans and proteoglycans. Is recognized by certain members of the integrin family and serves as a cell-to-substrate adhesion molecule. Inhibitor of the membrane- damaging effect of the terminal cytolytic complement pathway. The sequence for protein Vitronectin precursor is given at the end of the application, as "Vitronectin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Vitronectin precursor localization is believed to be Extracellular. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, melanoma. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigatbns is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Alphavbeta3 integrin antagonist; Apoptosis agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: immune response; cell adhesion, which are annotation(s) related to Biological Process; protein binding; heparin binding, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T39971 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 29 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 29 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: liver cancer, lung malignant tumors and pancreas carcinoma.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster T39971 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Vitronectin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T39971 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971_T5. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971 P6 and VTNC HUMAN: l.An isolated chimeric polypeptide encoding for T39971 P6, comprising a first amino acid sequence being at least 90 %> homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRΓNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG conesponding to amino acids 1 - 276 of VTNCjTUMAN, which also conesponds to amino acids 1 - 276 of T39971 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence TQGWGD conesponding to amino acids 277 - 283 of T39971 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence TQGWGD in T39971_P6.The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signakpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T39971 P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T39971 P6 is encoded by the following transcript(s): T39971_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 T5 is shown in bold; this coding portion starts at position 756 and ends at position 1604. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein T39971 P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 T10. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971 J>9 and VTNC HUMAN: l.An isolated chimeric polypeptide encoding for T39971 P9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTPJNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT conesponding to amino acids 1 - 325 of VTNC_HUMAN, which also conesponds to amino acids 1 - 325 of T39971 P9, and a second amino acid sequence being at least 90 % homologous to SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGA NNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGC PAPGHL conesponding to amino acids 357 - 478 of VTNC_HUMAN, which also conesponds to amino acids 326 - 447 of T39971 P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein T39971_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein T39971 P9 is encoded by the following transcript(s): T39971 TO, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 T10 is shown in bold; this coding portion starts at position 756 and ends at position 2096. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T39971 P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 T12. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T3997 I P 11 and VTNC HUMAN: l .An isolated chimeric polypeptide encoding for T39971 P11, comprising a first amino acid sequence being at least 90 %> homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRP CQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS conesponding to amino acids 1 - 326 of VTNC_HUMAN, which also conesponds to amino acids 1 - 326 of T39971 P11, and a second amino acid sequence being at least 90 % homologous to
DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of VTNC HUMAN, which also conesponds to amino acids 327 - 363 of T39971 _P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a stmcture as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. Comparison report between T39971 P11 and Q9BSH7 (SEQ ID NO: 1000): l.An isolated chimeric polypeptide encoding for T39971 P11, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS conesponding to amino acids 1 - 326 of Q9BSH7, which also conesponds to amino acids 1 - 326 of T39971 JM 1, and a second amino acid sequence being at least 90 % homologous to
DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of Q9BSH7, which also conesponds to amino acids 327 - 363 of T39971 P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 JM 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a stmcture as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein T39971 JM 1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protem T39971 P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein T39971 P11 is encoded by the following transcript(s): T39971 T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 T12 is shown in bold; this coding portion starts at position 756 and ends at position 1844. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T39971 P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 T16. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T3997 I P 12 and VTNC JXUMAN: l.An isolated chimeric polypeptide encoding for T39971 P12, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK conesponding to amino acids 1 - 223 of VTNC HUMAN, which also conesponds to amino acids 1 - 223 of T39971 JM2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV conesponding to amino acids 224 - 238 of T39971 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971 P12, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971 JM2. Comparison report between T3997 I P 12 and Q9BSH7: l.An isolated chimeric polypeptide encoding for T39971 P12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK conesponding to amino acids 1 - 223 of Q9BSH7, which also conesponds to amino acids 1 - 223 of T39971 P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV conesponding to amino acids 224 - 238 of T39971 JM2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T39971 JM2, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence
VPGAVGQGRKHLGRV in T39971 JM2. ( The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T39971 P12 also has the following non-silent SNPs (Single Nucleotide
Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T39971 P12 is encoded by the following transcript(s): T39971 T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 T16 is shown in bold; this codmg portion starts at position 756 and ends at position 1469. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protem T39971 JP12 sequence provides support for the deduced sequence of this variant protem according to the present mvention). Table 14 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T39971_node_0 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12, T39971_T16 and T39971 T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T39971_node_18 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T16. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster T39971_node_21 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971 T12 and T39971 T5. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T39971_node_22 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T5. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T39971_node_23 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971 T12 and T39971 T5. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T39971_node_31 according to the present invention is supported by 94 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): T39971 T10 and T39971 T5. Table 20 below describes the starting and ending position of this segment on each transcnpt. Table 20 - Segment location on transcripts
Segment cluster T39971_node 33 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): T39971 T0, T39971 T12 and T39971_T5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T39971_node_7 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971 T12, T39971_T16 and T39971_T5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T39971_node l according to the present invention can be found in the following transcript(s): T39971_T10, T39971JT12, T39971_T16 and T39971_T5. Table 23 below descπbes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster T39971_nodeJ0 according to the present mvention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971_T12, T39971_T16 and T39971JT5. Table 24 below descnbes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T39971_node l 1 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971 T12, T39971 T16 and T39971_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster T39971_nodeJ2 according to the present invention can be found in the following transcript(s): T39971 T10, T39971_T12, T39971_T16 and T39971JT5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T39971_nodeJ5 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971_T12, T39971_T16 and T39971_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T39971_nodeJ6 according to the present invention can be found in the following transcript(s): T39971_T10, T39971 _T12, T39971_T16 and T39971_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster T39971_node 17 according to the present invention is supported by 86 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T10, T39971_T12, T39971 T16 and T39971_T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster T39971_node_26 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster T39971_node_27 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T5. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster T39971_node_28 according to the present invention can be found in the following transcript(s): T39971 T10 and T39971 T5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster T39971_node_29 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10 and T39971 T5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster T39971 node 3 according to the present invention is supported by 78 libraries. The number of libraries was determmed as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971 T12, T39971 T16 and T39971 T5. Table 34 below describes the starting and ending position of this segment on each transcπpt. Table 34 - Segment location on transcripts
Segment cluster T39971_node_30 according to the present invention can be found in the following transcript(s): T3997 I T 10 and T39971 T5. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts £ Transcript name* Segment staffing position Segment ending positiot
Segment cluster T39971_node_34 according to the present invention can be found in the following transcript(s): T39971 TO, T39971_T12 and T39971_T5. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster T39971 node 35 according to the present invention can be found in the following transcript(s): T39971JT10, T39971JT12 and T39971 T5. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster T39971_node_36 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971_T12 and T39971 JT5. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster T3997 l_node_4 according to the present invention can be found in the following transcript(s): T39971 TO, T39971 T12, T39971 T16 and T39971_T5. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster T39971_node_5 according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971 T2, T39971 T16 and T39971 T5. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster T39971_node_8 according to the present invention can be found in the following transcript(s): T39971_T10, T39971JT12, T39971_T16 and T39971_T5. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster T39971_node_9 according to the present invention can be found in the following transcript(s): T39971_T10, T39971 JT2, T39971_T16 and T39971_T5. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/xkraCL20cZ/43L7YcPH7x :VTNCJiUMAN Sequence documentation:
Alignment of : T39971_P6 x VTNCJiUMAN
Alignment segment 1/1:
Quality: 2774.00 Escore: 0 Matching length: 278 Total length: 278 Matching Percent Similarity: 99.64 Matching Percent Identity: 99.64 Total Percent Similarity: 99.64 Total Percent Identity: 99.64 Gaps : 0
Alignment : 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGI DSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I II I I I I I I I I I I I I II I I I I II I I I I I I II I I I I I I I I I I I I I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGTQ 278 I I I I I I I I I II II I I I I I I I I I I I I I I 251 PDNVDAALALPAHSYSGRERVYFFKGKQ 278
Sequence name: /tmp/X4DeeuSlB4/yMubSR5FPs :VTNCJiUMAN
Sequence documentation:
Alignment of: T39971_P9 x VTNCJiUMAN
Alignment segment 1/1:
Quality: 4430.00 Escore: 0 Matching length: 447 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 93.51 Total Percent Identity: 93.51 Gaps : 1
Alignment:
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300
301 VFEHFAMMQRDSWEDIFELLFWGRT 325 I I I I I I I I I I I I I I I I I I I II I I I I 301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 326 SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 369 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I 351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400
370 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 419 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450
420 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 447 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478
Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZ :VTNCJiUMAN
Sequence documentation:
Alignment of : T39971_P11 x VTNCJiUMAN
Alignment segment 1/1:
Quality: 3576.00 Escore: 0 Matching length: 363 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps : 1
Alignment :
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I II I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300
301 VFEHFAMMQRDSWEDIFELLFWGRTS 326 I I I I I I I I I I I I I I I I I I I I I I I I I I 301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350
326 326 351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400
327 DKYYRVNLR 335 I I I I I I I I I 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450
336 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 I I I I II I I I I I I I I I I I I I II I I I I I I I 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478
Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZw:Q9BSH7
Sequence documentation:
Alignment of: T39971_P11 x Q9BSH7
Alignment segment 1/1: Quality: 3576.00 Escore: 0 Matching length: 363 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps: 1
Alignment :
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 . . . . . 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 301 VFEHFAMMQRDSWEDIFELLFWGRTS 326 I I I I I I I I I I I I I I I I II I I I II I I I 301 VFEHFA MQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 326 326
351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAM 400 327 DKYYRVNLR 335 I I I I I I I I I 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450
336 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478
Sequence name: /tmρ/fgebv7ir4i/48bTBMziJ0 : VTNCJiUMAN
Sequence documentation: Alignment of : T39971_P12 x VTNCJiUMAN
Alignment segment 1/1: Quality: 2237.00
Escore: 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFK 223 I I I II I I I I I I I I I I I I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFK 223
Sequence name: /tmp/fgebv7ir4i/48bTBMziJ0 :Q9BSH7
Sequence documentation:
Alignment of: T39971_P12 x Q9BSH7
Alignment segment 1/1:
Quality: 2237.00 Escore: 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 I I I I I I I I I I II M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
201 GIEGPIDAAFTRINCQGKTYLFK 223 I I I I I I I I I I I I I I I I I I I I I I I 201 GIEGPIDAAFTRINCQGKTYLFK 223
Expression of VTNC _HUMAN vitronectin (semm spreading factor, somatomedin B, complement S-protein), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971 junc23-33 in normal and cancerous ovary tissues Expression of VTNCJiUMAN vitronectin (semm spreading factor, somatomedin B, complement S-protein) transcripts detectable by or according to junc23-33, T39971 junc23-33 amplicon(s) and T39971 junc23-33F and T39971 junc23-33R primers was measured by real time PCR. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amp licon was normalized to the geometric mean of the quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, Table 1, above, 'Tissue samples in testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. Figure 30 is a histogram showing down regulation of the above- indicated VTNC HUMAN vitronectin (semm spreading factor, somatomedin B, complement S-protein), transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 30, the expression of VTNC HUMAN vitronectin (semm spreading factor, somatomedin B, complement S-protein), transcripts detectable by the above amplicon(s) in most cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 45-48 Table 1, above, "Tissue samples in testing panel"). Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T39971 junc23-33F forward primer; and T39971 junc23-33R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T39971 junc23- 33. T39971 junc23-33 Forward primer (SEQ ID NOTOOl): GGGGCAGAACCTCTGACAAG T39971 junc23-33 Reverse primer (SEQ ID NOM002): GGGCAGCCCAGCCAGTA T39971 junc23-33 Amplicon (SEQ ID NO: 1003): GGGGC AGA ACCTCTG AC AAGTACTACCGAGTC AATCTTCGCACACGGCG AGTG GAC ACTGTGGACCCTCCCTACCCACGCTCCATCGCTCAGTACTGGCTGGGCTGCCC
Expression of VTNCJiUMAN vitronectin (semm spreading factor, somatomedin B, complement S-protein), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33 in different normal tissues. Expression of VTNC HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts detectable by or according to T39971junc23-33 amplicon and T39971junc23-33F and T39971junc23-33R was measured byreal time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM 000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM 003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35, Table 2 "Tissue samples in normal panel" above), to obtain a value of relative expression of each sample relative to median of the breast samples. The results are described in Figure 31 , presenting the histogram showing the expression of T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33, in different normal tissues. Primers and amplicon are as above.
DESCRIPTION FOR CLUSTER Z44808 Cluster Z44808 features 5 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest . hscript Name ;?
Z44808 PEA 1 Ti l 607
Z44808 PEA 1 T4 608
Z44808 PEA 1 T5 609
Z44808 PEA 1 T8 610
Table 2 - Segments of interest
Table 3 - Proteins of interest Trotein Name Z44808 PEA 1 P5 634
These sequences are variants of the known protein SPARC related modular calcium- binding protein 2 precursor (SwissProt accession identifier SM02 HUMAN; known also according to the synonyms Secreted modular calcium-binding protein 2; SMOC-2; Smooth muscle-associated protein 2; SMAP-2; MSTPl 17), SEQ ID NO: 633, refened to herein as the previously known protein. The sequence for protein SPARC related modular calcium-binding protein 2 precursor is given at the end of the application, as "SPARC related modular calcium-binding protein 2 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein SPARC related modular calcium-binding protein 2 precursor localization is believed to be Secreted. Cluster Z44808 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 32 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 32 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, lung cancer and pancreas carcinoma.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue bladder 6.8e-01 7.6e-01 7Je-01 0.8 9.1e-01 0.6 bone 7.0e-01 8.8e-01 9.9e-01 o3" TT brain 6.8e-01 7.2e-01 3.0e-02 2.6 lJe-01 TT cobn 9.2e-03 1.3e-02 1.2e-01 3.6 1.6e-01 TT
As noted above, cluster Z44808 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein SPARC related modular calcium- binding protein 2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z44808 PEA 1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808 PEA 1 T4. An alignment is given to the known protein (SPARC related modular calcium- binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808_PEA_1_P5 and SM02 HUMAN: l.An isolated chimeric polypeptide encoding for Z44808 PEA 1 P5, comprising a first amino acid sequence being at least 90 %> homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNWIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKPvFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02_HUMAN, which also conesponds to amino acids 1 - 441 of Z44808JΕA 1 _P5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR conesponding to amino acids 442 - 464 of Z44808 PEAJ P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Z44808 PEAJ P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region..
Variant protein Z44808 PEA J P5 is encoded by the following transcript(s): Z44808 PEAJ JM, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808 PEA J T4 is shown in bold; this coding portion starts at position 586 and ends at position 1977. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEAJ P5 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Nucleic acid SNPs
Variant protein Z44808 PEA J P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808 PEAJ T5. An alignment is given to the known protein (SPARC related modular calcium- binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808_PEAJ_P6 and SM02_HUMAN: l.An isolated chimeric polypeptide encoding for Z44808 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPiPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKXEIJ PFKRFLRKT SKPKXCVKKFVEYCDWTNDKSISVQ ELMGCLGVAKEDGKADTKKRH conesponding to amino acids 1 - 428 of SM02_HUMAN, which also conesponds to amino acids 1 - 428 of Z44808 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RSKRNL conesponding to amino acids 429 - 434 of Z44808JPEAJ JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Z44808 PEA 1 P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RSKRNL in Z44808_PEA_1_P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z44808 PEAJ P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 8 - Amino acid mutations
Variant protein Z44808 PEAJ P6 is encoded by the following transcript(s): Z44808_PEAJ_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808_PEAJ_T5 is shown in bold; this coding portion starts at position 586 and ends at position 1887. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein Z44808 PEAJ P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808 PEAJ T9. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808_PEA_1_P7 and SM02 HUMAN: l.An isolated chimeric polypeptide encoding for Z44808 PEAJ P7, comprising a first amino acid sequence being at least 90 %> homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE
RVVHWYFKLLDKNSSGDIGKKΈΠO>FKRFLRKKSKPKKCVKKFVEYCDV W
ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02 HUMAN, which also conesponds to amino acids 1 - 441 of Z44808 PEAJ P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLWLRGKVSFYCF conesponding to amino acids 442 - 454 of Z44808 PEA J P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808 PEA 1 P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLWLRGKVSFYCF in Z44808_PEA_1_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein Z44808 PEA 1 P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEAJ P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein Z44808_PEAJ_P7 is encoded by the following transcript(s): Z44808_PEAJ_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808 PEA J T9 is shown in bold; this coding portion starts at position 586 and ends at position 1947. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 JΕAJ P7 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein Z44808 PEA _1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808 PEAJ T11. The identification of this transcript was performed using a non-EST based method for identification of alternative splicing, described in the following reference: "Sorek R et al, Genome Res. (2004) 14:1617-23." An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808 PEAJ P11 and SM02 HUMAN: l.An isolated chimeric polypeptide encoding for Z44808 PEAJ P11, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT conesponding to amino acids 1 - 170 of SM02_HUMAN, which also conesponds to amino acids 1 - 170 of Z44808_PEAJ_P11, and a second amino acid sequence being at least 90 %> homologous to DIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGL YKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQ GCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLD KNSSGDIGKKEIKPFKJ^LRKKSKPKKCVKjKFVEYCDViNNDKSISVQELMGCLGVAKE DGKADTKKRHTPRGHAESTSNRQPRKQG conesponding to amino acids 188 - 446 of SM02_HUMAN, which also conesponds to amino acids 171 - 429 of Z44808 PEAJ P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Z44808 PEAJ P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to -170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein Z44808_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEAJ P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Variant protem Z44808_PEAJ_P11 is encoded by the following transcript(s): Z44808 PEAJ JT 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808 PEAJ T11 is shown in bold; this coding portion starts at position 586 and ends at position 1872. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808 PEA J P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
As noted above, cluster Z44808 features 21 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z44808 PEA J node O according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JΕAJ T11, Z44808JΕAJ JM, Z44808JΕAJ _T5, Z44808_PEAJ_T8 and Z44808 PEAJ JT9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts Pranscnpt na ei-j #Segment startog-pΘSitιor Z44808 PEA 1 Ti l 669 Z44808 PEA 1 T4 669 Z44808 PEA 1 T5 669 Z44808 PEA 1 T8 669 Z44808 PEA 1 T9 669
Segment cluster Z44808_PEAJ_nodeJ6 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAJ_T11, Z44808JPEAJ T4, Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEA_1_T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z44808_PEA l_node_2 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1 Tl 1, Z44808_PEA_1_T4, Z44808J»EAJ_T5, Z44808JΕAJ T8 and Z44808_PEA_1_T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node_24 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAJ_T11, Z44808JΕAJ JM, Z44808JΕAJ T5, Z44808JΕAJ JT8 and Z44808_PEAJ_T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster Z44808_PEA l node 32 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJ JT4 and Z44808_PEAJ_T8. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node 33 according to the present invention is supported by 133 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEAJ T11, Z44808_PEA_1_T4 and Z44808 PEAJ T5. Table 20 below descπbes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node 36 according to the present invention is supported by 117 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): Z44808JΕA 1 T11, Z44808 PEAJJT4 and Z44808 PEA 1 T5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node _37 according to the present invention is supported by 120 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEAJ T11, Z44808_PEAJ_T4 and Z44808 PEA 1 T5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node_41 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEA 1 T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z44808_PEAJ_node l 1 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T4, Z44808_PEAJ_T5, Z44808_PEA _1_T8 and Z44808JPEAJ JT9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster Z44808_PEA_l_nodeJ3 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAJ_T11, Z44808JΕAJ T4, Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEA_1_T9. Table 25 below describes the starting and ending position of this segment on each transcript. 7α6/e 25 - Segment location on transcripts
Segment cluster Z44808_PEAJ_nodeJ8 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T1 1, Z44808_PEA_1_T4, Z44808_PEA_1_T5, Z44808JPEAJ T8 and Z44808_PEAJ_T9. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node_22 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T11, Z44808_PEA J T4, Z44808 _PEA_1_T5, Z44808 _PEA_1_T8 and Z44808_PEA_1_T9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster Z44808 PEA l_node_26 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAJ_T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster Z44808_PEA_l_node 30 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can'be found in the following transcript(s): Z44808_PEAJ_T11, Z44808 PEAJ T4, Z44808J>EAJ_T5, Z44808_PEA_1_T8 and Z44808_PEA_1_T9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Z44808 PEA 1 Ti l 1820 1857 Z44808 PEA 1 T4 1871 1908 Z44808 PEA 1 T5 1966 2003 Z44808 PEA 1 T8 1871 1908 Z44808 PEA 1 T9 1871 1908
Segment cluster Z44808_PEAJ_node_34 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T11, Z44808_PEAJ_T4 and Z44808_PEAJ_T5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster Z44808_PEA_l_node 35 according to the present invention can be found in the following transcript(s): Z44808JΕAJ T11, Z44808_PEAJ_T4 and Z44808 PEAJ T5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node 39 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEA 1 T9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster Z44808_PEAJ_node_4 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JΕAJ T11, Z44808 PEAJ T4, Z44808J>EAJ_T5, Z44808j^EA_l_T8 and Z44808JΕAJ T9. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts Segfhcnf starting position! Z44808 PEA 1 Til 842 948 Z44808 PEA 1 T4 842 948 Z44808 PEA 1 T5 842 948 Z44808 PEA 1 T8 842 948 Z44808 PEA 1 T9 842 948 Segment cluster Z44808_PEAJ_node_6 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEAJ T11, Z44808_PEAJ_T4, Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808 _PEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript.
Segment cluster Z44808_PEAJ_node_8 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 PEAJ T11, Z44808_PEAJ_T4, Z44808_PEA_1_T5, Z44808_PEA_1_T8 and Z44808_PEAJ_T9. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: /tmp/vUqLu6eAVZ/K3JDuPvaLo : SM02_HUMAN
Sequence documentation:
Alignment of: Z44808_PEA_1_P5 x SM02_HUMAN
Alignment segment 1/1: Quality: 4440.00
Escore: 0 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I II II II I I I I I II II I II II II I I I I I II II II II II II I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 . . . . . 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 I I I I I I I I i M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250 . . . . . CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
I II I I I II I I I I I I I I II I I II II I I I II I I I I II I I II I I I I I I I I I I I CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350
VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441
I I I II II I II I I I I I I I I II I II III II I I I I II II I I II I DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441
Sequence name: /tmp/QSUNfTsJ5y/kLOw5Vb6SD:SM02 iUMAN
Sequence documentation:
Alignment of: Z44808_PEA_1_P6 x SM02_HUMAN
Alignment segment 1/1:
Quality: 4310.00 Escore: 0 Matching length: 428 Total length: 428 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250
CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350
VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
DKSISVQELMGCLGVAKEDGKADTKKRH 428
I I II I I I I I I I I I I I I I I I I I I I I I I I I DKSISVQELMGCLGVAKEDGKADTKKRH 428 Sequence name: /tmp/MZVdR4PVdM/5uN8RwViJl : SM02_HUMAN
Sequence documentation:
Alignment of: Z44808 PEA_1_P7 x SM02 HUMAN
Alignment segment 1/1:
Quality: 4440.00 Escore: 0 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I I II I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 . . . . .
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 I I II II I I I I II I I II I I I I I I I I II II I I I I I II I I I I I I I I I I I II I I 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250
251 CHPSTGYCWCVLVDTGRPI PGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350
351 WHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
401 DKS ISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 DKS ISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 Sequence name: /tmp/3fGVxqLloe/J5mQduAdOF: SM02_HUMAN
Sequence documentation:
Alignment of: Z 808_PEA_1_P11 x SM02_HUMAN
Alignment segment 1/1:
Quality: 4228.00 Escore: 0 Matching length: 429 Total length: 446 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 96.19 Total Percent Identity: 96.19 Gaps : 1
Alignment :
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 51 PLCASDGRTFLSRCEFQRAKCKDPQLEI YRGNCKDVSRCVAERKYTQEQ 100 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
151 PRCPGSVNEKLPQREGTGKT DIASRYPTLWTEQ 183 I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
184 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 233 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I II
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250
234 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 283 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
284 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 333 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 . . . . .
334 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 383 I I I I I I I I I I I I I I I I I I I I I I I I ! II I I I I i I I I I I I I I I I I I II I I I 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
384 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 429 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 446 Expression of SM02 JTUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts, which are detectable by amplicon as depicted in sequence name Z44808 junc8- 11 in normal and cancerous ovary tissues Expression of SM02 HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to juncδ- 11, Z44808 junc8- 11 amplicon(s) and Z44808 junc8- 1 IF and Z44808 juncδ- 11R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA- amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, "Tissue sample in testing panel", above). The reciprocal of this ratio was then calculated, to obtain a value of fold down-regulation for each sample relative to median of the normal PM samples. Figure 33A is a histogram showing down regulation of the above -indicated SM02 JTUMAN SPARC related modular calcium-binding protein 2 precursor transcripts in cancerous ovary samples relative to the normal samples. As is evident from Figure 33A, the expression of SM02 JRJMAN SPARC related modular calcium-binding protein 2 precursor transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 45-48, 71, Table 1 , "Tissue sample in testing panel"). Notably down regulation of at least 5 fold was found in 33 out of 43 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of SM02 HUMAN SPARC related modular calcium- binding protein 2 precursor transcripts detectable by the above amplicon(s) in ovary cancer samples versus the normal tissue samples was determined by T test as 4.47E-05. Threshold of 5fold down regulation was found to differentiate between cancer and normal samples with P value of 1J5E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z44808 junc8-l IF forward primer; and Z44808 junc8- 11R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z44808 junc8-
11.
Z44808 junc8- 11 Forward primer (SEQ ID NO: 1004):
GAAGGCACAGGAAAAACAGATATTG
Z44808 junc8-l 1 Reverse primer (SEQ ID NOT005): TGGTGCTCTTGGTCACAGGAT Z44808 juncδ- 11 Amplicon (SEQ ID NO: 1006):
GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACA
GGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCATCCTGTGACCAAG
AGCACCA Expression of SM02 HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8- 11 in different normal tissues Expression of SM02 HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to Z44808 junc8-l 1 amplicon(s) and primers: Z44808 junc8-HF ' Z44808 junc8-HR was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM 003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2: Tissue samples in normal panel, above), to obtain a value of relative expression of each sample relative to median of the ovary samples. Results are shown in Figure 33B. Primers and amplicon are as above.
DESCRIPTION FOR CLUSTER S67314 Cluster S67314 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Fatty acid-binding protein, heart (SwissProt accession identifier FABH HUMAN; known also according to the synonyms H- FABP; Muscle fatty acid-binding protem; M-FABP; Mammary- derived growth inhibitor; MDGI), SEQ ID NO: 650, refened to herein as the previously known protein. Protein Fatty acid-binding protein is known or believed to have the following function(s): FABP are thought to play a role in the intracellular transport of long-chain fatty acids and their acyl-CoA esters. The sequence for protein Fatty acid-binding protein is given at the end of the application, as "Fatty acid-binding protein, heart amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Fatty acid-binding protein localization is believed to be cytoplasmic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: negative control of cell proliferation, which are annotation(s) related to Biological Process; and lipid binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster S67314 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Fatty acid- binding protein. A description of each variant protein according to the present invention is now provided.
Variant protein S67314 PEA 1 P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314_PEAJ_T4. An alignment is given to the known protein (Fatty acid-binding protein) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314 PEAJ P4 and FABHJ3UMAN: l.An isolated chimeric polypeptide encoding for S67314 PEAJ P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of S67314 PEAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL conesponding to amino acids 117 - 215 of S67314 PEAJ P4, wherein said firstand second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314 PEAJ JM, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRWSGGNFLCSGFGL
TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314 PEAJ P4. Comparison report between S67314_PEAJ_P4 and AAP35373 (SEQ ID NO: 1007): l.An isolated chimeric polypeptide encoding for S67314 PEAJ P4, comprising a first a ino acid sequence being at least 90 % homologous to
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF
KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314 PEA J P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least
95% homologous to a polypeptide having the sequence
VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL
TQAGTQILPYRLHDCGQITFSKCNCKTGrNNTNLVGLLGSL conesponding to amino acids 117 - 215 of S67314 PE A _1_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314 PEA 1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%o, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL
TQAGTQILPYRLHDCGQITFSKCNCKTGPNNTNLVGLLGSL in S67314_PEA_1_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans- membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein.. Variant protein S67314 PEAJ JM also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314 PEAJ JM sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein S67314JPEAJ JM is encoded by the following transcript(s): S67314 PEA 1 T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314 PEAJ JM is shown in bold; this coding portion starts at position 925 and ends at position 1569. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314 PEAJ P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein S67314_PEAJ_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314 PEA 1 T5. An alignment is given to the known protein (Fatty acid-binding protem, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314 PEAJ P5 and FABH HUMAN: 1.An isolated chimeric polypeptide encoding for S67314_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MVDAFLGTWKLVDSK FDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of S67314 PEAJ P5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV conesponding to amino acids 117 - 178 of S67314 PEAJ P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEAJ_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314_PEAJ_P5. Comparison report between S67314_PEA_1_P5 and AAP35373: l.An isolated chimeric polypeptide encoding for S67314_PEAJ_P5, comprising a first amino acid sequence being at least 90 %> homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314_PEAJ_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
DVLTAWPSΓYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG
KSIV conesponding to amino acids 117 - 178 of S67314 PEAJ P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEA J P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
DVLTAWPSΓYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans- membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein.. Variant protein S67314 PEAJ P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein S67314_PEA_1_P5 is encoded by the following transcript(s): S67314 PEA 1 T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314 PEA _1_T5 is shown in bold; this coding portion starts at position 925 and ends at position 1458. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67 14 PEAJ P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein S67314 PEA 1 P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314JPEAJ JT6. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314 PEA J J>6 and FABH JTUMAN: l.An isolated chimeric polypeptide encoding for S67314_PEAJ_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTΠEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of S67314_PEAJ_P6, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314 PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314 PEAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314JΕAJ J>6. Comparison report between S67314 PEAJ P6 and AAP35373: l.An isolated chimeric polypeptide encoding for S67314 PEAJ P6, comprising a first amino acid sequence being at least 90 %> homologous to
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314_PEAJ_P6, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314 PEA 1 P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314 PEAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEA_1_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans -membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein.. Variant protein S67314 PEA 1 P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the ammo acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEAJ_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein S67314 PEAJ P6 is encoded by the following transcnpt(s): S67314 PEAJ T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314 PEAJ T6 is shown in bold; this coding portion starts at position 925 and ends at position 1302. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314_PEAJ_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein S67314 PEAJ P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314 PEA J_T7. An alignment is given to the known protein (Fatty acid-binding protein) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314_PEAJ_P7 and FABH HUMAN: l.An isolated chimeric polypeptide encoding for S67314_PEAJ_P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of FABH HUMAN, which also conesponds to amino acids 1 - 24 of S67314 PEAJ P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314 PEAJ P7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of FABHJHUMAN, which also conesponds to amino acids 36 - 144 of S67314_PEAJ_P7, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314 PEA 1 P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314_PEA_1_P7. Comparison report between S67314_PEA_1_P7 and AAP35373: l.An isolated chimeric polypeptide encoding for S67314 PEA 1 J"J, comprising a first amino acid sequence being at least 90 %> homologous to
MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of AAP35373, which also conesponds to amino acids 1 - 24 of S67314 PEA 1 P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
AHILITFPLPS conesponding to amino acids 25 - 35 of S67314 PEAJ P7, and a third amino acid sequence being at least 90 % homologous to
GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of AAP35373, which also conesponds to amino acids 36 - 144 of
S67314 PEAJ P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314 PEAJ P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to
S67314_PEA_1_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314 PEAJ P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67 14 PEA 1 P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein S67314 PEA 1 P7 is encoded by the following transcript(s): S67314 PEAJ T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314 PEAJ T7 is shown in bold; this coding portion starts at position 925 and ends at position 1356. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protem S67314 PEAJ P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
As noted above, cluster S67314 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S67314_PEAJ_node_0 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314 PEAJ T4, S67314 PEA 1 T5, S67314 PEAJ T6 and S67314_PEA_1_T7. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster S67314_PEA_l_node_l 1 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314 PEA 1 T4. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts ^Transcnpt names' Segmentsfartingjposition •Segment ending ppsitionl WS > i %?.
Segment cluster S67314JPEAJ_node l3 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA J T7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster S67314_PEAJ_nodeJ5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314JΕAJ T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster S67314_PEAJ_nodeJ7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314 PEA 1 T6. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 18. Table 18 - Oligonucleotides related to this segment
Segment cluster S67314_PEA_l_node_4 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314 PEAJ T4, S67314 PEAJJT5, S67314 PEAJ T6 and S67314 PEA 1 T7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S67314_PEAJ_nodeJ0 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1_T4, S67314 PEAJ T5, S67314JΕA _1_T6 and S67314_PEA_1_T7. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster S67314_PEAJ_node_3 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314 PEA 1 T7. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5 : FABHJiUMAN Sequence documentation:
Alignment of: S67314 PEA_1_P4 x FABH HUMAN
Alignment segment 1/1
Quality: 1095.00 Escore: Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 I I I I I I I I I II II I II I I I II I II I II I I II I II I I I I I II I I I I I I I I I 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50 52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I II I II I I II I I II I I II I II I I I M I I I M I I I I I I I II I II I I I I I I I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100
102 ETTLVRELIDGKLIL 116
101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5 : AP35373
Sequence documentation: Alignment of: S67314_PEA_1_P4 x AAP35373
Alignment segment 1/1: Quality: 1107.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I II I I I I II 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 . . . . . 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 I I II I I II II I I I I II I I I I I I II I I I I II I II II I II II II II I I II I I 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 101 QETTLVRELIDGKLIL 116 I I II II I II II I I I 1 I 101 QETTLVRELIDGKLIL 116 Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW: FABHJiUMAN
Sequence documentation:
Alignment of: S67314_PEA_1_P5 x FABHJiUMAN
Alignment segment 1/1:
Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50
52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I II I I I I I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100
102 ETTLVRELIDGKLIL 116 101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW:AAP35373
Sequence documentation:
Alignment of: S67314_PEA_1_P5 x AAP35373
Alignment segment 1/1:
Quality: 1107.00 Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I j I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100
101 QETTLVRELIDGKLIL 116 II II II I I I I I II I I I 101 QETTLVRELIDGKLIL 116
Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX: FABHJiUMAN
Sequence documentation:
Alignment of : S67314_PEA_1_P6 x FABHJiUMAN
Alignment segment 1/1:
Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50
52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100
102 ETTLVRELIDGKLIL 116 I I I I I I I I I I I I I I 101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX:AAP35373
Sequence documentation:
Alignment of: S67314_PEA_1_P6 x AAP35373
Alignment segment 1/1:
Quality: 1107.00 Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: . . . . . 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100 I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDG 100
101 QETTLVRELIDGKLIL 116 I I I I I I I I I I I I I I I I 101 QETTLVRELIDGKLIL 116
Sequence name: /tmp/xYzWyViDom/twDu3T69pd: FABHJiUMAN
Sequence documentation: Alignment of : S 67314_PEA_1_P7 x FABHJiUMAN
Alignment segment 1/1: Quality: 1160.00
Escore: 0 Matching length: 132 Total length: 143 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.31 Total Percent Identity: 92.31 Gaps : 1
Alignment:
2 VDAFLGTWKLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKPT 51 I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I 1 VDAFLGTWKLVDSKNFDDYMKSL GVGFATRQVASMTKPT 39
52 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 101 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 40 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 89 102 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 90 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 132 Sequence name: /tmp/xYzWyViDom/twDu3T69pd:AAP35373
Sequence documentation:
Alignment of: S67314_PEA_1_P7 x AAP35373
Alignment segment 1/1:
Quality: 1172.00 Escore: 0 Matching length: 133 Total length: 144 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.36 Total Percent Identity: 92.36 Gaps : 1
Alignment:
1 MVDAFLGTWKLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKP 50 I I I I II II I I II I I II I I I II II I I II II I I I I I II I II 1 MVDAFLGTWKLVDSKNFDDYMKSL GVGFATRQVASMTKP 39
51 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 100
40 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 89
101 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144 90 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 133
DESCRIPTION FOR CLUSTER Z39337 Cluster Z39337 features 3 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Kallikrein 6 precursor (SwissProt accession identifier KLK6JTUMAN; known also according to the synonyms EC 3.4.21.-; Protease M; Neurosin; Zyme; SP59), SEQ ID NO: 670, refened to herein as the previously known protein. The sequence for protein Kallikrein 6 precursor is given at the end of the application, as "Kallikrein 6 precursor amino acid sequence". Protein Kallikrein 6 precursor localization is believed to be secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: central nervous system development; response to wounding; protein autoprocessing, which are annotation(s) related to Biological Process; chymotrypsin; tissue kallikrein; trypsin; protein binding; hydrolase, which are annotation(s) related to Molecular Function; and extracellular; cytoplasm, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster Z39337 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 34 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 34 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and gastric carcinoma.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster Z39337 features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Kallikrein 6 precursor. A description of each variant protein according to the present invention is now provided. Variant protein Z39337_PEA_2_PEA 1_P4 according to the present mvention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2_PEA 1_T3. An alignment is given to the known protein (Kallikrein 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z39337_PEA_2_PEA 1_P4 and KLK6_HUMAN: l.An isolated chimeric polypeptide encoding for Z39337_PEA_2_PEA 1_P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence MWLPLSGAA conesponding to amino acids 1 - 9 of Z39337_PEA_2_PEA 1_P4, and a second amino acid sequence being at least 90 %> homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSWRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADGDFPDTIQCAYIHLVSREECEHAYPGQ ITQNMLCAGDEKYGKDSCQGDSGGPLVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCR YTNWIQKTIQAK conesponding to amino acids 1 - 244 of KLK6 HUMAN, which also conesponds to amino acids 10 - 253 of Z39337_PEA_2_PEAJ_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z39337_PEA_2_PEAJ_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWLPLSGAA of Z39337_PEA_2_PEAJ_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein Z39337_PEA_2_PEAJ_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEAJ_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
The glycosylation sites of variant protein Z39337_PEA_2_PEAJ_P4, as compared to the known protein Kallikrein 6 precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein Z39337 JΕA_2_PEA_1_P4 is encoded by the following transcript(s): Z39337_PEA_2_PEAJ_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337 PEA 2 PEA _1_T3 is shown in bold; this coding portion starts at position 87 and ends at position 845. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein Z39337_PEA_2_PEAJ_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2_PEA_1_T12. An alignment is given to the known protein (Kallikrein 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protem according to the present invention to each such aligned protem is as follows: Comparison report between Z39337_PEA_2_PEA_1_P9 and KLK6_HUMAN: l.An isolated chimeric polypeptide encoding for Z39337_PEA_2_PEAJ_P9, comprising a first amino acid sequence being at least 90 % homologous to MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGVLIHPLWV LTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDAASHDQDIMLLRLARP AKLSELIQPLPLERDCSANTTSCHILGWGKTADG conesponding to amino acids 1 - 149 of KLK6JIUMAN, which also conesponds to amino acids 1 - 149 of Z39337_PEA_2_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence Q conesponding to amino acids 150 - 150 of Z39337_PEA_2_PEAJ_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide.
The glycosylation sites of variant protein Z39337 PEA 2 PEA J P9, as compared to the known protein Kallikrein 6 precursor, are described in Table 9 (given according to their positιon(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein Z39337_PEA_2_PEAJ_P9 is encoded by the following transcript(s): Z39337_PEA_2_PEAJ_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337_PEA_2_PEAJ_T12 is shown in bold; this coding portion starts at position 298 and ends at position 747. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEAJ_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein Z39337 PEA 2 PEAJ JM3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39337_PEA_2_PEA_1_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein Z39337_PEA_2_PEAJ_P13 is encoded by the following transcript(s): Z39337_PEA_2_PEAJ_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39337_PEA_2_PEAJ_T6 is shown in bold; this coding portion starts at position 298 and ends at position 417. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39337_PEA_2_PEAJ_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z39337_PEA_2_PEA_l_node_2 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAJ_T6 and Z39337_PEA_2_PEAJ_T12. Table 12 below describes the starting and ending position of this segment on each transcnpt. Table 12 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEA_l_nodeJ5 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAJ_T3, Z39337_PEA_2_PEAJ_T6 and Z39337_PEA_2_PEA_1_T12. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_nodeJ6 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T12. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_node_18 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T3 and Z39337_PEA_2_PEAJ_T6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_node_21 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T3 and Z39337_PEA_2_PEAJ_T6. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed m various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to ovarian cancer), shown in Table 17. Table 17 - Oligonucleotides related to this segment
Segment cluster Z39337_PEA_2_PEAJ_node_22 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): Z39337j°EA_2_PEAJ_T3 and Z39337_PEA_2_PEA_1_T6. Table 18 below describes the starting and ending position of this segment on each transcπpt. Table 18 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster Z39337 J°EA_2_PEAJ_node_3 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T6 and Z39337_PEA_2_PEAJ_T12. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_node_5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEA_1_T3. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_node_6 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAJ_T3, Z39337_PEA_2_PEA_1_T6 and Z39337J>EA_2J>EAJ_T12. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_nodeJ0 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAJ_T3 and Z39337_PEA_2_PEAJ_T12. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_node l 1 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the followmg transcript(s): Z39337_PEA_2_PEAJ_T3 and Z39337 PEA 2JPEAJ JT12. Table 23 below descπbes the starting and ending position of this segment on each transcπpt. Table 23 - Segment location on transcripts
Segment cluster Z39337_PEA_2_PEAJ_nodeJ4 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39337_PEA_2_PEAJ_T3, Z39337_PEA_2_PEA_1_T6 and Z39337_PEA_2_PEA_1_T12. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: KLK6JΪUMAN
Sequence documentation:
Alignment of: Z39337 PEA 2 PEA_1_P4 x KLK6 HUMAN
Alignment segment 1/1:
Quality: 2444.00 Escore : Matching length: 244 Total length: 244 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 10 MKKLMWLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 59 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MKKLMWLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50 60 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 109 I I I I M M I I M M I I I M I I I I I I M I I I I I M I I I I I I I M I M I M I 51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100
110 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADGD 159 I I M || M I I || M I M || I I I I I I I || || I I I I M I I I I I I I I I I I II I 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADGD 150
160 FPDTIQCAYIHLVSREECEHAYPGQITQNMLCAGDEKYGKDSCQGDSGGP 209 II II II II I I I II II I I II I I II I I II II I II I I I II II II I II II II II 151 FPDTIQCAYIHLVSREECEHAYPGQITQNMLCAGDEKYGKDSCQGDSGGP 200
210 LVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCRYTNWIQKTIQAK 253 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LVCGDHLRGLVSWGNIPCGSKEKPGVYTNVCRYTNWIQKTIQAK 244 Sequence name: KLK6_HUMAN
Sequence documentation:
Alignment of: Z39337_PEA_2_PEA_1_P9 x KLK6_HUMAN
Alignment segment 1/1:
Quality: 1471.00 Escore: 0 Matching length: 149 Total length: 149 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKKLMVVLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MKKLMWLSLIAAAWAEEQNKLVHGGPCDKTSHPYQAALYTSGHLLCGGV 50
51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100 I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LIHPLWVLTAAHCKKPNLQVFLGKHNLRQRESSQEQSSVVRAVIHPDYDA 100 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADG 149 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 ASHDQDIMLLRLARPAKLSELIQPLPLERDCSANTTSCHILGWGKTADG 149
DESCRIPTION FOR CLUSTER HUMPHOSLIP Cluster HUMPHOSLIP features 7 transcript(s) and 53 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Phosphohpid transfer protein precursor (SwissProt accession identifier PLTP HUMAN; known also according to the synonyms Lipid transfer protein II), SEQ ID NO: 734, refened to herein as the previously known protein. Protein Phosphohpid transfer protein precursor is known or believed to have the following function(s): Converts HDL into larger and smaller particles. May play a key role in extracellular phosphohpid transport and modulation of HDL particles. The sequence for protein Phosphohpid transfer protein precursor is given at the end of the application, as "Phosphohpid transfer protein precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Phosphohpid transfer protein precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: lipid metabolism; lipid transport, which are annotation(s) related to B logical Process; lipid binding, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below (with regard to ovarian cancer), shown in Table 5. Table 5 - Oligonucleotides related to this cluster
Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Phosphohpid transfer protein precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMPHOSLIP JΕA_2 JMO according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP J>EA_2_T17. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP _PEA_2_P 10 and PLTP_HUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP J>EA_2 JMO, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P10, and a second amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMK DPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMES YFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKP SGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSN HSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGPNFVHEVVTNHAGFLTI GADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV conesponding to amino acids 163 - 493 of PLTP_HUMAN, which also conesponds to amino acids 68 - 398 of HUMPHOSLIP ΕA_2 JMO, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMPHOSLIP PEA 2 P10, comprising a polypeptide having a length "n", wherem n is at least about 10 amino acids in length, optionally at least about 20 ammo acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a stmcture as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA_2 JMO also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the ammo acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP PEA 2 P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P10, as compared to the known protein Phosphohpid transfer protein precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein HUMPHOSLIP JΕA_2 JMO is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T17 is shown in bold; this coding portion starts at position 276 and ends at position 1469. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMPHOSLIP JΕA_2JM 2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T19. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP_PEA_2JM2 and PLTPJTUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP JΕA_2JM 2, comprising a first amino acid sequence being at least 90 %> homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRG AFFPLTERiNTWSLPNRAVEPQLQEEEPdVlVYVAFSEFFFDSAMESYFRAGALQLLLVGDK VPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVP PDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLK TMLQIGVMPMLN conesponding to amino acids 1 - 427 of PLTP HUMAN, which also conesponds to amino acids 1 - 427 of HUMPHOSLIP JΕA_2JM 2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKAGV conesponding to amino acids 428 - 432 of HUMPHOSLIP_PEA_2_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP J"*EA_2JM 2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GKAGV in HUMPHOSLIP_PEA_2_P 12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA_2JM 2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP J*EA_2JM 2, as compared to the known protein Phosphohpid transfer protein precursor, are described in Table 10 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 10 - Glycosylation site(s)
Vanant protein HUMPHOSLIP JΕA_2JM 2 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T19 is shown in bold; this coding portion starts at position 276 and ends at position 1571. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein HUMPHOSLIP_PEA_2_P30 according to the present invention has an amino acid sequence as given at the end of the applicatbn; it is encoded by transcript(s) HUMPHOSLIP J*EA_2 r6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA 2 P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP J>EA_2J>30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Variant protein HUMPHOSLIP_PEA_2_P30 is encoded by the following transcript(s): HUMPHOSLIP JΕA_2 T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T6 is shown in bold; this coding portion starts at position 276 and ends at position 431. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP J>EA_2J>30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HUMPHOSLIP_PEA_2_P31 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T7. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP J)EA_2J>31 and PLTP HUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P31, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTP HUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP PEA 2 P31, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPWGGSSLFLALDLTLRPPVG conesponding to amino acids 68 - 98 of HUMPHOSLIP_PEA_2_P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2_P31, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PGLERGADKFPWGGSSLFLALDLTLRPPVG in HUMPHOSLIP J EA_2 JP31.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP_PEA_2_P31 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P31 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 14 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP J>EA_2J>31 , as compared to the known protein Phosphohpid transfer protein precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protem). Table 15 - Glycosylation site(s)
Variant protein HUMPHOSLIP J?EA_2_P31 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP PEA 2 T7 is shown in bold; this coding portion starts at position 276 and ends at position 569. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP JPEA_2JP31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HUMPHOSLIP_PEA_2_P33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T14. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP _PEA_2_P33 and PLTP_HUMAN: 1 n isolated chimeric polypeptide encoding for HUMPHOSLIP J>EA_2J>33, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ conesponding to amino acids 1 - 183 of PLTP HUMAN, which also conesponds to amino acids 1 - 183 of HUMPHOSLIP J>EA_2_P33, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 184 - 200 of HUMPHOSLIP J>EA_2_P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2_P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP J>EA_2J>33.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA 2 P33 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP J)EA_2J>33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P33, as compared to the known protein Phosphohpid transfer protein precursor, are descnbed in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the vanant protein). Table 18 - Glycosylation site(s)
Variant protein HUMPHOSLIP PEA 2 P33 is encoded by the following transcript(s): HUMPHOSLIP J»EA_2_T 14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T14 is shown in bold; this coding portion starts at position 276 and ends at position 875. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HUMPHOSLIP PEA 2 P34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T16. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP PEA 2 P34 and PLTP_HUMAN: 1 An isolated chimeric polypeptide encoding for HUMPHOSLIP PEA 2 P34, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPV conesponding to amino acids 1 - 205 of PLTP_HUMAN, which also conesponds to amino acids 1 - 205 of HUMPHOSLIP_PEA_2_P34, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence LWTSLLALTIPS conesponding to amino acids 206 - 217 of HUMPHOSLIP J>EA_2J>34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP J>EA_2J>34, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LWTSLLALTIPS in HUMPHOSLIP _PEA_2_P34.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA 2 P34 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their posιtion(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP J>EA_2JP34, as compared to the known protein Phosphohpid transfer protein precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s)
Variant protein HUMPHOSLIP_PEA_2_P34 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T 16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIPJPEA 2 T16 is shown in bold; this coding portion starts at position 276 and ends at position 926. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP JΕA_2JP34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HUMPHOSLIP_PEA_2_P35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T18. An alignment is given to the known protein (Phosphohpid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP_PEA_2 _P35 and PLTP_HUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P35, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF conesponding to amino acids 1 - 109 of PLTP HUMAN, which also conesponds to amino acids 1 - 109 of HUMPHOSLIP PEA 2 P35, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 %> homologous to KVYDFLSTFITSGMRFLLNQQ conesponding to amino acids 163 - 183 of PLTP_HUMAN, which also conesponds to amino acids 111 - 131 of HUMPHOSLIP J>EA_2_P35, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 132 - 148 of HUMPHOSLIP_PEA_2_P35, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMPHOSLIP_PEA_2_P35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a stmcture as follows (numbering according to HUMPHOSLIP J>EA_2J>35): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2. 3.An isolated polypeptide encoding for a tail of HUMPHOSLIP J>EA_2J>35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP_PEA_2_P35.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPHOSLIP JΕA 2 P35 also has the following non-silent SNPs
(Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP J>EA_2J>35 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 23 - Amino acid mutations
The glycosylation sites of variant protein HUMPHOSLIP J?EA_2J>35, as compared to the known protein Phosphohpid transfer protein precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s)
Variant protein HUMPHOSLh°_PEA_2_P35 is encoded by the following transcript(s): HUMPHOSLIP JΕA_2_T 18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T18 is shown in bold; this coding portion starts at position 276 and ends at position 719. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP JPEA_2J**35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
As noted above, cluster HUMPHOSLIP features 53 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMPHOSLIP J"*EA_2_nodeJ) according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP PEA 2 T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2 JT 8 and HUMPHOSLIP_PEA_2_T19. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_nodeJ9 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16 and HUMPHOSLIP_PEA_2_T19. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_34 according to the present invention is supported by 191 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP J"ΕA_2JT6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T 14, HUMPHOSLIP _PEA_2_T 16, HUMPHOSLIPJ>EA_2_T17, HUMPHOSLIPJ>EA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HUMPHOSLIP J>EA_2_node 58 according to the present invention is supported by 131 libraries. The number of libraries was determined as previously described. This segment can be found m the followmg transcript(s): HUMPHOSLIP JPEA 2 T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIPJ"ΕA_2_T17, HUMPH0SLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 29 below describes the starting and ending position of this segment on each transcπpt. Table 29 - Segment location on transcripts
Segment cluster HUMPHOSLIP JΕA_2_node_70 according to the present invention is supported by 5 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP J>EA_2_T7, HUMPHOSLIP J>EA_2JT14, HUMPHOSLIPJ>EA_2_T16, HUMPHOSLIP JΕA_2_T17, HUMPH0SLIPJ>EA_2_T18 and HUMPHOSLIP J>EA_2_T19. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HUMPHOSLIP J>EA_2_node_75 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP J>EA_2_T7, HUMPHOSLIP J>EA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIPJ>EA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMPHOSLIP_PEA_2_node_2 according to the present invention is supported by 159 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIPJ>EA__2 ri8 and HUMPHOSLIP__PEA__2_T19. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HUMPHOSLIP JΕA_2_node 3 according to the present invention can be found in the following transcπpt(s): HUMPHOSLIP_PEA_2 __T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP__PEA__2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 33 below descnbes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMPHOSLIP JΕA_2_node_4 according to the present invention can be found in the following transcπpt(s): HUMPHOSLIP J>EA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_.PEA._2_T 16, HUMPHOSLIP__PEA__2__T17, HUMPHOSLIP_PEA__2_T18 and HUMPHOSLIP__PEA_2_T19. Table 34 below descπbes the starting and ending position of this segment on each transcπpt. Table 34 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_6 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA__2_T14, HUMPHOSLIP__PEA_2_T16, HUMPHOSLIP__PEA__2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_7 according to the present invention can be found in the following transcript(s): HUMPHOSLIP PEA 2 T6, HUMPHOSLIP_PEA__2_T7, HUMPHOSLIP__PEA__2_T14, HUMPHOSLIP__PEA_2__T16, HUMPHOSLIP_PEA__2__T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_8 according to the present invention is supported by 171 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA__2_T7, HUMPHOSLTP_PEA_2_JT4J HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLιP_PEA_2_T19. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_9 according to the present invention is supported by 168 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T 14, HUMPHOSLIP_PEA_2__T 16, HUMPHOSLIP__PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP J EA__2_T19. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_nodeJ4 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMPHOSLIP J"ΕA_2_nodeJ 5 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA__2__T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2__T18 and HUMPHOSLIP__PEA__2_T19. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_16 according to the present invention is supported by 179 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP__PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node J 7 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP__PEA_2__T7, HUMPHOSLIP_PEA__2__T14, HUMPHOSLIP__PEA_2_T16, HUMPHOSLIP PEA 2 T18 and HUMPHOSLIP_PEA_2_T19. Table 42 below describes the starting and ending position of this segment on each transcnpt. Table 42 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_23 according to the present invention is supported by 168 libranes. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP JΕA_2_T 14, HUMPHOSLIP_PEA_2_T 16, HUMPHOSLLP__PEA__2__T17, HUMPHOSLIP_PEA__2_T18 and HUMPHOSLIP_PEA_2__T19. Table 43 below descnbes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_24 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP__PEA__2__T7, HUMPHOSLIP_PEA__2__T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP__PEA__2__T17, HUMPHOSLIP_PEA__2__T18 and HUMPHOSLIP_PEA_2_T19. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HUMPHOSLIP J>EA_2_node_25 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T14 and HUMPHOSLIP_PEA_2_T18. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_26 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HUMPHOSLIP JΕA_2_T6, HUMPHOSLIP_PEA__2_T7, HUMPHOSLIP__PEA__2_T14, HUMPHOSLIP_PEA_2__T16, HUMPHOSLIP_PEA_2__T17, HUMPHOSLIP_PEA__2__T18 and HUMPHOSLIP_PEA_2_T19. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_29 according to the present invention can be found in the following transcript(s): HUMPHOSLIP PEA 2 T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMPHOSLIP JΕA_2_node_30 according to the present invention is supported by 181 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLTP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_33 according to the present invention is supported by 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_36 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_37 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_39 according to the present invention is supported by 166 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_40 according to the present invention is supported by 199 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLJT_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_41 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 54 below describes the startmg and endmg position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_42 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 55 below descnbes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_44 according to the present invention is supported by 185 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_45 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_Tl 6, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_47 according to the present invention is supported by 223 libraries. The number of libranes was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 58 below describes the startmg and ending position of this segment on each transcnpt. Table 58 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_51 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPH0SL1P_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster HUMPHOSLIP _PEA_2_node_52 according to the present invention is supported by 235 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_53 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T19. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_54 according to the present invention is supported by 236 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLJT_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_55 according to the present invention is supported by 232 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_58 according to the present mvention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA__2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2__T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_59 according to the present invention is supported by 230 libranes. The number of libraries was determmed as previously descπbed. This segment can be found in the following transcnpt(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 65 below descnbes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_60 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLJT_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_61 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLJT_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLJ _PEA_2_T19. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_62 according to the present inventbn can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_63 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_64 according to the present invention can be found m the following transcπpt(s). HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLJT_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 70 below descπbes the startmg and endmg position of this segment on each transcnpt. Table 70 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_65 according to the present invention can be found in the following transcπpt(s). HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 71 below descnbes the startmg and endmg position of this segment on each transcnpt Table 71 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_66 according to the present invention is supported by 180 libranes The number of libranes was determmed as previously descnbed. This segment can be found in the followmg transcπpt(s)- HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 72 below descπbes the starting and ending position of this segment on each transcnpt Table 72 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_67 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster HUMPHOSLIP _PEA_2_node_69 according to the present invention can be found in the following transcript(s): HUMPHOSLIP PEA 2 T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_71 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLJT__PEA_2_T7, HUMPH0SLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_72 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, 200
795 HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_73 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HTJMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster HUMPHOSLIP_PEA_2_node_74 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLJT_PEA_2_T18 and HUMPHOSLIP_PEA_2_T19. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: PLTPJiUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP PEA 2_P10 x PLTPJiUMAN Alignment segment 1/1:
Quality: 3716.00 Escore: 0 Matching length: 398 Total length: 493 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 80.73 Total Percent Identity: 80.73 Gaps : 1
Alignment: 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISE 67 I I I I I I I I I I I I I I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
67 67 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
68 KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 105 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200
106 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 155 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250
156 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 205 I I I I I I I I I I I I I I || I I I I I I I I I I I I M I I I I M I I I I I I I I M I I I I 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300
206 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 255 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 256 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 305 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 . . . . . 306 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVT 355 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 401 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEWT 450 356 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 398 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I 451 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 493
Sequence name: PLTPJiUMAN
Sequence documentation: Alignment of: HUMPHOSLIP_PEA_2_P12 x PLTPJiUMAN
Alignment segment 1/1:
Quality: 4101.00 Escore: 0 Matching length: 427 Total length: 427 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 . . . . . 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200
201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250
251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 RAVEPQLQEEERMVYVAFSEFFFDSA ESYFRAGALQLLLVGDKVPHDLD 300
301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400
401 ALESLALIPLQAPLKTMLQIGVMPMLN 427 I I I I I I I I I I I I I I I I I I I I I I I I I I 401 ALESLALIPLQAPLKTMLQIGVMPMLN 427
Sequence name: PLTPJiUMAN
Sequence documentation: Alignment of : HUMPHOSLI P_PEA_2_P31 x PLTPJiUMAN
Alignment segment 1/1: Quality: 639.00
Escore: 0 Matching length: 67 Total length: 67 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISE 67 I I I I I I I I I I I I I I I I I 51 IPDLRGKEGHFYYNISE 67
Sequence name: PLTPJiUMAN Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P33 x PLTPJiUMAN
Alignment segment 1/1:
Quality: 1767.00 Escore: 0 Matching length: 184 Total length: 184 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.46 Total Percent Similarity: 100.00 Total Percent Identity: 99.46 Gaps: 0
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQV 184 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184
Sequence name: PLTPJiUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P34 x PLTPJiUMAN
Alignment segment 1/1:
Quality: 1971.00 Escore: 0 Matching length: 205 Total length: 205 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I II I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200
201 DTVPV 205 I I I I I 201 DTVPV 205
Sequence name: PLTPJiUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P35 x PLTPJiUMAN
Alignment segment 1/1: Quality: 1158.00 Escore: 0 Matching length: 132 Total length: 184 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.48 Total Percent Similarity: 71.74 Total Percent Identity: 70.65 Gaps : 1
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 . . . . . 101 FRRQLLYWFL 110 I I I I I I II I : 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 111 KVYDFLSTFITSGMRFLLNQQV 132 I I I I I I I I I I I I I I I I I I I I I : 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184
DESCRIPTION FOR CLUSTER T59832 Cluster T59832 features 5 transcript(s) and 30 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Gamma- interferon inducible lysosomal thiol reductase precursor (SwissProt accession identifier GILT HUMAN; known also according to the synonyms Gamma- interferon- inducible protein IP-30), SEQ ID NO: 777, refened to herein as the previously known protein. Protein Gamma- interferon inducible lysosomal thiol reductase precursor is known or believed to have the following function(s): cleaves disulfide bonds in proteins by reduction. May facilitate the complete unfolding of proteins destined for lysosomal degradation. May be 808 involved in MHC class Il-restricted antigen processing. The sequence for protein Gamma- interferon inducible lysosomal thiol reductase precursor is given at the end of the application, as "Gamma- interferon inducible lysosomal thiol reductase precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Gamma- interferon inducible lysosomal thiol reductase precursor localization is believed to be Lysosomal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: extracellular; lysosome, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster T59832 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 35 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 35 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, breast malignant tumors, ovarian carcinoma and pancreas carcinoma.
Table 5 - Normal tissue distribution
adrenal 208 bladder 205 bone 200 brain 18 colon 236 epithelial 143 general 280 head and neck 192 kidney 71 liver 53 lung 459 lymph nodes 248 breast bone manow 94 ovary pancreas 20
Table 6-P values and ratios for expression in cancerous tissue
Thyroid 2.3e-01 2.3e-01 5.9e-02 2.5 5.9e-02 2.5 utems 7.4e-02 4Je-02 2.2e-02 2.0 6.2e-02 1J As noted above, cluster T59832 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Gamma- interferon inducible lysosomal thiol reductase precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T59832 P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T59832_P5 is encoded by the following transcript(s): T59832 T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832_T6 is shown in bold; this coding portion starts at position 149 and ends at position 715. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein T59832 P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832 T8. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832 P7 and GILT HUMAN: 1.An isolated chimeric polypeptide encoding for T59832 P7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG conesponding to amino acids 12 - 223 of GILT HUMAN, which also conesponds to amino acids 1 - 212 of T59832_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR conesponding to amino acids 213 - 238 of T59832 P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832 P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLΓVPWSQGWTRQRDQR in T59832_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein T59832 _P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein T59832_P7, as compared to the known protein Gamma- interferon inducible lysosomal thiol reductase precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protem T59832 P7 is encoded by the following transcπpt(s): T59832_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832 T8 is shown in bold; this coding portion starts at position 149 and ends at position 862. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T59832 P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832_T11. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832 P9 and GILT HUMAN: l.An isolated chimeric polypeptide encoding for T59832 P9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE conesponding to amino acids 12 - 214 of
GILT HUMAN, which also conesponds to amino acids 1 - 203 of T59832 P9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR conesponding to amino acids 204 - 244 of T59832 P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832 P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9.
The locatbn of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T59832 P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The glycosylation sites of variant protein T59832 P9, as compared to the known protein Gamma- interferon inducible lysosomal thiol reductase precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Variant protein T59832 P9 is encoded by the following transcript(s): T59832 T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832 T11 is shown in bold; this coding portion starts at position 149 and ends at position 880. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein T59832 P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832 T15. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832JM2 and GILT HUMAN: l.An isolated chimeric polypeptide encoding for T59832JM2, comprising a first amino 5 acid sequence being at least 90 %> homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 12 - 141 of GILT HUMAN, which also conesponds to amino acids 1 - 130 of T59832 P12, and a second amino acid sequence being at
10 least 90 %> homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 173 - 261 of GILT HUMAN, which also conesponds to amino acids 131 - 219 of T59832 P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential
15 order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832 P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at
20. least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of 25 different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. 30 Variant protein T59832 P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
The glycosylation sites ofvariant protein T59832 P12, as compared to the known protein Gamma- interferon inducible lysosomal thiol reductase precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Variant protein T59832 P12 is encoded by the following transcript(s): T59832_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832 T15 is shown in bold; this coding portion starts at position 149 and ends at position 805. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JM2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein T59832 P18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832 T22. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832 P18 and GILT HUMAN: l.An isolated chimeric polypeptide encoding for T59832JM8, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK conesponding to amino acids 12 - 55 of GILT HUMAN, which also conesponds to amino acids 1 - 44 of T59832JM8, and a second amino acid sequence being at least 90 %> homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK conesponding to amino acids 173 - 261 of GILT HUMAN, which also conesponds to amino acids 45 - 133 of T59832JM8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2An isolated chimeric polypeptide encoding for an edge portion of T59832 P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T59832 P18 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JM8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The glycosylation sites of variant protein T59832JM8, as compared to the known protein Gamma- interferon inducible lysosomal thiol reductase precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Variant protein T59832 P18 is encoded by the following transcript(s): T59832_T22, for which the sequence(s) is/are given, at the end of the application. The coding portion of transcript T59832 T22 is shown in bold; this coding portion starts at position 149 and ends at position 547. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
As noted above, cluster T59832 features 30 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster T59832_node_l according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832_T8, T59832_T11, T59832_T15 and T59832 T22. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T59832_node_7 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T59832_node_29 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T8. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster T59832_node_39 according to the present invention is supported by 195 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T59832_node_2 according to the present invention is supported by 258 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): T59832JT6, T59832JT8, T59832_T11, T59832_T15 and T59832 T22. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T59832_node_3 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster T59832_node_4 according to the present invention is supported by 296 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832 T8, T59832_T11, T59832 T15 and T59832 T22. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T59832_node_5 according to the present invention is supported by 305 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832_T8, T59832_T11, T59832 T15 and T59832 T22. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T59832_node_6 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster T59832_node_8 according to the present invention can be found in the following transcript(s): T59832JT6, T59832_T8, T59832_T11 and T59832 T15. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster T59832_node_9 according to the present invention is supported by 330 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832 T8, T59832_T11 and T59832_T15. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster T59832_node_10 according to the present invention is supported by 332 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832_T8, T59832_T11 and T59832_T15. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster T59832_node_l 1 according to the present invention is supported by 306 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6, T59832_T8, T59832 T11 and T59832 T15. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster T59832_node_12 according to the present invention is supported by 280 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11 and T59832_T15. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster T59832_node_14 according to the present invention is supported by 280 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832 T11 and T59832_T15. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster T59832_node_16 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832JT8, T59832_T11 and T59832_T15. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster T59832_node_19 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832 T6, T59832 T8 and T59832 T11. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster T59832_node_20 according to the present invention is supported by 318 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832 T8 and T59832 T11. Table 37 below describes the startmg and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster T59832_node_25 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 38 below describes the starting and endmg position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster T59832_node_26 according to the present invention is supported by 342 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): T59832 T6, T59832_T8, T59832 T11 , T59832_T15 and T59832 T22. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster T59832_node_27 according to the present invention is supported by 314 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832 T8, T59832_T11, T59832 T15 and T59832_T22. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster T59832 node_28 according to the present invention is supported by 284 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6, T59832_T8, T59832_T15 and T59832_T22. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster T59832_node_30 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster T59832_node_31 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster T59832_node_32 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6, T59832_T8, T59832_T11, T59832_T15 and T59832 T22. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster T59832_node_34 according to the present invention can be found in the following transcπpt(s): T59832 T6, T59832_T8, T59832 T11, T59832 T15 and T59832_T22. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster T59832_node_35 according to the present invention can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832JT22. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster T59832_node_36 according to the present invention can be found in the following transcript(s): T59832JT6, T59832_T8, T59832_T11, T59832_T15 and T59832_T22. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster T59832_node_37 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6, T59832JT8, T59832_T11, T59832_T15 and T59832 T22. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster T59832_node_38 according to the present invention is supported by 247 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T6, T59832_T8, T59832_T11, T59832_T15 and T59832 T22. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: GILTJiUMAN
Sequence documentation:
Alignment of: T59832_P7 x GILTJiUMAN Alignment segment 1/1
Quality: 2110.00 Escore: 0 Matching length: 212 Total length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I 11 I I I I I I I I I I 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 . . . . . 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211
201 PHEYVPWVTVNG 212 I I i I I I I I I I I I 212 PHEYVPWVTVNG 223
Sequence name: GILTJiUMAN
Sequence documentation: Alignment of : T59832_P9 x GILTJiUMAN
Alignment segment 1/1: Quality: 2016.00
Escore: 0 Matching length: 203 Total length: 203 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 . . . . . 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161
151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 201 PHE 203
212 PHE 214
Sequence name: GILTJiUMAN
Sequence documentation:
Alignment of : T59832_P12 x GILTJiUMAN
Alignment segment 1/1
Quality: 2084.00 Escore: 0 Matching length: 219 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.60 Total Percent Identity: 87.60 Gaps : 1
Alignment:
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVE 130 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 131 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 169 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211
170 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 219 I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261
Sequence name: GILTJiUMAN
Sequence documentation:
Alignment of : T59832_P18 x GILTJiUMAN
Alignment segment 1/1: Quality: 1222.00 Escore: 0 Matching length: 133 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 53.20 Total Percent Identity: 53.20 Gaps: 1
Alignment:
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK 44 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61
44 44 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111
44 44
112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 . . . . . 45 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 83 II I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 84 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 133 II I I I II I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261
Expression of Homo sapiens interferon, gamma- inducible protein 30 (IFI30) T59832 transcripts which are detectable by amplicon as depicted in sequence name T59832 junc6-25-26 in normal and cancerous Ovary tissues Expression of Homo sapiens interferon, gamma- inducible protein 30 (TFI30) transcripts detectable by or according to junc6-25-26, T59832 junc6-25-26 amplicon(s) and primers T59832 junc6-25-26F and T59832 junc6-25-26R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM 000194; amplicon - HPRTl - amplicon), SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 1, above), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was carried out no differential expression in the cancerous samples relative to the normal PM samples was observed, although this may be due a problem with this specific experiment.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T59832 junc6-25-26F forward primer; and T59832 junc6-25-26R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T59832 juncό- 25-26. Forward primer T59832 junc6-25-26F (SEQ ID NO :1008): CCACCAGTTAACTACAAGTGCCTG Reverse primer T59832 junc6-25-26R (SEQ ID NO :1009): GCGTGCATGAGCTGCATG Amplicon T59832 junc6-25-26 (SEQ ID NO :1010): CCACCAGTTAACTACAAGTGCCTGCAGCTCTACGCCCCAGGGCTGTCGCCAGACAC TATCATGGAGTGTGCAATGGGGGACCGCGGCATGCAGCTCATGCACGC
DESCRIPTION FOR CLUSTER HSCP2 Cluster HSCP2 features 12 transcript(s) and 50 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest HSCP2 PEA 1 P4 846 HSCP2_PEA_ 1 T4; HSCP2 PEA 1 T50 HSCP2 PEA 1 P8 847 HSCP2 PEA 1 T13 HSCP2 PEA 1 P14 848 HSCP2 PEA 1 T19 HSCP2 PEA 1 P15 849 HSCP2 PEA 1 T20 HSCP2 PEA 1 P2 850 HSCP2 PEA 1 T22 HSCP2 PEA 1 P16 851 HSCP2 PEA 1 T23 HSCP2 PEA 1 P6 852 HSCP2 PEA 1 T25 HSCP2 PEA 1 P22 853 HSCP2 PEA 1 T31 HSCP2 PEA 1 P24 854 HSCP2 PEA 1 T33 HSCP2 PEA 1 P25 855 HSCP2 PEA 1 T34 HSCP2 PEA 1 P33 856 HSCP2 PEA 1 T45
These sequences are variants of the known protein Cemloplasmm precursor (SwissProt accession identifier CERU_HUMAN; known also according to the synonyms EC 1.16.3.1; Fenoxidase), SEQ ID NO: 845, refened to herein as the previously known protein. Protem Ceruloplasmin precursor is known or believed to have the following functιon(s): Ceruloplasmin is a blue, copper-bindmg (6-7 atoms per molecule) glycoprotein found in plasma. Four possible functions are fenoxidase activity, amine oxidase activity, copper transport and homeostasis, and superoxide dismutase activity. The sequence for protem Cemloplasmm precursor is given at the end of the application, as "Cemloplasmm precursor ammo acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ion transport; copper ion transport; copper homeostasis; iron homeostasis, which are annotation(s) related to Biological Process; fenoxidase; copper ion transporter; copper binding; oxidoreductase, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSCP2 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 36 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 36 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: kidney malignant tumors and ovarian carcinoma.
Table 5 - Normal tissue distribution
bladder bone brain 48 epithelial 100
Table 6 - P values and ratios for expression in cancerous tissue
pancreas 2.3e-01 4.0e-01 1.2e-03 2.5 9.4e-03 1.8 prostate 9Je-01 9.3e-01 0.8 7.4e-05 1.3
Thyroid 5.0e-01 5.0e-01 6Je-01 1.5 6Je-01 1.5
Utems 2.4e-01 lJe-01 6.5e-04 2.1 7.2e-02 1.3 As noted above, cluster HSCP2 features 12 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Ceruloplasmin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSCP2 PEA 1 JM according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2_PEA_1_T4 and HSCP2_PEA_1_T50. An alignment is given to the known protein (Ceruloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEA 1 P4 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2_PEA_1_P4, comprising a first amino acid sequence being at least 90 %> homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD WGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICJ KDSLDKΪKΕKHIDREF MFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTPNLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQEC>TKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVPJ^NKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKTPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DFlTEKV^KDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETT YTVLQNE conesponding to amino acids 1 - 1060 of CERU HUMAN, which also conesponds to amino acids 1 - 1060 of HSCP2 PEAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGTSM conesponding to amino acids 1061 - 1065 of HSCP2 PEAJ JM, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA_1_P4, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GGTSM in HSCP2 PEAJJM.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2 PEAJJM also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7 (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
The glycosylation sites of variant protem HSCP2 PEAJ P4, as compared to the known protein Cemloplasmin precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the vanant protein). Table 8 - Glycosylation site(s)
Variant protein HSCP2_PEA_1 JM is encoded by the following transcript(s): HSCP2_PEA_1_T4 and HSCP2_PEA_1_T50, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T4 is shown in bold; this coding portion starts at position 250 and ends at position 3444. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
The coding portion of transcript HSCP2 PEAJ T50 is shown in bold; this coding portion starts at position 250 and ends at position 3444. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T13. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ J>8 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P8, comprising a first amino acid sequence being at least 90 % homologous to MJKILILGJTLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD WGRLYJKKALYLQYTDETFRTTIEKPVWLGFLGPΠJ AETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFλ^VMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTWGGSYΩ LVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNiKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDlTTGLIGPMKJCKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKTIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKJPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEK\T«JKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYK conesponding to amino acids 1 - 1006 of CERU_HUMAN, which also conesponds to amino acids 1 - 1006 of HSCP2 PEAJ P8, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KCFQEHLEFGYSTAM conesponding to amino acids 1007 - 1021 of HSCP2_PEA_1_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2 PEAJ P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KCFQEHLEFGYSTAM in HSCP2_PEA_1_P8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signat-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P8, as compared to the known protein Cemloplasmin precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P8 is encoded by the following transcript(s): HSCP2 PEAJ T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2_PEA_1_T13 is shown in bold; this coding portion starts at position 250 and ends at position 3312. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSCP2_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T19. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ JM4 and CERU_HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P14, comprising a first amino acid sequence being at least 90 %> homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD PJGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREF WMFS DENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTPNLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKJXLVYREYTDASFTNRKΈRGPEEEHLGILGPVΓWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH conesponding to amino acids 1 - 621 of CERU_HUMAN, which also conesponds to amino acids 1 - 621 of HSCP2_PEA_1_P14, a second amino acid sequence bridging amino acid sequence comprising of W, and a third amino acid sequence being at least 90 % homologous to
TFNVECLTTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGIL GPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGA GTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENE SWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYL MGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHV
TDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 694 - 1065 of CERU_HUMAN, which also conesponds to amino acids 623 - 994 of HSCP2 PEAJ JM4, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCP2_PEA_1 JM4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HWT having a structure as follows (numbering according to HSCP2 PEAJ P14): a sequence starting from any of amino acid numbers 621 -x to 621; and ending at any of amino acid numbers 623 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2 PEAJ P14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ JM4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
The glycosylation sites of variant protein HSCP2_PEA_1_P14, as compared to the known protein Cemloplasmin precursor, are described in Table 15 (given according to their positιon(s) on the ammo acid sequence in the first column; the second column mdicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Variant protein HSCP2_PEA_1_P14 is encoded by the following transcript(s): HSCP2 PEAJ T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T19 is shown in bold; this coding portion starts at position 250 and ends at position 3231. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSCP2 PEAJJM5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2_PEA_1_T20. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P15 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P15, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD mGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKX^NIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TT GGSYKJ LVYREYTDASFTNRKERGPEEEHLGILGPVTVVAEVGDTIRVTFHNKGAY PLSIEPIGλTUπ^KNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH 5 YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWK PERSGAGTEDSACIPWAY YSTVDQλT LYSGLIGPLIVCPTO'YLKVFNPPJlKLEFALLFLVFDENESWYLDDNIKTYS DHPEKViNXDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT l o VHFHGHSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGΓ LLHCHVTDHIHAGMETT YTVLQNE conesponding to amino acids 1 - 1060 of CERU HUMAN, which also conesponds to amino acids 1 - 1060 of HSCP2 PEAJ P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
15 GEYPASSETHRRIWNVIYPITVSVIILFQISTKE conesponding to amino acids 1061 - 1094 of HSCP2_PEA_1 JM5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2 PEAJ P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
20 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEYPASSETHRRIWNVIYPITVSVIILFQISTKE in HSCP2_PEA_1_P15.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 25 programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protem HSCP2_PEA_1_P15 also has the following non-silent SNPs (Single 30 Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P15, as compared to the known protem Cemloplasmin precursor, are described in Table 18 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P15 is encoded by the following transcript(s): HSCP2 PEAJ T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEA 1 T20 is shown in bold; this coding portion starts at position 250 and ends at position 3531. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1 JM5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T22. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ P2 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P2, comprising a first amino acid sequence being at least 90 % homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETF RTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKΌSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECINFKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVPJ^KNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYΎSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ conesponding to amino acids 1 - 761 of CERU HUMAN, which also conesponds to amino acids 1 - 761 of HSCP2 PEAJ P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence K conesponding to amino acids 762 - 762 of HSCP2 PEAJ P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCP2_PEA_1_P2 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P2, as compared to the known protein Cemloplasmin precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s)
Variant protein HSCP2_PEA_1_P2 is encoded by the following transcript(s): HSCP2 PEAJ T22, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T22 is shown in bold; this coding portion starts at position 250 and ends at position 2535. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HSCP2JPEAJJM6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T23. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJJM6 and CERU_HUMAN: l .An isolated chimeric polypeptide encoding for HSCP2 PEAJ P16, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKΉYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD RIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTP LFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVPJ^KNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCPTIQSEDSTFYLGERTYΎIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK VKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS
DHPEKVNKDDEEFIESNKMHAJTJGP^FGNLQGLTMHVGDEV^VVYLMGMGNEIDLHT VHFHGHSFQYKH conesponding to amino acids 1 - 1007 of CERU_HUMAN, which also conesponds to amino acids 1 - 1007 of HSCP2 PEAJ P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLRLTGEYGM conesponding to amino acids 1008 - 1017 of HSCP2 PEAJ P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2 PEAJ P16, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LLRLTGEYGM in HSCP2_PEA_1_P 16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2 PEAJ P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23 (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1 J 6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 JΕAJ JM6, as compared to the known protein Cemloplasmin precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s)
Variant protein HSCP2JPEAJ JP16 is encoded by the following transcript(s): HSCP2 PEAJ T23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T23 is shown in bold; this coding portion starts at position 250 and ends at position 3300. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T25. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P6 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P6, comprising a first 5 amino acid sequence being at least 90 %> homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD PJGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKTJIASGLIGPLΠCKJ DSLDKEKEKJΠ^ l o CSEPEKVDKTJNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNK-NYWDTΓNLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNXNNEGTYΎSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD
15 PVCLAKMYYS AVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYL FSAGNEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDH YTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ NVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDK
20 VKΠFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAY YSTVDQVTO)LYSGLIGPLIVCPJY,YLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYS DHPEKVNKDDEEFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHT VHFHGHSFQYK conesponding to amino acids 1 - 1006 of CERU_HUMAN, which also corresponds to amino acids 1 - 1006 of HSCP2 PEAJ P6, and a second amino acid sequence 25 being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSL conesponding to amino acids 1007 - 1009 of HSCP2 PEAJ P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
30 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protem localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P6, as compared to the known protein Cemloplasmin precursor, are described in Table 27 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the vanant protein; and the last column indicates whether the position is different on the variant protein). Table 27 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P6 is encoded by the following transcript(s): HSCP2 PEAJ T25, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T25 is shown in bold; this coding portion starts at position 250 and ends at position 3276. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column mdicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P6 sequence provides support for the deduced sequence of this variant protein according to the present mvention). Table 28 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEA J T31. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2_PEA_1_P22 and CERU_HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P22, comprising a first amino acid sequence being at least 90 %> homologous to
MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD WGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHE conesponding to amino acids 1 - 131 of CERU_HUMAN, which also conesponds to amino acids 1 - 131 of HSCP2 PEAJ P22, a second amino acid sequence bridging amino acid sequence comprising of A, and a third amino acid sequence being at least 90 % homologous to
V^GYWGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKiNTYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVTFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIW AEVGDTIRVTFHNKGAYPLSIEPIGVPvFNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLA1 MYYSAVDPTKDIFTGLI GPMKICKXGSLHANGRQJKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE
DFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKW YRQYTDSTFRVPVERKAEEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTE
SSTVTPTLPGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR PYLKVFNPPVRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKV TKDDEEFIESNKMHAI
NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 262 - 1065 of CERU_HUMAN, which also conesponds to amino acids 133 - 936 of HSCP2 PEAJ P22, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCP2 PEAJ P22, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EAV having a stmcture as follows (numbering according to HSCP2 PEAJ P22): a sequence starting from any of amino acid numbers 131-x to 131; and ending at any of amino acid numbers 133 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2 PEAJ P22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 29, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P22, as compared to the known protein Cemloplasmin precursor, are described in Table 30 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 30 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P22 is encoded by the following transcript(s): HSCP2 PEAJ T31, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEA J T31 is shown in bold; this coding portion starts at position 250 and ends at position 3057. The transcript also has the following SNPs as listed in Table 31 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 31 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P24 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T33. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ P24 and CERU HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P24, comprising a first amino acid sequence being at least 70%, optbnally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MPLTMGKRNLFLLTP conesponding to amino acids 1 - 15 of HSCP2 PEAJ P24, and a second amino acid sequence being at least 90 % homologous to VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRIDTINLFP ATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSSSKDNIRGKHVRHY YIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTN RKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAYPLSIEPIGV FNKNNEGTYYSPNY NPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLI GPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKEDE DFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAGNEADVHGIYFSGNTYLWR GERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNQCRRQSEDS TFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVV YRQYTDSTFRVPVEPT AEEEHLGILGPQLHADVGDKVKIffKNMATRPYSIHAHGVQTE SSTVTPTLPGETLTYVWKTPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRR PYLKVITWRRK EFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVYSSDVF DIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNEDTKSG conesponding to amino acids 262 - 1065 of CERU HUMAN, which also conesponds to amino acids 16 - 819 of HSCP2 PEAJ P24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HSCP2 PEAJ P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MPLTMGKRNLFLLTP of HSCP2_PEA_1_P24. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:N on-secretory protein,NN:YES) predicts that this protein has a signal peptide. Variant protein HSCP2_PEA_1_P24 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Amino acid mutations
The glycosylation sites of variant protem HSCP2 PEAJ P24, as compared to the known protein Cemloplasmin precursor, are described in Table 33 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 33 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P24 is encoded by the following transcnpt(s): HSCP2 PEAJ T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T33 is shown in bold; this coding portion starts at position 353 and ends at position 2809. The transcript also has the following SNPs as listed in Table 34 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 34 - Nucleic acid SNPs
Variant protein HSCP2 PEAJ P25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T34. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ P25 and CERU_HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P25, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD WGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTY CSEPEKVDKDNEDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVH AAFFHGQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFF QVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQG TTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTIRVTFHNKGAY PLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAPTETFTYEWTVPKEVGPTNAD PVCLAKMYYSAVDPTKDIFTGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENES LLLEDNIRMFTTAPDQVDKEDEDFQESNKMH conesponding to amino acids 1 - 621 of CERU_HUMAN, which also conesponds to amino acids 1 - 621 of HSCP2 PEAJ P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence CKYCIIHQSTKLF conesponding to amino acids 622 - 634 of
HSCP2 _PEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2_PEA_1_P25, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CKYCIIHQSTKLF in HSCP2_PEA_1_P25.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2_PEA_1_P25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 35, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P25, as compared to the known protein Cemloplasmin precursor, are described in Table 36 (given according to their position(s) on the amino acid sequence m the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 36 - Glycosylation site(s)
Variant protein HSCP2 PEAJ _P25 is encoded by the following transcript(s): HSCP2 PEAJ T34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T34 is shown in bold; this coding portion starts at position 250 and ends at position 2151. The transcript also has the following SNPs as listed in Table 37 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 37 - Nucleic acid SNPs
Variant protein HSCP2_PEAJ _P33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCP2 PEAJ T45. An alignment is given to the known protein (Cemloplasmin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCP2 PEAJ P33 and CERU_HUMAN: l.An isolated chimeric polypeptide encoding for HSCP2 PEAJ P33, comprising a first amino acid sequence being at least 90 % homologous to MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTEHSNIYLQNGPD mGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAETGDKVYVHLKNLASRPYTFHS HGITYYKEHEGAIYPDNTTDFQRADDKVYPGEQYTYMLLATEEQSPGEGDGNCVTRIY HSHIDAPKDIASGLIGPLIICKK conesponding to amino acids 1 - 202 of CERU HUM AN, which also conesponds to amino acids 1 - 202 of HSCP2 PEAJ P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC conesponding to amino acids 203 - 232 of HSCP2_PEA_1_P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCP2 PEAJ P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSPYCTCYMTKRQGQGSLSFKKKSSLLC in HSCP2_PEA_1_P33.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCP2_PEA_1_P33 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 38, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Amino acid mutations
The glycosylation sites of variant protein HSCP2 PEAJ P33, as compared to the known protein Cemloplasmin precursor, are described in Table 39 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 39 - Glycosylation site(s)
Variant protein HSCP2 PEAJ P33 is encoded by the following transcript(s): HSCP2 PEAJ JM5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCP2 PEAJ T45 is shown in bold; this coding portion starts at position 250 and ends at position 945. The transcript also has the following SNPs as listed in Table 40 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCP2 PEAJ P33 sequence provides support for the deduced sequence of this variant protein according to the present mvention). Table 40 - Nucleic acid SNPs
As noted above, cluster HSCP2 features 50 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A descπption of each segment according to the present invention is now provided. Segment cluster HSCP2_PEA_l_node_0 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEAJ_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34, HSCP2_PEA_1_T45 and HSCP2_PEA_1_T50. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_3 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T34, HSCP2_PEA_1_T45 and HSCP2 PEAJ T50. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_6 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEAJ_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEAJ_T25, HSCP2_PEA_1_T34, HSCP2_PEA_1_T45 and HSCP2_PEA_1_T50. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_8 accordmg to the present mvention is supported by 3 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T45. Table 44 below descnbes the starting and ending position of this segment on each transcnpt. Table 44 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_nodeJ0 accordmg to the present invention is supported by 57 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following trans cript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T34 and HSCP2_PEA_1_T50 Table 45 below describes the starting and endmg position of this segment on each transcπpt. Table 45 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_14 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEAJ_T25, HSCP2_PEA_1_T31 , HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_23 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEAJ_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_26 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 _PEA_1_T4, HSCP2 PEA _1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_29 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1 JT9, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_31 according to the present invention is supported by 72 libranes The number of libranes was determined as previously descπbed This segment can be found in the followmg transcnpt(s) HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1 _ T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2_PEA_1_T50 Table 50 below descnbes the starting and ending position of this segment on each transcπpt Table 50 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_32 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSCP2_PEA_1_T34. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_34 according to the present invention is supported by 65 libraries. The number of libranes was deteπnined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_52 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T22. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_58 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T23, HSCP2 PEAJ T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_72 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4 and HSCP2 PEAJ T50. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_73 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSCP2 PEAJ T4. Table 56 below describes the .starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_74 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster HSCP2JPEA _l_node_76 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_78 according to the present invention is supported by 67 libranes. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 59 below descnbes the starting and endmg position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_80 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSCP2 PEAJJM, HSCP2 PEAJ T13, HSCP2_PEAJ_T19, HSCP2_PEA_1_T31 and HSCP2_PEAJ_T33. Table 60 below describes the startmg and ending position of this segment on each transcπpt. Table 60 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_84 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T20 and HSCP2_PEA_1_T23. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
relate to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSCP2_PEA_l_node_4 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJJT13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2 PEAJ T34, HSCP2_PEA_1_T45 and HSCP2 PEAJ T50. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_7 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T34, HSCP2_PEA_1_T45 and HSCP2_PEA_1_T50. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_nodeJ3 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2 PEAJ T13, HSCP2_PEAJ_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA _1_T31 , HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_15 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2 PEA _1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_16 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_nodeJ8 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2 PEAJJT34 and HSCP2 PEAJ JT50. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_20 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 _PEA _1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2_PEA_1_T50. Table 68 below describes the starting and ending position of this segment on each transcnpt. Table 68 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_21 according to the present invention is supported by 49 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33, HSCP2_PEA_1_T34 and HSCP2 PEAJ T50. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_37 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_38 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts >
Segment cluster HSCP2_PEAJ_node_39 accordmg to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_41 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 73 below descnbes the starting and ending position of this segment on each transcnpt. Table 73 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_42 according to the present invention is supported by 18 libranes. The number of libranes was determmed as previously descnbed. This segment can be found in the following transcπpt(s): HSCP2 PEAJ T22. Table 74 below describes the starting and ending position of this segment on each transcnpt. Table 74 - Segment location on transcripts
2005/116850
942 Segment cluster HSCP2_PEAJ_node_46 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2__PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 75 below describes the starting and ending position of this segment on each transcript.
Segment cluster HSCP2_PEA_l_node_47 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in' the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA__1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_50 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_51 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_ T20, HSCP2_PEA_1_T22, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_55 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_56 accordmg to the present invention is supported by 58 libranes. The number of libranes was determined as previously descnbed This segment can be found in the following transcnpt(s) HSCP2 PEAJ T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50 Table 80 below descπbes the startmg and endmg position of this segment on each transcript Table 80 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_60 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJJM, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_61 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T23, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_67 according to the present invention can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T31 , HSCP2_PEA_1_T33 and HSCP2_PEA_1_T50. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Segment cluster HSCP2 PEA _l_node_68 according to the present invention is supported by 88 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T19, HSCP2_PEA_1_T20, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31, HSCP2_PEA_1_T33 and HSCP2 PEAJ T50. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Segment cluster HSCP2 _PEA_l_node_69 according to the present invention is supported by 96 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2_PEA_1_T13, HSCP2_PEA_1_T19, HSCP2 _PEA_1_T20, HSCP2_PEA_1_T25, HSCP2_PEA_1_T31 , HSCP2 PEAJ T33 and HSCP2 PEAJ T50. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_70 according to the present invention can be found in the followmg transcript(s): HSCP2_PEA_1_T20. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_75 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T4, HSCP2 PEAJ T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_77 according to the present invention is supported by 47 libraries. The number of libraries was determmed as previously described. This segment can be found in the following transcript(s): HSCP2_PEA_1_T4, HSCP2 PEA J T13, HSCP2_PEA_1_T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_79 according to the present invention is supported by 55 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSCP2_PEA J T4, HSCP2_PEA_1_T13, HSCP2JΕAJ T19, HSCP2_PEA_1_T31 and HSCP2_PEA_1_T33. Table 89 below describes the starting and endmg position of this segment on each transcript. Table 89 - Segment location on transcripts
Segment cluster HSCP2_PEA_l_node_82 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCP2 PEAJ T20 and HSCP2JPEAJ T23. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: CERUJiUMAN Sequence documentation:
Alignment of: HSCP2_PEA_1_P4 x CERUJiUMAN
Alignment segment 1/1:
Quality: 10630.00 Escore : Matching length: 1060 Total length: 1060 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I II I I I I I I I I II I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I II I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 . . . . .
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIR F 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I I I II I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800 I I I I I I I I I I I i I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 . . . . . 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I ! I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II
1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 1051 ETTYTVLQNE 1060 I I I I I I II I I 1051 ETTYTVLQNE 1060
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P8 x CERUJiUMAN
Alignment segment 1/1:
Quality: 10079.00 Escore: 0 Matching length: 1006 Total length: 1006 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50
HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLI IC 200 . . . . . KKDSLDKEKEKHIDREFWMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I KKDSLDKEKEKHI DREFVVMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250
EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 I I I I I I I II I II I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 . . . . .
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850
851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950
951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
1001 HSFQYK 1006 I I I I I I 1001 HSFQYK 1006
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA 1 P14 x CERU_HUMAN Alignment segment 1/1:
Quality: 9832.00 Escore: 0 Matching length: 994 Total length: 1065 Matching Percent Similarity: 99.90 Matching Percent Identity: 99.90 Total Percent Similarity: 93.24 Total Percent Identity: 93.24 Gaps : 1
Alignment: . . . . . 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 201 KKDSLDKEKEKHIDREFWMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I
201 KKDSLDKEKEKHIDREFWMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 . . . . .
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 601 TTAPDQVDKEDEDFQESNKMH 621 I I I II I I I I I I I I I I I II I I I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 . . . . .
622 WTFNVECL 629 I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
630 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYΫIAAVEVEWDYSPQRE 679 I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
680 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 729 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
730 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 779 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I ! 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850
780 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 829 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I
851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 . . . . .
830 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 879 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950
880 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 929 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
930 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 979 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050
980 ETTYTVLQNEDTKSG 994 I I I II I I I I I I I I II 1051 ETTYTVLQNEDTKSG 1065
Sequence name : CERUJiUMAN
Sequence documentation:
Alignment of : HSCP2_PEA_1_P15 x CERUJiUMAN
Alignment segment 1/1:
Quality: 10630.00 Escore: 0 Matching length: 1060 Total length: 1060 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 . . . . . 201 KKDSLDKEKEKHIDREFWMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250
• 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I II I I I I I II I I II I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 . . . . .
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I !
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 . . . . .
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
801 EEEHLGILGPQLHADVGDKVKI IFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 . . . . . 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950
951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I I I I I || I I I I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050
1051 ETTYTVLQNE 1060 I I I I I I I I I I 1051 ETTYTVLQNE 1060
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P2 x CERUJiUMAN
Alignment segment 1/1:
Quality: 7636.00
Escore: 0 Matching length: 761 Total length: 761 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50
51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 . . . . .
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLI IC 200
201 KKDSLDKEKEKHIDREFWMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 KKDSLDKEKEKHIDREFVVMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250
251 EDFQESNR YSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 . . . . .
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I II
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
751 WEKELHHLQEQ 761 I I I I I I I I I I I 751 WEKELHHLQEQ 761
Sequence name: CERUJiUMAN
Sequence documentation:
Al ignment of : HSCP2_PEA_1_P16 x CERUJiUMAN
Alignment segment 1/1:
Quality: 10092.00 Escore: 0 Matching length: 1007 Total length: 1007 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 . . . . .
201 KKDSLDKEKEKHIDREFVVMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I ! I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I I || I || I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 . . . . .
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850
851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900
901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950
951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
1001 HSFQYKH 1007 I I I I I I I 1001 HSFQYKH 1007
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P6 x CERUJiUMAN
Alignment segment 1/1: Quality: 10079.00
Escore: 0 Matching length: 1006 Total length: 1006 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I 1 MKILILGIFLFLCST-PAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50
51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I II I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KKDSLDKEKEKHIDREFWMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I II I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I 401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 . . . . .
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSVVWYLFSAG 650 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650
651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHM PDTEGTFNVECL 700 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I 701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKVVYRQYTDSTFRVPVERKA 800 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850
851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I 851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900
901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 I I I I I I I I !! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 . . . . . 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000
1001 HSFQYK 1006 I I I I I I 1001 HSFQYK 1006 Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P22 x CERUJiUMAN
Alignment segment 1/1
Quality: 9277.00 Escore: 0 Matching length: 936 Total length: 1065 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.89 Total Percent Similarity: 87.89 Total Percent Identity: 87.79 Gaps : 1
Alignment:
1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50
51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHE 131 I I II I I I I I I I I I I I I I I I II I I I I I I I I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
131 131
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
131 131
201 KKDSLDKEKEKHIDREFVVMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250
132 AVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 171 : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300
172 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 221 I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
222 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 271 I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
272 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 321 I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
322 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 371 I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 372 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 421 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I II I II 501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550
422 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 471 I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I 551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
472 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 521 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I
601 TTAPDQVDKEDEDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAG 650
522 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 571 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 651 NEADVHGIYFSGNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECL 700
572 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 621 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
701 TTDHYTGGMKQKYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQRE 750 . . . . .
622 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 671 I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I
751 WEKELHHLQEQNVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKA 800
672 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 721 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I 801 EEEHLGILGPQLHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTL 850
722 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 771 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I
851 PGETLTYVWKIPERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVC 900 772 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 821 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 901 RRPYLKVFNPRRKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDE 950 . . . . . 822 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 871 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I 951 EFIESNKMHAINGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHG 1000 872 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 921 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1001 HSFQYKHRGVYSSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGM 1050
922 ETTYTVLQNEDTKSG 936 I I I I I I I I I I I I I I I 1051 ETTYTVLQNEDTKSG 1065
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P24 x CERUJiUMAN
Alignment segment 1/1: Quality: 8074.00 Escore: 0 Matching length: 804 Total length: 804 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
16 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRI 65 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 262 VNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFHGQALTNKNYRI 311 66 DTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSS 115 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 312 DTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQAFFQVQECNKSS 361 . . . . . 116 SKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQ 165 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II 362 SKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTAPGSDSAVFFEQ 411 166 GTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTI 215 I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 412 GTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILGPVIWAEVGDTI 461
216 RVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAP 265 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 462 RVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSRSVPPSASHVAP 511 266 TETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLIGPMKICK 315 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I 512 TETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGLIGPMKICK 561 . . . . .
316 KGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKED 365 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I
562 KGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMFTTAPDQVDKED 611
366 EDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAGNEADVHGIYFS 415 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I
612 EDFQESNKMHSMNGFMYGNQPGLTMCKGDSWWYLFSAGNEADVHGIYFS 661
416 GNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQ 465 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I II I
662 GNTYLWRGERRDTANLFPQTSLTLHMWPDTEGTFNVECLTTDHYTGGMKQ 711
466 KYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ 515 I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 712 KYTVNQCRRQSEDSTFYLGERTYYIAAVEVEWDYSPQREWEKELHHLQEQ 761
516 NVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGILGPQ 565 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i
762 NVSNAFLDKGEFYIGSKYKKWYRQYTDSTFRVPVERKAEEEHLGILGPQ 811 . . . . .
566 LHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKI 615 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
812 LHADVGDKVKIIFKNMATRPYSIHAHGVQTESSTVTPTLPGETLTYVWKI 861
616 PERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPR 665 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 862 PERSGAGTEDSACIPWAYYSTVDQVKDLYSGLIGPLIVCRRPYLKVFNPR 911
666 RKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI 715 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 912 RKLEFALLFLVFDENESWYLDDNIKTYSDHPEKVNKDDEEFIESNKMHAI 961
716 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVY 765 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 962 NGRMFGNLQGLTMHVGDEVNWYLMGMGNEIDLHTVHFHGHSFQYKHRGVY 1011 . . . . . 766 SSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNED 815 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1012 SSDVFDIFPGTYQTLEMFPRTPGIWLLHCHVTDHIHAGMETTYTVLQNED 1061 816 TKSG 819 I I I I 1062 TKSG 1065
Sequence name: CERUJiUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P25 x CERUJiUMAN
Alignment segment 1/1: Quality: 6196.00 Escore: 0 Matching length: 621 Total length: 621 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50
51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 I I I II I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100 . . . . . 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I II I I I I I I I I II I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
201 KKDSLDKEKEKHIDREFVVMFSVVDENFSWYLEDNIKTYCSEPEKVDKDN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KKDSLDKEKEKHIDREFVVMFSWDENFSWYLEDNIKTYCSEPEKVDKDN 250 251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 I I I I I I II I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I II II I I I I I I
251 EDFQESNRMYSVNGYTFGSLPGLSMCAEDRVKWYLFGMGNEVDVHAAFFH 300 . . . . .
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 GQALTNKNYRIDTINLFPATLFDAYMVAQNPGEWMLSCQNLNHLKAGLQA 350
351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400 I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 FFQVQECNKSSSKDNIRGKHVRHYYIAAEEIIWNYAPSGIDIFTKENLTA 400
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
401 PGSDSAVFFEQGTTRIGGSYKKLVYREYTDASFTNRKERGPEEEHLGILG 450
451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 451 PVIWAEVGDTIRVTFHNKGAYPLSIEPIGVRFNKNNEGTYYSPNYNPQSR 500
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I
501 SVPPSASHVAPTETFTYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIF 550 . . . . .
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I
551 TGLIGPMKICKKGSLHANGRQKDVDKEFYLFPTVFDENESLLLEDNIRMF 600
601 TTAPDQVDKEDEDFQESNKMH 621 I I I I I I I I I I I I I I I I I I I II 601 TTAPDQVDKEDEDFQESNKMH 621
Sequence name: CERU HUMAN
Sequence documentation:
Alignment of: HSCP2_PEA_1_P33 x CERUJiUMAN
Alignment segment 1/1:
Quality: 2003.00 Escore: 0 Matching length: 202 Total length: 202 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 I I I I I I I I ! I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I i I I I I I I I 1 MKILILGIFLFLCSTPAWAKEKHYYIGIIETTWDYASDHGEKKLISVDTE 50 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPI IKAE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HSNIYLQNGPDRIGRLYKKALYLQYTDETFRTTIEKPVWLGFLGPIIKAE 100
101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 101 TGDKVYVHLKNLASRPYTFHSHGITYYKEHEGAIYPDNTTDFQRADDKVY 150
151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLI IC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 PGEQYTYMLLATEEQSPGEGDGNCVTRIYHSHIDAPKDIASGLIGPLIIC 200
201 KK 202
201 KK 202
DESCRIPTION FOR CLUSTER HUMTEN Cluster HUMTEN features 19 transcript(s) and 57 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
These sequences are variants of the known protein Tenascin precursor (SwissProt accession identifier TENA HUMAN; known also according to the synonyms TN; Hexabrachion; Cytotactin; Neuronectin; GMEM; JI; Miotendinous antigen; Glioma- associated- extracellular matrix antigen; GP 150-225; Tenascin-C; TN-C), SEQ ID NO: 933, refened to herein as the previously known protein. Protein Tenascin precursor is known or believed to have the following function(s): SAM (substrate- adhesion molecule) that appears to inhibit cell migration. May play a role in supporting the growth of epithelial tumors. Is a ligand for integrins alpha- 8/beta-l, alpha-9/beta- 1, alpha- v/beta-3 and alpha- v/beta-6. The sequence for protein Tenascin precursor is given at the end of the application, as "Tenascin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Tenascin precursor localization is believed to be secreted; extracellular matrix.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: DNA antagonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion, which are annotation(s) related to Biological Process; cell adhesion receptor; ligand binding or carrier; protein binding, which are annotation(s) related to Molecular Function; and extracellular matrix, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HUMTEN can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previous ly described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 37 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 37 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues, ovarian carcinoma, pancreas carcinoma and skin malignancies.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HUMTEN features 19 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Tenascin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMTEN_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T4. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P5 and TENA HUMAN V1: l.An isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P5, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHK-RQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKXSIPVSARVATYLPAPEGLKFKSIKETSVE\^WDPLDIAFETWEΠFPJ>IΙVFIIKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKTITFTTGLDAPPJ^LRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATPNAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVΈAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EWVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA T conesponding to amino acids 1 - 1525 of TENA HUMAN Vl, which also conesponds to amino acids 1 - 1525 of HUMTEN PEAJ P5, a second amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVPNVSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI conesponding to amino acids 1526 - 1617 of HUMTEN PEAJ P5, and a third amino acid sequence being at least 90 %> homologous to
TEALPLLENLTISDP PYGFTVSWMASENAFDSFLVTWDSGKLLDPQEFTLSGTQRKLE LRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADE GVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAI ATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTR LVKLIPGVEYLVSIIAMKGFEESEP VSGSFTTALDGPSGLVTANITDSEALARWQPAIATV DSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDL DSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLS PSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDK AQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLN KITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHN GRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKG HEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1526 - 2201 of TENA_HUMAN_V1, which also corresponds to amino acids 1618 - 2293 of HUMTEN PEAJ P5, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMTEN_PEA_1_P5, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence encoding for TEPKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVP VSDAHSLHESQQFTVSGDAKQAH ITGLVENTGYDVSVAGTTLAGDPTRPLTAFVI, conesponding to HUMTEN_PEA_1_P5.
It should be noted that the known protein sequence (TENA HUMAN; SEQ ID NO:933) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl (SEQ ID NO:934). These changes were previously known to occur and are listed in the table below. Table 7 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN_PEA_1_P5 also has the following non-silent SNPs (Smgle Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P5 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HUMTEN_PEAJ_P5 is encoded by the following transcript(s): HUMTEN JΕAJ _T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcπpt HUMTEN PEAJ T4 is shown in bold; this coding portion starts at position 348 and ends at position 7226. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T5. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P6 and TENA_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQΓVTTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIW^TRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTE conesponding to amino acids 1 - 1527 of TENA HUMAN V1, which also conesponds to amino acids 1 - 1527 of HUMTEN PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVTNVSDAHSLHESQQFTVSGDAKQAHIT GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN conesponding to amino acids 1528 - 1658 of HUMTEN PEAJ P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P6, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PKPQLGTLIFSNITPKSFNMSWTTQAGLFAKIVP ΓVSDAHSLHESQQFTVSGDAKQAHIT GLVENTGYDVSVAGTTLAGDPTRPLTAFVITGTQSEVLTCLTQREKEISHLKGKFNKNTI FTANVYSLIFN in HUMTEN_PEAJ_P6.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 10 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN PEAJ P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P6 is encoded by the following transcript(s): HUMTEN_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN PEAJ T5 is shown in bold; this coding portion starts at position 348 and ends at position 5321. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T6. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P7 and TENA HUMAN V1 : l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P7, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAG LAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLJKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATP YTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVT conesponding to amino acids 1 - 1617 of TENA HUMAN V1, which also conesponds to amino acids 1 - 1617 of HUMTEN PEAJ J?7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT conesponding to amino acids 1618 - 1673 of HUMTEN_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN PEAJ P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GISNQVSHLFLFLVPFCVICLPDRHDFNIFVHIPYLIHKCSLLFHLLPTLPLVICT in HUMTEN PEA 1 P7. It should be noted that the known protein sequence (TENA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl . These changes were previously known to occur and are listed in the table below. Table 13 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protem localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column mdicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P7 sequence provides support for the deduced sequence of this variant protem according to the present invention). Table 14 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P7 is encoded by the following transcript(s): HUMTEN PEAJ JT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T6 is shown in bold; this coding portion starts at position 348 and ends at position 5366. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Variant protem HUMTEN_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T7. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P8 and TENA_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDC HGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRC VDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKTKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAK DRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRA VDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEΠDSNRLLETVEYNISGAERTAHISGLPPSTDFΓVYLSGLAPSIRTKTISATA
T conesponding to amino acids 1 - 1525 of TENA HUMAN V1, which also conesponds to amino acids 1 - 1525 of HUMTEN_PEA_1_P8, and a second amino acid sequence being at least 90 % homologous to TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTKKQSEPLEITLLAPERTRD LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTΓIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HRVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1617 - 2201 of TENA HUMAN V1, which also conesponds to amino acids 1526 - 2110 of HUMTEN JΕAJ _P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TT, having a structure as follows: a sequence starting from any of amino acid numbers 1525-x to 1525; and ending at any of amino acid numbers 1526+ ((n-2) - x), in which x varies
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 16 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Variant protein HUMTEN PEAJ P8 is encoded by the following transcript(s): HUMTEN_PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T7 is shown in bold; this coding portion starts at position 348 and ends at position 6677. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein HUMTEN_PEAJ_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN PEAJ Tl 1. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN PEAJ PIO and TENA HUMAN Vl: l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ PIO, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAG LAFLALATEGGVLKKVIRHKRQSGWATLPEENQPV NHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRP IPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKXSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EWVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVL conesponding to amino acids 1 - 1252 of TENA HUMAN Vl, which also conesponds to amino acids 1 - 1252 of
HUMTEN PEAJ PIO, and a second amino acid sequence being at least 90 %> homologous to TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNL NKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYH NGRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWK GHEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1344 - 2201 of TENA HUMAN Vl, which also conesponds to amino acids 1253 - 2110 of HUMTEN_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of
HUMTEN_PEA_1_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LT, having a stmcture as follows: a sequence starting from any of amino acid numbers 1252-x to 1252; and ending at any of amino acid numbers 1253+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 19 - Changes to TENA_HUMAN_V1
T e ocat on o the va ant prote n was eterm ne accor ng to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN PEAJ PIO also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ PIO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P10 is encoded by the following transcript(s): HUMTEN PEAJ Tl 1, for which the sequence(s) is/are given at the end of the application. The codmg portion of transcript HUMTEN_PEA_1_T11 is shown in bold; this coding portion starts at position 348 and ends at position 6677. The transcript also has the following SNPs as listed in Table 21 (given accordmg to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present mvention). Table 21 - Nucleic acid SNPs
Variant protein HUMTEN PEAJ P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T14. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P11 and TENA HUMAN Vl : l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ Pl 1, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQΓVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPA EGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLfflVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKLETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQ conesponding to amino acids 1 - 1149 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1149 of HUMTEN PEAJJM 1, and a second amino acid sequence being at least 90 %> homologous to
GYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMATDGIFETFTIEIIDSNRLLETVEY NISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDPNPYGFTV SWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVSGFTQGH QTKPLRAEΓVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKTRDTKKQSEPLEI TLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSAT VSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEE SEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSG NTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTW RPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMI QTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVF LRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGET AFAVYDJKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALS YKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLE GRRKRA conesponding to amino acids 1423 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1150 - 1928 of HUMTEN PEAJ Pl 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMTEN PEAJ Pl 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise QG, having a stmcture as follows: a sequence starting from any of amino acid numbers 1149-x to 1149; and ending at any of amino acid numbers 1150+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 22 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMTEN_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ Pl 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Variant protein HUMTEN PEAJ Pl 1 is encoded by the following transcript(s): HUMTEN PEAJ JT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T1 is shown in bold; this coding portion starts at position 348 and ends at position 6131. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ JM 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T16. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P13 and TENAJTUMANJvT : l.An isolated chimeric polypeptide encoding for HUMTEN_PEAJ_P13, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKJΛQSGVNATLPEENQPVVFNHVyi IK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKTKSIKETSVEVEWDPLDIAFETWEΠFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVVVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK
AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW conesponding to amino acids 1 - 1343 of TENA HUMAN Vl, which also corresponds to amino acids 1 - 1343 of HUMTEN PEAJ JM3, and a second amino acid sequence being at least 90 % homologous to
TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV KLIPGVΈYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVTEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDS PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS
THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ
ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT
AQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSGTAGDSMAYHNGRS FSTFDKDTDS AITNC ALS YKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH
SIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1708 - 2201 of TENA HUMAN Vl, which also conesponds to amino acids 1344 - 1837 of HUMTEN PEAJ P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of
HUMTEN PEAJJM3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VT, having a structure as follows: a sequence starting from any of amino acid numbers 1343-x to 1343; and ending at any of amino acid numbers 1344+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 25 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMTEN_PEA_1_P13 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P13 is encoded by the following transcript(s): HUMTEN_PEA_1_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN PEAJ T16 is shown in bold; this coding portion starts at position 348 and ends at position 5858. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T17. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P14 and TENA HUMAN Vl : 1.An isolated chimeric polypeptide encoding for HUMTEN _PEA_1_P 14, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQTVFTHRP IPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKXSJ VSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG
EITKSLRRPETSYRQTGLAPGQEYEISLHIVK-NNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFLKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSΓYGVIQGYRTPVLSAEASTGETPNLG EWVAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLEINILTISDJTJPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAETVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKTQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD
KAQALEVFCDMTSDGGGWIV conesponding to amino acids 1 - 2025 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 2025 of HUMTEN_PEA_1_P14, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF conesponding to amino acids 2026 - 2091 of HUMTEN PEAJ P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMTEN PEAJ P14, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence STTRDCRALRPRGRGRGQSRGGEEGDLLLMHSDTPMCEALQDSACHTEALRNSLLNKR MGNTLATF in HUMTEN PEA 1 P14.
It should be noted that the known protein sequence (TENA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 28 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN_PEA_1_P14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 29, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJJM4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P14 is encoded by the following transcript(s): HUMTEN PEAJ T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T17 is shown in bold; this coding portion starts at position 348 and ends at position 6620. The transcript also has the following SNPs as listed in Table 30 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 30 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN PEAJ T18. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN PEAJ P15 and TENA_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ P15, comprising a first amino acid sequence being at least 90 % homologous to
MGAMTQLLAG LAFLALATEGGVLKKVIRHKRQSGWATLPEENQP FNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLC VCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCPNGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSJTVSARVATYLPAPEGLKTKSIKETSVEVEWDPLDIAFETWEIIFRNIMNKEDEG
EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL
TAEKGRHKSKPARVKAS conesponding to amino acids 1 - 1070 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1070 of HUMTEN PEAJ P15, and a second amino acid sequence being at least 90 % homologous to
TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKTRDTKKQSEPLEITLLAPERTRD LTGLREATEYEIELYGISKGRRSQTVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQV ESFRITYVPITGGTPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTA LDGPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWRPPRASVTGYL LVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYP FPKDCSQAMLNGDTTSGLYTIYLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENF YQNWKAYAAGFGDRREEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSV GDAKTRYKLKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNC HR\^LMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1617 - 2201 of TENA_HUMAN_V1, which also conesponds to amino acids 1071 - 1655 of HUMTEN JΕAJ P 15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl . These changes were previously known to occur and are listed in the table below. Table 31 - Changes to TENA_HUMAN_V1
The location of the variant protein was deteπnined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEAJ_P15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Amino acid mutations
Variant protein HUMTEN PEAJ P15 is encoded by the following transcript(s): HUMTENJΕAJ JT8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN PEAJ T18 is shown in bold; this coding portion starts at position 348 and ends at position 5312. The transcript also has the following SNPs as listed in Table 33 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Nucleic acid SNPs
Variant protein HUMTEN PEAJ PIO according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T19. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P16 and TENA_HUMAN_V1: 1.An isolated chimeric polypeptide encoding for HUMTENJPEA _1_P 16, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAG LAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWHIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATPNAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAS conesponding to amino acids 1 - 1070 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1070 of HUMTEN_PEA_1_P16, and a second amino acid sequence being at least 90 % homologous to
TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQTRLV KLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIATVDS YVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDS PRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPS THYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ ALEVTCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGLDNLNKIT AQGQYELRVDLRDHGETAFA VYDKFS VGDAKTRYKLKVEGYSGTAGDSMA YHNGRS FSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRYGDNNHSQGVNWFHWKGHEH SIQFAEMKLRPSNFRNLEGRRKRA conesponding to amino acids 1708 - 2201 of TENA HUMAN Vl, which also conesponds to amino acids 1071 - 1564 of HUMTEN PEAJ JM6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMTEN_PEA_1_P16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise ST, having a structure as follows: a sequence starting from any of amino acid numbers 1070-x to 1070; and ending at any of amino acid numbers 1071+ ((n-2) - x), in which x varies from 0 to n-2. It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl . These changes were previously known to occur and are listed in the table below.
Table 34 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN PEAJ PIO also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 35, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ PIO sequence provides support for the deduced sequence of this variant protem according to the present mvention). Table 35 - Amino acid mutations
Variant protein HUMTEN PEAJ PIO is encoded by the following transcript(s): HUMTEN_PEA_1_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T19 is shown in bold; this coding portion starts at position 348 and ends at position 5039. The transcript also has the following SNPs as listed in Table 36 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ PIO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Nucleic acid SNPs
Variant protein HUMTEN PEAJ JM7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T20. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN PEAJJM7 and TENA HUMAN Vl : l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ JM7, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKXIQSGVTSrATLPEENQPVVFNHVYNiK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSFFITITSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDP PYGFTVS WMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEΓVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD
KAQALEVFCDMTSDGGGWIV conesponding to amino acids 1 - 2025 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 2025 of HUMTEN_PEA_1_P17, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST conesponding to amino acids 2026 - 2067 of HUMTEN_PEA_1_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P17, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPWPTTMADPSPPLTRTQIQPSPTVLCPTKGLSGTGTVTVST in HUMTEN PEA 1 P17.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 37 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMTEN PEAJ P17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 38, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P17 is encoded by the following transcript(s): HUMTEN PEAJ T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T20 is shown in bold; this coding portion starts at position 348 and ends at position 6548. The transcript also has the following SNPs as listed in Table 39 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 39 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P20 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN PEAJ T23. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P20 and TENA HUMAN Vl: l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ J>20, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGWLAFLALATEGGVLKKVIrøKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRTNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKXSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNiVINKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVXNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EV WAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSTYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLEiNLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA IATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVDGTKTQT RLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTANITDSEALARWQPAIAT VDSYVISYTGEKVPEITRTVSGNTVEYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTD LDSPRDLTATEVQSETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADL SPSTHYTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGD KΛQALEVFCDMTSDGGGWIVTLRRKNGREiNrFYQNWKAYAAGFGDRREEFWLG conesponding to amino acids 1 - 2057 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 2057 of HUMTEN_PEA_1_P20, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence NAALHVYI conesponding to amino acids 2058 - 2065 of HUMTEN_PEA_1_P20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN PEA J P20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homobgous to the sequence NAALHVYI in HUMTEN_PEA_1_P20.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 40 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN PEAJ J>20 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed m Table 41, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P20 is encoded by the following transcript(s): HUMTEN PEAJ T23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T23 is shown in bold; this coding portion starts at position 348 and ends at position 6542. The transcript also has the following SNPs as listed in Table 42 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 42 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P26 according to the present mvention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN PEAJ T32. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P26 and TENA_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P26, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGGVLJKKVIRHKRQSGV ATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQΓVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCPNGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW TEDLPQLGDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDITPESFNLSWMA TDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPPSTDFIVYLSGLAPSIRTKTISATA TTEALPLLENLTISDINPYGFTVSWMASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKL ELRGLITGIGYEVMVSGFTQGHQTKPLRAEΓVTEAEPEVDNLLVSDATPDGFRLSWTAD EGVFDNFVLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSA
I ATT conesponding to amino acids 1 - 1708 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 1708 of HUMTEN_PEA_1_P26, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GTVNKQERTEKSHDSGVFFSQG conesponding to amino acids 1709 - 1730 of HUMTEN PEAJ P26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMTEN PEAJ P26, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GTVNKQERTEKSHDSGVFFSQG in HUMTEN PEAJ P26. It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 43 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA_1_P26 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 44, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 44 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P26 is encoded by the following transcript(s): HUMTEN PEAJ T32, for which the sequence(s) is/are given at the end of the application The coding portion of transcript HUMTEN_PEA_1_T32 is shown in bold; this coding portion starts at position 348 and ends at position 5537. The transcript also has the following SNPs as listed in Table 45 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 45 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P27 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T35. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN JΕAJ P27 and TENA_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P27, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQP FNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFPJvTMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVJ NNTRGPGLJKRVTTTRLDAPSQIEVKJ VT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATP YTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVE AAQNLTVPGGLRSTDLPGLK AATHYTITIRGVTQDFSTTPLSVEVLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYD QFTIQVQEADQVEEAHNLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEW T conesponding to amino acids 1 - 1344 of TENA HUMAN Vl, which also conesponds to amino acids 1 - 1344 of HUMTEN PEAJ P27, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence Gl conesponding to amino acids 1345 - 1346 of HUMTEN_PEA_1_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 46 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA_1_P27 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 47, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 47 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P27 is encoded by the following transcript(s): HUMTEN_PEA_1_T35, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T35 is shown in bold; this coding portion starts at position 348 and ends at position 4385. The transcript also has the following SNPs as listed in Table 48 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 48 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P28 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T36. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P28 and TENA_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P28, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKXSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFPJ>« NKEDEG . EIj^SLRRPETSYRQTGLAPGQEYEISLHTVKJ mTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDJTA
EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATPNAATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQAYEHFIIQVQE ANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQGYRTPVLSAEASTGETPNLG EVWAEVGWDALKLNWTAPEGAYEYFFIQVQEADTVEAAQNLTVPGGLRSTDLPGLK
AATHYTITIRGVTQDFSTTPLSVEVLT conesponding to amino acids 1 - 1253 of TENA HUMAN Vl, which also conesponds to amino acids 1 - 1253 of HUMTEN_PEA_1_P28, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence
GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ conesponding to amino acids 1254 - 1292 of HUMTEN_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P28, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GILDEFTNSLPPLCLCSGGIKALSCFKLGSAPTTLGKYQ in HUMTEN_PEA_1_P28. It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 49 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA_1_P28 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 50, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ _P28 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 50 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P28 is encoded by the following transcript(s): HUMTEN_PEA_1_T36, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T36 is shown in bold; this coding portion starts at position 348 and ends at position 4223. The transcript also has the following SNPs as listed in Table 51 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P28 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 51 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T37. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P29 and TENA_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ P29, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAGVFLAFLALATEGG\T.KKVIRHK-RQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINJTRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LEINIKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKJΓNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTGLRPGTEYGIGVSAVKEDKESNPATP AATELDTPKDLQVSE TAETSLTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYNVLL TAEKGRHKSKPARVKAST conesponding to amino acids 1 - 1071 of TENA_HUMAN_V1 , which also conesponds to amino acids 1 - 1071 of HUMTEN_PEA_1_P29, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homobgous to a polypeptide having the sequence GESALSFLQTLG conesponding to amino acids 1072 - 1083 of HUMTEN_PEA_1_P29, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P29, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GESALSFLQTLG in HUMTEN_PEA_1_P29.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 52 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA_1_P29 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 53, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 53 - Amino acid mutations
Variant protein HUMTEN PEAJ P29 is encoded by the following transcript(s): HUMTEN_PEA_1_T37, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T37 is shown in bold; this coding portion starts at position 348 and ends at position 3596. The transcript also has the following SNPs as listed in Table 54 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 54 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T39. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P30 and TENA_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P30, comprising a first amino acid sequence being at least 90 %> homologous to MGAMTQLLAG LAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWFNHVΥNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHPJNIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIffPdvlMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTRLDAPSQIEVKDVT DTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRR GDMSSNPAKETFTTGLDAPRNLRRVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHA EVDVPKSQQATTKTTLTG conesponding to amino acids 1 - 954 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 954 of HUMTEN PEAJ P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELCISASLSQPALEGP conesponding to amino acids 955 - 970 of HUMTEN PEAJ J"*30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ELCISASLSQPALEGP in HUMTEN_PEA_1_P30.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 55 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMTEN PEAJ P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 56, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 56 - Amino acid mutations
Variant protein HUMTEN_PEA_1_P30 is encoded by the following tianscript(s): HUMTEN PEAJ T39, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T39 is shown in bold; this coding portion starts at position 348 and ends at position 3257. The transcript also has the following SNPs as listed in Table 57 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 57 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P31 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN PEAJ T40. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN_PEA_1_P31 and TENA HUMAN Vl : l.An isolated chimeric polypeptide encoding for HUMTEN_PEA_1_P31, comprising a first amino acid sequence being at least 90 %> homologous to
MGAMTQLLAGVFLAFLALATEGGVLJ KVII^HKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKJ3LAPPSEPSESFQEHTVDGENQIVFTHRINIPRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLVVYTPTHEGGLEMQFRVTGDQTSTIIQELEPGVEYFIRVFAI LENKKSIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMNKEDEG EITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTTTR conesponding to amino acids 1 - 802 of TENA_HUMAN_V1, which also conesponds to amino acids 1 - 802 of HUMTEN PEA _1_P31 , and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EYHL conesponding to amino acids
803 - 806 of HUMTEN PEAJ P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMTEN_PEA_1_P31, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EYHL in HUMTEN PEA _1_P31.
It should be noted that the known protein sequence (TENA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 58 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMTEN_PEA _1_P31 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 59, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 59 - Amino acid mutations
Variant protein HUMTEN_PEA _1_P31 is encoded by the following transcript(s): HUMTEN PEAJ T40, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T40 is shown in bold; this coding portion starts at position 348 and ends at position 2765. The transcript also has the following SNPs as listed in Table 60 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 60 - Nucleic acid SNPs
Variant protein HUMTEN_PEA_1_P32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMTEN_PEA_1_T41. An alignment is given to the known protein (Tenascin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTEN PEAJ J>32 and TENAJTUMANJvT : l.An isolated chimeric polypeptide encoding for HUMTEN PEAJ P32, comprising a first amino acid sequence being at least 90 % homologous to MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVFNHVYNIK LPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVFTHRINJTRRACGCAAAP DVKELLSRLEELENLVSSLREQCTAGAGCCLQPATGRLDTRPFCSGRGNFSTEGCGCVC EPGWKGPNCSEPECPGNCHLRGRCIDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNG VCICFEGYAGADCSREICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRG RCVENECVCDEGFTGEDCSELICPNDCFDRGRCP GTCYCEEGFTGEDCGKPTCPHACH TQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTGADCGELKC PNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCVEGKCVCEQGFKGYDC SDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQCPRDCSNRGLCVDGQCVCEDG FTGPDCAELSCPNDCHGRGRCVNGQCVCHEGFMGKDCKEQRCPSDCHGQGRCVDGQ CICHEGFTGLDCGQHSCPSDCNNLGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTE ETVNLAWDNEMRVTEYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAI LENKKSJTVSARVAT conesponding to amino acids 1 - 710 of TENA HUMAN Vl, which also conesponds to amino acids 1 - 710 of HUMTEN_PEA_1_P32, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CE conesponding to amino acids 711 - 712 of HUMTEN_PEA_1_P32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
It should be noted that the known protein sequence (TENA HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for TENA HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 61 - Changes to TENA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMTEN_PEA_1_P32 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 62, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN J"ΕA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 62 - Amino acid mutations
Variant protein HUMTEN PEAJ P32 is encoded by the following transcript(s): HUMTEN PEAJJ l, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTEN_PEA_1_T41 is shown in bold; this coding portion starts at position 348 and ends at position 2483. The transcript also has the following SNPs as listed in Table 63 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTEN PEAJ P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 63 - Nucleic acid SNPs
As noted above, cluster HUMTEN features 57 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMTEN_PEA_l_node_0 according to the present invention is supported by 15 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1 _T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN .PEAJ T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_2 according to the present mvention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEAJ_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_5 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_ PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39, HUMTEN PEAJ T40 and HUMTEN_PEA_1_T41. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
1087
Segment cluster HUMTEN_PEA_l_node_6 according to the present mvention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1__T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN PEAJJT32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_l 1 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN _PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN J>EAJ_ T39, HUMTEN_PEA_1_T40 and HUMTEN PEAJ T41. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster HUMTEN_PEAJ node J 2 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T41. Table 69 below describes the starting and ending position of this segment on each tianscript. Table 69 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_16 according to the present invention is supported by 27 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HUMTEN ΕAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN PEA _1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39 and HUMTEN_PEA_1_T40. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts
1091
Segment cluster HUMTEN_PEA_l_node_19 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJJM, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1__T37 and HUMTEN_PEA_1_T39. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_23 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMTEN_PEA_1_T39. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_27 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ JM, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN PEAJ JJ, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEAJ_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36 and HUMTEN_PEA_1_T37. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_28 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T37. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_30 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ JM, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN__PEA_1_T17, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35 and HUMTEN_PEA_1_T36. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_32 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN _PEA_1_T6, HUMTEN_PEAJ_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN__PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35 and HUMTEN_PEA_1_T36. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_33 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T36. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster HUMTENJPEA_l_node_35 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T16, HUMTEN_PEA__1_T17, HUMTEN_PEA_1_T20, HUMTEN _PEA_1_T23, HUMTEN_PEA_1_T32 and HUMTEN_PEA_1_T35. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_38 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23 and HUMTEN_PEA_1_T32. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_40 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ JM, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23 and HUMTEN_PEA_1_T32. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_42 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4 and HUMTEN PEAJ T5. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster HUMTEN J»EA_l_node_43 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T5. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_44 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23 and HUMTEN_PEA_1_T32. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_45 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T6. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_46 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23 and HUMTEN_PEA_1_T32. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_47 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans crip t(s): HUMTEN PEAJ T32. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_49 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN PEAJ JT23. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_51 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_56 according to the present invention is supported by 84 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 89 below describes the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_65 according to the present invention is supported by 103 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN PEAJ T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_71 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T17. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_73 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T23. Table 92 below describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_76 according to the present invention is supported by 124 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18 and HUMTEN_PEA_1_T19. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_79 according to the present invention is supported by 139 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19 and HUMTEN_PEA_1_T20. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_83 according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19 and HUMTEN JΕAJ T20. Table 95 below describes the starting and ending position of this segment on each transcript. Table 95 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_89 according to the present invention is supported by 196 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19 and HUMTEN PEAJ T20. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMTEN_PEA_l_node_7 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ Tl, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 97 below describes the starting and ending position of this segment on each transcript. Table 97 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_8 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ TM, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN _PEA_1 JJ, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEAJ_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEA_1_T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 98 below describes the starting and ending position of this segment on each transcript. Table 98 - Segment location on transcripts
Segment cluster HUMTEN PEA l_node_9 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN PEAJ T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEAJ_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEAJ_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN_PEAJ_T39, HUMTEN_PEA_1_T40 and HUMTEN_PEA_1_T41. Table 99 below describes the starting and ending position of this segment on each transcript. Table 99 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_14 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37, HUMTEN PEAJ T39 and HUMTEN PEAJ T40. Table 100 below describes the starting and ending position of this segment on each transcript. Table 100 - Segment location on transcripts
Segment cluster HUMTEN_PEAJ_node_17 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T40. Table 101 below describes the starting and ending position of this segment on each transcript. Table 101 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_21 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37 and HUMTEN J"ΕAJ_T39. Table 102 below describes the starting and ending position of this segment on each transcript. Table 102 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_22 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEAJ_T6, HUMTEN_PEAJ_T7, HUMTEN J>EA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN _PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN_PEA_1_T36, HUMTEN_PEA_1_T37 and HUMTEN PEAJ T39. Table 103 below describes the starting and ending position of this segment on each transcript. Table 103 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_25 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA 1_T5, HUMTEN PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20, HUMTEN_PEA_1_T23, HUMTEN_PEA_1_T32, HUMTEN_PEA_1_T35, HUMTEN JΕAJ T36 and HUMTEN_PEA_1_T37. Table 104 below describes the starting and ending position of this segment on each transcript. Table 104 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_36 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T35. Table 105 below describes the starting and ending position of this segment on each transcript. Table 105 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_53 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEAJ_T23. Table 106 below describes the starting and ending position of this segment on each transcript. Table 106 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_54 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEAJ T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTENJPEAJ J 6, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN PEAJ T20 and HUMTEN PEAJ JT23. Table 107 below describes the starting and ending position of this segment on each transcript. Table 107 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_57 according to the present invention can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1 , HUMTEN_PEA_1_T14, HUMTEN PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN JΕAJ T20 and HUMTEN_PEA_1_T23. Table 108 below describes the starting and ending position of this segment on each transcript. Table 108 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_61 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T11, HUMTEN PEAJJT4, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN PEAJJT23. Table 109 below describes the starting and ending position of this segment on each transcript. Table 109 - Segment location on transcripts
Segment cluster HUMTEN_PEAJ_node_62 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN PEAJ T6, HUMTEN PEAJ T7, HUMTEN_PEA_1_T11, HUMTEN PEAJ T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 1 10 below describes the starting and ending position of this segment on each transcript. Table 110 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_67 according to the present invention is supported by 92 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 1 11 below describes the starting and ending position of this segment on each transcript. Table 111 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_68 according to the present invention is supported by 1 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEAJ_T6, HUMTEN_PEAJ_T7, HUMTEN PEAJJM 1 , HUMTEN PEAJ T14, HUMTEN PEAJ JT6, HUMTEN PEAJ T17, HUMTEN_PEA_1_T18, HUMTEN PEAJ T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 1 12 below describes the starting and ending position of this segment on each transcript. Table 112 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_69 according to the present invention can be found in the following transcπpt(s) HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN PEA J T6, HUMTEN_PEA_1_T7, HUMTEN PEAJ Tl 1 , HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, FIUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23 Table 1 13 below describes the starting and ending position of this segment on each transcript Table 113 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_70 according to the present invention can be found in the following transcπpt(s) HUMTEN_PEA_1_T4, HUMTEN PEAJ T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTENJPEAJ JT 1, HUMTEN PEA 1 T 14, HUMTEN PEA 1 T 16, HUMTEN PEA 1 TIJ HUMTEN_PEA_I_T 18, HUMTEN_PEA_1_T19, HUMTEN_PEA_1_T20 and HUMTEN_PEA_1_T23. Table 1 14 below describes the starting and ending position of this segment on each transcript. Table 114 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_72 according to the present invention is supported by 121 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T1 J HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19 and HUMTEN_PEA_1_T23. Table 115 below describes the starting and ending position of this segment on each transcript. Table 115 - Segment location on transcripts
Segment cluster FIUMTEN_PEA_l_node_84 accordmg to the present invention is supported by 153 libraries The number of libranes was determined as previously described. This segment can be found in the following transcπpt(s) HUMTEN_PEAJ_T4, HUMTEN PEAJ T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN PEAJ Tl 1 , HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1 JT 8, HUMTEN_PEA_1_T19 and HUMTEN_PEA_1_T20 Table 1 16 below describes the starting and ending position of this segment on each transcript Table 116 - Segment location on transcripts
Segment cluster HUMTEN PEA _l_node_85 according to the present invention is supported by 168 libraries The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) HUMTEN JPEAJJT4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEAJ_T7, HUMTEN_PEA_1_T1 1, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEAJ_T18, HUMTEN_PEA_1_T19 and HUMTEN_PEA_1_T20 Table 1 17 below describes the starting and ending position of this segment on each transcript Table 117 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_86 according to the present invention is supported by 179 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN PEA J_T4, HUMTEN_PEA_1_T5, HUMTEN_PEAJ_T6, HUMTEN JPEAJ J, HUMTEN_PEA_1_T1 1, HUMTEN__PEA_1_T14, HUMTEN JΕ A _1_T16, HUMTEN_PEA_1__T17, HUMTEN PEAJ _T18, HUMTEN_PEA_1_T19 and HUMTEN PEA 1 T20. Table 1 18 below describes the starting and ending position of this segment on each transcript. Table 118 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_87 according to the present invention is supported by 167 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEA_1_ TM, HUMTEN_PEA_1_ T5, HUMTEN PEAJ T6, HUMTEN PEA 1 T7, HUMTEN_PEA_1_T1 1 , HUMTEN_PEA_1_T14, HUMTEN PEAJ T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN PEAJ T19 and HUMTEN PEA J_T20. Table 1 19 below describes the starting and ending position of this segment on each transcript. Table 119 - Segment location on transcripts
Segment cluster HUMTEN_PEA_l_node_88 according to the present invention is supported by 164 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTEN_PEAJ_T4, HUMTEN_PEA_1_T5, HUMTEN_PEA_1_T6, HUMTEN_PEA_1_T7, HUMTEN _PEA_1_T11, HUMTEN_PEA_1_T14, HUMTEN_PEA_1_T16, HUMTEN_PEA_1_T17, HUMTEN_PEA_1_T18, HUMTEN_PEA_1_T19 and HUMTEN_PEA_1_T20. Table 120 below describes the starting and ending position of this segment on each transcript. Table 120 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: TENA_HUMAN_V1 Sequence documentation:
Alignment of: HUMTEN PEA 1 P5 x TENA HUMAN VI
Alignment segment 1/1:
Quality: 21611.00 Escore: Matching length: 2201 Total length: 2293 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.99 Total Percent Identity: 95.99 Gaps : 1
Alignment : . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 I II II I II I I II II II II II I I I II II I II II I II II II II I II I I II II 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I M M || || I || I I I M I I I || || I I M I I I I I I I I I I I I I I I I I I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I II I II II I I I I I II I I II I II II I I I I I II II II II I I I I II I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I II I I I I I I I II II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I II I II I I II I I I I I II I I I I I I I II I I I I I I I II I I II I I I I I I I
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I II I II I II I I I II I I I II I I I II I I I I I II I I II I I I I I I I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 II I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I II II
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 . . . . .
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I II II 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I II I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I II I II I I I I I II I I I I I I I I II II I I I I I I II I I I I I II 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I II I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I II I I I II I I I I I I I I I I I I I I I I I I II II II I II I I I I I I I I I II I I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 . . . . . 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I II II I I I I I II I II II I II II I I I II I II I I I I I I I I I II I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I II II I I II I I I II II II I I II II I I I II I I II I II I I I I I II I I II
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I II I I I I I I I I I I I I I II I II I I I II II I I II I I I II II II I I II 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I M I I I I M I II I II I I I I I I II I I I II II I I II I I I II I I I I I I I I II I
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 II I I I II II II I I I I II I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 II I I I I I II I II I I I II I I II I I I I I I I I I I I I I I II I I I II II II I I I I
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 . . . . .
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I I I I I I II I I I I II I I I I I I I I I I II II I I I I II II I I I I I I I II
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 I I I I I I I I I I II I I I I II I I I I I I I II I I I I I I II I II I II II II I I I II 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEPKPQLGTLIFSNITPKSFNMSWT 1550 II I I I I I II II I I I I I II I I II I I I
1501 STDFIVYLSGLAPSIRTKTISATAT 1525 . . . . .
1551 TQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAHITGLVENTGYDVSVA 1600
1525 1525
1601 GTTLAGDPTRPLTAFVITEALPLLENLTISDINPYGFTVSWMASENAFDS 1650 I I II I I I I II I I I I I I I I I I I I I I II II I I II I
1526 TEALPLLENLTISDINPYGFTVSWMASENAFDS 1558
1651 FLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVSGFTQGHQT 1700 I I I I I I I I I I I || I || I I I I I || I I I I I || I I I I I || I I I || I I I I I I I I
1559 FLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVSGFTQGHQT 1608
1701 KPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTK 1750 II II I I II I I I II I II I I I II I I I I I I I I I I I I I I II I I II I I I I I I II I 1609 KPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNFVLKIRDTK 1658
1751 KQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAIATT 1800 I II I II I I I I I I I I I I I I I II II II I I I I I I I I II I I I I I I I I I I I I II I 1659 KQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQTVSAIATT 1708
1801 AMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVD 1850 I I I I II I I I I I II II I II I I I I I II I II I I I I I I II I I I I I I II II I I I I 1709 AMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGGTPSMVTVD 1758
1851 GTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTA 1900 I I I I I I II I II I I II I I I I I II I I I I II I I I I I II I I I I I I I II II I I II
1759 GTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALDGPSGLVTA 1808
1901 NITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE 1950 I I I I I I II I II I I I I I I I I I I I I I II I I I I I I I II I I I II I I II II I I I I 1809 NITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTVEYALTDLE 1858
1951 PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWR 2000 I I I I I I I I I I I I I I I I I II I I I I II II I I I I I I I I I I I II II II I I I I I I 1859 PATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQSETALLTWR 1908 . . . . .
2001 PPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQAL 2050 I I I I I I I I I II I I I II I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I I 1909 PPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTHYTAKIQAL 1958
2051 NGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ 2100 I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I II I 1959 NGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTIYLNGDKAQ 2008
2101 ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGL 2150 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I
2009 ALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDRREEFWLGL 2058
2151 DNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSG 2200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 2059 DNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYKLKVEGYSG 2108 2201 TAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRY 2250 I I I I II I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 2109 TAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCHRVNLMGRY 2158
2251 GDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA 2293 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2159 GDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKRA 2201
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P6 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 15349.00 Escore: 0 Matching length: 1603 Total length: 1603 Matching Percent Similarity: 97.75 Matching Percent Identity: 96.88 Total Percent Similarity: 97.75 Total Percent Identity: 96.88 Gaps: 0 Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I II II I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 II I I I I I II I I I II I I I I II I I I I I II I II I I I II II II I I I I II I I I II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 . . . . . 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I I I II I I I I I I I I I II I I I I I II I II I I I I II I I I I I I I I II I II I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I II I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I I M I I I I I I I I II I I I I I II I II I I I I II I I I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 II I I I I I I I II I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 II I I I I I I I I I I I I I II I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I II I I II I I I I I I I II I I II I I I I I I I I I I II I I I I I I I II I I I I I II 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I II I I I I I I II I I II I I I I I II I I I I I I I I I I I II II I I I I I I I I I I I I I
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I II II I I I I I I I II I I I I I I I II II I I I II I I I I I I I I I II I I I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I II I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 . . . . .
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I II I I II I I I I I II I II I I I I I I I I I I I I I II I II I I II I II I II 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I II II I I I II I I I I I II I I I I I I I II I I I I I I I I I I I I II I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I M M I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I II I I II II I II 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 II I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I II I I I II I I I I I I I I II I II I I I I I I I I I I II II I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 . . . . .
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I II I II I I I I I I I I I I I I I I I I I II I II I I I I I I I I II I I I II I I I I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I II I I I I I I I I II I II I I I I I II I I II I I I I I I I I I II I I II II I I 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I II I I I I I I II I I I II I I I I II II I I I I I I I I I I I II I I I I I I I II I I I I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 . . . . .
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I II II I I I I I I I I I I II II II I I II I I I I I II I I I I II I I I I I I I 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I I I I I I I I I II I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I II
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 . . . . .
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEPKPQLGTLIFSNITPKSFNMSWT 1550 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1551 TQAGLFAKIVINVSDAHSLHESQQFTVSGDAKQAHITGLVENTGYDVSVA 1600 I : : I I : I : I : I I : I I : : : I I : I I : I I : 1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GTT 1603 I I 1601 GFT 1603
Sequence name: TENAJiUMANJ/1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P7 x TENA__HUMAN_V1
Alignment segment 1/1:
Quality: 16042.00 Escore: 0 Matching length: 1617 Total length: 1617 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I II I I II I I II I I I I I II II I II I II I I I I I I II I I I I I II I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I || I M I I I I I I I I I I I || I I I I I I I I || I I M I I I I II I I M I M 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I II I I II I II I II I I I II I I I II I I I I II I I I I I II II I I I I II I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . . 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I II II II I I I I I II II I I II I I I I I I I I I I II I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I i i I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I I I I I I I I I I I M M || I I I I I I I I I I I I M I I I I II I I I II I I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I II I II II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 . . . . .
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II II I I I I
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I II I I I II I I I I I I I I I I I I I I I I I I I I I I II II I I I II I I I I I I I I I I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 M I I I M || I I I II I I || I I I I I I || I I I I || || I I M I I I I I I I II I I I
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 II II I II I I I I II II I I I I I I I I I I I I I I II II II I I I II I II I I I I II I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I II I I I I I I I I I I I I I I II I I I I II I I I I II I I I II I I I II I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 . . . . .
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 II II I I I I I I I I I I I I I II II I I I I II I I II I I I I I II I I I I I I I I I I II 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 II I I I I I I II I I II II I I I I I I I I I I I I II II I II I I I I I I II II I II I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I II I II I I II I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I II I II I I I I I I I I II I II I I I I II II I I I I I I II II I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I II I I I I I I II I II II I I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I II I I I I I II I I II I I I I I II I I I I I I I I II I I I I I I I I I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I II I I II I I II I I II I I I I I II I I I II II I I I II I I I I II I I II I II I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I I I I I I I I I I I I I || I II I I I II II I I I I II I I I II I I I I I I I II
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 II I II I I I I I II I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I I I I I I I I II I II I I II I II I I I I I I I I I I I I II I I I II I I I I I I II
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 . . . . .
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I II I I I I I I I I I I I I I II I
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I II I I I II I I
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I II I I I I
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 II I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 I I I I I II II I I I I I I I I II II I I I I I I I I I I I I I I I I I I II I I I II I I I I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 I II I I I I II I I I I I II I I I I I II I II I I I I II I I II I I I I I I II I I II I I 1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GFTQGHQTKPLRAEIVT 1617 I I I I I I I I II II II I I I 1601 GFTQGHQTKPLRAEIVT 1617
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA__1_P8 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 20743.00 Escore: 0 Matching length: 2110 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.87 Total Percent Identity: 95.87 Gaps : 1
Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 II I I I I I I I I II I I I I I I I I I I II II I II I I II I I I I I I I I I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I II I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I II I I II I II 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 || I I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I || I I M I || I I I I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I II II I II I I I I I I I I I I II I I I II I I II I I II II I I II II I I I I II I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I II II II I II I II II I I I I I II II I I II II I I II II I II II II I I I II I
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I II I I I I I I I I I II II I I I I I II I I I I II I II I II I I II I II II I I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 . . . . .
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I II I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 II II I II I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I II II I I I II I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTEETVNLAWDNEMRVT 650
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I II I I I I I I I I II I I I I I I II II I I II I I I I I I I I I I I I II I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I II I I I I I I I I II I II I II II I I I I I I I I I II I II I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I II I I I I I I I I I I I I II II I I II I I I I I I II I I I I II I I I I I I I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I || M I I I || I I I I I I I I I M M I I I I I I I I I I || I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 . . . . . 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I II I I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I || I I I I I I || I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 II I I I I I II I I I I II II I I I I I I I I I I I I I I I II I I I I I I I I II I I II I I
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I II I I I I I I I I I I I II I II I I II I I I I I I I I I II I I I I I I I I I I I I I I I I
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I II I I I I I I I II II I I I I I I I I I I I I I I II I I I II I I I I I I I I II I II 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || I I M I I I I I II I I I I I
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I I I I I I II I I I I I I II I I II II I I I I I I I I I I I I I I I I I I I I I I I I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 . . . . .
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I II I I I I I I I I I II I I I I I II II I II I II I I I I I II I I I I I I I I I I I I
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 I I II I I II I I II I I I I I II I I I I I I I I II I I I I I I I I I II I II I I I II I I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATAT 1525 I I I I I II I I I I II II I I I I II I I I I
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1525 1525
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1526 TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1559 I I I I I I I I I I I I I II II I I I I I I I II II I I I I I I
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1560 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1609 I I I I I I I I II I I II I I II I I I I I I I I I I I II II I I I I I II I I I I I I I I II 1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1610 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1659 I I I I I || I I I I I I I I I I I I I I I I I I I I I I || || I I I I || I I I I I I I M II
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1660 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1709 I I II I II I I I I I I II I II I I I I I I I II I I I I I I I I I I I I I I I II I I I II I 1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1710 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1759 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1760 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1809 II I I I I I I II I I I I I I I I I II I I I I II I I I I I I I I II II I I II I I I II I I 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1810 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1859 I I I I I I I I I I I I I I I I I II I I II I I I II I M I II I I I I I II I I I II I I I I
1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1860 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1909 I II I II I I I I I I I I I II I I II I I I I I I I I I I II I I II I I I I I I I I I I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1910 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1959 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I II I I I I 2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050 . . . . .
1960 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2009 I I II I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I II I I I II
2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100
2010 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2059 I I I I I I I II I II I I I I I I I I II I I I II I I I I I I I I I I II II I I I I I I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150
2060 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2109 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
2110 A 2110 I 2201 A 2201
Sequence name : TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P10 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 20725.00 Escore: 0 Matching length: 2110 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.87 Total Percent Identity: 95.87 Gaps : 1
Alignment: . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 I I I I I I I I I I II I II I I I I I I I I II I I I I I I I II I I I II I I II I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I II I I I II I I I I II I I I I I II I I I II I II I I I II I I I I I I I I I II I I I THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . . IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
I I I I I I I II I II I I I I I I I I II I I II I I I I I I I I II I II I I I I I II II I I IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I || I I || | | | | | I I I I || I I I I I I I I I I I || || M I I I I I I || I I I CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I II I I I I II I I II I I I I I I I I I I I I I II I I I I I I II I I I II I I I I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I I I M I I II I I || I I I I I I I I I I I I I I I || I || I I || I || I I I II
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I II I I I I I I I I I I I I I I II I I I I I I I I II II I II I I I 11 I I I I I I I II I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I II I I II I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I II I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 . . . . .
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I II I I I I I I I I II 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 II I I II I I I I II I M I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I II I I I II I I II I I I I I I I I I I I I II I I I I II I I I I I II I II I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I II I I I II I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 M I I M I I || I I I I M I I I I I I I I || I I I I I I II I I I I I I I || I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I II I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 II I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I I I I I II I I I I I I I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 . . . . .
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I II I I I I II I I I I I I II I I I I I I I I I II I I I II I I I I I I I I I II I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 1251 VL 1252 I I
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1253 TEDLPQL 1259 II II I II
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1260 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1309 I I I II I II II I I I II I I II II II I II II II I II I II II I I II I I I I I I II
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1310 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1359 I || I I I I || || I I M I I M II || I || I M I I I I II I I I II I I I M I I I M
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1360 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1409 I II I I I I II I II I II I II I II II I II I I I I I I I I I II I I I I I II I II II I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1410 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1459 I II II I II I I I II II I II I I II I I II I I I I I I II II I I I I I I I I I I I I I I
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
14 60 ASENAFDSFLVTWDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1509
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1510 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1559 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1560 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1609 II II II II I I I I I I I I II I I II II II II I II I II I II I II II II I I I I II 1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1610 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1659 I II I I I I I II I I II I I I I I II I I I I I I I I I I I I I II I I II I I II I I I I II 1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1660 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1709 II II I II II I II I I I I I II II I I I II I II I I II I I II I I I I I II I I I I I I 1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1710 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1759 I II I I I I I I I I II I I II II II I I I I I II I II II I II II I II II I II I I II 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1760 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1809 I I I || || || I M || | | I I M I || I M I I I I I M I I I I I || I M I I I I M I
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1810 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1859 I II I I I II I I II I I I II II II I I II I I I I I I I II II I I I I I I I I I I II I I 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1860 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1909 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I ! I I I I I I I I I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1910 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1959 I I I II I II I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II I I I I 2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050
1960 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2009 I I I I I I || I I I I I I I || I I I I I I I I I I I || I I I I II I II I I I I I I I M I I 2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100
2010 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2059 I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150
2060 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2109 I I II I I I I I I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
2110 A 2110
2201 A 2201
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P11 x TENA_HUMAN_V1
Alignment segment 1/1: Quality: 18990.00 Escore: 0 Matching length: 1928 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.60 Total Percent Identity: 87.60 Gaps: 1
Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I I I M I I I I I I I I M I I I I I I I I I II I I I I I I I I I II I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I II I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 . . . . . 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I II 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I II I I I I I I I I I II I I I I II I I I I I I II I I I I I I I II I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I II I I II I I I II I I I I I I II I I I I I II I II II I I I I I I I I I II 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I II I I I I I I I I I I I I I I I II I I I I I I I I I II I II I i I I I I I I I II I II I
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 II I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I II II
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I I I I I I I I I I I I II II I II I II II I I I II II I I I I I II I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I I I I I I I I I M I I I I I I I I || I I I I I I I I I I I I I I I I I II II I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I II I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I II I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 . . . . .
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I II I I I I II I I II I II II II I I I I I I I I I I I I I I I II II I I I I II I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I M || I || I I || I || I II I I I || I I I I I || II I I I || I I I I I I I I I M I I
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I II II I I I I I I I II I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I II II II I I I I I I I II II I I I I I I I I I I I II I I I II II I I I I I II I II I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 . . . . .
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I II I I I I II I I II I I II I I I I I I I I I I I II I I I I I I I I I II I II I
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I I I I I M I I I ! I I I I I I I I I I I I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 II I I I I I I I I I I I II I II I I I I I I I I I I I I I II I I I II I I II I I I I II I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I II II I I I I I I I I II I I I I I I II I I II I I I I I II II I I I I I I I II I II I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQ. 1149 I I I I I I I I I I I I I I I || I I I || I I M I I M I I || I || I I I I I || M I I I
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1149 1149
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1149 1149
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 . . . . .
1149 1149
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1149 1149
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1149 1149
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 1150 GYRTPVLSAEASTAKEPEIGNLNVSDIT 1177 I I I I I I I I I I I I II I I I I II I I I I II I I 1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 . . . . .
1178 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1227 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1228 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1277 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I II I I I I II I II I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1278 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1327 I I I I I I I I I I I I II I I I I I I I I I I I I I I I || I I I I I I I I I II I I I I I I I I
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1328 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1377 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1378 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1427 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 . . . . .
1428 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1477 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1478 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1527 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1528 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1577 I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II II I II I I I I 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1578 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1627 I I II I I I I I I II I I I I I II I I I I I II I I I I I I I I I I I I I I I II I II I I II 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900 . . . . .
1628 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1677 II I I I I I II I I I I I I I I II I I I I I I I I I I I I I I II I I II I I II I I I I I II 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1678 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1727 I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1728 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1777 I I || I I I I I I I I I I I I II I M || I I I M I II I I I I I I I I I II I I II I I II
2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050
1778 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 1827 I I I I I II I I I II II I I I I I I I I I I I I I I I I I I I I I I II I II I I I II I I II 2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100
1828 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 1877 I I I I I I I I I I I I I ! I I M I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150
1878 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 1927 2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
1928 A 1928 I 2201 A 2201
Sequence name: TENAJiUMANJVl
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P13 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 18153.00 Escore: 0 Matching length: 1837 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 83.46 Total Percent Identity: 83.46 Gaps : 1
Alignment : MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
I I II I I I I I I I II I II I I II I II I I I I II I I I I I I II I II I I I I I I I I II MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 . . . . . NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
I II I I I II II II I I I I II I I I I I I I I I I I I I I I I I I I II II I I I I I I II I NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I II I I I I II I II I I II II I I I I II I I I I I II II II I I I I I I I I I I I I I THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 M I II I I I I II II I I I II I I I II I II I I I I I I I II I II I I I I I I I I I I II ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I II I I I I II I IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 . . . . . CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I II II I I I I I I II I I I I I I II II II I I I I I I I I I I I I I II I I I I I II I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I I I I I I I I I I I I II I I II II I II I I I II I II I II II II I I II II 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I II I I II II II I I I I I I II I II I I I II I II I I I I I I I I II I I I II I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I I II I II I I I I I I I II II I II I I I II I I II I I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I I I I I I I II I I 651 EYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 II I I II I II I I I II I I II I I I II I I I I I I II II II II I I I II I I II I I II 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 M I II I II I I I I I I I I I I I II I I I I I I I I I I I II I I I II II I I II I I I II 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I II I I I I I I I II I I I I II I I I I I I I I I I I I I II I I I I II I I I I I II 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 . . . . . 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I I I I II I I I I II 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I II II I II I I I II I II I II I I II I I I I I II I I I I I I II II I I I I I I I I I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I II I I II I II I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 II I I I I I I II II I I I I I I I I II I I I I I I I II I I I I I I I II I I I I I II I II
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I II I II I II I II I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I II 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I I I I || M I I I I I || I I I I I I I I I I I I I I I I I I I || I I II M I I II I
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVV 1343 I I II I I I I I I I I I II II I I I I I II I I I I I I I I I I II I I II I I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1343 1343
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 . . . . .
1343 1343
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1343 1343
1451 PESE'NLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1343 1343
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 1343 1343
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 . . . . .
1343 1343
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1343 1343
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1344 TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1386 I I I I I || I I I I I I I I I I || I I I I I I I I I I I || I I || I I I I I I I
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1387 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1436 I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSII MKGFEESEPVSGSFTTALD 1800
1437 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1486 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I
1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850 . . . . .
1487 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1536 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1537 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1586 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1587 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1636 II II I I I I II I I II I I I I II I I I I I I I M I I II I I I II I I II I I I I I I II
1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1637 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1686 I I I I I I I I I I I I II I II I II II I I I I I I I I I II I I I I I I I I II I I I I I I I 2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050
1687 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 1736 I I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I I I II I I I I I II I I I 2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100
1737 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 1786 I I I I I II I I I I II I I I I I II I II I I II I II I I I I I I I I II I I I I II I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150
1787 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 1836 I || I I I I I I I I M I I II I I I II I I I I I I II I I I I I II I I II I I I II II II
2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
1837 A 1837 I 2201 A 2201
Sequence name : TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P14 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 19930.00 Escore: 0 Matching length: 2025 Total length: 2025 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I II I I I I I I II I I I I I I I II I I I I II I I I I I I I I II I I I I I I II I I I 1 MGA TQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I II I I I I I I II I II I I I I II II I I II I I I I I I I I I I I I II I I I I I II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I M I I I I I I I I I I I I I I I M I I || I I I || I I I I I I I I I I I I I I || I II I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I II II II I I I I I I II I I I I I I I I I I I II II I I I I I I I I I I II I I I I II II 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . . .
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I II II I I I I I II II I I I I I I I I I II I I I I I I II I II II I I I II II I I I II 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I II I I I II I I I I I I I I I I I I I I I I II II I II I I I I I I II I I I I I II I I
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I || I I M M I || I M I I I I I I I || I I II I I I I II || I I I I I I I I
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 II I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I I II I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I I II I I I I II I I I I I II I I II I I I I I I II I I I II I I I II I I II 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 . . . . .
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I I I I I II I I I I I I II I II I I I I II II I I I I I II I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I II I I I II I I I I I I I I I II I I I I I I II I I I II I I I I I I I I II I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I II I I II I I I I I I I I I I I I I I I I I I I I I II I I II II I I I I I II I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I ! I ! I I I I I I I I I I I I I I I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I II I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I II II I I I || II I I || || I I I I II I M M I I I || I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I II I I II I I I I I I I I II I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I II II I II II I I I I I I I II
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 . . . . .
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I II I I I II I
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1151 YRTPVLSAEASTGETPNLGEVWAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I II II II I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 II I I I I I I II I I II I I I I I II II II II I II I I I I I I I I I I II I I II I I I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I II I I II I I I I I I II I II I I I II I I II I II I I I I II I I I I II I I I 1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I || II I M I I I I I I I II I I II I I I II I I I II I I I I II I II M II
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 I I I II I I I I I I I II II II I II II I II I I I I II I II I I I I II II I I II I I I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 . . . . .
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 II I I I I II I I II I I I I I I I I I I I I I I I II II I I I I I I I I I I II M I I II I
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I II I I I I I I I I I I I
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750 II I I I II I I II II I I I I I II II II I I I I I I I I I II II II II I I I I I I I I I
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750 . . . . .
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800 II I II I I I II I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I II II II
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850 I I I I II I II I I I I II I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950 I II I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
2001 YLNGDKAQALEVFCDMTSDGGGWIV 2025 I I II I I I I I I I I I II I I I I I I I I I I
2001 YLNGDKAQALEVFCDMTSDGGGWIV 2025 Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P15 x TENAJiUMANJ/1
Alignment segment 1/1:
Quality: 16391.00 Escore: 0 Matching length: 1655 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.19 Total Percent Identity: 75.19 Gaps : 1
Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I II I II I I I I I I I II I I I I I I I I I I I I II I I I II I I I I I II II II I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I II I II I I II I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I II II I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I I I I I I II I I I I I I I I I II I I II I I I I I II I I I I I I I I I I I I II I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I || M I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 . . . . .
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 . . . . .
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I II II I I I I I I I I I I I I I I I I I II I I I I I I I I II I I II II I II II I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I II I I I I I II I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 . . . . .
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 II I I I I I I I I I II I I II I I I II I I I I I I II I I I I I I I I I I I I I I I I II I I
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II II I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I II I I I I I I I II I I I II I I I II I I I I I I I I I II I I I I I I I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 . . . . .
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I II I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKAS 1070 I I I I I I I I I I I I I I I I I I 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1070 1070
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1070 1070
1151 YRTPVLSAEASTGETPNLGEVWAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1070 1070
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 . . . . .
1070 1070 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1070 1070
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1070 1070
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1070 1070
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 . . . . .
1070 1070
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1070 1070
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1070 1070
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1071 TEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1104 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 1105 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1154 I I I II I II I I I I I II I I I I I I I I II I I I II II II I I I I I I I I I I I I I I II 1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1155 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1204 I II I I II I II I I II II I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I 1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1205 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1254 M || I I I I M I I I I I I I M I I I I I I I I II I I I I I I I I II I I I I I I I I I I I
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1255 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1304 I I II II I I I I I I I I II I I I I I I I I I I I II I I I I I I I I II II I I I I I I I II 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1305 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1354 I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900 . . . . .
1355 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1404 I I I I I I II I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I II I II I I I I I
1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1405 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1454 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1455 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1504 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I i I II I I I I
2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050 1505 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 1554 I I II I I I II I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I I I I I I I I 2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100 . . . . . 1555 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 1604 I II II I I I I I I I I I I I II I I I I II I II II I II I I II I I I I I I I I I I I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150 1605 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 1654 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
1655 A 1655
2201 A 2201
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P16 x TENA_HUMAN_Vl
Alignment segment 1/1: Quality: 15530.00 Escore: 0 Matching length: 1564 Total length: 2201 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 71.06 Total Percent Identity: 71.06 Gaps : 1
Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 . . . . . 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I ! II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I II II I I I I I I I I I I I I I I II I I I I I I I I I II I II I II I I II II I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 . . . . .
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I II II II I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II II 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I II I II I I II I I II II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 M I I I I I I II I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I II II
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I II I I I I I I I I I I I I II I I I I II I II I II I I I I I I I I I I II I I I II I I I I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 II I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I II II I I II I I
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 . . . . .
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I II I II II I I I I I I I II I I I I I I I II I I I I I I II I I I I I I I I II 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I II I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I I I II I I I I I I I I I I I I I II II II II I I II II I I I I I I II I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKAS 1070 I I I I I I I I I I I I I I I I I I II
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1070 1070
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1070 1070
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 . . . . .
1070 1070
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1070 1070
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1070 1070
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1070 1070
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 1070 1070
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1070 1070
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1070 1070
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1070 1070
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1070 1070
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 . . . . .
1070 1070
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1071 TAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1113 I I I II I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1114 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1163 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800 1164 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1213 I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I II I I I II I I I I I II I I I I 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850 . . . . .
1214 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1263 II I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I II I I I I I I I I I 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1264 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1313 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I II I I I II II II I 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950
1314 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 1363 I I || || I I I I I I I I I I I I I I I I I M || I I || I I II I I I I I I II I M M I I
1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
1364 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 1413 II I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I II 2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050
1414 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 1463 I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I II I I II I I I II I I II 2051 REEFWLGLDNLNKITAQGQYELRVDLRDHGETAFAVYDKFSVGDAKTRYK 2100 . . . . .
1464 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 1513 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I II II II I I I I I I I I I 2101 LKVEGYSGTAGDSMAYHNGRSFSTFDKDTDSAITNCALSYKGAFWYRNCH 2150
1514 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 1563 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 2151 RVNLMGRYGDNNHSQGVNWFHWKGHEHSIQFAEMKLRPSNFRNLEGRRKR 2200
1564 A 1564 I 2201 A 2201
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P17 x TENA_HUMAN_V1
Alignment segment 1/1: Quality: 19930.00
Escore: 0 Matching length: 2025 Total length: 2025 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
II I I II I I I II I I I I I I I I I I I I II I I I I I I I I I II II II I I I I I II II I MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I || I I I M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I II I I II I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I I I II I I I ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I II I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 . . . . . ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I II I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II II I I I I ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 . . . . .
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I II I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 II I I I I I I I I II I I I I I I I I I II II II II I I I I I I I I I I I II I I I I I I I I
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I M I || || I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLWTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 . . . . .
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II II I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II I I I II II I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 . . . . . 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I II I I I I I I I I I II I I I I I I I I I I I I I I II I I I I II I I II I II I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 II I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I II I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I II II II I I I I I I I I I I I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I || I I II I II I I I ||
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 . . . . .
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I II I I II I II I I I I 1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 I I I I I I I I I I I I I I I I I II I II II I I I I II I I I II I I I I I I I I I I I II II 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 I I I I I I I I I I I I I || I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750 I I I I II I II I II II I II I I II I I I II I I I II I II I I I II I I II I I I I II I 1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800 . . . . .
1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850 I I I I II I I I I I II I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I 1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900 I I I I I I I I I I I I I I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I I I II 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900
1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950 I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I
1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
2001 YLNGDKAQALEVFCDMTSDGGGWIV 2025 I I I I I I I I I I I I I I I I I I I I I I I I I 2001 YLNGDKAQALEVFCDMTSDGGGWIV 2025
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P20 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 20262.00 Escore: 0 Matching length: 2057 Total length: 2057 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I || I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I I I I I I I II II I I I I I II I I II I I I I I I I I II I I I I I I I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I II I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 . . . . . 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II II II I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I II I I I I I II I I I I I II I I I I I II I I I I I I II I II I I I I I I I II I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I I I I II I I I I I I I II I I II I II I I I I I I I I II I I I I I I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || I I I I I I I I M
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 . . . . .
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I II I II I I
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 II II II I I II I I I II I I I II I I I I I I I I I I I I II I II I I I I I I I I I II II 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 . . . . . 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I II I I I I I I II I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I II I I I II I I I I I I I I II I I II I I I II I I I I I I I I I I I I I I I II I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 . . . . .
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 II I II I I I II I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I II I I I I I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 . . . . .
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I II I I II I I I I I I I I II I I I I I I I I I II I I I I I II I I I I I I II I I I I II I
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I II I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I || I I I I || I I I || || I I
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I II I I II I I I I I I 1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 I I II I I I II I I II II I I I I II II I I I I I I I I I I I I I I I I I I I I II I I II I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 I I I I I I I I I I I I I I || I M I I I I I I II I I I I || I I I || I I I I I I I I I I II
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 I I I II I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 . . . . .
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I
1701 TVSAIATTAMGSPKEVIFSDITENSATVSWRAPTAQVESFRITYVPITGG 1750
1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 1751 TPSMVTVDGTKTQTRLVKLIPGVEYLVSIIAMKGFEESEPVSGSFTTALD 1800
1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850 I I I I I i I I I I I I I I I I I I I i I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I
1801 GPSGLVTANITDSEALARWQPAIATVDSYVISYTGEKVPEITRTVSGNTV 1850
1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1851 EYALTDLEPATEYTLRIFAEKGPQKSSTITAKFTTDLDSPRDLTATEVQS 1900 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950 I I I I I I I I II I I I I II I I I I I I I I I I I II I I I II I I II II I I II' I II I I I 1901 ETALLTWRPPRASVTGYLLVYESVDGTVKEVIVGPDTTSYSLADLSPSTH 1950 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000 I I I I II I I I II I I I I I I II I II II I II I II I I II II I II I II I I I I I I I I 1951 YTAKIQALNGPLRSNMIQTIFTTIGLLYPFPKDCSQAMLNGDTTSGLYTI 2000
2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050 I I I || M I I I || I || M I I I I I I I II I I I I I I I II I I I II I I II I I II II 2001 YLNGDKAQALEVFCDMTSDGGGWIVFLRRKNGRENFYQNWKAYAAGFGDR 2050
2051 REEFWLG 2057 I I I I I I I 2051 REEFWLG 2057
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P26 x TENA_HUMAN_V1
Alignment segment 1/1: Quality: 16903.00
Escore: 0 Matching length: 1708 Total length: 1708 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I II I I II II I II I II I II I I I II I II I I I I I I I I I II I II I I II I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I II I I I I I II I I I I I I I II I I II I I I I I I I I I I I I I II I II I I II I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I II I I I I I I II II I I I I II I I I I I I I I I I II I I I I I I I I I I II I I I II I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I I I I I I I II I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 II I II I I I I I I II I I I I II I I I I I I I I II I I II II II I I I I I I I I I I I II 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 II I I I I I I I II I II I I I II I I I I I I I I II I I I II I II I I I I I I I I I I I II 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I I I I II I II I I I I I I I I I I I I I I I II I I II II I II I I I I I II II I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II II II I I II I I I II I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I II I I I II I II I II I I II I II I I I I I I I I II I I II I II II I I I II I I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I II I I I I II II I II II I I II I II I II I I I I I I I I I I I I I II I I I I I I II 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I II I I I I I I II I I I I I I I I I I I II I I I I II I I I I I I I I I I I II II I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I II I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I II I I I I I I I I I I I I I II I I II I I I I I II I II I I I I II I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 . . . . . 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 120 Ϊ
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II II I I I I I I I I II I I I I I 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 . . . . .
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I II I I I I I I I I I I I I I I I I I I I I I II I I I II I II I I I I II I I I I I I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I I I I I I
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I II I I I I I I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300
1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 I I I I II I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I II I . 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVTEDLPQL 1350 . . . . .
1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I 1351 GDLAVSEVGWDGLRLNWTAADNAYEHFVIQVQEVNKVEAAQNLTLPGSLR 1400
1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1401 AVDIPGLEAATPYRVSIYGVIRGYRTPVLSAEASTAKEPEIGNLNVSDIT 1450
1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500 I I II I I I II II I II II I I I II I I I I II I I I I I II I II I II I I I II I I I I I 1451 PESFNLSWMATDGIFETFTIEIIDSNRLLETVEYNISGAERTAHISGLPP 1500
1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 II I I I I II II I II I I I I I I I I I II I I I I I I I I I I II I I I II II I I I I I I I 1501 STDFIVYLSGLAPSIRTKTISATATTEALPLLENLTISDINPYGFTVSWM 1550 . . . . .
1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600 II I I I I II I I I I I II I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 1551 ASENAFDSFLVTVVDSGKLLDPQEFTLSGTQRKLELRGLITGIGYEVMVS 1600
1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650 I II I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I II I I I I I I I I I I 1601 GFTQGHQTKPLRAEIVTEAEPEVDNLLVSDATPDGFRLSWTADEGVFDNF 1650
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700 I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I II II I I I II II I I I I I I I
1651 VLKIRDTKKQSEPLEITLLAPERTRDLTGLREATEYEIELYGISKGRRSQ 1700
1701 TVSAIATT 1708 I I I I I I I I 1701 TVSAIATT 1708
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA__1_P27 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 13445.00 Escore: 0 Matching length: 1344 Total length: 1344 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I II I I II I I I I I I I I II I I I I II I I I I I I I I I I II I I I I I I I I I I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I II I I I II I I I I II I I II II I I I I I I II II I I I I I II I I I I I I I I I I I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . .
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I I I I I I II I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I II I I I I I I I I II I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I II I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I M I I
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I II II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 . . . . .
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I II I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I II I I I II I I II I I I I II I I II I I II I II I I I I I I II I II 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I II I II I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I II I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I I I I I I I I I II I I I II II I I I I I I I I I I I I I I I I I I I I I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I II I I I I I I I I II I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I I I I || I I I I I I II I I I I I II I II I I II I I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I II I I I II I I I I I I I I I I I I I I r I I I I I I I I I I I I I II I I I I I I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 . . . . .
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I
1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1251 VLTEEVPDMGNLTVTEVSWDALRLNWTTPDGTYDQFTIQVQEADQVEEAH 1300 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVT 1344 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I II I I I I I 1301 NLTVPGSLRSMEIPGLRAGTPYTVTLHGEVRGHSTRPLAVEVVT 1344
Sequence name: TENA_HUMAN VI
Sequence documentation:
Alignment of: HUMTEN PEA 1 P28 x TENA HUMAN VI
Alignment segment 1/1
Quality: 12559.00 Escore: 0 Matching length: 1253 Total length: 1253 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 I I I I I I I I I I I I I I II I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
I I I II I I I II I I II I II I I I II I I I II I I I I I I II I I I I I I I I I II I I I I THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
I II I I I I I I I I I I II I I I II I I I I I I I I I I I I I I II I I I II II I I I I II I ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I II I I I I I I I I I I I I I I I I I II I I I II I II II I I I I II I II II I I I IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
I I I I I I II I I I I II I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I II HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I II I I I I I I I I I I I I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I || I I I I I || I I I I I I I I I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I I I I I I I I I I II I I I I I I I I I I I II I I I II I I I I I I I I I II I I I I I I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 . . . . .
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I II I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I II I II II I I I I I I I I II I I I II I I I I I I I I I I I I I I II I I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I II II I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 II I I I I I I I I I I I I I I I I || I I I II M I I I I I I I I I I I II I II I I I I I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I II I II I I I I I I II I I I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 I I I I I I I I I II II I I I I I I I I I I II I I I I II II I I I I I I II I I II II I I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 . . . . .
1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I II 1051 VLLTAEKGRHKSKPARVKASTEQAPELENLTVTEVGWDGLRLNWTAADQA 1100
1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150 I I I I I I I I I I II I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1101 YEHFIIQVQEANKVEAARNLTVPGSLRAVDIPGLKAATPYTVSIYGVIQG 1150
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I II I I I I I I I I I I I I I I I
1151 YRTPVLSAEASTGETPNLGEVVVAEVGWDALKLNWTAPEGAYEYFFIQVQ 1200 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250 I I I I I I I I I I I I I I II II II I I I I I I I I I I I II I I I I I II I I I II I I I II 1201 EADTVEAAQNLTVPGGLRSTDLPGLKAATHYTITIRGVTQDFSTTPLSVE 1250
1251 VLT 1253 I I I 1251 VLT 1253
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P29 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 10822.00 Escore: 0 Matching length: 1071 Total length: 1071 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I I I I || || I II I I I I I I I I I I I I || I I I I I I I I I I I I M I I I I I I I I M I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I I I I I II I II I I I I II I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 . . . . . 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I I II I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 I I I I II I I I I I I II I I I II I I I II I I I I I I I I I I I I II I I I I I II I I I I I 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 I II I I I I I I I I I II I I I I I I I I I I I II II I I I I I I II I I I I I I I I I I I I I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 || I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I || || I II I I M I I I I I
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I II I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I II I I 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 II I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 . . . . .
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I I I M I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I II I I I I I I II I I I I I I I I I I II I I I I II I I I I I I I I I II I I I I II I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 . . . . . 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I II I I 801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I II I II II I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I II II I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I M I I M I I I || I I I I I M I I I I I I I I I M I I I I I M I I I I II I I I 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000 I II II I I II I I II I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I 951 TLTGLRPGTEYGIGVSAVKEDKESNPATINAATELDTPKDLQVSETAETS 1000
1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050 II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I 1001 LTLLWKTPLAKFDRYRLNYSLPTGQWVGVQLPRNTTSYVLRGLEPGQEYN 1050
1051 VLLTAEKGRHKSKPARVKAST 1071 I I I II I I I I I I II I I I I I I I I 1051 VLLTAEKGRHKSKPARVKAST 1071 Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN__PEA_1_P30 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 9694.00 Escore: 0 Matching length: 954 Total length: 954 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I I I I I I II II I I I I I II I I I I I I II II I II I I I I II I I I I I I II I II I I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I I I I II I I II I I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I II I I I 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I II I I I I II I I I I II I I I II I I I I I I I I I I I I I I I I I I II I I I I II I I II
201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 II I I I I I I II I II I I I I I I I I II I I I II I I II I I I I II II II I I I I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 . . . . .
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 I I I II I I II I I II I I I I II I I I I I I II I I I I II I II I II I I I I I I I I I II 501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 . . . . .
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 I I I I II I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I II I I
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 II I I I I II I I I I II I I I II I I I I I I I I I I I I II I I II II I I I I I II I I I I
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 I I I I I I I I I I I M I I I I I M I I I I I I M I I II I I II II I I I I I I I I I I I I
651 EYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I I I I I I I I II I II I I I I I I I I I II I I I I I I II I I I I I I I II I I II II I I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 . . . . .
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
801 TRLDAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTID 850
851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 LTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFTTGLDAPRNLR 900
901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II 901 RVSQTDNSITLEWRNGKAAIDSYRIKYAPISGGDHAEVDVPKSQQATTKT 950
951 TLTG 954 MM 951 TLTG 954
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA__1_P31 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 8236.00 Escore: 0 Matching length: 802 Total length: 802 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50 I II II II II I I II I II I I II II II I I II I II I I II II II II II II I I II I 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 M M I I I I I M I II II II I I I I I I I I I I II II I II II I I I II I I II II II 51 NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 I I II II II I II II II II I II II II I I II I II I II II II II II II II I II I 101 THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 I II I I I II II I II I II I I II II I I I II I I I I II II I I II II II I II I I II 151 ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . . 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 I II II I I I II I II II I II I I I I II I I I I II I I I II II II II II II I II I I 201 IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300 I I II II I I II I I II II II I I II II I I I II II I I II I I I II II I II I I I I I 251 ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I I I II II II I I I I I I I I I I I I II I I I II I I II I I I I I I I I I II I II I I I I 301 CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 II II II II II II II II I I II I II I II I II II I I II II II I I II I II II II 351 HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 . . . . .
401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 II I II I II II II II I I I II I I I I I II II I I II II II II I II I II II II II 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 I I II I I II II I II II II II II II II II I I II II II I I II II II II I II II
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 II I I II II I I I II I II II II I II II I I II I I I I I I II II II I I I II I II I
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 II II I II I I II I II I I I I I I I II II I I II I I II II I I I I II I I I I I II I I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 I I I I I II II I I I I I I II I II I II I I I I II I II II II II I I I I I I II I II I
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 . . . . .
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 II I I II I I I I II II II II II I I I II I II I I I I I II II I I II I II I II II I
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750 I I II I I I I I I I I II I I II II I I II I II I I I II I I I I I I I I I I I I I I I II I 701 SIPVSARVATYLPAPEGLKFKSIKETSVEVEWDPLDIAFETWEIIFRNMN 750
751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800 II II I II I I I I II II II II II II I II II II II II I II II II II I I II II I 751 KEDEGEITKSLRRPETSYRQTGLAPGQEYEISLHIVKNNTRGPGLKRVTT 800
801 TR 802 I I 801 TR 802
Sequence name: TENA_HUMAN_V1
Sequence documentation:
Alignment of: HUMTEN_PEA_1_P32 x TENA_HUMAN_V1
Alignment segment 1/1:
Quality: 7332.00 Escore: 0 Matching length: 710 Total length: 710 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPWF 50 II II II I II I I II I II II I I II I I I II II II I II II II II I II II II II I MGAMTQLLAGVFLAFLALATEGGVLKKVIRHKRQSGVNATLPEENQPVVF 50
NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100 M II II II I II II II II I II II II II II I I II II II I II II II II I II II NHVYNIKLPVGSQCSVDLESASGEKDLAPPSEPSESFQEHTVDGENQIVF 100
THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150 II II II II I II II II II II I II II II II I II I I II II II I II II II II II THRINIPRRACGCAAAPDVKELLSRLEELENLVSSLREQCTAGAGCCLQP 150
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200
ATGRLDTRPFCSGRGNFSTEGCGCVCEPGWKGPNCSEPECPGNCHLRGRC 200 . . . . . IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
I II II I II I II II II I II II II II II II I I II II II I II II II II II I II IDGQCICDDGFTGEDCSQLACPSDCNDQGKCVNGVCICFEGYAGADCSRE 250
ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
I I I I II I I II II II I I I I II II I II II I II I I II II I II I I I I I II II II ICPVPCSEEHGTCVDGLCVCHDGFAGDDCNKPLCLNNCYNRGRCVENECV 300
CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350 I || || I I M II II II I I I II I II II I II II II II II I I I II II II II I I I CDEGFTGEDCSELICPNDCFDRGRCINGTCYCEEGFTGEDCGKPTCPHAC 350
HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400
I II II I II I I II I I I I I I I II II I I II II I I I I I I II I II I I II I II I I I HTQGRCEEGQCVCDEGFAGVDCSEKRCPADCHNRGRCVDGRCECDDGFTG 400 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450 II I II II II II II II I I I II II II II II II I I I I II II II I I I II I I II I 401 ADCGELKCPNGCSGHGRCVNGQCVCDEGYTGEDCSQLRCPNDCHSRGRCV 450
451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500 II I II II II II II II I II II II I II I I II II I II II II II II II II I II I 451 EGKCVCEQGFKGYDCSDMSCPNDCHQHGRCVNGMCVCDDGYTGEDCRDRQ 500
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550 M I II I I II II II I II I I I II II I I I II II II II II II II II II I I I I II
501 CPRDCSNRGLCVDGQCVCEDGFTGPDCAELSCPNDCHGRGRCVNGQCVCH 550
551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600 II I I II II II II II II II I I I II I II I I I I II I II II II I II II I II II I 551 EGFMGKDCKEQRCPSDCHGQGRCVDGQCICHEGFTGLDCGQHSCPSDCNN 600
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 II I I II II I I I II I I I II I I II I I II I I I I II I I II I I I I II I I II II I I
601 LGQCVSGRCICNEGYSGEDCSEVSPPKDLVVTEVTEETVNLAWDNEMRVT 650 . . . . .
651 EYLWYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700 II I II II I II II I II I II II II II I II II II I I I II I I I II I II II I I I I
651 EYLVVYTPTHEGGLEMQFRVPGDQTSTIIQELEPGVEYFIRVFAILENKK 700
701 SIPVSARVAT 710 I I I II I I I II 701 SIPVSARVAT 710 1230 DESCRIPTION FOR CLUSTER HUMOSTRO
Cluster FIUMOSTRO features 3 transcπpt(s) and 30 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein variants are given in table 3. Table 1 - Transcripts of interest
7α6/e 2 - Segments of interest
1231
Table 3 - Proteins of interest
These sequences are variants of the known protein Osteopontin precursor (SwissProt accession identifier OSTP HUMAN; known also according to the synonyms Bone sialoprotein 1 ; Urinary stone protem; Secreted phosphoprotein 1 ; SPP- 1 ; Nephropontin; Uropontin), SEQ ID NO: 310, refened to herein as the previously known protein. Protein Osteopontin precursor is known or believed to have the following function(s): binds tightly to hydroxyapatite. Appears to form an integral part of the mineralized matrix. 1232 Probably important to cell- atrix interaction. Acts as a cytokine involved in enhancing production of interferon-gamma and interleukin- 12 and reducing production of interleukin- 10 and is essential in the pathway that leads to type I immunity (By similarity). The sequence for protein Osteopontin precursor is given at the end of the application, as "Osteopontin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Osteopontin precursor localization is believed to be Secreted. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Regeneration, bone. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Bone formation stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the dmg database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Musculoskeletal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ossification; anti-apoptosis; inflammatory response; cell- matrix adhesion; cell-cell signaling, which are annotation(s) related to Biological Process; defense/immunity protein; cytokine; integrin ligand; protein binding; growth factor; apoptosis inhibitor, which are annotation(s) related to Molecular Function; and extracellular matrix, which are annotation(s) related to Cellular Component. 1233 The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HUMOSTRO can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in noπnal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 38 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 38 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues, lung malignant tumors, breast malignant tumors, ovarian carcinoma and skin malignancies.
Table 5 - Normal tissue distribution
adrenal bladder bone 897 brain 506 colon 69 epithelial 548 general 484 head and neck 50 kidney 5618 liver lung 10 1234
Table 6 - P values and ratios for expression in cancerous tissue 'Namejpf a issue V* adrenal 1.5e-01 2.1e-01 2.0e-02 4.6 4.4e-02 3.6 bladder 1.2e-01 9.2e-02 5Je-02 4.1 2.1e-02 4.3 bone 4.9e-01 7.4e-01 4.1e-06 0.6 5.4e-01 0.4 brain 6.6e-01 7.0e-01 3.2e-01 0.6 0.4 colon 2Je-01 4.0e-01 3.1e-01 1.5 5.2e-01 1.1 epithelial 2.0e-07 1.6e-03 9.8e-01 0J 0.5 general 1.2e-06 1.2e-02 7.9e-01 0.8 0.6 head and neck 3.4e-01 5.0e-01 0J 0J kidney 6.8e-01 7.4e-01 0.2 0.1 liver 3.3e-01 2.5e-01 1.8 2.3e-01 2.6 lung 4.3e-04 4.6e-03 2.1e-30 15.0 2.8e-27 23.5 lymph nodes 6Je-01 8Je-01 8.1e-01 0.7 9.9e-01 0.3 breast 2.3e-01 3.0e-01 1.9e-04 6.2 4.1e-03 4.3 bone manow 7.5e-01 7.8e-01 0.3 2.0e-02 1.2 muscle 4.0e-02 7.5e-02 l.le-01 4.6 5.1e-01 TT 1235
As noted above, cluster HUMOSTRO features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Osteopontin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMOSTRO_PEA_l_PEA l_P21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOSTRO_PEA_l_PEA_l_T14. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO_PEA_l_PEA_l_P21 and OSTPJHUMAN: l.An isolated chimeric polypeptide encoding for HUMOSTRO_PEA_l_PEA_l_P21, comprising a first amino acid sequence being at least 90 %> homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ conesponding to amino acids 1 - 58 of OSTP HUMAN, which also conesponds to amino acids 1 - 58 of HUMOSTRO_PEA_l_PEA_l_P21, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VFLNFS conesponding to amino acids 59 - 64 of HUMOSTRO_PEA_l_PEA_l_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMOSTRO_PEA_l_PEA_l_P21, comprising a polypeptide being at least 70%, optionally at least about 80%o, preferably at least 1236 about 85%, moie preferably at least about 90% and most preferably at least about 95% homologous to the sequence VFLNFS in HUMOSTRO_PEAJ_PEA_l_P21
The location of the vanant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMOSTRO_PEA_l_PEA_l_P21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their positιon(s) on the ammo acid sequence, with the alternative amino acιd(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO_PEA_l_PEA_l_P21 sequence provides support for the deduced sequence of this vanant protein according to the present mvention). Table 7 - Amino acid mutations
The glycosylation sites of variant protein HUMOSTRO_PEA_l_PEA_l_P21, as compared to the known protein Osteopontin precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the vanant protein). Table 8 - Glycosylation site(s) 1237
Variant protein HUMOSTRO_PEAJ_PEAJ_P21 is encoded by the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript FIUMOSTRO PEA J PEAJ T14 is shown in bold; this coding portion starts at position 199 and ends at position 390. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO_PEAJ_PEA_l_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
1238
1239
Variant protein HUMOSTRO_PEAJ_PEAJ_P25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOSTROJPEA_l_PEAJ JT6. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO_PEA_l_PEA_l_P25 and OSTP HUMAN: 124 0 l .An isolated chimeric polypeptide encoding for HUMOSTRO_PEA_l_PEAJ_P25, comprising a first amino acid sequence being at least 90 % homologous to MR1AVICFCLLGITCAIPVKQADSGSSEEKQ conesponding to amino acids 1 - 31 of OSTPJJUMAN, which also conesponds to amino acids 1 - 31 of HUMOSTRO_PEA_l_PEA_l_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence H conesponding to amino acids 32 - 32 of HUMOSTRO_PEA_l JPEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaFpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMOSTRO_PEA_l_PEAJ_P25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO_PEA_l_PEA_l_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
The glycosylation sites of variant protein HUMOSTRO_PEA _1_PEA_1_P25, as compared to the known protein Osteopontin precursor, are described in Table 11 (given 124 1 according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein HUMOSTRO_PEA_l_PEA_l_P25 is encoded by the following transcript(s): HUMOSTRO_PEA_l_PEAJ_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMOSTRO_PEA l_PEA l_T16 is shown in bold; this coding portion starts at position 199 and ends at position 294. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO_PEAJ_PEA_l_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
1242
1243
an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOSTRO_PEA_1_PEAJ_T30. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the 124 4 relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO_PEAJ_PEA_1_P30 and OSTP HUMAN: l .An isolated chimeric polypeptide encoding for HUMOSTRO_PEAJ_PEA_1_P30, comprising a first amino acid sequence being at least 90 % homologous to
MRIAV1CFCLLGITCAIPVKQADSGSSEEKQ conesponding to amino acids 1 - 31 of
OSTPJHUMAN, which also conesponds to amino acids 1 - 31 of
HUMOSTRO_PEAJ_PEAJ_P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI conesponding to amino acids 32 - 39 of HUMOSTRO_PEA_1_PEA_1_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMOSTRO_PEA_1_PEA_1_P30, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VSIFYVFI in HUMOSTRO JΕAJ _ PEAJ_P30.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMOSTRO_PEA_1_PEA_1_P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO _PEA_1_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations 1245
The glycosylation sites of variant protem HUMOSTRO_PEA_1_PEAJ_P30, as compared to the known protein Osteopontin precursor, are described in Table 14 (given according to their positιon(s) on the ammo acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the vanant protein). Table 14 - Glycosylation sιte(s)
Variant protein HUMOSTRO_PEAJ_PEA_1_P30 is encoded by the following transcπpt(s): HUMOSTRO_PEA_1_PEA_1_T30, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMOSTRO_PEA_1_PEA_1_T30 is shown in bold; this coding portion starts at position 199 and ends at position 315. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO_PEA_1_PEAJ_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). 7 We 15 - Nucleic acid SNPs
124 6
As noted above, cluster HUMOSTRO features 30 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMOSTRO_PEA_l_PEAJ_node_0 according to the present invention is supported by 333 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JΕAJ PEA J T14, HUMOSTRO PEAJ PEAJ T 16 and HUMOSTRO_PEA_1_PEAJ_T30. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMOSTRO PEAJ PEAJ nodeJO according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T16. Table 17 below describes the starting and ending position of this segment on each transcript. 124 7 Table 17 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_16 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEAJ_T14. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_23 according to the present invention is supported by 334 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEAJ_PEA_l_T16. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
1 24 8 Segment cluster I IUMOSTRO_PEA_l_PEAJ_node_31 according to the present invention is supported by 350 libraries The number of libraries was determined as previously described This segment can be found in the following transcnpt(s) HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16 Table 20 below describes the starting and ending position of this segment on each transcript Table 20 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_43 according to the present mvention is supported by 192 libranes The number of libranes was determined as previously descnbed This segment can be found in the following transcnpt(s) HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16 Table 21 below describes the starting and ending position of this segment on each transcript Table 21 - Segment location on transcripts
to the above cluster are also provided These segments are up to about 120 bp in length, and so are included in a separate descπption
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_3 according to the present mvention is supported by 353 libraries. The number of libranes was determined as previously descπbed This segment can be found in the following transcπpt(s) HUMOSTRO_PEA_l_PEA_l_T14, HUMOSTRO_PEA_l PEA 1 T16 and 124 9 HUMOSTRO_PEA_1_PEAJ_T30. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_5 according to the present invention is supported by 353 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14, HUMOSTRO_PEA_l_PEA_l_T16 and HUMOSTRO_PEA_1_PEA_1_T30. Table 23 below describes the starting and ending position of this segment on each transcript. 7αWe 23 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_7 according to the present invention is supported by 357 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14, HUMOSTRO_PEA_l_PEA_l_T16 and HUMOSTRO_PEA_1_PEA_1_T30. Table 24 below describes the starting and ending position of this segment on each transcript. 1250 Table 24 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEAJ_node_8 according to the present invention is supported by 1 library The number of libranes was determined as previously descnbed This segment can be found m the following transcπpt(s) HUMOSTRO_PEA_1_PEAJ_T30 Table 25 below describes the starting and ending position of this segment on each transcript Table 25 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_15 according to the present invention is supported by 366 libranes The number of libraries was determined as previously descnbed This segment can be found in the following transcnpt(s) HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16 Table 26 below describes the starting and ending position of this segment on each transcπpt Table 26 - Segment location on transcripts
1251
Segment cluster HUMOSTROJΕAJ _PEA_l_node_17 according to the present invention is supported by 261 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTROJΕAJ _PEAJ_T 16. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_20 according to the present invention can be found in the following transcript(s): HUMOSTRO PEAJ PEAJ T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_21 according to the present invention is supported by 315 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJΕAJ _PEA_1_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 29 below describes the starting and ending position of this segment on each transcript. 1252
Segment cluster HUMOSTROJΕAJ _PEA_l_node_22 according to the present invention is supported by 322 libraries. The number of libraries was detenmned as previously described. This segment can be found in the following transcπpt(s): HUMOSTROJΕAJ _PEA_1_T 14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_24 according to the present invention is supported by 270 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_Tl 6. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
1253
Segment cluster HUMOSTROJΕAJ PEAJ _node_26 according to the present invention can be found in the following transcript(s): FIUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_27 according to the present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l JT4 and HUMOSTRO_PEA_l_PEA_l_T16. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_28 according to the present invention is supported by 273 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 34 below describes the starting and ending position of this segment on each transcript. 1254 Table 34 - Segment location on transcripts
Segment cluster HUMOSTROJΕAJ _PEA_l_node_29 according to the present invention is supported by 272 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJΕAJ _PEA_1_T14 and HUMOSTRO_PEA_l_PEAJ_T16. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_30 according to the present invention can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l J 6. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
1255
Segment cluster HUMOSTROJΕAJ _PEA_l_node_32 according to the present invention is supported by 293 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJΕAJ _PEAJ_T 14 and HUMOSTROJΕAJ _PEA_1_T 16. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMOSTRO PEA J PEA _l_node_34 according to the present invention is supported by 301 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJΕAJ _PEA_1_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l_PEA_l_node_36 according to the present invention is supported by 292 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTROJΕAJ _PEA_1_T16. Table 39 below describes the starting and ending position of this segment on each transcript. 1256 Table 39 - Segment location on transcripts
Segment cluster HUMOSTROJΕAJ _PEA_l_node_37 according to the present invention is supported by 295 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEAJ_PEA_l_T14 and HUMOSTROJΕAJ _PEAJ_T16. Table 40 below describes the starting and ending position of this segment on each transcript. 7 6/e 40 - Segment location on transcripts
Segment cluster HUMOSTROJΕAJ _PEA_l_node_38 according to the present invention can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
1257
Segment cluster HUMOSTRO_PEAJ_PEAJ_node_39 according to the present invention is supported by 268 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO_PEA_l_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HUMOSTRO PEA J PEA l_node_40 according to the present invention can be found in the following transcript(s): HUMOSTRO_PEAJ_PEA_l_T14 and HUMOSTRO_PEAJ_PEA_l_T16. Table 43 below describes the starting and ending position of this segment on each tianscript. Table 43 - Segment location on transcripts
Segment cluster HUMOSTROJΕAJ _PEA_l_node_41 according to the present invention can be found in the following transcript(s): HUMOSTRO_PEAJ_PEA_l_T14 and HUMOSTRO_PEA_l_PEA_l_T16. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts 1258
Segment cluster HUMOSTRO PEA J PEA J_node_42 according to the present invention is supported by 224 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJΕAJ _PEA_1_T14 and HUMOSTRO_PEA_l _PEA_1 J 6. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: OSTPJiUMAN
Sequence documentation: Alignment of: HUMOSTRO_PEA_l_PEA_l_P21 x OSTPJiUMAN Alignment segment 1/1:
Quality: 578.00 Escore : Matching length: 58 Total length: 58 1259 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQ 50 || M II I II II II II II II II I II I II II II II II I I II I II II II I II I 1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQ 50
51 KQNLLAPQ 58 II II II II 51 KQNLLAPQ 58
Sequence name: OSTPJiUMAN
Sequence documentation:
Alignment of: HUMOSTRO_PEA_l_PEA_l_P25 x OSTPJiUMAN
Alignment segment 1/1:
Quality: 301.00 Escore: 0 Matching length: 31 Total length: 31 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 1260 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31 II II I II I I II II I II II II I II II II II II 1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31
Sequence name: OSTPJiUMAN
Sequence documentation:
Alignment of: HUMOSTRO_PEA_1_PEA_1_P30 x OSTPJiUMAN
Alignment segment 1/1:
Quality: 301.00 Escore: 0 Matching length: 31 Total length: 31 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . 1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31 1261 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31
DESCRIPTION FOR CLUSTER T46984 Cluster T46984 features 21 transcript(s) and 49 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
1262
Table 2 - Segments of interest
1263
Table 3 - Proteins of interest
1264
These sequences are variants of the known protein Dolichyl-diphosphooligosaccharide-- protein glycosyltransferase 63 kDa subunit precursor (SwissProt accession identifier RIB2 HUMAN; known also according to the synonyms EC 2.4.1.119; Ribophorin II; RPN-II; RIBIIR), SEQ ID NO: 384, refened to herein as the previously known protein. Protein Dolichyl-diphosphooligosaccharide- -protein glycosyltransferase 63 kDa subunit precursor is known or believed to have the following function(s): Essential subunit of N- oligosaccharyl transferase enzyme which catalyzes the transfer of a high mannose oligosaccharide from a lipid- linked oligosaccharide donor to an asparagine residue within an Asn-X-Ser/Thr consensus motif in nascent polypeptide chains. The sequence for protein Dolichyl-diphosphooligosaccharide—protein glycosyltransferase 63 kDa subunit precursor is given at the end of the application, as "Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein 1265
Protein Dolichyl-diphosphoohgosaccharide— protein glycosyltransferase 63 kDa subunit precursor localization is believed to be Type I membrane protein. Endoplasmic reticulum.
The following GO Annotatιon(s) apply to the previously known protein. The following annotation(s) were found: protein modification, which are annotation(s) related to Biological Process; oligosaccharyl transferase; dohchyl-diphosphooligosaccharide-protein glycosyltransferase; transferase, which are annotation(s) related to Molecular Function; and oligosaccharyl transferase complex; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T46984 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 39 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 39 and Table 5. This cluster is overexpressed (at least at a minimum level) in the 1266 following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues, breast malignant tumors, ovarian carcinoma and pancreas carcinoma.
Table 5 - Normal tissue distribution
1267 Table 6 - P values and ratios for expression in cancerous tissue
above. These transcript(s) encode for protein(s) which are variant(s) of protein Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor. A description of each variant protein according to the present invention is now provided. 1268 Variant protein T46984_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T2. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P2 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEAJ_P2, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALT1IASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQ1YHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYH VPV VVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKTEFDS ASGTYTLYLIIGDATLKNPILWNV conesponding to amino acids 1 - 498 of RIB2 HUM AN, which also conesponds to amino acids 1 - 498 of T46984_PEA_1_P2, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCA conesponding to amino acids 499 - 501 of T46984 PEAJ P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. 1269
The glycosylation sites of variant protein T46984_PEAJ_P2, as compared to the known protein Dolichyl-diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein T46984_PEA_1_P2 is encoded by the following transcript(s): T46984 PEAJ T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984_PEA_1_T2 is shown in bold; this coding portion starts at position 316 and ends at position 1818. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEAJ_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
1270
1271
Variant protein T46984_PEA_1_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T3. An alignment is given to the known protein (Dolichyl- 1272 diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P3 and RIB2_HUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQ conesponding to amino acids 1 - 433 of RIB2 HUMAN, which also conesponds to amino acids 1 - 433 of T46984_PEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP conesponding to amino acids 434 - 444 of T46984_PEA_1_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984 PEAJ P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence ICHIWKLIFLP in T46984_PEA_1_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 1273 prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984_PEA_1_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indie ates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEAJ_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
1274 The glycosylation sites of variant protein T46984_PEA_1_P3, as compared to the known protein DoIιchyl-dιphosphoolιgosacchaπde--proteιn glycosyltransferase 63 kDa subunit precursor, are described in Table 10 (given according to their posιtιon(s) on the amino acid sequence in the first column, the second column indicates whether the glycosylation site is present in the vanant protein, and the last column indicates whether the position is different on the variant protein) Table 10 - Glycosylation sιte(s)
Variant protem T46984_PEA_1_P3 is encoded by the following transcπpt(s) T46984_PEA_1_T3, for which the sequence(s) is/are given at the end of the application The coding portion of transcπpt T46984_PEA_1_T3 is shown in bold, this coding portion starts at position 316 and ends at position 1647 The transcπpt also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein T46984 PEAJ P3 sequence provides support for the deduced sequence of this vanant protein according to the present invention) Table 11 - Nucleic acid SNPs
1275
1276
Variant protein T46984 PEAJ P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T13. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P10 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984 PEAJ P10, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL 1277
GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSl VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVG1TNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNV conesponding to amino acids 1 - 498 of RIB2_HUMAN, which also conesponds to amino acids 1 - 498 of T46984_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LMDQK conesponding to amino acids 499 - 503 of T46984_PEA_1_P 10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2An isolated polypeptide encoding for a tail of T46984_PEA_1_P10, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LMDQK in T46984 PEAJ P10. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). 1278 Table 12 - Amino acid mutations
The glycosylation sites of variant protein T46984 PEAJ P10, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 13 (given according to their position(s) on the amino acid 1279 sequence in the first column, the second column indicates whether the glycosylation site is present in the variant protein, and the last column indicates whether the position is different on the variant protein) Table 13 - Glycosylation sιte(s)
Vanant protein T46984_PEA_1_P10 is encoded by the following transcπpt(s) T46984_PEA_1_T13, for which the sequence(s) is/are given at the end of the application The coding portion of transcript T46984_PEAJ_T13 is shown in bold, this coding portion starts at position 316 and ends at position 1824 The transcnpt also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein T46984_PEA_1_P10 sequence provides support for the deduced sequence of this vanant protein according to the present invention) 7αWe 14 - Nucleic acid SNPs
1280
1281
Variant protein T46984_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T14. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984 PEAJ P11 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984 PEAJ P11, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKJ^ESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVΗSTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL 1282 FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLllGDATLKNPILWNVADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEK RPPTVVSNTFTALILSPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWT QLNMFQTLKYLAILGSVTFLAGNRMLAQQAVKR conesponding to amino acids 1 - 628 of RIB2JHUMAN, which also conesponds to amino acids 1 - 628 of T46984_PEAJ_P1 1.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a trans -membrane region downstream of this signal peptide. Variant protein T46984 PEAJ P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984 PEAJ P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
1283
The glycosylation sites of variant protein T46984 PEAJ P11, as compared to the known protem Dolichyl-diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor, are described in Table 16 (given according to their positιon(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 16 - Glycosylation site(s)
1284
Variant protein T46984_PEAJ_P11 is encoded by the following transcript(s): T46984_PEA_1_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984_PEA_1_T14 is shown in bold; this coding portion starts at position 316 and ends at position 2199. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P1 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
1285
1286
Variant protein T46984_PEAJ_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T15. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P 12 and RIB2_HUMAN: l .An isolated chimeric polypeptide encoding for T46984_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKJ NFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN conesponding to amino acids 1 - 338 of RIB2 HUMAN, which also conesponds to amino acids 1 - 338 of T46984_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SQDLH conesponding to amino acids 339 - 343 of T46984 PEAJ P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T46984_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T46984_PEA_1_P12. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized 1287 programs The variant protein is believed to be located as follows with regard to the cell secreted The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region Variant protein T46984_PEAJ_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 18, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein T46984_PEA_1_P12 sequence provides support for the deduced sequence of this vanant protein according to the present invention) Table 18 - Amino acid mutations
1288 The glycosylation sites of variant protein T46984_PEA_1_P12, as compared to the known protein Dohchyl-dιphosphoohgosacchaπde--proteιn glycosyltransferase 63 kDa subunit precursor, are descπbed in Table 19 (given according to their posιtιon(s) on the amino acid sequence in the first column, the second column indicates whether the glycosylation site is present in the variant protein, and the last column indicates whether the position is different on the variant protein) Table 19 - Glycosylation sιte(s)
Variant protein T46984 PEAJ P12 is encoded by the following transcnpt(s) T46984_PEA_1_T15, for which the sequence(s) is/are given at the end of the application The coding portion of transcript T46984_PEA_1_T15 is shown in bold, this coding portion starts at position 16 and ends at position 1344 The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein T46984 PEAJ P12 sequence provides support for the deduced sequence of this vanant protein accordmg to the present invention) Table 20 - Nucleic acid SNPs
1289
1290
Variant protein T46984_PEA_1_P21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T27. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984 PEAJ P21 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEA_1_P21, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M conesponding to amino acids 1 - 1 of T46984 PEAJ P21, and a second amino acid sequence being at least 90 %> homologous to KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSSVTQIYHAV AALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVA RLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFES 1291 LSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKL EHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVEL RVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQLVDVNT GAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDSASGTYTLYLII GDATLKNPILWNVADVVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTVVSNTF TALILSPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWTQLNMFQTLKY LAILGSVTFLAGNRMLAQQAVKRTAH conesponding to amino acids 70 - 631 of RIB2_HUMAN, which also conesponds to amino acids 2 - 563 of T46984_PEA_1_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs The variant protein is believed to be located as follows with regard to the cell membrane The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a trans -membrane region for this protein In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein Variant protein T46984 PEAJ P21 also has the following non-silent SNPs (Smgle Nucleotide Polymoφhisms) as listed m Table 21, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein T46984 PEAJ P21 sequence provides support for the deduced sequence of this vanant protein according to the present invention) Table 21 - Amino acid mutations
1292
The glycosylation sites of variant protein T46984_PEA_1_P21, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 22 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 22 - Glycosylation site(s) 1293
Variant protein T46984_PEA_1_P21 is encoded by the following transcript(s): T46984_PEA_I_T27, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEAJJF27 is shown in bold; this coding portion starts at position 338 and ends at position 2026. The transcript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEAJ_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
1294
1295
Variant protein T46984_PEA_1_P27 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T34. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P27 and RIB2JTUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEA_1_P27, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYH AVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVWVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFA conesponding to amino acids 1 - 415 of RIB2 HUMAN, which also conesponds to amino acids 1 - 415 of T46984 PEAJ P27, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP conesponding to amino acids 416 - 459 of T46984 PEAJ P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2An isolated polypeptide encoding for a tail of T46984_PEA_1_P27, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, 1296 more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984 PEA 1 P27. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984_PEA_1_P27 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 24, (given according to their posιtion(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Amino acid mutations
1297
The glycosylation sites of variant protein T46984_PEA_1_P27, as compared to the known protein Dolichyl-diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor, are described in Table 25 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 25 - Glycosylation site(s)
Variant protein T46984_PEA_1_P27 is encoded by the following transcript(s): T46984_PEA_1_T34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984 PEAJ T34 is shown in bold; this coding portion starts at position 316 and ends at position 1692. The transcript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs 1298
1299
Variant protein T46984_PEA_1_P32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T40. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P32 and RIB2_HUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEA_1_P32, comprising a first amino acid sequence being at least 90 %> homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHiNRYHWVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN 1300 RYIANTVE corresponding to amino acids 1 - 364 of RIB2 HUMAN, which also corresponds to amino acids 1 - 364 of T46984_PEA_1_P32, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%>, more preferably at least 90%o and most preferably at least 95%> homologous to a polypeptide having the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT conesponding to amino acids 365 - 397 of T46984_PEA_1_P32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T46984_PEA_1_P32, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984_PEA_1_P32.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984_PEA_1_P32 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 27, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Amino acid mutations
1301
The glycosylation sites of variant protein T46984 PEAJ P32, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 28 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 28 - Glycosylation site(s)
Variant protein T46984JPEA_1_P32 is encoded by the following transcript(s): T46984 PEAJ T40, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984 PEAJ JT40 is shown in bold; this coding portion starts at position 316 and ends at position 1506. The transcript also has the following SNPs as listed in 1302 Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein T46984_PEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 29 - Nucleic acid SNPs
1303
Variant protein T46984_PEA_1_P34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T42. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P34 and RIB2_HUMAN: 1 n isolated chimeric polypeptide encoding for T46984_PEA_1_P34, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKJ l^ESLSEAFSVASAAAVLSFINRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVG conesponding to amino acids 1 - 329 of RIB2 HUMAN, which also conesponds to amino acids 1 - 329 of T46984 PEAJ P34.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. 1304 Variant protein T46984_PEA_1_P34 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 30, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 30 - Amino acid mutations
The glycosylation sites of variant protein T46984_PEA_1_P34, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 31 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is 1305 present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 31 - Glycosylation site(s)
Variant protein T46984_PEA_1_P34 is encoded by the following transcript(s): T46984_PEA_1_T42, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984_PEA_1_T42 is shown in bold; this coding portion starts at position 316 and ends at position 1302. The transcript also has the following SNPs as listed in Table 32 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984 PEAJ P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Nucleic acid SNPs
1306
Variant protein T46984 PEAJ P35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T43. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide—protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P35 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984 PEAJ P35, comprising a first amino acid sequence being at least 90 %> homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSΓVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI 1307 VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI corresponding to amino acids 1 - 287 of RIB2 HUMAN, which also conesponds to amino acids 1 - 287 of T46984_PEA_1_P35, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence GCWPSRQSREQHISSRRKMHLKTECQEKESRTIHSMRRKMEKKNFl conesponding to amino acids 288 - 334 of T46984_PEA_1_P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2An isolated polypeptide encoding for a tail of T46984_PEA_1_P35, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GCWPSRQSREQHISSRRKMEILKTECQEKESRTIHSMRRKMEKKNFI in T46984_PEA_1_P35.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984_PEA_1_P35 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 33, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Amino acid mutations
1308
The glycosylation sites of variant protein T46984_PEA_1_P35, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 34 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 34 - Glycosylation site(s)
Variant protein T46984_PEA_1_P35 is encoded by the following transcript(s): T46984_PEA_1_T43, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984 PEAJ T43 is shown in bold; this coding portion starts at position 316 and ends at position 1317. The transcript also has the following SNPs as listed in 1309 Table 35 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Nucleic acid SNPs
1310
Variant protein T46984JPEA_1_P38 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T47. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984 PEAJ J 38 and RIB2 HUMAN: 1 An isolated chimeric polypeptide encoding for T46984 PEA J P38, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEAL conesponding to amino acids 1 - 145 of RIB2 HUMAN, which also conesponds to amino acids 1 - 145 of T46984_PEA_1_P38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MDPDWCQCLQLHFCS conesponding to amino acids 146 - 160 of T46984 PEAJ P38, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984 PEAJ P38, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, 1311 more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T46984_PEA_1_P38.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984 PEAJ P38 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 36, (given according to their position(s) on the amino acid sequence, with the alternative amino acιd(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEAJ_P38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Amino acid mutations
The glycosylation sites of variant protein T46984_PEA_1_P38, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 37 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). 1312 Table 37 - Glycosylation site(s)
Variant protein T46984_PEA_1_P38 is encoded by the following transcript(s): T46984_PEA_1_T47, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984_PEA_1_T47 is shown in bold; this coding portion starts at position 316 and ends at position 795. The transcript also has the following SNPs as listed in Table 38 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Nucleic acid SNPs
1313
Variant protein T46984_PEA J_P39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984_PEA_1_T48. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide- -protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984 PEAJ P39 and RIB2 HUMAN: l.An isolated chimeric polypeptide encoding for T46984 PEAJ P39, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLA conesponding to amino acids 1 - 160 of RIB2_HUMAN, which also conesponds to amino acids 1 - 160 of T46984_PEA_1_P39.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide 1314 prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region Variant protein T46984_PEAJ_P39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 39, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein T46984 PEAJ P39 sequence provides support for the deduced sequence of this vanant protein according to the present invention) Table 39 - Amino acid mutations
The glycosylation sites of vanant protein T46984 PEAJ P39, as compared to the known protein Dohchyl-diphosphoohgosacchaπde— protein glycosyltransferase 63 kDa subunit precursor, are descnbed m Table 40 (given according to their posιtιon(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the vanant protein; and the last column indicates whether the position is different on the variant protem). Table 40 - Glycosylation sιte(s)
Vanant protem T46984 PEAJ P39 is encoded by the following transcnpt(s). T46984 PEAJ T48, for which the sequence(s) is/are given at the end of the application. The 1315 coding portion of transcript T46984_PEA_1_T48 is shown in bold; this coding portion starts at position 316 and ends at position 795. The transcript also has the following SNPs as listed in Table 41 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Nucleic acid SNPs
10. Variant protein T46984 PEAJ P45 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T32. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of , the application. One or more alignments to one or more previously published protein sequences 1316 are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P45 and RIB2_HUMAN: l .An isolated chimeric polypeptide encoding for T46984_PEA_1_P45, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCE conesponding to amino acids 1 - 101 of RIB2 HUMAN, which also conesponds to amino acids 1 - 101 of T46984_PEA_1_P45, and a second amino acid sequence being at least 70%>, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG conesponding to amino acids 102 - 116 of T46984 PEAJ P45, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984_PEA_1_P45, comprising a polypeptide being at least 70%>, optionally at feast about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence NSPGSADSIPPVPAG in T46984 PEAJJM5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984_PEA_1_P45 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 42, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P45 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 42 - Amino acid mutations 1317
The glycosylation sites of variant protein T46984_PEA_1 JM5, as compared to the known protein Dolichyl-diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor, are described in Table 43 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 43 - Glycosylation site(s)
Variant protein T46984 PEAJJM5 is encoded by the following transcript(s): T46984_PEA_1_T32, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984 PEAJ T32 is shown in bold; this coding portion starts at position 316 and ends at position 663. The transcript also has the following SNPs as listed in Table 44 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984_PEA_1_P45 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 44 - Nucleic acid SNPs
1318
1319
Variant protein T46984 PEAJ P46 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984 PEAJ T35. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1_P46 and RIB2_HUMAN: l.An isolated chimeric polypeptide encoding for T46984_PEA_1_P46, comprising a first amino acid sequence being at least 90 %> homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL 1320 GAQVPDAK conesponding to amino acids 1 - 69 of RIB2J IUMAN, which also corresponds to amino acids 1 - 69 of T46984_PEA_1_P46, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG conesponding to amino acids 70 - 84 of T46984 PEAJ JM6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984_PEA_1 JM6, comprising a polypeptide being at least 70%>, optionally at least about 80%o, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NSPGSADSIPPVPAG in T46984 PEA 1 P46.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984_PEA_1_P46 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 45, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984 PEAJ JM6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 45 - Amino acid mutations
1321 The glycosylation sites of variant protein T46984 PEAJ P46, as compared to the known protein Dolichyl-diphosphooligosaccharide--protein glycosyltransferase 63 kDa subunit precursor, are described in Table 46 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 46 - Glycosylation site(s)
Variant protein T46984 PEAJ JP46 is encoded by the following transcript(s): T46984_PEA_1_T35, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984_PEAJ_T35 is shown in bold; this coding portion starts at position 316 and ends at position 567. The transcript also has the following SNPs as listed in Table 47 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984 PEAJ P46 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 47 - Nucleic acid SNPs
1322
1323
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T46984_PEA_l_node_2 according to the present invention is supported by 240 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): T46984 PEAJ T2, T46984JPEAJ T3, T46984_PEA J_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40, T46984_PEA_1_T42, T46984_PEA_1_T43, T46984 PEAJ T47 and T46984 PEAJ T48. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
1324
Segment cluster T46984_PEA_l_node_4 according to the present mvention is supported by 321 libranes The number of libranes was deteπnined as previously described This segment can be found in the followmg transcnpt(s) T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEAJ T12, T46984 PEAJ T13, T46984_PEA_1_T14, T46984 PEAJ T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40, T46984_PEA_1_T42, T46984_PEA_1_T43, T46984JPEAJ T47 and T46984_PEA_1_T48 Table 49 below descπbes the starting and ending position of this segment on each transcπpt Table 49 - Segment location on transcripts
1325
Segment cluster T46984_PEA_l_node_6 according to the present mvention is supported by 3 libranes The number of libranes was deteπnined as previously descπbed This segment can be found in the following transcπpt(s) T46984_PEA_1_T27 Table 50 below descnbes the starting and ending position of this segment on each transcript Table 50 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_12 according to the present invention is supported by 262 libranes The number of libranes was deteπnined as previously descnbed This segment can be found in the following transcnpt(s) T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T34, 1326 T46984_PEA_1_T40, T46984_PEA_1_T42, T46984 PEA_1_T43, T46984_PEA_1_T47 and T46984_PEA_1_T48. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_14 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEAJ_T48. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts 1327
Segment cluster T46984_PEA_l_node_25 according to the present invention is supported by 257 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEAJ_T2, T46984_PEA_1_T3, T46984_PEAJ JT2, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984 PEAJ T19, T46984 PEAJ T23, T46984_PEA_1_T27, T46984_PEAJ_T32, T46984_PEA_1_T34, T46984_PEAJ_T35, T46984_PEA_1_T40, T46984J>EAJ JT42 and T46984_PEAJ_T43. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
1328
Segment cluster T46984_PEA_l_node_29 according to the present invention is supported by 1 libraries The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s) T46984_PEA_1_T42 Table 54 below descnbes the starting and ending position of this segment on each transcπpt. Table 54 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_34 according to the present invention is supported by 4 libranes The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s)- T46984_PEA_1_T40. Table 55 below describes the starting and ending position of this segment on each transcnpt. Table 55 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_46 according to the present invention is supported by 1 libraries. The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s). T46984 PEAJ T46. Table 56 below descnbes the starting and endmg position of this segment on each transcript. Table 56 - Segment location on transcripts 1329
Segment cluster T46984_PEA_l_node_47 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984 PEAJ T3, T46984_PEA_1_T19 and T46984_PEA_1_T46. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_52 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T19 and T46984 PEAJ T23. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
1330
Segment cluster T46984_PEA_ l_node_65 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T51. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_69 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T52 and T46984_PEA_1_T54. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_75 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T14. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts 1331
Segment cluster T46984_PEAJ_node_86 according to the present invention is supported by 314 libraries The number of libraries was deteπnined as previously descnbed This segment can be found in the following transcnpt(s) T46984_PEAJ_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEAJ_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T43, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEAJ_T51 , T46984_PEAJ_T52 and T46984 PEAJ T54 Table 62 below describes the starting and ending position of this segment on each transcript Table 62 - Segment location on transcripts
1332
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided These segments are up to about 120 bp in length, and so are included in a separate descπption
Segment cluster T46984_PEAJ_node_9 according to the present mvention is supported by 304 libranes The number of libranes was deten ned as previously described This segment can be found in the following transcπpt(s) T46984_PEAJ_T2, T46984_PEAJ_T3, T46984_PEA_1_T12, T46984 PEAJ T13, T46984JΕAJ T14, T46984_PEAJ_T15, T46984_PEA_1_T19, T46984 PEAJ T23, T46984_PEA_1_T27, T46984 PEAJ T32, T46984_PEA_1_T34, T46984_PEA_1 JMO, T46984_PEA_1_T42, T46984 PEAJ T43, T46984_PEAJ_T47 and T46984_PEAJ_T48 Table 63 below describes the starting and ending position of this segment on each transcript Table 63 - Segment location on transcripts
1333
Segment cluster T46984_PEAJ_node_13 according to the present invention is supported by 232 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T34, T46984_PEA_1_T40, T46984_PEA_1_T42, T46984 PEA 1 T43 and T46984 PEAJJM8. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
1334
Segment cluster T46984_PEA_l_node_19 according to the present invention is supported by 237 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_ T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40, T46984_PEA_1_T42 and T46984_PEA_1_T43. Table 65 below describes the starting and ending position of this segment on each transcript Table 65 - Segment location on transcripts
1335
Segment cluster T46984_PEA_l_node_21 according to the present invention is supported by 242 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40, T46984_PEA_1_T42 and T46984_PEA_1_T43. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
1336 Segment cluster T46984_PEA_l_node_22 according to the present invention is supported by 205 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEAJ T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984 PEAJ T15, T46984 PEAJ JT9, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40, T46984_PEA_1_T42 and T46984_PEA_1_T43. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_26 according to the present invention can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, 1337 T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PE A_ 1 _T 14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40 and T46984_PEA_1_T42. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_28 according to the present invention is supported by 242 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEAJ T12, T46984 PEAJJT13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T40 and T46984_PEA_1_T42. Table 69 below describes the starting and ending position of this segment on each transcript. 1338 Table 69 - Segment location on transcripts TranscπpLname tgnsr >„Segmenf»ϊ "1 4* startmg position .ending position,. T46984 PEA 1 12 1 183 1301 T46984 PEA 1 T3 1 183 1301 T46984 PEA 1 T12 1183 1301 T46984 PEA 1 T13 1183 1301 T46984 PEA 1 T14 1183 1301 T46984 PEA 1 T15 1183 1301 T46984 PEA 1 T19 1183 1301 T46984 PEA 1 T23 1183 1301 T46984 PEA 1 T27 1001 1119 T46984 PEA 1 T32 1007 1125 T46984 PEA 1 T34 1183 1301 T46984 PEA 1 T35 911 1029 T46984 PEA 1 T40 1183 1301 T46984 PEA 1 T42 1183 1301
Segment cluster T46984 PEA J_node_31 according to the present invention is supported by 207 libranes The number of libranes was detennined as previously descnbed. This segment can be found in the followmg transcnpt(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35 and T46984_PEA_1_T40. Table 70 below describes the starting and ending position of this segment on each transcπpt. Table 70 - Segment location on transcripts
1339
Segment cluster T46984_PEA_l_node 32 according to the present invention is supported by 226 libranes. The number of libraries was detennined as previously described This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984JΕAJ T13, T46984 PEA 1 T14, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984 PEAJ T27, T46984 PEAJ T32, T46984_PEA_1_T34, T46984_PEA_1_T35 and T46984_PEA_1_T40. Table 71 below descπbes the starting and ending position of this segment on each transcπpt. Table 71 - Segment location on transcripts
1340
Segment cluster T46984_PEA_l_node_38 according to the present invention can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984 PEAJ T13, T46984_PEA_1_T14, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34 and T46984_PEA_1_T35. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
1341
Segment cluster T46984_PEA_l_node_39 according to the present invention can be found in the following transcript(s): T46984_PEA_1_T2, T46984 PEAJ T3, T46984J>EA_1_T12, T46984_PEAJ_T13, T46984_PEA_1 J 4, T46984JΕAJ T15, T46984_PEA J_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34 and T46984 PEAJJT35. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_40 according to the present invention is supported by 227 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEAJ_T2, T46984_PEA_1_ T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, 1342 T46984_PEA_1_T34 and T46984_PEA_1_T35. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_42 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEAJ T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984 PEAJ T19, T46984 PEAJ T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984 PEAJ T34 and T46984 PEAJ T35. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
1343
Segment cluster T46984_PEA_l_node_43 according to the present invention is supported by 235 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEAJ T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32 and T46984 PEAJ T35. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
1344
Segment cluster T46984_PEA_l_node 48 according to the present invention is supported by 282 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984JPEAJ J 9, T46984_PEA_1_T23, T46984_PEAJ_T27, T46984_PEA_1_T32, T46984_PEA_1_T35 and T46984_PEA_1_T46. Table 77 below descπbes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
1345
Segment cluster T46984_PEA_l_node_49 according to the present invention is supported by 262 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984 PEAJ T3, T46984_PEA_1_T12, T46984JΕAJ JT3, T46984_PEA_1_T14, T46984_PEA_1 JT5, T46984_PEA_1_T19, T46984 PEAJ T23, T46984_PEA_1_T27, T46984_PEAJ_T32, T46984_PEA_1_T35 and T46984 PEAJ T46. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_50 according to the present invention is supported by 277 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984 PEAJ T2, T46984 PEAJ _T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, 134 6 T46984_PEA_1_T35 and T46984_PEA_1 JM6 Table 79 below descπbes the starting and ending position of this segment on each transcript Table 79 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_51 according to the present invention is supported by 6 libranes The number of libranes was determmed as previously descπbed This segment can be found in the following transcπpt(s) T46984_PEA_1_T2, T46984 PEAJ T12, T46984 PEAJ T19 and T46984 PEAJ T23 Table 80 below describes the starting and ending position of this segment on each transcπpt Table 80 - Segment location on transcripts
1 34 7
Segment cluster T46984 PEA l_node_53 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T13, T46984 PEAJ T19 and T46984_PEA_1_T23. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster T46984 _PEA_l_node_54 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984_PEA_1_T19 and T46984 PEAJ T23. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
1 34 8
Segment cluster T46984_PEAJ_node_55 according to the present invention is supported by 335 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984 PEAJJT3, T46984_PEAJ_T12, T46984_PEA_1_T13, T46984_PEAJ_T14, T46984_PEA_1_T15, T46984_PEAJ_T19, T46984_PEA_1_T23, T46984_PEAJ_T27, T46984_PEA_1_T32, T46984_PEA_1_T35 and T46984 PEAJ T46. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_57 according to the present invention can be found in the following transcript(s): T46984_PEAJ_T2, T46984_PEAJ_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, 134 9 T46984_PEA_1_T35 and T46984_PEA_1_T46 Table 84 below describes the starting and ending position of this segment on each transcript Table 84 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_60 according to the present invention is supported by 326 libranes The number of libranes was detennined as previously described This segment can be found in the following transcπpt(s) T46984 PEAJ T2, T46984 PEAJ T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984__PEAJ_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T35 and T46984_PEA_1_T46 Table 85 below describes the starting and ending position of this segment on each transcπpt Table 85 - Segment location on transcripts
1350
Segment cluster T46984_PEAJ_node_62 according to the present invention is supported by 335 libranes. The number of libranes was determined as previously descnbed. This segment can be found in the following transcπpt(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984 PEA J T13, T46984_PEAJ_T14, T46984 PEAJ T15, T46984JΕAJ J 9, T46984JPEAJ JT27, T46984_PEA_1_T32, T46984_PEA_1_T35 and T46984 PEAJ T46. Table 86 below descnbes the starting and endmg position of this segment on each transcript. Table 86 - Segment location on transcripts
1351
Segment cluster T46984_PEA_l_node_66 according to the present invention is supported by 336 libranes. The number of libraries was detennined as previously described. This segment can be found m the following transcnpt(s)- T46984_PEAJ_T2, T46984 PEAJ T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984 PEAJ T34, T46984 PEAJ T35, T46984 PEAJ JM6, T46984 PEAJ T47 and T46984_PEA_1_T51 Table 87 below describes the starting and ending position of this segment on each transcript Table 87 - Segment location on transcripts
1352
Segment cluster T46984_PEA_l_node_67 according to the present invention is supported by 323 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984 PEAJ T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984JPEAJJT13, T46984_PEA_1_T14, T46984_PEAJ_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEAJ_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T46, T46984_PEA_1 JM7 and T46984_PEA_1_T51. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
1353
Segment cluster T46984_PEA_l_node_70 according to the present invention is supported by 337 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984 PEAJ T34, T46984_PEA_1_T35, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEA_1_T51 , T46984_PEA_1_T52 and T46984_PEA_1_T54. Table 89 below describes the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts
1354
Segment cluster T46984_PEA_l_node_71 according to the present invention can be found in the following transcπpt(s) T46984_PEA_1_T2, T46984_PEA_1_T3, T46984 PEA 1 T12, T46984_PEA_1 J 3, T46984 PEAJ T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984JΕAJ T27, T46984 PEAJJT32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEAJ_T51, T46984 PEAJ T52 and T46984_PEAJ_T54 Table 90 below describes the starting and ending position of this segment on each transcript Table 90 - Segment location on transcripts
1355
Segment cluster T46984 PEA l_node_72 according to the present invention can be found in the following transcπpt(s) T46984_PEAJ_T2, T46984_PEAJ_T3, T46984JPEAJ T12, T46984_PEA_1_T13, T46984 PEAJ T14, T46984_PEAJ_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEAJ_T34, T46984_PEA_1_T35, T46984_PEA_1_T43, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEA_1_T51, T46984_PEA_1_T52 and T46984_PEA_1_T54. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
1356
Segment cluster T46984_PEA_l_node_73 according to the present invention can be found in the following transcπpt(s) T46984_PEAJ_T2, T46984 PEAJ T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984 PEAJ J 5, T46984_PEA_1_T19, T46984_PEAJ_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEAJ T34, T46984 PEAJ J35, T46984_PEA_1_T43, T46984 PEAJJT46, T46984_PEA_1_T47, T46984_PEΛJ_T51 , T46984_PEA_1_T52 and T46984_PEA_1_T54 Table 92 below descπbes the starting and ending position of this segment on each transcript Table 92 - Segment location on transcripts
1357
Segment cluster T46984_PEA _l_node_74 according to the present invention can be found in the following transcript(s): T46984_PEA_1_T2, T46984_ PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA J T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984 PEAJ JM3, T46984_PEA_1_T46, T46984_PEAJ_T47, T46984_PEA_1_T51, T46984_PEA_1_T52 and TM6984_PEAJ_T54. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
1358
Segment cluster T46984_PEA_l_node_83 according to the present invention can be found in the following transcript(s): T46984_PEA_l_T2, T46984_PEA_1_T3, T46984_PEAJ_T12, T46984_PEA_1_T13, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984 PEAJ T23, T46984_PEA_1_T27, T46984JΕAJ T32, T46984_PEAJ_T34, T46984 PEAJ T35, T46984_PEA_1_T43, T46984_PEA_1_T46, T46984 PEAJ T47, T46984_PEAJ_T51, T46984_PEAJ_T52 and T46984_PEA_1_T54. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on transcripts
1359 Segment cluster T46984_PEA_l_node_84 according to the present invention can be found in the following transcript(s): T46984_PEAJ_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1 J 5, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T43, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEA_1_T51 , T46984_PEA_1_T52 and T46984_PEA_1_T54. Table 95 below describes the starting and ending position of this segment on each transcript. Table 95 - Segment location on transcripts
1360 Segment cluster T46984_PEAJ_node_85 according to the present invention is supported by 295 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript (s): T46984_PEA_1_T2, T46984_PEA_1_T3, T46984_PEA_1_T12, T46984_PEA_1_T13, T46984_PEA_1_T15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_ T27, T46984_PEA_1_T32, T46984_PEA_1_T34, T46984_PEA_1_T35, T46984_PEA_1_T43, T46984_PEA_1_T46, T46984_PEA_1_T47, T46984_PEA_1_T51, T46984_PEA_1_T52 and T46984_ PEA_1_T54. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
1361
Variant protein alignment to the previously known protein: Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P2 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 4716.00 Escore: 0 Matching length: 498 Total length: 498 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 1362
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I I I II I II I II I I I I I I II I I I I I I I I I I I II II 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 . . . . .
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I II II I I II I II I I I I I II I I I II I I I I I II I I I I I I I I I II 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I II I I I II I I I I I I I I I I I I I I I I I II I I II I I I I I II I I I I I I I
201 FEEGLETTALFVAATYKLMDHVGTEPS IKEDQVIQLMNAI FSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I
251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 . . . . .
401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFV 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFV 450
451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPIL NV 498 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I II I I II 1363 451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPILWNV 49£
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P3 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 4085.00 Escore: 0 Matching length: 433 Total length: 433 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTI IASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1364 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I II II I I I I II AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I II I I II I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I M || || I M I I I || M I f I || I II I I I I || II II I II I I I I I I I I II I I RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300
I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300 . . . . . TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQ 433 I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQ 433 1365
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA__1_P10 x RIB2_HUMAN
Alignment segment 1/1: Quality: 4716.00
Escore: 0 Matching length: 498 Total length: 498 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1366 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I II I II I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II 251 EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I II I I I II I I I I I I I I I 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEWFV 450 I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFV 450
451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPIL NV 498 I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPILWNV 498 1367
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P11 x RIB2_HUMAN
Alignment segment 1/1: Quality: 5974.00
Escore: 0 Matching length: 628 • Total length: 628 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I I I I I I II I I I I I I I I I I 1368 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I II I II II 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I I I I I I I II I I I I I I I I I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I I I I I I I I I II I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I II I I I
251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFV 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEWFV 450
451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPIL NVAD 500 I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPILWNVAD 500
501 VVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTWSNTFTALIL 550 1369 I I II I I I II I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I II I I I I I I 501 VVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTVVSNTFTALIL 550
551 SPLLLLFALWIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWTQLNMF 600 II I I I I I I I I I I I I I I I I I I I I I I M I I I II I II I I I I I I I I I I I I I I I I 551 SPLLLLFAL IRIGANVSNFTFAPSTIIFHLGHAAMLGLMYVYWTQLNMF 600
601 QTLKYLAILGSVTFLAGNRMLAQQAVKR 628 I I II I II I II I I I II I I I I I I II I I I I I 601 QTLKYLAILGSVTFLAGNRMLAQQAVKR 628
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P12 x RIB2_HUMAN
Alignment segment 1/1: Quality: 3179.00
Escore: 0 Matching length: 338 Total length: 338 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 1370 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I II I II I I I I I I I II I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I M I ! I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN 338 1371 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN 338
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P21 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 5348.00 Escore: 0 Matching length: 562 Total length: 562 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
2 KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSED 51 I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I i I 70 KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSED 119
52 SSVTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTAS 101 I I I I I I I I I I I I I I II I II II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 120 SSVTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTAS 169 1372 102 HLSQQADLRSIVEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLM 151 I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 170 HLSQQADLRSIVEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLM 219
152 DHVGTEPSIKEDQVIQLMNAIFSKKNFESLSEAFSVASAAAVLSHNRYHV 201 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 220 DHVGTEPSIKEDQVIQLMNAIFSKKNFESLSEAFSVASAAAVLSHNRYHV 269
202 PVVVVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKLEHAKSVASRATV 251 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I
270 PVVVVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKLEHAKSVASRATV 319
252 LQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVELRVKI 301 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 320 LQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVELRVKI 369
302 STEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQ 351 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I
370 STEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQ 419 . . . . .
352 LVDVNTGAELTPHQTFVRLHNQKTGQEWFVAEPDNKNVYKFELDTSERK 401 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I
420 LVDVNTGAELTPHQTFVRLHNQKTGQEWFVAEPDNKNVYKFELDTSERK 469
402 IEFDSASGTYTLYLIIGDATLKNPIL NVADWIKFPEEEAPSTVLSQNL 451 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 470 IEFDSASGTYTLYLIIGDATLKNPILWNVADWIKFPEEEAPSTVLSQNL 519
452 FTPKQEIQHLFREPEKRPPTWSNTFTALILSPLLLLFAL IRIGANVSN 501 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
520 FTPKQEIQHLFREPEKRPPTWSNTFTALILSPLLLLFAL IRIGANVSN 569 1373
502 FTFAPSTIIFHLGHAAMLGLMYVYWTQLNMFQTLKYLAILGSVTFLAGNR 551 I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 570 FTFAPSTIIFHLGHAAMLGLMYVYWTQLNMFQTLKYLAILGSVTFLAGNR 619
552 MLAQQAVKRTAH 563 I II I I II I I II I 620 MLAQQAVKRTAH 631
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T4698 _PEA_1_P27 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 3910.00 Escore: 0 Matching length: 415 Total length: 415 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 1374
Alignment :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 II I I I II II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I I I I II I I I I I I I I I I II I I I I I I I I I I II I I I II I I I I I I I II I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I II I I I I I I I II I I II I I I I I I I I I I I I I I I I II 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 . . . . . 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 1375 351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 401 KAKGTFIADSHQNFA 415 I I I I I I I I I I I I II I 401 KAKGTFIADSHQNFA 415
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P32 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 3434.00 Escore: 0 Matching length: 373 Total length: 373 Matching Percent Similarity: 98.93 Matching Percent Identity: 98.39 Total Percent Similarity: 98.93 Total Percent Identity: 98.39 Gaps: 0
Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1376
AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
I I I I I I I I I II II I I I I I II II II I I II II I II I I I I I II II I I II I I I I AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 . . . . . EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
I I I I I I I I I I I II I II I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I II I II II I I I I I I I I I II I I I II I I I II I I I I I I I II I II RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I || I I I I I I M I I I I || I || I I I I I I I I || I I I || M || I I I I I I I FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I II I I II I I I I I II I I II II I I I I I I I I I I I I I I II I I I I I I I I I I II II EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300
TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
I II I I I I I I I I I I I I I I I II I I I II I I II I I I I I I I II I I I II I I II I I I TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
EVEGDNRYIANTVEGQVRWLTPV 373
I I I II I I I I I I I I I : I : I I EVEGDNRYIANTVELRVKISTEV 373 1377
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984 PEA_1 P34 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 3087.00 Escore: 0 Matching length: 329 Total length: 329 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 1378 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I II I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II II I I I 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I II I II I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I IJ I I I I I I I I I I I I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPWVVPEGSASDTHEQAILRLQVTNVLSQPL 300 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVG 329 I I I I I I I I I I I I I I I I II I I I I I I I I I I I 301 TQATVKLEHAKSVASRATVLQKTSFTPVG 329
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P35 x RIB2_HUMAN 1379 Alignment segment 1/1:
Quality: 2697.00 Escore: 0 Matching length: 287 Total length: 287 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I II I I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I II I I I I I I I I M I I I I I I I M M I I I I I I I I M I I I I I I I II I I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 1380 I I I I I I II I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI 287 I I I I || I I I I I I I I I || I I I I I I I I I I I I II || I I I I 251 EAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI 287
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P38 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 1368.00
Escore: 0 Matching length: 145 Total length: 145 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 1381 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I I I I I I I I II I I I II II I II I I I I I I I I II I I I I I I I I I I I I I I I I I I II 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEAL 145 I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEAL 145
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P39 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 1500.00 Escore: 0 Matching length: 160 Total length: 160 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I II I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 1382
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 II I II I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLA 160 I II I II I I I I 151 RLSKEETVLA 160
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P45 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 970.00 Escore: 0 Matching length: 103 Total length: 103 Matching Percent Similarity: 99.03 Matching Percent Identity: 99.03 Total Percent Similarity: 99.03 Total Percent Identity: 99.03 Gaps: 0 1383 Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II II II II II 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I II I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 ENS 103 I I 101 EIS 103
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P46 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 656.00 Escore: 0 Matching length: 69 Total length: 69 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 1384 Gaps :
Alignmen :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAK 69 I I I I I I II I I II I I I I I I I 51 AFYSIVGLSSLGAQVPDAK 69
DESCRIPTION FOR CLUSTER M78530 Cluster M78530 features 3 transcnpt(s) and 21 segment(s) of interest, the names for which are given m Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3 Table 1 - Transcripts of interest
M78530 PEA 1 Ti l 399 M78530 PEA 1 T12 400 M78530 PEA 1 T13 401
Table 2 - Segments of interest
1385
Table 3 - Proteins of interest ProteirrNameϊ M78530 PEA 1 P15 426 M78530 PEA 1 Ti l M78530 PEA 1 P16 427 M78530 PEA 1 T12 M78530 PEA 1 P17 428 M78530 PEA 1 T13
Cluster M78530 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcπpts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 40 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). 1386
Overall, the following results were obtained as shown with regard to the histograms in Figure 40 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions, ovarian carcinoma.
Table 4 - Normal tissue distribution iName gg ssuejg dumber- Mjs adrenal 40 bladder 41 brain 52 colon 126 epithelial 51 general 35 kidney 199 lung 63 breast ovary pancreas 20 prostate 28 stomach utems 113
Table 5 - P values and ratios for expression in cancerous tissue
1387
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below (in relation to ovarian cancer), shown in Table 6. Table 6 - Oligonucleotides related to this cluster
M78530 0 6 0 ovanan carcinoma OVA
As noted above, cluster M78530 features 3 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein M78530 PEAJ P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78530 PEAJ T11. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78530_PEA_1 JT5 and Q9HCB6 (SEQ ID NO:424): l.An isolated chimeric polypeptide encoding for M78530 PEAJ P15, comprising a first amino acid sequence being at least 90 % homologous to 1388
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN VRAAPSAEFSVDRTRHLMSFLTMMGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPW DAGTDSGVTYESPNKPTIPQEKIRPLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQ CNIVPDNVDDIVADLAPEEKDEDDTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKA QLDLSVPCPDTQDFQPCMGPGCSDEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQ FPEDGSVCTLPTEE conesponding to amino acids 1 - 544 of Q9HCB6, which also conesponds to amino acids 1 - 544 of M78530_PEA_1_P15, a bridging amino acid T conesponding to amino acid 545 of M78530_PEA_1_P15, a second amino acid sequence being at least 90 % homologous to EKCTVNEECSPSSCLMTEWGEWDECSATCGMGMKKRHRMIKMNPADGSMCKAETSQ AEKCMMPECHTIPCLLSPWSEWSDCSVTCGKGMRTRQRMLKSLAELGDCNEDLEQVE KCMLPEC conesponding to amino acids 546 - 665 of Q9HCB6, which also conesponds to amino acids 546 - 665 of M78530 PEAJ P15, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS conesponding to amino acids 666 - 695 of M78530_PEA_1_P15, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78530_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15.
Comparison report between M78530_PEA_1_P15 and 094862 (SEQ ID NOM25): l.An isolated chimeric polypeptide encoding for M78530 PEAJ P15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having 1389 the sequence
MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEA_1_P15, a second amino acid sequence being at least 90 %> homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNVRAAPSAEFSVDRTRHLMSFLTM MGPSPDWNVGLSAEDLCTKECGWVQKVVQDLIPWDAGTDSGVTYESPNKPTIPQEKIR PLTSLDHPQSPFYDPEGGSITQVARVVIERIARKGEQCNIVPDNVDDIVADLAPEEKDED DTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKAQLDLSVPCPDTQDFQPCMGPGCS DEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQFPEDGSVCTLPTEETEKCTVNEEC SPSSCLMTEWGEWDECSATCGMGMKKRHRMIKMNPADGSMCKAETSQAEKCMMPE CHTIPCLLSPWSEWSDCSVTCGKGMRTRQRMLKSLAELGDCNEDLEQVEKCMLPEC conesponding to amino acids 1 - 582 of 094862, which also conesponds to amino acids 84 - 665 of M78530_PEA_1_P15, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS conesponding to amino acids 666 - 695 of M78530 PEAJ P15, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of M78530_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence
MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530_PEA_1_P15. 3.An isolated polypeptide encoding for a tail of M78530_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RKSWSSSRPITSMFLSPGSPEPASANTARS in M78530_PEA_1_P15. 1390
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein M78530_PEA_1_P15 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78530_PEAJ_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein M78530_PEAJ_P15 is encoded by the following transcript(s): M78530_PEAJ_T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78530_PEAJ_T11 is shown in bold; this coding portion starts at position 629 and ends at position 2713. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78530_PEAJ_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs 1391
SNP position on nucleotide Αltemative'nucleic acid Previously known SNP?-_«~ sequence^ jj-Vy Igf • feϊSϊ
760 C -> T No
1461 A -> T No
1462 G -> T No
1492 A -> G No
Variant protein M78530_PEA_1_P16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78530_PEA_1_T12. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78530_PEA_1_P16 and Q8NCD7 (SEQ ID NO: 423): l.An isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVΈGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKΈEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTWQVFWIAPPAGTGCVILKASΓVQKRIIYFQDEGSLTK-KL CEQDSTFDG VTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN
V conesponding to amino acids 1 - 297 of Q8NCD7, which also conesponds to amino acids 1 - 297 ofM78530 PEA 1 P16.
Comparison report between M78530_PEA_1_P16 and Q9HCB6 (SEQ ID NO: 424): l.An isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTWQ WIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH 1392
SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLN V corresponding to amino acids 1 - 297 of Q9HCB6, which also conesponds to amino acids 1 - 297 of M78530_PEA_1_P16. Comparison report between M78530JPEAJJ 6 and 094862 (SEQ ID NO: 425): l .An isolated chimeric polypeptide encoding for M78530_PEA_1_P16, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRJLRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of M78530_PEA_1_P16, and a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQSDEVLTVIKAKAQWPAWQPLNV conesponding to amino acids 1 - 214 of 094862, which also conesponds to amino acids 84 - 297 of M78530 PEAJ P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of M78530 PEAJ P16, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530_PEA_1_P16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signafpeptide 1393 prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein M78530_PEA_1_P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78530_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein M78530_PEA_1_P16 is encoded by the following transcript(s): M78530_PEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78530_PEA_1_T12 is shown in bold; this coding portion starts at position 629 and ends at position 1519. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78530_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
1394
Variant protein M78530_PEA_1_P17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78530_PEA_1_T13. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78530_PEA_1_P17 and Q8NCD7: l.An isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 90 %> homologous to MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ conesponding to amino acids 1 - 275 of Q8NCD7, which also conesponds to amino acids 1 - 275 of M78530_PEA_1_P17, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK conesponding to amino acids 276 - 285 of M78530JPEAJ JP17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78530_PEA_1_P17, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
Comparison report between M78530_PEA_1_P17 and Q9HCB6: 1395 l .An isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 90 %> homologous to
MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEE ETQFMSNCPVAVTESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKL CEQDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSH SKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ conesponding to amino acids 1 - 275 of Q9HCB6, which also corresponds to amino acids 1 - 275 of M78530_PEA_1_P17, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%o, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VRQKNHRMTK conesponding to amino acids 276 - 285 of M78530_PEA_1_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of M78530_PEA_1_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
Comparison report between M78530_PEA_1_P17 and 094862: l.An isolated chimeric polypeptide encoding for M78530_PEA_1_P17, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS conesponding to amino acids 1 - 83 of
M78530_PEA_1_P17, a second amino acid sequence being at least 90 % homologous to AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQ VFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCCACGT AKYRLTFYGNWSEKTHPKTJYPRRANHWSAIIGGSHSKNYVLWEYGGYASEGVKQVAE LGSPVKMEEEIRQQ conesponding to amino acids 1 - 192 of 094862, which also conesponds to amino acids 84 - 275 of M78530 PEAJ P17, and a third amino acid sequence being at least 1396
70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRQKNHRMTK conesponding to amino acids 276 - 285 of M78530_PEA_1_P17, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of M78530_PEA_1_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRLSPAPLKLSRTP ALLALALPLAAALAFSDETLDKVPKSEGYCSRILRAQGTRREGYT EFSLRVEGDPDFYKPGTSYRVTLS of M78530_PEA_1_P17. 3. An isolated polypeptide encoding for a tail of M78530_PEA_1_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRQKNHRMTK in M78530_PEA_1_P17.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein M78530_PEA_1_P17 is encoded by the following transcript(s): M78530 PEAJ T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78530_PEA_1_T13 is shown in bold; this coding portion starts at position 629 and ends at position 1483. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78530_PEA_1_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). 1397 Table 11 - Nucleic acid SNPs
As noted above, c uster M78530 features 21 segment(s , w ich were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster M78530_PEA_l_node_0 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T1 1, M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_15 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530 PEAJ T11, M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts 1398
Segment cluster M78530_PEA_l_nodeJ6 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T13. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_19 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T12. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_21 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This 1399 segment can be found in the following transcript(s): M78530_PEA_1_T1 1. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_23 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T1 1. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_27 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): M78530_PEA_1_T1 1. Table 18 below describes the starting and ending position of this segment on each tianscript. Table 18 - Segment location on transcripts
1400 Segment cluster M78530_PEAJ_node_29 according to the present invention is supported by 34 libraries The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) M78530_PEA_1_T1 1 Table 19 below descπbes the starting and ending position of this segment on each transcript Table 19 - Segment location on transcripts
Segment cluster M78530_PEAJ_node_36 according to the present invention is supported by 41 libraries The number of libranes was detennined as previously described This segment can be found in the following transcπpt(s) M78530_PEA_1_T11 Table 20 below describes the starting and ending position of this segment on each transcript Table 20 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_37 according to the present invention is supported by 1 libranes The number of libraries was determmed as previously descnbed This segment can be found in the following transcnpt(s) M78530_PEA_1_T11 Table 21 below descnbes the starting and ending position of this segment on each transcπpt. Table 21 - Segment location on transcripts
1401
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster M78530_PEA_l_node_2 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T1 1 , M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_4 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): M78530_PEA_1_T11, M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
1 4 02
Segment cluster M78530_PEAJ_node_5 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T1 1, M78530_PEAJ_T12 and M78530_PEA_1_T13. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster M78530_PEA _l_node_7 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T11, M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
1403 Segment cluster M78530_PEAJ_node_9 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530 PEA 1 T1 1, M78530 PEAJ T12 and M78530_PEA_1_T13. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_10 according to the present invention can be found in the following transcript(s): M78530_PEA_1_T11 , M78530_PEA_1_T12 and M78530_PEA_1_T13. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_18 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78530_PEA_1_T11 and M78530_PEA_1_T12. Table 28 below describes the starting and ending position of this segment on each transcript. 1404 Table 28 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_25 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): M78530_PEA_1_T11. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_30 according to the present invention can be found in the following transcript(s): M78530_PEA_1_T1 1. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster M78530_PEA_l_node_33 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This 1405 segment can be found in the following transcπpt(s) M78530_PEA_1_T1 1 Table 31 below describes the starting and ending position of this segment on each transcript Table 31 - Segment location on transcripts
Segment cluster M78530_PEAJ_node_34 according to the present invention is supported by 34 libraries The number of libraries was determined as previously described This segment can be found in the following transcnpt(s) M78530 PEAJ T1 1 Table 32 below describes the starting and ending position of this segment on each transcript Table 32 - Segment location on transcripts
Variant protein alignment to the previously known protem Sequence name: Q9HCB6
Sequence documentation: Alignment of: M78530_PEA_1_P15 x Q9HCB6
Alignment segment 1/1 1406 Quality: 6706.00 Escore: 0 Matching length: 665 Total length: 665 Matching Percent Similarity: 99.85 Matching Percent Identity: 99.85 Total Percent Similarity: 99.85 Total Percent Identity: 99.85 Gaps: 0
Alignment :
1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50
51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 . . . . . 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVF IAP 150 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200
201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANH SAIIGGSHSKNYVL EYGG 250 I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANH SAIIGGSHSKNYVL EYGG 250 1 4 07
YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PAWQPLNVRAA 300
I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PA QPLNVRAA 300 . . . . . PSAEFSVDRTRHLMSFLTMMGPSPD NVGLSAEDLCTKECG VQKWQDL 350
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II PSAEFSVDRTRHLMSFLTMMGPSPD NVGLSAEDLCTKECG VQKVVQDL 350
IP DAGTDSGVTYESPNKPTIPQEKIRPLTSLDHPQSPFYDPEGGSITQV 400
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I IPWDAGTDSGVTYESPNKPTIPQEKIRPLTSLDHPQSPFYDPEGGSITQV 400
ARVVIERIARKGEQCNIVPDNVDDIVADLAPEEKDEDDTPETCIYSN SP 450 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I ARWIERIARKGEQCNIVPDNVDDIVADLAPEEKDEDDTPETCIYSN SP 450 SACSSSTCDKGKRMRQRMLKAQLDLSVPCPDTQDFQPCMGPGCSDEDGS 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I WSACSSSTCDKGKRMRQRMLKAQLDLSVPCPDTQDFQPCMGPGCSDEDGS 500
TCTMSEWITWSPCSISCGMGMRSRERYVKQFPEDGSVCTLPTEETEKCTV 550
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I TCTMSE ITWSPCSISCGMGMRSRERYVKQFPEDGSVCTLPTEEMEKCTV 550 . . . . . NEECSPSSCLMTE GE DECSATCGMGMKKRHRMIKMNPADGSMCKAETS 600
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I NEECSPSSCLMTE GE DECSATCGMGMKKRHRMIKMNPADGSMCKAETS 600
QAEKCMMPECHTIPCLLSP SEWSDCSVTCGKGMRTRQRMLKSLAELGDC 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1408 601 QAEKCMMPECHTIPCLLSP SE SDCSVTCGKGMRTRQRMLKSLAELGDC 650
651 NEDLEQVEKCMLPEC 665 II I I I I I I I I I I I I I 651 NEDLEQVEKCMLPEC 665
Sequence name: 094862
Sequence documentation:
Alignment of: M78530_PEA_1_P15 x 094862
Alignment segment 1/1: Quality: 5926.00
Escore: 0 Matching length: 582 Total length: 582 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1409 84 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 133 II II I I I I I II I I I I I I I I I I I I I II I I I I I I II I I I I I I I II II II I I I 1 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 50 34 ESTPRRRTRIQVF IAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 183 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 ESTPRRRTRIQVF IAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 100 84 QDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANH SA 233 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I01 QDSTFDGVTDKPILDCCACGTAKYRLTFYGN SEKTHPKDYPRRANH SA 150 34 IIGGSHSKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVI 283 I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I51 IIGGSHSKNYVL EYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVI 200 84 KAKAQWPAWQPLNVRAAPSAEFSVDRTRHLMSFLTMMGPSPDWNVGLSAE 333 I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I01 KAKAQWPAWQPLNVRAAPSAEFSVDRTRHLMSFLTMMGPSPDWNVGLSAE 250 . . . . .34 DLCTKECGWVQKVVQDLIP DAGTDSGVTYESPNKPTIPQEKIRPLTSLD 383 I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I51 DLCTKECGWVQKVVQDLIPWDAGTDSGVTYESPNKPTIPQEKIRPLTSLD 300 84 HPQSPFYDPEGGSITQVARVVIERIARKGEQCNIVPDNVDDIVADLAPEE 433 I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I01 HPQSPFYDPEGGSITQVARVVIERIARKGEQCNIVPDNVDDIVADLAPEE 350 34 KDEDDTPETCIYSN SP SACSSSTCDKGKRMRQRMLKAQLDLSVPCPDT 483 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I51 KDEDDTPETCIYSNWSPWSACSSSTCDKGKRMRQRMLKAQLDLSVPCPDT 400 1410
484 QDFQPCMGPGCSDEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQFPE 533 I I II II I I I I I I II I I I II I I I I I I I II I I I I I I I II II I I II I I I I I I I 401 QDFQPCMGPGCSDEDGSTCTMSEWITWSPCSISCGMGMRSRERYVKQFPE 450
534 DGSVCTLPTEETEKCTVNEECSPSSCLMTE GE DECSATCGMGMKKRHR 583 I I II II II II I II I II II II II II II II II I II II II II II I II II II I I 451 DGSVCTLPTEETEKCTVNEECSPSSCLMTE GEWDECSATCGMGMKKRHR 500 584 MIKMNPADGSMCKAETSQAEKCMMPECHTIPCLLSPWSEWSDCSVTCGKG 633 II I I I I I I I I I I II I I I I II II I I I II I I I I I I I I II I I II I I I I I I I II 501 MIKMNPADGSMCKAETSQAEKCMMPECHTIPCLLSP SE SDCSVTCGKG 550
634 MRTRQRMLKSLAELGDCNEDLEQVEKCMLPEC 665 I I I I I I I I I I I I I I || M I || I I I I I M I II I 551 MRTRQRMLKSLAELGDCNEDLEQVEKCMLPEC 582
Sequence name: Q8NCD7
Sequence documentation:
Alignment of: M78530_PEA_1_P16 x Q8NCD7
Alignment segment 1/1: 1 4 1 1 Quality: 2926.00 Escore: 0 Matching length: 297 Total length: 297 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 I II I I I I I I I I I II II I I I I II I I I I I I I I II I I I I I I I I I II I I I I II I 1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50
51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 I I I I I I I I I I I I I I II I II I I I II I I I I I I I I I I I I I I I I I II I I I I I I I 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 . . . . . 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVF IAP 150 I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I II I I I I I I I I I I I I II I I 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVF IAP 150 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200
201 ACGTAKYRLTFYGN SEKTHPKDYPRRANH SAIIGGSHSKNYVL EYGG 250 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 ACGTAKYRLTFYGN SEKTHPKDYPRRANH SAIIGGSHSKNYVLWEYGG 250 1412
251 YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PAWQPLNV 297 I II I I II II I II I II I II I I II I I I I II I I I I II II I I I I I II II II 251 YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PAWQPLNV 297
Sequence name: Q9HCB6
Sequence documentation:
Alignment of: M78530_PEA_1_P16 x Q9HCB6
Alignment segment 1/1:
Quality: 2926.00 Escore: 0 Matching length: 297 Total length: 297 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : . . . . . 1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 1413
1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50
51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 I I || M I I I I I I II I I I I I I I I I I I II I I I I II I I I II I II I I I II II I I 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100
101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150 II I I I I II I II I I I I I I I II I I I I II I I II II II II I I II I I I II I I I I I 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVF IAP 150
151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 I II II I II II I I I I II II I II II II II II II I II I II II I I I II II II II 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 . . . . . 201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANH SAIIGGSHSKNYVLWEYGG 250 I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 ACGTAKYRLTFYGN SEKTHPKDYPRRANH SAIIGGSHSKNYVL EYGG 250 251 YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PA QPLNV 297 I I I I I I I I I I II I I I II I I I I I I I II I II I II I I I I I I II I I I I I I I 251 YASEGVKQVAELGSPVKMEEEIRQQSDEVLTVIKAKAQ PA QPLNV 297
Sequence name: 094862
Sequence documentation: 1414
Alignment of: M78530_PEA_1_P16 x 094862
Alignment segment 1/1:
Quality: 2135.00 Escore: 0 Matching length: 214 Total length: 214 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
84 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 133 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I 1 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 50
134 ESTPRRRTRIQVF IAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 183 I I I I I I I I I I I I I II I I II II I I I I I I I I II I I I I I I I I I I I I I I I I I II 51 ESTPRRRTRIQVF IAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 100 . . . . . 184 QDSTFDGVTDKPILDCCACGTAKYRLTFYGN SEKTHPKDYPRRANHWSA 233 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I 101 QDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSA 150 234 IIGGSHSKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVI 283 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1415 151 IIGGSHSKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQSDEVLTVI 200
284 KAKAQWPAWQPLNV 297 I I II I I I I II I II I 201 KAKAQWPAWQPLNV 214
Sequence name: Q8NCD7
Sequence documentation:
Alignment of: M78530_PEA_1_P17 x Q8NCD7
Alignment segment 1/1: Quality: 2705.00
Escore: 0 Matching length: 275 Total length: 275 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 I II II I II I II I I I I I I II II II II II I I II II I II I I I I I II I II II II 1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 . . . . . 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 1416 I I I I I I I I I I I I I I I I I II I I II I II I I I I I II I I I I I I II I II I II I I I 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100
101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150 I I || I II I I || I II I I I I I I I I II I I I I I I || II I I I I I I I I I I II I I I I 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150
151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I II I I I I I I I I I 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200
201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGG 250 I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I II I I I II I I I I 201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGG 250
251 YASEGVKQVAELGSPVKMEEEIRQQ 275 I I I I I I I I I I I I I I I I I I I I I I I I I 251 YASEGVKQVAELGSPVKMEEEIRQQ 275
Sequence name: Q9HCB6
Sequence documentation:
Alignment of: M78530_PEA_1_P17 x Q9HCB6
Alignment segment 1/1:
Quality: 2705.00 Escore: 0 Matching length: 275 Total length: 275 1417 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50 I I I I I I I I || I I I I I I I I I || I I || I I I || I || I I I I I || I I M I II I I I 1 MRLSPAPLKLSRTPALLALALPLAAALAFSDETLDKVPKSEGYCSRILRA 50
51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100 I I I I I I I I I I I I I I I II I II II I I I I I I I II I I II II II II II I I II I II 51 QGTRREGYTEFSLRVEGDPDFYKPGTSYRVTLSAAPPSYFRGFTLIALRE 100
101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150 I II I I I I I I II II I I I II I I I II I II I I II II I I I II I II I I I I I II I I I 101 NREGDKEEDHAGTFQIIDEEETQFMSNCPVAVTESTPRRRTRIQVFWIAP 150 . . . . . 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 II II I I I I II II II I I I I I I II I II I II I II I II II I I II I I I I I I I II I 151 PAGTGCVILKASIVQKRIIYFQDEGSLTKKLCEQDSTFDGVTDKPILDCC 200 201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGG 250 I I I I I I II II II II I I I I I I I I I II I I I I II I I I I II I II I II I II I II I 201 ACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSAIIGGSHSKNYVLWEYGG 250
251 YASEGVKQVAELGSPVKMEEEIRQQ 275 I I I I || I || || I I || I I I I I I I I I I 251 YASEGVKQVAELGSPVKMEEEIRQQ 275 1418
Sequence name: 094862
Sequence documentation:
Alignment of: M78530_PEA_1_P17 x 094862
Alignment segment 1/1: Quality: 1914.00
Escore: 0 Matching length: 192 Total length: 192 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
84 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 133 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I 1 AAPPSYFRGFTLIALRENREGDKEEDHAGTFQIIDEEETQFMSNCPVAVT 50
134 ESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 183 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 51 ESTPRRRTRIQVFWIAPPAGTGCVILKASIVQKRIIYFQDEGSLTKKLCE 100 184 QDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSA 233 I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 1419 101 QDSTFDGVTDKPILDCCACGTAKYRLTFYGNWSEKTHPKDYPRRANHWSA 150
234 IIGGSHSKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ 275 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 IIGGSHSKNYVLWEYGGYASEGVKQVAELGSPVKMEEEIRQQ 1 92
DESCRIPTION FOR CLUSTER T48119 Cluster T481 19 features 1 transcript(s) and 19 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
T48119 T2 429
Table 2 - Segments of interest
1420
Table 3 - Proteins of interest
These sequences are variants of the known protein Programmed cell death protein 8, mitochondrial precursor (SwissProt accession identifier PCD8_HUMAN; known also according to the synonyms Apoptosis- inducing factor), SEQ ID NO: 449, refened to herein as the previously known protein. Protein Programmed cell death protein 8, mitochondrial precursor is known or believed to have the following function(s): Probable oxidoreductase that acts as a caspase- independent mitochondrial effector of apoptotic cell death. Extramitochondrial aif induces nuclear chromatin condensation and large scale DNA fragmentation (in vitro). The sequence for protein Programmed cell death protein 8, mitochondrial precursor is given at the end of the application, as "Programmed cell death protein 8, mitochondrial precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
1421
Protein Programmed cell death protein 8, mitochondrial precursor localization is believed to be mitochondrial intermembrane space. Translocated to the nucleus upon induction of apoptosis.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: electron transport; DNA fragmentation; apoptosis; induction of apoptosis by DNA damage, which are annotation(s) related to Biological Process; electron carrier; disulfide oxidoreductase, which are annotation(s) related to Molecular Function; and nucleus; mitochondrion, which are annotation(s) related to Cellular Component. The GO assignment relies on infonnation from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T48119 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 41 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 41 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Table 5 - Normal tissue distribution 1422
Table 6 - P values and ratios for expression in cancerous tissue
1423
As noted above, cluster T48119 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Programmed cell death protein 8, mitochondrial precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T48119 P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) T48119_T2. An alignment is given to the known protein (Programmed cell death protein 8, mitochondrial precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T48119_P2 and PCD8_HUMAN: l.An isolated chimeric polypeptide encoding for T48119 P2, comprising a first amino acid sequence being at least 90 %> homologous to 1424 MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNER1SGLGLTPE QKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARSIRARDPGARVLIVSEDP ELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGKERSIYFQPPSFYVSAQDLPHIENGGV AVLTGKKVVQLDVRDNMVKLNDGSQITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTL FRKIGDFRSLEKTSREVKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILP EYLSNWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHIVAAVGLEP NVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGRRRVEHHDHAV VSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVDSSLPTVGVFAKATAQD NPKSATEQSGTGIRSESETESEASEITIPPSTPAVPQAPVQGEDYGKGVIFYLRDKVVVGI VLWNIFNRMPIARKIIKDGEQHEDLNEVAKLFNIHED conesponding to amino acids 50 - 613 of PCD8 HUMAN, which also conesponds to amino acids 1 - 564 of T48119_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein T48119 P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T48119 P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
1425
Variant protein T481 19_P2 is encoded by the following transcript(s): T481 19_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T481 19_T2 is shown in bold; this coding portion starts at position 227 and ends at position 1918. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T481 19_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. 142 6 Segment cluster T481 19_node_0 according to the present invention is supported by 81 libraries The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) T481 19_T2 Table 9 below describes the starting and ending position of this segment on each transcript Table 9 - Segment location on transcripts
Segment cluster T481 19_node_l 1 according to the present invention is supported by 77 libranes The number of libranes was determined as previously descnbed This segment can be found in the following transcnpt(s) T48119 T2 Table 10 below describes the starting and ending position of this segment on each transcript Table 10 - Segment location on transcripts
Segment cluster T48119_node_13 according to the present mvention is supported by 74 libranes The number of libranes was detennined as previously descnbed This segment can be found m the following transcπpt(s) T48119 T2 Table 11 below descnbes the startmg and ending position of this segment on each transcnpt Table 11 - Segment location on transcripts
1427
Segment cluster T481 19_node_38 according to the present invention is supported by 1 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T481 19_T2. Table 12 below describes the starting and ending position of this segment on each transcript.
Segment cluster T481 19_node_41 according to the present invention is supported by 128 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): T481 19_T2. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in regard to ovarian cancer), shown in Table 14. Table 14 - Oligonucleotides related to this segment
1 4 2 8 Segment cluster T481 19_node_45 according to the present invention is supported by 138 libraries. The number of libraries was deteπnined as previously described. This segment can be found in the following transcript(s): T481 19_T2. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T48119_node_ 47 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T481 19_T2. Table 16 below describes the startmg and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T48119_node_4 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
142 9
Segment cluster T481 19_node_8 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T481 19_T2. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T48119_node_15 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T48119_node_17 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts 1430
Segment cluster T481 19_node_20 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T48119_node_22 according to the present invention is supported by 73 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119JT2. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster T481 l9_node_26 accordmg to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 23 below describes the starting and ending position of this segment on each tianscript. 1431 Table 23 - Segment location on transcripts
Segment cluster T481 19_node_28 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119_T2. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T48119_node_31 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119 T2. Table 25 below describes the starting and ending position of this segment on each transcript. 7 We 25 - Segment location on transcripts
Segment cluster T48119_node_32 according to the present invention can be found in the following transcript(s): T48119 T2. Table 26 below describes the starting and ending position of this segment on each transcript. 1432 Table 26 - Segment location on transcripts
Segment cluster T48119_node_33 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T481 19_T2. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T48119_node_44 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T48119 T2. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : PCD8_HUMAN 1433
Sequence documentation:
Alignment of: T48119_P2 x PCD8_HUMAN
Alignment segment 1/1:
Quality: 5416.00 Escore: 0 Matching length: 564 Total length: 564 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERI 50 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II II I I I I I I I I 50 MTRQMASSGASGGKIDNSVLVLIVGLSTVGAGAYAYKTMKEDEKRYNERI 99
51 SGLGLTPEQKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARS 100 I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 100 SGLGLTPEQKQKKAALSASEGEEVPQDKAPSHVPFLLIGGGTAAFAAARS 149
101 IRARDPGARVLIVSEDPELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGK 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 150 IRARDPGARVLIVSEDPELPYMRPPLSKELWFSDDPNVTKTLRFKQWNGK 199 1434 151 ERSIYFQPPSFYVSAQDLPHIENGGVAVLTGKKVVQLDVRDNMVKLNDGS 200 I I II I I I I I I I I I I II I II I I I I II I I I I I I I I I I I I I I I I I I I II I I I I 200 ERSIYFQPPSFYVSAQDLPHIENGGVAVLTGKKVVQLDVRDNMVKLNDGS 249
201 QITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTLFRKIGDFRSLEKISRE 250 I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I II I I I I II I I I I I I II
250 QITYEKCLIATGGTPRSLSAIDRAGAEVKSRTTLFRKIGDFRSLEKISRE 299
251 VKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILPEYLS 300 I I I I I I I I I I I I I I I || I M I I I I I I I I I I I I I I I M I I I II I I I I I I I I
300 VKSITIIGGGFLGSELACALGRKARALGTEVIQLFPEKGNMGKILPEYLS 349
301 NWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHIVAAVG 350 I I I I I I II I I I II I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 350 NWTMEKVRREGVKVMPNAIVQSVGVSSGKLLIKLKDGRKVETDHIVAAVG 399
351 LEPNVELAKTGGLEIDSDFGGFRVNAELQARSNIWVAGDAACFYDIKLGR 400
400 LEPNVELAKTGGLEI DS DFGGFRVNAELQARSNIWVAGDAACFYDIKLGR 44 9
401 RRVEHHDHAVVSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVD 450 I I I I I I j I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I j I I I I
450 RRVEHHDHAWSGRLAGENMTGAAKPYWHQSMFWSDLGPDVGYEAIGLVD 4 99
451 S SLPTVGVFAKATAQDNPKSATEQSGTGIRSESETESEASE I TI PPSTPA 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
500 SSLPTVGVFAKATAQDNPKSATEQSGTGIRSESETESEASEITIPPSTPA 549
501 VPQAPVQGEDYGKGVIFYLRDKVVVGIVLWNIFNRMPIARKIIKDGEQHE 550
550 VPQAPVQGEDYGKGVIFYLRDKVWGIVLWNIFNRMPIARKIIKDGEQHE 599 1435
551 DLNEVAKLFNIHED 564 I I I I I I I I I I I I I I 600 DLNEVAKLFNIHED 613
DESCRIPTION FOR CLUSTER HSMUCIA Cluster HSMUCIA features 14 transcript(s) and 22 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest 1436
Table 3 - Proteins of interest
1 437
These sequences are variants of the known protein Mucin 1 precursor (SwissProt accession identifier MUC1_HUMAN; known also according to the synonyms MUC- 1; Polymoφhic epithelial mucin; PEM; PEMT; Episialin; Tumor-associated mucin; Carcinoma- associated mucin; Tumor-associated epithelial membrane antigen; EMA; H23AG; Peanut- reactive urinary mucin; PUM; Breast carcinoma-associated antigen DF3; CD227 antigen), SEQ ID NO: 487, refened to herein as the previously known protein. Protein Mucin 1 precursor is known or believed to have the following function(s): May play a role in adhesive functions and in cell-cell interactions, metastasis and signaling. May provide a protective layer on epithelial surfaces. Direct or indirect interaction with actin cytoskeleton. Isoform 7 behaves as a receptor and binds the secreted isoform 5. The binding induces the phosphorylation of the isoform 7, alters cellular moφhology and initiates cell signaling. Can bind to GRB2 adapter protein. The sequence for protein Mucin 1 precursor is given at the end of the application, as "Mucin 1 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
1438
Protein Mucin 1 precursor localization is believed to be Type I membrane protein. Two secreted forms (5 and 9) are also produced. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, breast; Cancer, lung, non-small cell; Cancer, ovarian; Cancer, prostate; Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: CD8 agonist; DNA antagonist; Immunostimulant; Interferon gamma agonist; MUC-1 inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the dmg database or the public databases (e.g, described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; Monoclonal antibody, murine; Immunotoxin; Immunostimulant; Immunoconjugate . The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: actin binding, which are annotation(s) related to Molecular Function; and cytoskeleton; integral plasma membrane protein, which are annotation(s) related to Cellular Component. 1 439 The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSMUCIA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in noπnal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 42 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 42 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer.
Table 5 - Normal tissue distribution
bladder 41 brain colon 66 epithelial 96 general 36 head and neck 314 kidney 282 lung 200 breast 61 ovary pancreas 12 prostate 34~ 1440
Table 6 - P values and ratios for expression in cancerous tissue
the cluster, although not of at least one tianscript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below (in regard to ovarian cancer), shown in Table 7. Table 7 - Oligonucleotides related to this cluster 144 1 HSMUCIA 0 0 1 1364 ovarian carcinoma OVA As noted above, cluster HSMUCI A features 14 transcπpt(s), which were listed in Table 1 above. These transcnpt(s) encode for protein(s) which are variant(s) of protein Mucin 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSMUC 1 A_PEA_1_P25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1 A PEAJ T26. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC 1A_PEAJ_P25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the altemative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A PEAJ P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSMUC 1A_PEA_1_P25 is encoded by the following transcript(s): HSMUC1A PEAJ T26, for which the sequence(s) is/are given at the end of the application. The coding portion of tianscript HSMUC 1A_PEA_1_T26 is shown in bold; this coding portion 1442 starts at position 507 and ends at position 1 1 15. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 A_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSMUC 1A_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T33. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows 144 3 with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein HSMUC1A PEAJ P29 is encoded by the following transcript(s): HSMUC1A_PEA_1_T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1 A_PEA_1_T33 is shown in bold; this coding portion starts at position 507 and ends at position 953. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
144 4
Variant protein HSMUC 1 A_PEA_1_P30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcπpt(s) F1SMUC 1 A_PEA_1_T34. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC 1 A PEAJ P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 A PEAJ P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSMUC 1A PEAJ P30 is encoded by the following transcript(s): HSMUC1A_PEA_1_T34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1 A_PEA_1_T34 is shown in bold; this coding portion starts at position 507 and ends at position 1004. The tianscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs 1445
Variant protein HSMUC 1A PEAJ P32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC 1A PEAJ T36. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signakpeptide prediction programs predict that this protem has a signal peptide. Variant protein HSMUC 1A_PEA_1_P32 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). 14 4 6 Table 13 - Amino acid mutations
Variant protein HSMUC1A_PEA_1_P32 is encoded by the following transcπpt(s) HSMUC 1A PEAJ T36, for which the sequence(s) is/are given at the end of the application The coding portion of transcript HSMUC1A_PEA_1_T36 is shown in bold, this coding portion starts at position 507 and ends at position 977 The transcπpt also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed, the last column indicates whether the SNP is known or not, the presence of known SNPs in vanant protein FISMUCl A_PEA_1_P32 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 14 - Nucleic acid SNPs
144 7
Variant protein HSMUC 1A_PEAJ_P36 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T40. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSMUC1A PEAJ P36 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A PEAJ P36 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein HSMUC1A_PEA_1_P36 is encoded by the following transcript(s): HSMUC1A_PEA_1_T40, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T40 is shown in bold; this coding portion starts at position 507 and ends at position 983. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the 144 8 presence of known SNPs in vanant protein HSMUC 1 A_PEA_1_P36 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSMUC 1 A PEAJ P39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcnpt(s) HSMUC1A_PEA_1_T43. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protem localization is believed to be secreted because both signafpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. 144 9 Variant protein HSMUC 1A_PEA_1_P39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their positιon(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 A PEAJ P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Variant protein HSMUC 1A_PEA_1_P39 is encoded by the following transcript(s): HSMUC1A_PEAJ_T43, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A PEAJ T43 is shown in bold; this coding portion starts at position 507 and ends at position 914. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 A PEAJ P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
1450
Variant protein HSMUC 1 A PEAJJ 5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T29. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a tians -membrane region.
Variant protein HSMUC 1A PEA 1 P45 is encoded by the following transcript(s): HSMUC 1A PEAJ T29, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A PEAJ T29 is shown in bold; this coding portion starts at position 507 and ends at position 746. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 AJPEAJJM5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
1451
Variant protein HSMUC 1A PEAJ P49 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A PEAJ T12. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protem HSMUC1A PEAJ P49 is encoded by the following tianscript(s): HSMUC1A_PEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A PEAJ T12 is shown in bold; this coding portion starts at position 507 and ends at position 884. The transcript also has the following SNPs as 14 52 listed in Table 20 (given according to their position on the nuc leotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1 A_PEA_1_P49 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Variant protein HSMUC 1A_PEA_1_P52 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A PEAJ T30. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from 1453 SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein HSMUC 1A_PEA_1_P52 is encoded by the following transcript(s): HSMUC 1A_PEA_1_T30, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1 A PEAJ T30 is shown in bold; this coding portion starts at position 507 and ends at position 719. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P52 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
1454
Variant protein HSMUC 1A PEAJ P53 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC 1A_PEA_1_T31. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- embrane region prediction program predicts that this protein has a trans- membrane region.
Vanant protein HSMUC 1A PEAJ P53 is encoded by the following transcript(s): HSMUC1A_PEA_1_T31, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A PEAJ T31 is shown in bold; this coding portion starts at position 507 and ends at position 665. The tianscript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P53 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
1 4 55
Vanant protein HSMUC1A_PEA_1_P56 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1 A_PEA_1_T42. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSMUC1A PEAJ J>56 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given accordmg to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
1456
Variant protein FISMUC1A PEAJ P56 is encoded by the following transcript(s): HSMUC 1A PEAJJM2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A_PEA_1_T42 is shown in bold; this coding portion starts at position 507 and ends at position 890. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A PEAJ P56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P58 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) 1457 HSMUC 1A PEAJJT35 The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSMUC1A PEAJ P58 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their posιtιon(s) on the amino acid sequence, with the alternative amino acιd(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HSMUCl A PEAJ P58 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 25 - Amino acid mutations
Variant protein HSMUC1A_PEA_1_P58 is encoded by the following transcript(s): HSMUC1A_PEA_1_T35, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A PEAJ T35 is shown in bold; this coding portion starts at position 507 and ends at position 980. The transcript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HSMUCl A_PEA_1_P58 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
1458
Variant protein HSMUC 1A_PEA_1_P59 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) HSMUCl A PEA 1 T28. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein HSMUC 1A PEAJ P59 is encoded by the following tianscript(s): HSMUCl A PEAJ T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A_PEA_1_T28 is shown in bold; this coding portion starts at position 507 and ends at position 794. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the 1459 presence of known SNPs in variant protein HSMUCl A_PEA_1 P59 sequence provides support for the deduced sequence of this variant protein according to the present invention) Table 27 - Nucleic acid SNPs
Variant protein HSMUC 1A_PEA_1_P63 according to the present invention has an amino acid sequence as given at the end of the application, it is encoded by franscπpt(s) HSMUC1A PEAJ T47 An alignment is given to the known protein (Mucm 1 precursor) at the end of the application One or more alignments to one or more previously published protem sequences are given at the end of the application A bnef descnption of the relationship of the vanant protein accordmg to the present mvention to each such aligned protein is as follows. Comparison report between HSMUC1A_PEA_1_P63 and MUC1_HUMAN. 1460 l .An isolated chimeric polypeptide encoding for HSMUCl A PEAJ P63, comprising a first amino acid sequence being at least 90 % homologous to
MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSV conesponding to amino acids 1 - 45 of MUC1_HUMAN, which also conesponds to amino acids 1 - 45 of HSMUC 1A PEA 1 P63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence
EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK conesponding to amino acids 46 - 85 of HSMUC 1A PEAJ P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSMUCl A PEAJ P63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUC 1A PEAJ P63.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.
The glycosylation sites of variant protein HSMUCl A_PEA_1_P63, as compared to the known protein Mucin 1 precursor, are described in Table 28 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
Table 28 - Glycosylation site(s) 14 61
Variant protein HSMUCl A PEAJ P63 is encoded by the following transcript(s): HSMUC 1A_PEA_1_T47, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T47 is shown in bold; this coding portion starts at position 507 and ends at position 761. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A PEAJ P63 sequence provides support for the deduced sequence of this variant protein accordmg to the present invention). Table 29 - Nucleic acid SNPs
1462
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSMUClA_PEA_l_node_0 according to the present invention is supported by 31 libraries. The number of libranes was deteπnined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC1A_PEAJ_T29, HSMUC 1A_PEAJ_T30, HSMUC 1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUCl A_PEA_1_T43 and HSMUC1A PEAJ T47. Table 30 below describes the starting and ending position of this segment on each tianscript. Table 30 - Segment location on transcripts
1463
Segment cluster HSMUCl A_PEA_l_node_ 14 according to the present invention is supported by 55 libraries The number of libranes was deteπnined as previously descπbed This segment can be found in the following transcπpt(s) HSMUC lA PEAJ T 12 Table 31 below descnbes the starting and ending position of this segment on each transcnpt. Table 31 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_24 according to the present mvention is supported by 135 libranes The number of libranes was detennined as previously described. This segment can be found in the following tianscπpt(s). HSMUC1A PEAJ T12. Table 32 below describes the starting and ending position of this segment on each transcπpt. Table 32 - Segment location on transcripts Transcript name "' Segment Segment f. starting position. & ending*position HSMUCIA PEA 1 T12 953 1084 1 4 64 Segment cluster HSMUC l A_PEA_l_node_29 according to the present invention is supported by 156 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcπpt(s): HSMUCl A_PEA_1_T12, HSMUCl A PEAJ T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1 A_PEA_1_T30, HSMUC 1A_PEAJ_T31 , HSMUCl A_PEA_1_T33, HSMUC 1 A_PEA_1_T34, HSMUC1A PEAJ T35, HSMUCl A_PEAJ_T36, HSMUC lAJPEAJJMO, HSMUC1A_PEA_1_T42 and HSMUC 1A_PEA_1_T43. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSMUCl A_PEA_l_node_35 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HSMUC 1A_PEA_1_T47. Table 34 below describes the starting and ending position of this segment on each transcript. 1465 Table 34 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in regard to ovarian cancer), shown in Table 35. 7αWe 35 - Oligonucleotides related to this segment
Segment cluster HSMUClA_PEA_l_node_38 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUCl A_PEA_1_T34, HSMUCl A_PEA_1_T35, HSMUCl A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUCl A PEAJ T47. Table 36 below describes the starting and ending position of this segment on each tianscript. Table 36 - Segment location on transcripts
1466
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSMUCl A_PEA_l_node_3 according to the present mvention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSMUC1A PEAJ T29, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T40 and HSMUC1A_PEA_1_T43. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
14 67 Segment cluster HSMUCl A_PE A J _node_4 according to the present invention can be found in the following transcript(s): HSMUC 1A_PEA_1_T 12, HSMUC1A_PEA_1_T26, HSMUC 1A PEAJ T28, HSMUC1A_PEA_1_T29, HSMUC 1A_PEAJ_T30, HSMUC 1A_PEA_1_T 1 , HSMUC1A_PEA_1_T33, HSMUC 1A_PEAJ_T34, HSMUC1A_PEA_1_T35, HSMUC 1A_PEA_1_T36, HSMUC1A_PEA_1 JMO, HSMUC1A_PEA_1_T42, HSMUC 1A_PEAJ_T43 and HSMUC1A_PEA_1_T47. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HSMUCl A_PEA_l_node_5 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC1A_PEA_1_T29, 14 68 HSMUC 1A_PEA_1_T30, HSMUCl A_PEA_1_T31 , HSMUC l A PEAJ T33, HSMUCl AJPEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC l A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUCl A_PEA_1_T47. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_6 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HSMUCl A PEAJ T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1 A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, 1469 FISMUCl A_PEA_1_T40, HSMUC 1A PEAJ T42, HSMUC l A_PEA_1_T43 and HSMUC l A PEAJ T47 Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, vanous oligonucleotides were tested for being differentially expressed in vanous disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in regard to ovanan cancer), shown in Table 41. 7αWe 41 - Oligonucleotides related to this segment
1470
Segment cluster HSMUC lA_PEA_l_node_7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1A PEAJ T 12, HSMUCl A_PEA_1_T26, HSMUC1A_PEAJ JT28, HSMUC 1A_PEA_1_T29, HSMUC 1 A_PEAJ_T30, HSMUCl A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1A PEAJJT34, HSMUC1A_PEAJ_T35, HSMUC1A_PEA_1_T36, HSMUC 1A_PEAJ_T40, HSMUC 1A PEAJ JM2 and HSMUC1A_PEAJ_T43. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in regard to ovarian cancer), shown in Table 43. 1471 Table 43 - Oligonucleotides related to this segment
Segment cluster HSMUC lA_PEAJ_node_ 17 according to the present invention can be found in the following transcπpt(s). HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T33 and HSMUC 1 A_PEA_ 1 T40. Table 44 below descπbes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HSMUCl A_PEA_l_node_18 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A PEAJ T12, HSMUCl A J>EA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC1A_PEA_1_T33, HSMUC1A_PEA_1_T35, HSMUC 1A_PEA_1_T40 and HSMUC 1A_PEA_1 T42. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
1472
Segment cluster HSMUCl A_PEAJ_node_20 according to the present invention can be found in the following transcript(s): HSMUC1AJΕAJ T12, HSMUC1A_PEA_1_T26, HSMUC 1A PEAJ T28, HSMUC 1AJPEA_1_T33, HSMUC1A PEAJ T35 and HSMUCl A_PEA_1_T42. Table 46 below describes the starting and ending position of this segment on each transcπpt. Table 46 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_21 according to the present invention is supported by 97 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HSMUC1A PEAJ T12, HSMUC 1A_PEA_1_T26, HSMUCIA PEA 1 T28, HSMUCIA PEA 1 T33, 1 473 HSMUC l A PEAJ T35 and HSMUC l A_PEA_ I T42. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_23 according to the present invention can be found in the following transcript(s): HSMUC1A_PEA_1_T12. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_26 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEAJ JT2, HSMUCl A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T29, HSMUC 1A_PEA_1_T30 and HSMUC1A_PEA_1_T31. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts 1 474
Segment cluster HSMUClA_PEA_l_node_27 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A PEAJ JT29, HSMUC 1A_PEA_1_T30, HSMUC 1 A JΕAJ _T31, HSMUC1A PEAJ JT33, HSMUC1A PEAJ T34, HSMUC1A_PEA_1_T35 and HSMUC 1A_PEA_1_T36. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
1 475
Segment cluster HSMUCl A_PEA_l_node_31 according to the present invention can be found in the following tianscript(s): HSMUC1A_PEA_1 JT2, HSMUCl A PEA 1 T26, HSMUC 1A_PEA_1_T28, HSMUC 1A PEAJ T29, HSMUC 1A_PEA_1_T30, HSMUCl A_PEA_1_T31 , HSMUC1 A_PEAJ_T33, HSMUC 1A_PEA_1_T34, HSMUC 1A_PEA_1_T35, HSMUC 1A_PEAJ_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A PEAJ T42 and HSMUC1A PEAJ T43. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HSMUCl A_PEA_l_node_34 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This 1476 segment can be found in the following transcnpt(s) HSMUC 1A_PEA_I_T47 Tabfe 52 below describes the starting and ending position of this segment on each transcript Table 52 - Segment location on transcripts
Slranscπpf name ~Segment § * . Segment^ startmgφ'osition ending positionjjpM HSMUCIA PEA 1 T47 639 665
Segment cluster HSMUC 1A PEAJ node 36 according to the present invention is supported by 135 libranes The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s) HSMUC 1A_PEA_1_T 12, HSMUC 1A_PEA_1_T26, HSMUC 1A PEAJ T28, HSMUC 1A_PEA_1_T29, HSMUCl A_PEA_1_T30, HSMUC1A PEAJ T31 , HSMUC1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA J T35, HSMUC 1A_PEA_1_T36, HSMUC1A_PEA_1_T40, HSMUC1A_PEAJ_T42, HSMUC 1A_PEA_1_T43 and HSMUC1A_PEA_1_T47 Table 53 below describes the starting and ending position of this segment on each transcript Table 53 - Segment location on transcripts
1 477
Segment cluster HSMUC l A_PEA_l_node_37 according to the present invention is supported by 146 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1A_PEA_1_T 12, HSMUCl A PEAJ T26, HSMUCl A_PEA_1_T28, HSMUCl A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC1A_PEAJ_T35, HSMUCl A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A PEAJ T42, HSMUC 1A PEAJ T43 and HSMUC1A_PEA_1_T47. Table 54 below describes the starting and ending position of this segment on each tianscript. Table 54 - Segment location on transcripts
1 478
Variant protein alignment to the previously known protein: Sequence name: MUC1_HUMAN Sequence documentation:
Alignment of: HSMUCIA PEA_1_P63 x MUC1 TOMAN
Alignment segment 1/1
Quality: 429.00 Escore: 0 Matching length: 59 Total length: 59 Matching Percent Similarity: S6.44 Matching Percent Identity: 81.36 Total Percent Similarity: 86.44 Total Percent Identity: 81.36 Gaps :
Alignment :
1 MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVEEEVS 50 1479 I II II II I II II II II II I II II I I II I I I I I I I II I II II I II I 1 MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSVPSSTE 50
51 ADQVSVGAS 59 : II: : I 51 KNAVSMTSS 59
Combined expression of 6 sequences (T10888-juncl l-17; Rl 1723-segl3; H61775-seg8- F2R2; Z44808-junc8- 1 1 ; Z25299-seg20; Z25299-seg23) in normal and cancerous ovary tissues Expression of CEA6JTUMAN Carcinoembryonic antigen- related cell adhesion molecule 6; R11723-hypothetical protein PSEC0181 (PSEC); immunoglobulin superfamily, member 9; SM02_HUMAN SPARC related modular calcium-binding protein 2 precursor; Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor; tianscripts detectable by or according to the amplicons: T10888-juncl l - 17; R11723-segl3; H61775-seg8-F2R2; Z44808- junc8- l l; Z25299-seg20; Z25299 seg23 amplicon(s) and the primers: T10888-juncl l- 17-F and T10888-juncl l-17-R; R11723-segl -F and R11723-segl3-R; H61775-seg8-F2 and H61775- seg8-R2; Z44808-junc8-l l-F and Z44808-junc8- l l-R; Z25299-seg20-F and Z25299-seg20-R; Z25299-seg23-F and Z25299-seg23-R, was measured by real time PCR. In parallel the expression of four housekeeping genes - PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl - ampliconand SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon), GAPDH (GenBank Accession No. BC026907; GAPDH amplicon) was measured similarly. For each RT sample, the expression of the above amplicons was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample of each amplicon was then divided by the median of the quantities of the normal post-mortem (PM) samples detected for the same amplicon (Sample Nos. 45-48, 71 Table 1, "Tissue samples in testing sample", above), to obtain a value of fold up -regulation for each sample relative to median of the normal PM samples. The reciprocal of this ratio was calculated for Z44808-junc8- 11, to obtain a value of fold down-regulation for each sample relative to median of the normal PM samples. 1480 Figure 43 is a histogram showing differential expression of the above- indicated transcripts in cancerous ovary samples relative to the normal samples. The number and percentage of samples that exhibit at least 10 fold differential of at least one of the sequences, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 43, differential expression of at least 10 fold in at least one of the sequences was found in 42 out of 43 cancerous samples.
DESCRIPTION FOR CLUSTER HUMCEA Cluster HUMCEA features 5 transcript(s) and 42 segment(s) of interest, the names fcr which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
1481
1482
Table 3 - Proteins of interest
These sequences are variants of the known protein Carcinoembryonic antigen- related cell adhesion molecule 5 precursor (SwissProt accession identifier CEA5_HUMAN; known also according to the synonyms Carcinoembryonic antigen; CEA; Meconium antigen 100; CD66e antigen), SEQ ID NO: 549, refened to herein as the previously known protein. The sequence for protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor is given at the end of the application, as "Carcinoembryonic antigen- related cell adhesion molecule 5 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
1483 Protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor localization is believed to be Attached to the membrane by a GPI-anchor.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the dmg database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: integral plasma membrane protein; membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HUMCEA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such tianscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 44 below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 44 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. 1484 Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
14 85 As noted above, cluster HUMCEA features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen- related cell adhesion molecule 5 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMCE A_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCEA PEAJ T8. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEA_PEA_1_P4 and CEA5JBUMAN: l.An isolated chimeric polypeptide encoding for HUMCEA_PEAJ JM, comprising a first amino acid sequence being at least 90 %> homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLW V NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVL conesponding to amino acids 1 - 234 of CEA5 HUMAN, which also conesponds to amino acids 1 - 234 of HUMCEA JΕAJ P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAAS VLGGSGSTPYDGRNR conesponding to amino acids 235 - 315 of
HUMCEA PEAJ P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMCEA_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence 1486 CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR in HUMCEA PEA 1 P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMCEA_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein HUMCEA_PEA_1_P4, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s) 1487
Variant protein HUMCEA PEAJ P4 is encoded by the following transcript(s): HUMCEA_PEAJ_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCE A_PEA_1_T8 is shown in bold; this coding portion starts at position 1 15 and ends at position 1059. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
1489
Variant protein HUMCEA_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCEA_PEAJ_T9. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCE A_PEA_1_P5 and CEA5_HUMAN: l.An isolated chimeric polypeptide encoding for HUMCEA JΕAJ _P5, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTC QAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWΛW NNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDD PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQ ANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVN GQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTP IISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFV SNLATGRNNSIVKSITVS conesponding to amino acids 1 - 675 of CEA5_HUMAN, which also conesponds to amino acids 1 - 675 of HUMCEA_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at 14 90 least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence GKWLPGASASYSGVESIWFSPKSQED1FFPSLCSMGTRKSQILS corresponding to amino acids 676 - 719 of HUMCE A PEAJ P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMCEA PEAJ P5, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS in HUMCEA_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMCEA_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
1491
The glycosylation sites of variant protein HUMCEA_PEA_1_P5, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 12 (given according to their posιtιon(s) on the amino acid sequence in the first column, the second column indicates whether the glycosylation site is present in the vanant protein, and the last column indicates whether the position is different on the vanant protein). Table 12 - Glycosylation sιte(s)
1 4 92
Variant protein HUMCEAJΕAJ _P5 is encoded by the following transcπpt(s): HUMCEA_PEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCEA PEA _1_T9 is shown in bold; this coding portion starts at position 115 and ends at position 2271. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ _P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
14 93
Variant protein HUMCEAJΕAJ _P 14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) HUMCEA PEAJ T20. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMCEAJΕAJ _P 14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA PEAJ P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations 14 94
Variant protein HUMCEA_PEA_1_P14 is encoded by the following transcript(s): HUMCEA_PEA_1_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of tianscript HUMCEAJΕAJ _T20 is shown in bold; this coding portion starts at position 1 15 and ends at position 1821. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA PEAJ P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
1 495
Variant protein HUMCEA_PEA_1_P19 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) HUMCEA_PEA_1_T25. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEA JΕAJ _P 19 and CEA5_HUMAN: l .An isolated chimeric polypeptide encoding for HUMCEAJΕAJ _P 19, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILN conesponding to amino acids 1 - 232 of CEA5 HUMAN, which also conesponds to amino acids 1 - 232 of HUMCEA_PEA_1_P19, and a second amino acid sequence being at least 90 %> homologous to \T.YGPDTPΠSPPDSSYLSGANLNLSCHSASNPSPQYSWRP GIPQQHTQVLFIAKITPNNN GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 589 - 702 of CEA5 HUMAN, which also conesponds to amino acids 233 - 346 of HUMCEAJΕAJ _P 19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEA_1_P19, comprising a polypeptide having a length "n", wherein n is at least 1496 about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise NV, having a stmcture as follows: a sequence starting from any of amino acid numbers 232-x to 232; and ending at any of amino acid numbers 233+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMCEAJΕAJ J119 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA_PEA_1_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
14 97 The glycosylation sites of vanant protein HUMCEA PEAJ P19, as compared to the known protein Carcinoembryonic antigen- related cell adhesion molecule 5 precursor, are descπbed in Table 17 (given according to their posιtιon(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein, and the last column indicates whether the position is different on the variant protein) Table 17 - Glycosylation sιte(s)
1 4 98
Variant protein HUMCEAJΕAJ _P 19 is encoded by the following transcript(s): HUMCEA_PEA_1_T25, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCEA JPEA_1_T25 is shown in bold; this coding portion starts at position 1 15 and ends at position 1152. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HUMCEA PEAJ P19 sequence provides support for the deduced sequence of this variant protein accordmg to the present invention). Table 18 - Nucleic acid SNPs
1499
Variant protein HUMCEA_PEA_1_P20 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCEA PEAJ T26. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEA PEAJ P20 and CEA5_HUMAN: l.An isolated chimeric polypeptide encoding for HUMCEA_PEA_1_P20, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYP conesponding to amino acids 1 - 142 of CEA5JTUMAN, which also conesponds to amino acids 1 - 142 of HUMCEA_PEAJ_P20, and a second amino acid sequence being at least 90 %> homologous to ELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLT LFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHS ASNPSPQYSWPJNGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASG TSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 499 - 702 of 1500 CEA5J-IUMAN, which also conesponds to amino acids 143 - 346 of HUMCEA_PEA_1_P20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEAJ_P20, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PE, having a structure as follows: a sequence starting from any of amino acid numbers 142-x to 142; and ending at any of amino acid numbers 143+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMCE A_PEA_1_P20 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA JΕAJ P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Amino acid mutations
1501
The glycosylation sites of variant protein HUMCEA PEA J_P20, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 20 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 20 - Glycosylation site(s)
1502
Variant protein HUMCEA_PEA_1_P20 is encoded by the following transcript(s): HUMCEA JΕA_1_T26, for which the sequence(s) is/are given at the end of the application. The coding portion of tianscript HUMCEA_PEAJ_T26 is shown in bold; this coding portion starts at position 115 and ends at position 1 152. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ _P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
1503
As noted above, cluster HUMCEA features 42 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest A description of each segment according to the present invention is now provided
Segment cluster HUMCE A PEAJ node O according to the present invention is supported by 56 libranes The number of libranes was deteπnined as previously described This segment can be found m the following tianscnpt(s) HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T9, HUMCEA_PEA_1_T20, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26. Table 22 below descnbes the startmg and ending position of this segment on each transcnpt. Table 22 - Segment location on transcripts
1504
Segment cluster HUMCE A_PEA_l_node_2 according to the present invention is supported by 83 libraries The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) HUMCEA_PEAJ_T8, HUMCEAJΕAJ _T9, HUMCEA_PEA_1_T20, HUMCEAJΕAJ _T25 and HUMCEA_PEA_1_T26 Table 23 below describes the starting and ending position of this segment on each transcript Table 23 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_l 1 according to the present invention is supported by 6 libranes The number of libraries was deteπnined as previously descπbed. This segment can be found in the followmg transcπpt(s): HUMCEA_PEA_1_T8 Table 24 below describes the starting and ending position of this segment on each tianscript. Table 24 - Segment location on transcripts
1505
Segment cluster HUMCEA_PEA_l_node_12 according to the present invention is supported by 83 libraries The number of libranes was determined as previously described This segment can be found in the following transcπpt(s) HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T9 and I IUMCEA_PEA_1_T20 Table 26 below describes the starting and ending position of this segment on each transcript Table 26 - Segment location on transcripts
Segment cluster HUMCE A_PEA_l_node_31 according to the present invention is supported by 87 libranes The number of libranes was determined as previously described This segment can be found in the following tianscnpt(s) HUMCEA_PEA_1_T8, HUMCEA PEAJ T9 and HUMCEAJΕAJ _T20 Table 27 below descπbes the starting and ending position of this segment on each transcript Table 27 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_36 according to the present invention is supported by 94 libranes The number of libranes was deteπnined as previously described This segment can be found in the following transcπpt(s) HUMCEA_PEA_1_T8, 1506 HUMCEAJΕAJ _T9 and HUMCEAJΕAJ _T26 Table 28 below describes the starting and ending position of this segment on each transcript Table 28 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_44 according to the present invention is supported by 1 12 libranes The number of libranes was detennined as previously described This segment can be found the following transcπpt(s) HUMCEA_PEA_1_T8, HUMCEA_PEA_1_T9, HUMCEAJΕAJ _T25 and HUMCEA_PEA_1_T26 Table 29 below describes the starting and ending position of this segment on each transcπpt Table 29 - Segment location on transcripts
Segment cluster HUMCE A_PEA_l_node_46 according to the present invention is supported by 15 libranes. The number of libranes was detennined as previously descnbed This segment can be found in the following tianscnpt(s) HUMCEA PEAJ T9 Table 30 below descnbes the starting and ending position of this segment on each transcnpt Table 30 - Segment location on transcripts 1507
Segment cluster HUMCEA_PEA_l_node_63 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_65 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T25 and HUMCEA_PEA_1_T26. Table 32 below describes the starting and ending position of this segment on each tianscript. Table 32 - Segment location on transcripts
1508
Segment cluster HUMCEA_PEAJ_node_67 according to the present invention is supported by 2 libraries The number of libraries was detennined as previously descnbed This segment can be found in the following transcπpt(s) HUMCEA_PEA_1_T20 Table 33 below descπbes the starting and ending position of this segment on each transcript Table 33 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided These segments are up to about 120 bp in length, and so are included in a separate descπption Segment cluster HUMCEA_PEA_l_node_3 according to the present invention is supported by 67 libranes The number of libranes was detennined as previously descnbed This segment can be found in the followmg transcnpt(s) HUMCEA_PEAJ_T8, HUMCEA_PEA_1_T9, HUMCE A_PEA_1_T20, HUMCEA_PEA_1_T25 and HUMCEAJΕAJ _T26 Table 34 below descπbes the starting and ending position of this segment on each transcnpt. Table 34 - Segment location on transcripts
1509
Segment cluster FlUMCEA_PEAJ_node_7 according to the present invention is supported by 73 libraries The number of libranes was detennined as previously described This segment can be found in the following transcπpt(s) HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T9, HUMCEAJΕAJ _T20 and HUMCEAJΕAJ _T25 Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_8 accordmg to the present invention is supported by 67 libraries The number of libranes was determined as previously described. This segment can be found in the following tianscπpt(s) HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T9, HUMCEA_PEA_1_T20 and HUMCEA_PEA_1_T25. Table 36 below descnbes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
1510
Segment cluster IIUMCEA_PEAJ_node_9 according to the present invention is supported by 71 libraries The number of libraries was detennined as previously described This segment can be found in the following transcnpt(s) HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T9, HUMCEAJΕAJ _T20 and HUMCEAJΕAJ _T25 Table 37 below descπbes the starting and ending position of this segment on each transcript Table 37 - Segment location on transcripts
Segment cluster HUMCEA PEAJ nodeJO according to the present invention is supported by 67 libranes The number of libranes was deteπnined as previously described. This segment can be found m the following tianscnpt(s): HUMCEA_PEA_1_T8, HUMCEA_PEA_1_T9, HUMCEA_PEA_1_T20 and HUMCEAJΕAJ _T25 Table 38 below descnbes the starting and ending position of this segment on each transcnpt Table 38 - Segment location on transcripts
1 51 1
Segment cluster HUMCEAJΕAJ _node_ 15 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T9 and HUMCEA_PEA_1_T20. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMCE A_PEA_l_node_ 16 according to the present invention can be found in the following tianscript(s): HUMCEA PEAJ T8, HUMCE A PEAJ T9 and HUMCEA_PEA_1_T20. Table 40 below describes the starting and ending position of this segment on each tianscript. 7crZ>/e 40 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_17 according to the present invention can be found in the following tianscript(s): HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T9 and HUMCEAJΕAJ _T20. Table 41 below describes the starting and ending position of this segment on each transcript. 1512 Table 41 - Segment location on transcripts
Segment cluster HUMCEA_PEAJ_node_18 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCE A_PEA_1_T9 and HUMCEA_PEA_1_T20. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HUMCEA_PEAJ_node_19 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMCEA_PEA_1_T8, HUMCEA PEAJ T9 and HUMCEA_PEA_1_T20. Table 43 below describes the starting and ending position of this segment on each tianscript. Table 43 - Segment location on transcripts
1513
Segment cluster HUMCEA_PEAJ_node_20 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEA JΕAJ _T9 and HUMCEAJΕAJ _T20. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_21 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T9 and HUMCEA_PEA_1_T20. Table 45 below describes the starting and ending position of this segment on each tianscript. Table 45 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_22 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This 1514 segment can be found in the following transcript(s): FIUMCEA_PEA_1_T8, HUMCEA_PEA_1_T9 and HUMCEA J»EAJ_T20. Table 46 below describes the starting and ending position of this segment on each tianscript. Table 46 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_23 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA_PEAJ_T8, HUMCEAJΕAJ _T9 and HUMCEAJΕAJ _T20. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_24 according to the present invention can be found in the following tianscript(s): HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T9 and HUMCE A PEAJ T20. Table 48 below describes the starting and ending position of this segment on each tianscript. Table 48 - Segment location on transcripts 1515
Segment cluster HUMCEAJΕAJ _node_27 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T9 and HUMCEAJΕAJ _T20. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_29 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEA PEA J T9 and HUMCEA_PEA_1_T20. Table 50 below describes the starting and ending position of this segment on each tianscript. Table 50 - Segment location on transcripts
151 6
Segment cluster HUMCEA_PEA_l_node_30 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T9 and HUMCEA PEAJ T20. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_33 according to the present invention can be found in the following tianscript(s): HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T9 and HUMCEA PEAJ T26. Table 52 below describes the starting and ending position of this segment on each tianscript. Table 52 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_34 according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMCEA_PEA_1_T8, 1517 HUMCEAJΕAJ _T9 and HUMCEAJΕAJ _T26. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_35 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA_PEA_1_T8, HUMCEA_PEAJ_T9 and HUMCEA_PEA_1_T26. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HUMCEAJΕAJ _node_45 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA PEAJ T9. Table 55 below describes the starting and ending position of this segment on each tianscript. Table 55 - Segment location on transcripts 151 8
Segment cluster HUMCEAJΕAJ _node_50 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HUMCE A_PEA_l_node_51 according to the present invention is supported by 88 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMCEA_PEA_1_T8, HUMCEA_PEA_1_T25 and HUMCEA PEAJ T26. Table 57 below describes the starting and ending position of this segment on each tianscript. Table 57 - Segment location on transcripts
151 9
Segment cluster HUMCEA_PEAJ_node_56 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA_PEA_1_T8, HUMCEA JPEA T25 and HUMCEA_PEA_1_T26. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_57 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): HUMCEA_PEAJ_T8, HUMCEA_PEAJ_T25 and HUMCEAJΕAJ _T26. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_58 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This 1520 segment can be found in the following transcnpt(s) HUMCEAJΕAJ _T8, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26 Table 60 below describes the starting and ending position of this segment on each transcπpt Table 60 - Segment location on transcripts
Segment cluster HUMCE A_PEA_l_node_60 according to the present invention is supported by 55 libraries The number of libranes was detennined as previously described This segment can be found in the following tianscπpt(s) HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26 Table 61 below descnbes the starting and ending position of this segment on each transcπpt Table 61 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_61 according to the present invention can be found m the following tianscπpt(s) HUMCEAJΕAJ _T8, HUMCEA_PEA_1_T25 and HUMCE A_PEA_1_T26 Table 62 below descnbes the starting and ending position of this segment on each transcript Table 62 - Segment location on transcripts 1521
Segment cluster HUMCEAJΕAJ _node_62 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA_PEA_1_T8, HUMCEAJΕAJ _T25 and HUMCEAJΕAJ _T26. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster HUMCEA_PEA_l_node_64 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCEA_PEA_1_T8, HUMCEA_PEA_1_T25 and HUMCEA_PEA_1_T26. Table 64 bebw describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
1522
Variant protein alignment to the previously known protein: Sequence name: CEA5_HUMAN
Sequence documentation: Alignment of: HUMCEA_PEA_1_P4 x CEA5_HUMAN Alignment segment 1/1:
Quality: 2320.00 Escore: 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 1523 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I II I I I II I I I I I I I I II I I I I I I I I I II I I I I II I I II I I I I I I I I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I II I I I I I I I I I I II I I I II II I I II I II II I II I I I I I I II II 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVL 234 I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVL 234
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P5 x CEA5_HUMAN
Alignment segment 1/1:
Quality: 6692.00
Escore: 1524 Matching length: 675 Total length: 675 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100
101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II II I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250
251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300 1525 I I I I I II II I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350 I I I I I I I I I I I I I II II I I I I I I II I I I I I I I I I I II I I II I I I I I II I I
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
351 NTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 I II II I I I I I I I II I I I II I II I I I I I I I I I I II I II I I II I I I II I I II 351 NTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400
401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I II I I I I 401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 . . . . .
451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 550
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I ! I I I
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600
601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650 1526 651 GTYACFVSNLATGRNNSIVKSITVS 675 I I I I I I I I I I I I I I I I II I I I I I I I 651 GTYACFVSNLATGRNNSIVKSITVS 675
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P19 x CEA5_HUMAN
Alignment segment 1/1
Quality: 3298.00 Escore: 0 Matching length: 346 Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29 Total Percent Identity: 49.29 Gaps : 1
Alignment:
1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I II I I I I I I I I I I I I I I I I I II II II II I I I I I I I I I I I I I I I I I I II 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 1 527
IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
I I I II I I I I I I I I I I I I I II I I I I I II I I II I I I II II I I I I I I I I I I I I IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 . . . . . SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
I || I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
TLFNVTRNDTASYKCETQNPVSARRSDSVILN 232
II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 232
SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
232
AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
232
NTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 232
VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 232 1528 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500
232 232
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 550
233 VLYGPDTPIISP 244 I I I II I I I I I I I 551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600
245 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 294 I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650 295 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 344 I I I I I II I I I I I I I II I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I 651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 700
345 LI 346
701 LI 702
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P20 x CEA5_HUMAN
Alignment segment 1/1: 1529 Quality: 3294.00 Escore: 0 Matching length: 346 Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29 Total Percent Identity: 49.29 Gaps : 1
Alignment :
1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I I I I I I II I I I I I I II 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I II I I I I I I I I II 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 . . . . . 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYP 142 I I I I I I I I I I I I II I I I I I I I I I I I II I I I I II I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 142 142
151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
142 142
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 1530
142 142
251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
142 142
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
142 142
351 NTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400
142 142
401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450
143 EL 144 I I 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500
145 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 194 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 550 . . . . .
195 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 244 I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I I I I II I I I I I I II I I I
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600
245 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 294 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1531 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650
295 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 344 I I I I I I I II I I I I I I I I I I I I I I I II I I II I I II I I I I I I I I I I I II I II 651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 700
345 LI 346
701 LI 702
DESCRIPTION FOR CLUSTER HUMEDF Cluster HUMEDF features 3 transcnpt(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein vanants are given in table 3 Table 1 - Transcripts of interest
HUMEDF PEA 2 T5 555 HUMEDF PEA 2 T10 556 HUMEDF PEA 2 Ti l 557
Table 2 - Segments of interest
1532
Table 3 - Proteins of interest BrotemfNamej SEQ.ID NΘ: 1 GoitesponBing raScript(s | HUMEDF PEA 2 P5 567 HUMEDF PEA 2 TIO HUMEDF PEA 2 P6 568 HUMEDF PEA 2 Ti l HUMEDF PEA 2 P8 569 HUMEDF PEA 2 T5
These sequences are variants of the known protein Inhibin beta A chain precursor (SwissProt accession identifier IHBA HUMAN; known also according to the synonyms Activin beta-A chain; Erythroid differentiation protein; EDF), SEQ ID NO 566, refened to herein as the previously known protein. Protein Inhibin beta A chain precursor is known or believed to have the followmg fiιnction(s): inhibins and activins inhibit and activate, respectively, the secretion of folhtropin by the pituitary gland. Inhibins/activms are involved in regulating a number of diverse functions such as hypothalamic and pituitary hormone secretion, gonadal hormone secretion, germ cell development and maturation, erythroid differentiation, insulin secretion, nerve cell survival, embryonic axial development or bone growth, depending on their subunit composition. Inhibins appear to oppose the functions of actrvins. The sequence for protein Inhibin beta A chain precursor is given at the end of the application, as "Inhibin beta A chain precursor ammo acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer; Osteoporosis; Contraceptive, female; Contraceptive, male; Diagnosis, cancer. It has been investigated for clinical/therapeutic use m humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information 1533 related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Erythroid differentiation factor agonist; Follicle-stimulating hormone agonist; Growth factor agonist; Inhibin agonist; Interleukin 6 antagonist; Osteoblast stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the dmg database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Haematological; Female contraceptive; Male contraceptive; Antianaemic; Osteoporosis treatment; Fertility enhancer; Anticancer; Diagnostic; Antisickling; Neurobgical; Alimentary/Metabolic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ovarian follicle development; induction of apoptosis; defense response; cell cycle anest; cell surface receptor linked signal transduction; cell-cell signaling; neurogenesis; mesoderm development; cell growth and/or maintenance; response to external stimulus; cell differentiation; erythrocyte differentiation; growth, which are annotation(s) related to Biological Process; defense/immunity protein; cytokine; transforming growth factor beta receptor ligand; hormone; protein binding; growth factor; activin inhibitor, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl
Protein knowledgebase, available from <htφ://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMEDF features 3 tianscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Inhibin beta A chain precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMEDF_PEA_2_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMEDF_PEA_2_T10. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end of the application. One or more alignments to one or more previously 1534 published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF PEA 2 P5 and IHBA_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P5, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWI1VRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK l ilLNMLHLKJ RPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEMNELM EQTSEIITFAESGT conesponding to amino acids 1 - 131 of IHB A_HUMAN, which also conesponds to amino acids 1 - 131 of HUMEDF PEA 2 P5, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 132 - 134 of HUMEDF PEA 2 P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMEDF PEA 2 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF_PEA_2_P5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
The glycosylation sites of variant protein HUMEDF_PEA_2_P5, as compared to the known protein Inhibin beta A chain precursor, are described in Table 5 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). 1535 Table 5 - Glycosylation site(s)
Variant protein HUMEDF PEA 2 P5 is encoded by the following transcript(s): HUMEDF_PEA_2_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMEDF PEA 2 T10 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF_PEA_2_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMEDF_PEA_2_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMEDF PEA 2 T11. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF_PEA_2_P6 and IHBA_HUMAN: 1536 1.An isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P6, comprising a first amino acid sequence being at least 90 % homologous to MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK KHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEMNELM EQTSEIITFAESG conesponding to amino acids 1 - 130 of IHBA HUMAN, which also conesponds to ammo acids 1 - 130 of HUMEDF PEA 2 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence HSEA corresponding to amino acids 131 - 134 of HUMEDF PEA 2 P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%o homologous to the sequence HSEA in HUMEDF PEA 2 P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a tians- membrane region.
The glycosylation sites of variant protein HUMEDF PEA 2 P6, as compared to the known protein Inhibin beta A chain precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
1537
Variant protein HUMEDF_PEA_2_P6 is encoded by the following transcript(s): HUMEDF PEA 2JT 1 , for which the sequence(s) is/are given at the end of the application. The coding portion of transcript FIUMEDF_PEA_2_T1 1 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF_PEA_2_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMEDF PEA 2 P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by tianscript(s) HUMEDF_PEA_2_T5. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF PEA 2 P8 and IHBA HUMAN: l.An isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P8, comprising a first amino acid sequence being at least 90 % homologous to MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK 1538 KHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEMNELM EQTSEIITFAESGT conesponding to amino acids 1 - 131 of 1HBA HUMAN, which also conesponds to amino acids 1 - 131 of HUMEDF_PEA_2_P8, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 132 - 134 of HUMEDF_PEA_2_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF PEA 2 P8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. The glycosylation sites of variant protein HUMEDF_PEA_2_P8, as compared to the known protein Inhibin beta A chain precursor, are described in Table 9 (given according to their positions) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
1539 Variant protein HU EDF_PEA_2_P8 is encoded by the following transcript(s): HUMEDF_PEA_2_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMEDF PEA 2 T5 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF_PEA_2_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
As noted above, cluster HUMEDF features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMEDF_PEA_2_node_6 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This 1 54 0 segment can be found in the following transcript(s): HUMEDF_PEA_2_T5, HUMEDF PEA 2 T10 and HUMEDF PEA 2 T 1 1. Table 1 1 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_l 1 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF PEA 2 T10 and HUMEDF PEA 2 T11. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_l 8 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5. Table 13 below describes the starting and ending position of this segment on each transcript. 7 ό/e 13 - Segment location on transcripts 154 1
Segment cluster HUMEDF_PEA_2_node 19 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF PEA 2 T5. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_22 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. 1542 Segment cluster HUMEDF_PEA_2_node_2 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5, HUMEDF_PEA_2_T10 and HUMEDF_PEA_2_T1 1. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMEDF PEA 2 T5 and HUMEDF_PEA_2_T10. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_20 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following tianscript(s): HUMEDF_PEA_2_T5. Table 18 below describes the starting and ending position of this segment on each transcript. 1543 Table 18 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: IHBA_HUMAN
Sequence documentation:
Alignment of: HUMEDF_PEA_2_P5 x IHBA_HUMAN
Alignment segment 1/1 Quality: 1285.00 Escore: 0 Matching length: 133 Total length: 133 Matching Percent Similarity: 99.25 Matching Percent Identity: 98.50 Total Percent Similarity: 99.25 Total Percent Identity: 98.50 Gaps : 0 Alignment:
1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1544 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50
51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100
101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTVK 133 I I I II I I II I I I I I I I I II I I I I I I I I I I I I : 101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTAR 133
Sequence name: IHBA_HUMAN
Sequence documentation:
Alignment of: HUMEDF_PEA_2_P6 x IHBA_HUMAN
Alignment segment 1/1:
Quality: 1275.00 Escore: 0 Matching length: 130 Total length: 130 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 1545 Gaps: 0
Alignment : 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I II I II I I II I I I I I I I II 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50
51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 I M I I II I II I I I I I I I I I I I I I II I II I I I II I I I I I II I I II I I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100
101 VEIEDDIGRRAEMNELMEQTSEIITFAESG 130 I I I I I I I I I I I II I I I I II I I II I I II I II 101 VEIEDDIGRRAEMNELMEQTSEIITFAESG 130
Sequence name: IHBA_HUMAN
Sequence documentation:
Alignment of: HUMEDF_PEA_2_P8 x IHBA_HUMAN
Alignment segment 1/1: Quality: 1285.00
Escore: 0 1546 Matching length: 133 Total length: 133 Matching Percent Similarity: 99.25 Matching Percent Identity: 98.50 Total Percent Similarity: 99.25 Total Percent Identity: 98.50 Gaps: 0
Alignment :
1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I I I II I I I II I I I I I I I II I I II I I I I I II I I I I I I I I I I I I I I I I II I 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100
101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTVK 133 I M I I I I I I II I I M I I I I I I I I I I I I I I I I : 101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTAR 133
Therapeutic applications of splice variants of the present invention Splice variants described herein (including any polynucleotide, oligonucleotide, polypeptide, peptide or fragments thereof) or antibodies that specifically bind thereto may optionally be used for therapeutic applications, for example to treat the diseases described herein with regard to diagnostic applications thereof. A "variant-treatable" disease refers to any disease that is treatable by using a splice variant of any of the therapeutic proteins according to the present invention. "Treatment" also encompasses prevention, amelioration, elimination and control of the disease and/or pathological condition. The diseases for which such variants may be useful therapeutic agents are described in greater detail below for each of the variants. The 1547 variants themselves are described by "cluster" or by gene, as these variants are splice variants of known proteins. Therefore, a "cluster-related disease" or a "variant-related disease" refers to a disease that may be treated by a particular protein, with regard to the description of such diseases below a therapeutic protein variant according to the present invention. The term "biologically active", as used herein, refers to a protein having stmctural, regulatory, or biochemical functions of a naturally occuning molecule. Likewise, "immunologically active" refers to the capability of the natural, recombinant, or synthetic ligand, or any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies. The term "modulate", as used herein, refers to a change in the activity of at least one receptor mediated activity. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional or immunological properties of a ligand.
METHODS OF TREATMENT As mentioned hereinabove the novel therapeutic protein variants of the present invention and compositions derived therefrom (i.e., peptides, oligonucleotides) can be used to treat cluster-related diseases. Thus, according to an additional aspect of the present invention there is provided a method of treating cluster-related disease in a subject. The subject according to the present invention is a mammal, preferably a human which has at least one type of the cluster- related diseases described hereinabove. As mentioned hereinabove, the biomolecular sequences of the present invention can be used to tieat subjects with the above-described diseases. The subject according to the present invention is a mammal, preferably a human which is diagnosed with one of the diseases described hereinabove, or alternatively is predisposed to having one of the diseases described hereinabove. As used herein the term "treating" refers to preventing, curing, reversing, attenuating, alleviating, minimizing, suppressing or halting the deleterious effects of the above -described diseases. 1548 Treating, according to the present invention, can be effected by specifically upregulating or alternatively downregulating the expression of at least one of the polypeptides of the present invention in the subject. Optionally, upregulation may be effected by administering to the subject at least one of the polypeptides of the present invention (e.g., recombinant or synthetic) or an active portion thereof, as described herein. However, since the bioavailability of large polypeptides may potentially be relatively small due to high degradation rate and low penetration rate, administration of polypeptides is preferably confined to small peptide fragments (e.g., about 100 amino acids). The polypeptide or peptide may optionally be administered in a pharmaceutical composition, described in more detail below. It will be appreciated that treatment of the above -described diseases according to the present invention may be combined with other treatment methods known in the art (i.e., combination therapy). Thus, treatment of malignancies using the agents of the present invention may be combined with, for example, radiation therapy, antibody therapy and/or chemotherapy. Alternatively or additionally, an upregulating method may optionally be effected by specifically upregulating the amount (optionally expression) in the subject of at least one of the polypeptides of the present invention or active portions thereof. As is mentioned hereinabove and in the Examples section which follows, the biomolecular sequences of this aspect of the present invention may be used as valuable therapeutic tools in the tieatment of diseases in which altered activity or expression of the wild- type gene product is known to contribute to disease onset or progression. For example in case a disease is caused by overexpression of a membrane bound receptor, a soluble variant thereof may be used as an antagonist which competes with the receptor for binding the ligand, to thereby terminate signaling from the receptor. Examples of such diseases are listed in the Examples section which follows. It will be appreciated that the polypeptides of the present invention may also have agonistic properties. These include increasing the stability of the ligand (e.g., IL-4), protection from proteolysis and modification of the pharmacokinetic properties of the ligand (i.e., increasing the half- life of the ligand, while decreasing the clearance thereof). As such, the biomolecular sequences of this aspect of the present invention may be used to treat conditions or 154 9 diseases in which the wild-type gene product plays a favorable role, for example, increasing angiogenesis in cases of diabetes or ischemia. Upregulating expression of the therapeutic protein variants of the present invention may be effected via the administration of at least one of the exogenous polynucleotide sequences of the present invention, ligated into a nucleic acid expression constmct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells), as described above. Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants of the present invention or active portions thereof. It will be appreciated that the nucleic acid constmct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid constmct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex- vivo gene therapy). Nucleic acid constmcts are described in greater detail above. It will be appreciated that the present methodology may also be effected by specifically upregulating the expression of the variants of the present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern of the gene. This approach has been successfully used for shifting the balance of expression of the two isoforms of Bel- x [Taylor (1999) Nat. Biotechnol. 17:1097- 1 100; and Mercatante (2001) J. Biol. Chem. 276:16411-16417]; 1L-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Dmg Dev. 9:213-220]. For example, interleuk in 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma.
Two alternatively spliced isoforms are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane- bound receptor, while the shorter form encodes for a secreted soluble non- functional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression of the wild type receptor and increase the expression of the shorter isoforms. Design and synthesis of oligonucleotides which can be used 1550 according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31 :217-239. Upregulating expression of the polypeptides of the present invention in a subject may be effected via the administration of at least one of the exogenous polynucleotide sequences of the present invention (e.g., SEQ ID NOs: 3, 7, 1 1 , 15, 19, 23, 27, 31 , 35, 39 or 43) ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells). Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants of the present invention or active portions thereof. It will be appreciated that the nucleic acid constmct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex- vivo gene therapy). Preferably, the promoter utilized by the nucleic acid constmct of the present invention is active in the specific cell population transformed. Examples of cell type-specific and or tissue- specific promoters include promoters, such as albumin that is liver specific [Pinkert et al., (1987) Genes Dev. 1 :268-277], lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobuhns; [Banerji et al. (1983) Cell 33729-740], neuron- specific promoters such as the neurofilament promoter [Byme et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas- specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland- specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Patent Application No. EP 264,166). Examples of suitable constmcts include, but are not limited to, pcDNA3, pcDNA3.1 (+/-
), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, including Retio-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the tiasgene is tianscribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5 'LTR promoter. 1551 Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non- viral constmcts, such as adenovims, lentivims, Heφes simplex 1 vims, or adeno- associated vims (AAV) and lipid-based systems. Useful lipids for lipid- mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Choi [Tonkinson et al, Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constmcts for use in gene therapy are vimses, most preferably adenovimses, AAV, lentivimses, or retrovimses. A viral constmct such as a retroviral constmct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constmcts also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the vims used, unless it is already present in the viral constmct. In addition, such a constmct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this puφose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the constmct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constmcts will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second -strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non- viral, such as cationic lipids, polylysine, and dendrimers. It will be appreciated that the present methodology may also be performed by specifically upregulating the expression of the splice variants of the present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern of the gene. This approach has been successfully used for shifting the balance of expression of the two isoforms of Bel- x [Taylor (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante (2001) J. Biol. Chem. 276:1641 1- 16417]; IU5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Dmg Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. 1552
Two alternatively spliced isofonns are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long foπn encodes for the intact membrane- bound receptor, while the shorter form encodes for a secreted soluble non- functional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression of the wild type receptor and increase the expression of the shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31 :217-239. Treatment can preferably effected by agents which are capable of specifically downregulating expression (or activity) of at least one of the polypeptide variants of the present invention. Down regulating the expression of the therapeutic protein variants of the present invention may be achieved using oligonucleotide agents such as those described in greater detail below. SiRNA molecules - Small interfering RNA (siRNA) molecules can be used to down- regulate expression of the therapeutic protein variants of the present invention. RNA interference is a two-step process. The first step, which is termed as the initiation step, input dsRNA is digested into 21-23 nucleotide (nt) small interfering RNAs (siRNA), probably by the action of Dicer, a member of the RNase III family of dsRNA-specific ribonucleases, which processes (cleaves) dsRNA (introduced directly or via a transgene or a vims) in an ATP- dependent manner. Successive cleavage events degrade the RNA to 19-21 bp duplexes (siRNA), each with 2-nucleotide 3 ' overhangs [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002); and Bernstein Nature 409:363-366 (2001)]. In the effector step, the siRNA duplexes bind to a nuclease complex to from the RNA- induced silencing complex (RISC). An ATP-dependent unwinding of the siRNA duplex is required for activation of the RISC. The active RISC then targets the homologous transcript by base pairing interactions and cleaves the mRNA into 12 nucleotide fragments from the 3' terminus of the siRNA [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225- 232 (2002); Hammond et al. (2001) Nat. Rev. Gen. 2:110-119 (2001); and Shaφ Genes. Dev. 15:485-90 (2001)]. Although the mechanism of cleavage is still to be elucidated, research 1553 indicates that each RISC contains a single siRNA and an RNase [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002)]. Because of the remarkable potency of RNAi, an amplification step within the RNAi pathway has been suggested. Amplification could occur by copying of the input dsRNAs which would generate more siRNAs, or by replication of the siRNAs formed. Alternatively or additionally, amplification could be effected by multiple turnover events of the RISC [Hammond et al. Nat. Rev. Gen. 2: 1 10- 1 19 (2001), Shaφ Genes. Dev. 15:485-90 (2001); Hutvagner and Zamore Curr. Opin. Genetics and Development 12:225-232 (2002)]. For more information on RNAi see the following reviews Tuschl ChemBiochem. 2:239-245 (2001); Cullen Nat. Immunol. 3:597-599 (2002); and Brand Biochem. Bbphys. Act. 1575:15-25 (2002). Synthesis of RNAi molecules suitable for use with the present invention can be effected as follows. First, the mRNA sequence is scanned downstream of the AUG start codon for AA dinucleotide sequences. Occunence of each AA and the 3' adjacent 19 nucleotides is recorded as potential siRNA target sites. Preferably, siRNA target sites are selected from the open reading frame, as untranslated regions (UTRs) are richer in regulatory protein binding sites. UTR-binding proteins and/or translation initiation complexes may interfere with binding of the siRNA endonuclease complex [Tuschl ChemBiochem. 2:239-245]. It will be appreciated though, that siRNAs directed at untranslated regions may also be effective, as demonstrated for GAPDH wherein siRNA directed at the 5' UTR mediated about 90 % decrease in cellular GAPDH mRNA and completely abolished protein level (www.ambion.com/techlib/tn/911912.html). Second, potential target sites are compared to an appropriate genomic database (e.g., human, mouse, rat etc.) using any sequence alignment software, such as the BLAST software available from the NCBI server (www.ncbi.nlm.nih.gov/BLAST/). Putative target sites which exhibit significant homology to other coding sequences are filtered out. Qualifying target sequences are selected as template for siRNA synthesis. Prefened sequences are those including low G/C content as these have proven to be more effective in mediating gene silencing as compared to those with G/C content higher than 55 %. Several target sites are preferably selected along the length of the target gene for evaluation. Target sites are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. For better 1554 evaluation of the selected siRNAs, a negative control is preferably used in conjunction. Negative control siRNA preferably include the same nucleotide composition as the siRNAs but lack significant homology to the genome. Thus, a scrambled nucleotide sequence of the siRNA is preferably used, provided it does not display any significant homology to any other gene. DNAzyme molecules - Another agent capable of downregulating expression of the polypeptides of the present invention is a DNAzyme molecule capable of specifically cleaving an mRNA transcript or DNA sequence of the polynucleotides of the present invention. DNAzymes are single- stranded polynucleotides which are capable of cleaving both single and double stranded target sequences (Breaker, R.R. and Joyce, G. Chemistry and Biology 1995;2:655; Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 1997;943:4262) A general model (the "10-23" model) for the DNAzyme has been proposed. "10-23" DNAzymes have a catalytic domain of 15 deoxyribonucleotides, flanked by two substrate- recognition domains of seven to nine deoxyribonucleotides each. This type of DNAzyme can effectively cleave its substrate RNA at purine :pyrimidine junctions (Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 199; for rev of DNAzymes see Khachigian, LM [Curr Opin Mol Ther 4:119-21 (2002)]. Target sites for DNAzymes are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. Examples of constmction and amplification of synthetic, engineered DNAzymes recognizing single and double-stranded target cleavage sites have been disclosed in U.S. Pat. No. 6,326,174 to Joyce et al. DNAzymes of similar design directed against the human Urokinase receptor were recently observed to inhibit Urokinase receptor expression, and successfully inhibit colon cancer cell metastasis in vivo (Itoh et al , 20002, Abstract 409, Ann Meeting Am Soc Gen Ther www.asgt.org). In another application, DNAzymes complementary to bcr-abl oncogenes were successful in inhibiting the oncogenes expression in leukemia cells, and lessening relapse rates in autologous bone marrow transplant in cases of CML and ALL. Antisense molecules - Downregulation of the polynucleotides of the present invention can also be effected by using an antisense polynucleotide capable of specifically hybridizing with an mRNA transcript encoding the polypeptide variants of the present invention 1555 The tenn "antisense", as used herein, refers to any composition containing nucleotide sequences, which are complementary to a specific DNA or RNA sequence. The term "antisense strand" is used in reference to a nucleic acid strand that is complementary to the "sense" strand. Antisense molecules also include peptide nucleic acids and may be produced by any method including synthesis or transcription. Once introduced into a cell, the complementary nucleotides combine with natural sequences produced by the cell to form duplexes and block either transcription or translation. The designation "negative" is sometimes used in reference to the antisense strand, and "positive" is sometimes used in reference to the sense strand. Antisense oligonucleotides are also used for modulation of alternative splicing in vivo and for diagnostics in vivo and in vitro (Khelifi C. et al., 2002,
Cunent Pharmaceutical Design 8:451-1466; Sazani, P., and Kole. R. Progress in Molecular and Cellular Biology, 2003, 31 :217-239). Design of antisense molecules which can be used to efficiently downregulate expression of the polypeptides of the present invention must be effected while considering two aspects important to the antisense approach. The first aspect is delivery of the oligonucleotide into the cytoplasm of the appropriate cells, while the second aspect is design of an oligonucleotide which specifically binds the designated mRNA within cells in a way which inhibits translation thereof. The prior art teaches of a number of delivery strategies which can be used to efficiently deliver oligonucleotides into a wide variety of cell types [see, for example, Luft J Mol Med 76: 75-6 (1998); Kronenwett et al. Blood 91: 852-62 (1998); Rajur et al. Bioconjug Chem 8: 935-40 (1997); Lavigne et al. Biochem Biophys Res Commun 237: 566-71 (1997) and Aoki et al. (1997) Biochem Biophys Res Commun 231 : 540-5 (1997)]. In addition, algorithms for identifying those sequences with the highest predicted binding affinity for their target mRNA based on a thermodynamic cycle that accounts for the energetics of stmctural alterations in both the target mRNA and the oligonucleotide are also available [see, for example, Walton et al. Biotechnol Bioeng 65: 1-9 (1999)]. Such algorithms have been successfully used to implement an antisense approach in cells. For example, the algorithm developed by Walton et al. enabled scientists to successfully design antisense oligonucleotides for rabbit beta-globin (RBG) and mouse tumor necrosis factor-alpha (TNF alpha) tianscripts. The same research group has more recently reported that 1556 the antisense activity of rationally selected oligonucleotides against three model target mRNAs (human lactate dehydrogenase A and B and rat gpl30) in cell culture as evaluated by a kinetic PCR technique proved effective in almost all cases, including tests against three different targets in two cell types with phosphodiester and phosphorothioate oligonucleotide chemistries. In addition, several approaches for designing and predicting efficiency of specific oligonucleotides using an in vitro system were also published (Matveeva et al., Nature Biotechnology 16: 1374 - 1375 (1998)]. Several clinical trials have demonstrated safety, feasibility and activity of antisense oligonucleotides. For example, antisense oligonucleotides suitable for the treatment of cancer have been successfully used [Holmund et al., Cun Opin Mol Ther 1 :372-85 (1999)], while treatment of hematological malignancies via antisense oligonucleotides targeting c-myb gene, p53 and Bcl-2 had entered clinical trials and had been shown to be tolerated by patients [Gerwitz Cun Opin Mol Ther 1 :297-306 (1999)]. More recently, antisense- mediated suppression of human heparanase gene expression has been reported to inhibit pleural dissemination of human cancer cells in a mouse model [Uno et al., Cancer Res 61 J855-60 (2001)]. Thus, the cunent consensus is that recent developments in the field of antisense technology which, as described above, have led to the generation of highly accurate antisense design algorithms and a wide variety of oligonucleotide delivery systems, enable an ordinarily skilled artisan to design and implement antisense approaches suitable for downregulating expression of known sequences without having to resort to undue trial and enor experimentation. Target sites for antisense molecules are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. Ribozymes - Another agent capable of downregulating expression of the polypeptides of the present invention is a ribozyme molecule capable of specifically cleaving an mRNA transcript encoding the polypeptide variants of the present invention. Ribozymes are being increasingly used for the sequence- specific inhibition of gene expression by the cleavage of mRNAs encoding proteins of interest [Welch et al., Cun Opin Biotechnol. 9:486-96 (1998)]. The possibility of designing ribozymes to cleave any specific target RNA has rendered them 1557 valuable tools in both basic research and therapeutic applications. In therapeutics area, ribozymes have been exploited to target viral RNAs in infectious diseases, dominant oncogenes in cancers and specific somatic mutations in genetic disorders [Welch et al., Clin Diagn Virol. 10:163-71 (1998)]. Most notably, several ribozyme gene therapy protocols for HIV patients are already in Phase 1 trials. More recently, ribozymes have been used for transgenic animal research, gene target validation and pathway elucidation. Several ribozymes are in various stages of clinical trials. ANGIOZYME was the first chemically synthesized ribozyme to be studied in human clinical trials. ANGIOZYME specifically inhibits formation of the VEGF-r (Vascular Endothelial Growth Factor receptor), a key component in the angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other firms have demonstrated the importance of anti- angiogenesis therapeutics in animal models. HEPTAZYME, a ribozyme designed to selectively destroy Hepatitis C Vims (HCV) RNA, was found effective in decreasing Hepatitis C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, Incoφorated - WEB home page). Alternatively, down regulation of the polypeptide variants of the present invention may be achieved at the polypeptide level using downregulating agents such as antibodies or antibody fragments capabale of specifically binding the polypeptides of the present invention and inhibiting the activity thereof (i.e., neutralizing antibodies). Such antibodies can be directed for example, to the heterodimerizing domain on the variant, or to a putative ligand binding domain. Further description of antibodies and methods of generating same is provided below.
PHARMACEUTICAL COMPOSITIONS AND DELIVERY THEREOF The present invention features a pharmaceutical composition comprising a therapeutically effective amount of a therapeutic agent according to the present invention, which is preferably a therapeutic protein variant as described herein. Optionally and alternatively, the therapeutic agent could be an antibody or an oligonucleotide that specifically recognizes and binds to the therapeutic protein variant, but not to the conesponding full length known protein. Alternatively, the pharmaceutical composition of the present invention includes a therapeutically effective amount of at least an active portion of a therapeutic protein variant polypeptide. 1558 The pharmaceutical composition according to the present invention is preferably used for the treatment of cluster- related diseases. "Treatment" refers to both therapeutic treatment and prophylactic or preventative measures. Those in need of treatment include those already with the disorder as well as those in which the disorder is to be prevented. Hence, the mammal to be treated herein may have been diagnosed as having the disorder or may be predisposed or susceptible to the disorder.
"Mammal" for puφoses of treatment refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. Preferably, the mammal is human. A "disorder" is any condition that would benefit from treatment with the agent according to the present invention. This includes chronic and acute disorders or diseases including those pathological conditions which predispose the mammal to the disorder in question. Non- limiting examples of disorders to be treated herein are described with regard to specific examples given herein. The term "therapeutically effective amount" refers to an amount of agent according to the present invention that is effective to treat a disease or disorder in a mammal. In the case of cancer, the therapeutically effective amount of the agent may reduce the number of cancer cells; reduce the tumor size; inhibit (i.e., slow to some extent and preferably stop) cancer cell infiltration into peripheral organs; inhibit (i.e., slow to some extent and preferably stop) tumor metastasis; inhibit, to some extent, tumor growth; and/or relieve to some extent one or more of the symptoms associated with the cancer. To the extent the agent may prevent growth and/or kill existing cancer cells, it may be cytostatic and/or cytotoxic. For cancer therapy, efficacy can, for example, be measured by assessing the time to disease progression (TTP) and/or determining the response rate (RR). The therapeutic agents of the present invention can be provided to the subject per se, or as part of a pharmaceutical composition where they are mixed with a pharmaceutically acceptable carrier. As used herein a "pharmaceutical composition" refers to a preparation of one or more of the active ingredients described herein with other chemical components such as physiologically suitable carriers and excipients. The puφose of a pharmaceutical composition is to facilitate administration of a compound to an organism. 1559 Herein the term "active ingredient" refers to the preparation accountable for the biological effect. Hereinafter, the phrases "physiologically acceptable canier" and "phaπnaceutically acceptable canier" which may be interchangeably used refer to a canier or a diluent that does not cause significant initation to an organism and does not abrogate the biological activity and properties of the administered compound. An adjuvant is included under these phrases. One of the ingredients included in the pharmaceutically acceptable canier can be for example polyethylene glycol (PEG), a biocompatible polymer with a wide range of solubility in both organic and aqueous media (Mutter et al. (1979). Herein the term "excipient" refers to an inert substance added to a pharmaceutical composition to further facilitate administration of an active ingredient. Examples, without limitation, of excipients include calcium carbonate, calcium phosphate, various sugars and types of starch, cellulose derivatives, gelatin, vegetable oils and polyethylene glycols. Techniques for formulation and administration of dmgs may be found in "Remington's Pharmaceutical Sciences," Mack Publishing Co., Easton, PA, latest edition, which is incoφorated herein by reference. Suitable routes of administration may, for example, include oral, rectal, transmucosal, especially transnasal, intestinal or parenteral delivery, including intramuscular, subcutaneous and intramedullary injections as well as intrathecal, direct intraventricular, intravenous, intraperitoneal, intranasal, or intraocular injections. Alternately, one may administer a preparation in a local rather than systemic manner, for example, via injection of the preparation directly into a specific region of a patient's body. Pharmaceutical compositions of the present invention may be manufactured by processes well known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee- making, levigating, emulsifying, encapsulating, entrapping or lyophilizing processes. Pharmaceutical compositions for use in accordance with the present invention may be formulated in conventional manner using one or more physiologically acceptable caniers comprising excipients and auxiliaries, which facilitate processing of the active ingredients into preparations which, can be used pharmaceutically. Proper formulation is dependent upon the route of administration chosen. 1560 For injection, the active ingredients of the invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer's solution, or physiological salt buffer. For transmucosal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art. For oral administration, the compounds can be formulated readily by combining the active compounds with pharmaceutically acceptable caniers well known in the art. Such carriers enable the compounds of the invention to be formulated as tablets, pills, dragees, capsules, liquids, gels, symps, slunies, suspensions, and the like, for oral ingestion by a patient. Pharmacological preparations for oral use can be made using a solid excipient, optionally grinding the resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries if desired, to obtain tablets or dragee cores. Suitable excipients are, in particular, fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; cellulose preparations such as, for example, maize starch, wheat starch, rice starch, potato starch, gelatin, gum tragacanth, methyl cellulose, hydroxypropylmethyl-cellulose, sodium carbomethylcellulose; and/or physiologically acceptable polymers such as polyvinylpynolidone (PVP). If desired, disintegrating agents may be added, such as cross- linked polyvinyl pynolidone, agar, or alginic acid or a salt thereof such as sodium alginate. Dragee cores are provided with suitable coatings. For this puφose, concentrated sugar solutions may be used which may optionally contain gum arabic, talc, polyvinyl pynolidone, carbopol gel, polyethylene glycol, titanium dioxide, lacquer solutbns and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for identification or to characterize different combinations of active compound doses. Pharmaceutical compositions, which can be used orally, include push- fit capsules made of gelatin as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or sorbitol. The push- fit capsules may contain the active ingredients in admixture with filler such as lactose, binders such as starches, lubricants such as talc or magnesium stearate and, optionally, stabilizers. In soft capsules, the active ingredients may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycols. In addition, stabilizers may be added. All formulations for oral administration should be in dosages suitable for the chosen route of administration. 1561 For buccal administration, the compositions may take the form of tablets or lozenges formulated in conventional manner. For administration by nasal inhalation, the active ingredients for use according to the present invention are conveniently delivered in the form of an aerosol spray presentation from a pressurized pack or a nebulizer with the use of a suitable propellant, e.g., dichlorodifluoromethane, trichlorofluoromethane, dichloro-tetrafluoroethane or carbon dioxide. In the case of a pressurized aerosol, the dosage unit may be determined by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g., gelatin for use in a dispenser may be formulated containing a powder mix of the compound and a suitable powder base such as lactose or starch. The preparations described herein may be formulated for parenteral administration, e.g., by bolus injection or continuous infusion. Formulations for injection may be presented in unit dosage form, e.g., in ampoules or in multidose containers with optionally, an added preservative. The compositions may be suspensions, solutions or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Pharmaceutical compositions for parenteral administration include aqueous solutions of the active preparation in water-soluble form. Additionally, suspensions of the active ingredients may be prepared as appropriate oily or water based injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acids esters such as ethyl oleate, triglycerides or liposomes. Aqueous injection suspensions may contain substances, which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol or dextran. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility of the active ingredients to allow for the preparation of highly concentrated solutions. Alternatively, the active ingredient may be in powder form for constitution with a suitable vehicle, e.g., sterile, pyrogen-free water based solution, before use. The preparation of the present invention may also be formulated in rectal compositions such as suppositories or retention enemas, using, e.g., conventional suppository bases such as cocoa butter or other glycerides. 1562 Pharmaceutical compositions suitable for use in context of the present invention include compositions wherein the active ingredients are contained in an amount effective to achieve the intended purpose. More specifically, a therapeutically effective amount means an amount of active ingredients effective to prevent, alleviate or ameliorate symptoms of disease or prolong the survival of the subject being treated. Determination of a therapeutically effective amount is well within the capability of those skilled in the art. For any preparation used in the methods of the invention, the therapeutically effective amount or dose can be estimated initially from in vitro assays. For example, a dose can be formulated in animal models and such information can be used to more accurately determine useful doses in humans. Toxicity and therapeutic efficacy of the active ingredients described herein can be determined by standard pharmaceutical procedures in vitro, in cell cultures or experimental animals. The data obtained from these in vitro and cell culture assays and animal studies can be used in formulating a range of dosage for use in human. The dosage may vary depending upon the dosage form employed and the route of administration utilized. The exact formulation, route of administration and dosage can be chosen by the individual physician in view of the patient's condition. (See e.g., Fingl, et al., 1975, in "The Pharmacological Basis of Therapeutics", Ch. 1 p.l). Depending on the severity and responsiveness of the condition to be treated, dosing can be of a single or a plurality of administrations, with course of treatment lasting from several days to several weeks or until cure is effected or diminution of the disease state is achieved. The amount of a composition to be administered will, of course, be dependent on the subject being treated, the severity of the affliction, the manner of administration, the judgment of the prescribing physician, etc. Compositions including the preparation of the present invention formulated in a compatible pharmaceutical canier may also be prepared, placed in an appropriate container, and labeled for treatment of an indicated condition. Pharmaceutical compositions of the present invention may, if desired, be presented in a pack or dispenser device, such as an FDA approved kit, which may contain one or more unit dosage forms containing the active ingredient. The pack may, for example, comprise metal or 1563 plastic foil, such as a blister pack. The pack or dispenser device may be accompanied by instmctions for administration. The pack or dispenser may also be accommodated by a notice associated with the container in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals, which notice is reflective of approval by the agency of the form of the compositions or human or veterinary administration. Such notice, for example, may be of labeling approved by the U.S. Food and Drug Administration for prescription dmgs or of an approved product insert.
IMMUNOGENIC COMPOSITIONS A therapeutic agent according to the present invention may optionally be a molecule, which promotes a specific immunogenic response against at least one of the polypeptides of the present invention in the subject. The molecule can be polypeptide variants of the present invention, a fragment derived therefrom or a nucleic acid sequence encoding thereof. Although such a molecule can be provided to the subject per se, the agent is preferably administered with an immunostimulant in an immunogenic composiiton. An immunostimulant may be any substance that enhances or potentiates an immune response (antibody and/or cell- mediated) to an exogenous antigen. Examples of immunostimulants include adjuvants, biodegradable microspheres (e.g., polylactic galactide) and liposomes into which the compound is incoφorated (see e.g., U.S. Pat. No. 4,235,877). Vaccine preparation is generally described in, for example, M. F. Powell and M. J. Newman, eds., "Vaccine Design (the subunit and adjuvant approach)," Plenum Press (NY, 1995). Illustrative immunogenic compositions may contain DNA encoding one or more of the polypeptides as described above, such that the polypeptide is generated in situ. The DNA may be present within any of a variety of delivery systems known to those of ordinary skill in the art, including nucleic acid expression systems (see below), bacteria and viral expression systems. Numerous gene delivery techniques are well known in the art, such as those described by Rolland, Crit. Rev. Therap. Dmg Canier Systems 15:143-198, 1998, and references cited therein. Appropriate nucleic acid expression systems contain the necessary DNA sequences for expression in the subject (such as a suitable promoter and terminating signal). Bacterial delivery systems involve the administration of a bacterium (such as Bacillus- Calmette-Guerrin) that expresses an immunogenic portion of the polypeptide on its cell surface or secretes such an 1564 epitope. n a prefened embodiment, the DNA may be introduced using a viral expression system (e.g., vaccinia or other pox vims, retrovims, or adenovirus), which may involve the use of a non-pathogenic (defective), replication competent vims. Suitable systems are disclosed, for example, in Fisher-Hoch et al., Proc. Natl. Acad. Sci. USA 86:317-321, 1989; Flexner et al., Ann. N.Y Acad. Sci. 569:86-103, 1989; Flexner et al., Vaccine 8: 17-21 , 1990; U.S. Pat. Nos. 4,603,112, 4,769,330, and 5,017,487; WO 89/01973; U.S. Pat. No. 4,777,127; GB 2,200,651 ; EP 0,345,242; WO 91/02805; Berkner, Biotechniques 6:616-627, 1988; Rosenfeld et al., Science 252:431-434, 1991; Kolls et al., Proc. Natl. Acad. Sci. USA 91 :215-219, 1994; Kass- Eisler et al., Proc. Natl. Acad. Sci. USA 90:1 1498- 11502, 1993; Guzman et al., Circulation 88:2838-2848, 1993; and Guzman et al., Cir. Res. 73: 1202-1207, 1993. Techniques for incoφorating DNA into such expression systems are well known to those of ordinary skill in the art. The DNA may also be "naked," as described, for example, in Ulmer et al., Science 259:1745-1749, 1993 and reviewed by Cohen, Science 259:1691- 1692, 1993. The uptake of naked DNA may be increased by coating the DNA onto biodegradable beads, which are efficiently transported into the cells. It will be appreciated that an immunogenic composition may comprise both a polynucleotide and a polypeptide component. Such immunogenic compositions may provide for an enhanced immune response. Any of a variety of immunostimulants may be employed in the immunogenic compositions of this invention. For example, an adjuvant may be included. Most adjuvants contain a substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or Mycobacterium tuberculosis derived proteins. Suitable adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Detroit, Mich.); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, N.J.); AS-2 (SmithKline Beecham, Philadelphia, Pa.); aluminum salts such as aluminum hydroxide gel (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. Cytokines, such as GM-CSF or interleukin-2,-7, or -12, may also be used as adjuvants. 1565 ' The adjuvant composition may be designed to induce an immune response predominantly of the Thl type. High levels of Th l -type cytokines (e.g., IFN-. gamma., TNF.alpha., IL-2 and IL- 12) tend to favor the induction of cell mediated immune responses to an administered antigen. In contrast, high levels of Th2-type cytokines (e.g., 1L-4, IL-5, IL-6 and IL- 10) tend to favor the induction of humoral immune responses. Following application of an immunogenic composition as provided herein, the subject will support an immune response that includes Thl - and Th2-type responses. The levels of these cytokines may be readily assessed using standard assays. For a review of the families of cytokines, see Mosmann and Coffinan, Ann. Rev. Immunol. 7:145-173, 1989. Prefened adjuvants for use in eliciting a predominantly Thl -type response include, for example, a combination of monophosphoryl lipid A, preferably 3 -de-O- acylated monophosphoryl lipid A (3D-MPL), together with an aluminum salt. MPL adjuvants are available from Corixa Coφoration (Seattle, Wash.; see U.S. Pat. Nos. 4,436,727; 4,877,61 1 ; 4,866,034 and 4,912,094). CpG-containing oligonucleotides (in which the CpG dinucleotide is unmethylated) also induce a predominantly Thl response. Such oligonucleotides are well known and are described, for example, in WO 96/02555, WO 99/33488 and U.S. Pat. Nos. 6,008,200 and 5,856,462. Immunostimulatory DNA sequences are also described, for example, by Sato et al., Science 273:352, 1996. Another prefened adjuvant is a saponin, preferably QS21 (Aquila Biopharmaceuticals Inc., Framingham, Mass.), which may be used alone or in combination with other adjuvants. For example, an enhanced system involves the combination of a monophosphoryl lipid A and saponin derivative, such as the combination of QS21 and 3D- MPL as described in WO 94/00153, or a less reactogenic composition where the QS21 is quenched with cholesterol, as described in WO 96/33739. Other prefened formulations comprise an oil- in- water emulsion and tocopherol. A particularly potent adjuvant formulation involving QS21, 3D- MPL and tocopherol in an oil- in- water emulsion is described in WO 95/17210. Other prefened adjuvants include Montanide ISA 720 (Seppic, France), SAF (Chiron, Calif, United States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of adjuvants (e.g., SBAS-2 or SBAS-4, available from SmithKline Beecham, Rixensart, Belgium), Detox (Corixa, Hamilton, Mont.), RC-529 (Corixa, Hamilton, Mont.) and other aminoalkyl glucosaminide 4- phosphates (AGPs), such as those described in pending U.S. patent application Ser. Nos. 08/853,826 and 09/074,720. 1566 A delivery vehicle may be employed within the immunogenic composition of the present invention to facilitate production of an antigen-specific immune response that targets tumor cells. Delivery vehicles include antigen presenting cells (APCs), such as dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered to be efficient APCs. Such cells may be genetically modified to increase the capacity for presenting the antigen, to improve activation and/or maintenance of the T cell response, to have anti- tumor effects per se and/or to be immunologically compatible with the receiver (i.e., matched HLA haplotype). APCs may generally be isolated from any of a variety of biological fluids and organs, including tumor and peritumoral tissues, and may be autologous, allogeneic, syngeneic or xenogeneic cells. Dendritic cells are highly potent APCs (Banchereau and Steinman, Nature 392:245-251, 1998) and have been shown to be effective as a physiological adjuvant for eliciting prophylactic or therapeutic antitumor immunity (see Timmeman and Levy, Ann. Rev. Med. 50:507-529, 1999). In general, dendritic cells may be identified based on their typical shape (stellate in situ, with marked cytoplasmic processes (dendrites) visible in vitro), their ability to take up, process and present antigens with high efficiency and their ability to activate naive T cell responses. Dendritic cells may, of course, be engineered to express specific cell- surface receptors or ligands that are not commonly found on dendritic cells in vivo or ex vivo, and such modified dendritic cells are contemplated by the present invention. As an alternative to dendritic cells, secreted vesicles antigen- loaded dendritic cells (called exosomes) may be used within an immunogenic composition (see Zitvogel et al., Nature Med. 4:594-600, 1998). Dendritic cells and progenitors may be obtained from peripheral blood, bone manow, tumor- infiltrating cells, peritumoral tissues- infiltrating cells, lymph nodes, spleen, skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, IL-4, IL-13 and/or TNF.alpha. to cultures of monocytes harvested from peripheral blood. Alternatively, CD34 positive cells harvested from peripheral blood, umbilical cord blood or bone manow may be differentiated into dendritic cells by adding to the culture medium combinations of GM-CSF, IL-3, TNF.alpha., CD40 ligand, LPS, flt3 ligand and or other compound(s) that induce differentiation, maturation and proliferation of dendritic cells. 1567 Dendritic cells are categorized as "immature" and "mature" cells, which allows a simple way to discriminate between two well characterized phenotypes. Immature dendritic cells are characterized as APC with a high capacity for antigen uptake and processing, which conelates with the high expression of Fey receptor and mannose receptor. The mature phenotype is typically characterized by a lower expression of these markers, but a high expression of cell surface molecules responsible for T cell activation such as class I and class II MHC, adhesion molecules (e.g., CD54 and CD1 1) and costimulatory molecules (e.g., CD40, CD80, CD86 and 4- IBB). APCs may generally be transfected with at least one polynucleotide encoding a polypeptide of the present invention, such that variant II, or an immunogenic portion thereof, is expressed on the cell surface. Such transfection may take place ex vivo, and a composition comprising such transfected cells may then be used for therapeutic puφoses, as described herein. Alternatively, a gene delivery vehicle that targets a dendritic or other antigen presenting cell may be administered to the subject, resulting in transfection that occurs in vivo. In vivo and ex vivo transfection of dendritic cells, for example, may generally be performed using any methods known in the art, such as those described in WO 97/24447, or the gene gun approach described by Mahvi et al., Immunology and cell Biology 75:456-460, 1997. Antigen loading of dendritic cells may be achieved by incubating dendritic cells or progenitor cells with a polypeptide of the present inventio, DNA (naked or within a plasmid vector) or RNA; or with antigen-expressing recombinant bacterium or vimses (e.g., vaccinia, fowlpox, adenovims or lentivirus vectors). Prior to loading, the polypeptide may be covalently conjugated to an immunological partner that provides T cell help (e.g., a carrier molecule) such as described above. Alternatively, a dendritic cell may be pulsed with a non- conjugated immunological partner, separately or in the presence of the polypeptide.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. 1568 Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incoφorated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incoφorated herein by reference. In addition, citation or identification of any reference in this application shall not be constmed as an admission that such reference is available as prior art to the present invention.

Claims

1569 WHAT IS CLAIMED IS:
1. An isolated polynucleotide comprising a polynucleotide having a sequence selected from the group consisting of: Rl 1723_PEA_1_T15, Rl 1723_PEA_1_T17,
Rl 1723_PEA_1_T19, Rl 1723_PEA_1_T20, Rl 1723_PEA _1_T5, or Rl 1723_PEA_1_T6.
2. An isolated polynucleotide comprising a node having a sequence selected from the group consisting of: Rl 1723_PEA_l_node_13, Rl 1723_PEAJ_node_16,
Rl 1723_PEA_l_node_19, R11723_PEA_l_node_2, Rl 1723_PEAJ_node_22, Rl 1723_PEA_l_node_31, Rl 1723_PEA_l_node_10, Rl 1723_PEA_l_node_l 1, Rl 1723_PEA_l_node_15, Rl 1723_PEA_l_node_18, Rl 1723_PEA_l_node_20, Rl 1723_PEA_l_node_21, Rl 1723_PEA_l_node_23, Rl 1723_PEA J_node_24, Rl 1723_PEA_l_node_25, Rl 1723_PEA_l_node_26, Rl 1723_PEA_l_node_27, Rl 1723_PEA_l_node_28, R11723_PEA_l_node_29, R11723_PEA_l_node_3, Rl 1723_PEA_l_node_30, R11723_PEA_l_node_4, R11723_PEA_l_node_5, Rl 1723_PEA_l_node_6, Rl 1723_PEA_l_node_7 or Rl 1723 _PEA_l_node_8.
3. An isolated polypeptide comprising a polypeptide having a sequence selected from the group consisting of : Rl 1723_PEA_1_P2, Rl 1723_PEA_1_P6, Rl 1723_PEA_1_P7, R11723_PEA_l_P13, or R11723_PEA_l_P10.
4. An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P6, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR conesponding to amino acids 1 - 110 of Rl 1723_PEA_1_P6, and a second amino acid sequence 1570 being at least 90 % homologous to
MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to amino acids 1 - 1 12 of Q8IXM0, which also conesponds to amino acids 1 11 - 222 of Rl 1723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
5. An isolated polypeptide encoding for a head of Rl 1723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%., more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence
MWVLG1AATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of R11723_PEA_1_P6.
6. An isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P6, comprising a first amino acid sequence being at least 90 %> homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLWIDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q96AC2, which also conesponds to amino acids 1 - 83 of R11723_PEA_1_P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of
Rl 1723 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
7. An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the 1571 sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA_1_P6.
8. An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q8N2G4, which also conesponds to amino acids 1 - 83 of Rl 1723 PEAJ P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of
Rl 1723 PEAJ P6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
9. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723_PEA_1_P6.
10. An isolated chimeric polypeptide encoding for Rl 1723 PEAJ P6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 24 - 106 of BAC85518, which also conesponds to amino acids 1 - 83 of Rl 1723 PEAJ _P6, and a second amino acid 1572 sequence being at least 70%o, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL
RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ
CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of
Rl 1723_PEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
11. An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P6, comprising a polypeptide being at least 70%>, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723_PEA_1_P6.
12. An isolated chimeric polypeptide encoding for Rl 1723 JΕAJ _P7, comprising a first amino acid sequence being at least 90 %> homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also conesponds to amino acids 1 - 64 of Rl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723 PEAJ P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
13. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7. 1573
14. An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLG1AATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 of Rl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
15. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in Rl 1723 JΕAJ _P7.
16. An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P7, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of Rl 1723 PEAJ P7, second amino acid sequence being at least 90 %> homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG conesponding to amino acids 22 - 80 of BAC85273, which also conesponds to amino acids 6 - 64 of Rl 1723_PEA _1_P7, and a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723 PEAJ P7, wherein said first, second and thud amino acid sequences are contiguous and in a sequential order. 1574
17. An isolated polypeptide encoding for a head of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of Rl 1723_PEA_1_P7.
18. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in Rl 1723 PEA 1 P7.
19. An isolated chimeric polypeptide encoding for Rl 1723 JΕAJ _P7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 24 - 87 of BAC85518, which also conesponds to amino acids 1 - 64 of Rl 1723_PEA_1_P7, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
Rl 1723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
20. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723_PEA_1_P7.
21. An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723 PEAJ P13, and a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most 1575 preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of R11723_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
22. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in Rl 1723_PEA_1_P13.
23. An isolated chimeric polypeptide encoding for Rl 1723_PEAJ_P10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of
Rl 1723 _PEA_1 _P10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
24. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723_PEA_1_P10.
25. An isolated chimeric polypeptide encoding for Rl 1723 PEA 1 P10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also conesponds to amino acids 1 - 63 of Rl 1723_PEA_1_P10, and a second amino acid sequence being at least 70%, 1576 optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of Rl 1723_PEAJ_P10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
26. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723_PEA_1_P10.
27. An isolated chimeric polypeptide encoding for Rl 1723_PEA_1_P10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of Rl 1723_PEA_1_P10, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA conesponding to amino acids 22 - 79 of BAC85273, which also conesponds to amino acids 6 - 63 of Rl 1723_PEA_1_P10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence
DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of R11723 PEAJ P10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order.
28. An isolated polypeptide encoding for a head of Rl 1723_PEA_1_P10, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWVLG of Rl 1723 PEAJ P10.
1577 29. An isolated polypeptide encoding for a tail of Rl 1723_PEA_1 JMO, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723_ PEA J_P10.
30. An isolated chimeric polypeptide encoding for Rl 1723 JΕAJ _P 10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of Rl 1723 _PEA_1 _P10, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of R11723 PEAJ P10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
31. An isolated polypeptide encoding for a tail of Rl 1723 PEAJ P10, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in Rl 1723_PEAJ_P10.
32. An isolated oligonucleotide, comprising an amplicon selected from the group consisting of SEQ ID NOs: 975 or 978.
33. A primer pair, comprising a pair of isolated oligonucleotides capable of amplifying said amplicon of claim 32.
34. The primer pair of claim 33, comprising a pair of isolated oligonucleotides selected from the group consisting of: SEQ NOs 972 and 973; or 976 and 977.
1578 35. An antibody capable of specifically binding to an epitope of an amino acid sequence of any of claims 3-31.
36. The antibody of claim 35, wherein said amino acid sequence comprises said tail of claims 4-31.
37. The antibody of claims 35 or 36, wherein said antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein PSEC .
38. A kit for detecting ovarian cancer, comprising a kit detecting overexpression of a splice variant according to any of the above claims.
39. The kit of claim 38, wherein said kit comprises a NAT-based technology.
40. The kit of claim 39, wherein said kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
41. The kit of claim 38, wherein said kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
42. The kit of claim 38, wherein said kit comprises an antibody according to any of claims 35-37.
43. The kit of claim 42, wherein said kit further comprises at least one reagent for performing an ELISA or a Western blot.
44. A method for detecting ovarian cancer, comprising detecting overexpression of a splice variant according to any of the above claims.
1579 45. The method of claim 44, wherein said detecting overexpression is perfoπned with a NAT-based technology.
46. The method of claim 44, wherein said detecting overexpression is performed with an immunoassay.
47. The method of claim 46, wherein said immunoassay comprises an antibody according to any of the above claims.
48. A biomarker capable of detecting ovarian cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof.
49. A method for screening for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
50. A method for diagnosing ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
51. A method for monitoring disease progression and/or treatment efficacy and/or relapse of ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
52. A method of selecting a therapy for ovarian cancer, comprising detecting ovarian cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims and selecting a therapy according to said detection.
EP05780004A 2004-01-27 2005-01-27 Differential expression of markers in ovarian cancer Ceased EP1721257A2 (en)

Applications Claiming Priority (30)

Application Number Priority Date Filing Date Title
US53912904P 2004-01-27 2004-01-27
US53912804P 2004-01-27 2004-01-27
US62085304P 2004-10-22 2004-10-22
US62087404P 2004-10-22 2004-10-22
US62097404P 2004-10-22 2004-10-22
US62097504P 2004-10-22 2004-10-22
US62065604P 2004-10-22 2004-10-22
US62091604P 2004-10-22 2004-10-22
US62091804P 2004-10-22 2004-10-22
US62092404P 2004-10-22 2004-10-22
US62091704P 2004-10-22 2004-10-22
US62067704P 2004-10-22 2004-10-22
US62100404P 2004-10-22 2004-10-22
US62113104P 2004-10-25 2004-10-25
US62201704P 2004-10-27 2004-10-27
US62232004P 2004-10-27 2004-10-27
US62815604P 2004-11-17 2004-11-17
US62816704P 2004-11-17 2004-11-17
US62819004P 2004-11-17 2004-11-17
US62811104P 2004-11-17 2004-11-17
US62812304P 2004-11-17 2004-11-17
US62810104P 2004-11-17 2004-11-17
US62811204P 2004-11-17 2004-11-17
US62814504P 2004-11-17 2004-11-17
US62825104P 2004-11-17 2004-11-17
US62813404P 2004-11-17 2004-11-17
US62817804P 2004-11-17 2004-11-17
US62823104P 2004-11-17 2004-11-17
US63055904P 2004-11-26 2004-11-26
PCT/IB2005/002555 WO2005116850A2 (en) 2004-01-27 2005-01-27 Differential expression of markers in ovarian cancer

Publications (1)

Publication Number Publication Date
EP1721257A2 true EP1721257A2 (en) 2006-11-15

Family

ID=37054357

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05780004A Ceased EP1721257A2 (en) 2004-01-27 2005-01-27 Differential expression of markers in ovarian cancer

Country Status (4)

Country Link
EP (1) EP1721257A2 (en)
AU (1) AU2005248530A1 (en)
CA (1) CA2554703A1 (en)
WO (1) WO2005116850A2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8642031B2 (en) 2006-11-02 2014-02-04 Acceleron Pharma, Inc. Antagonists of BMP9, BMP10, ALK1 and other ALK1 ligands, and uses thereof
US10059756B2 (en) 2006-11-02 2018-08-28 Acceleron Pharma Inc. Compositions comprising ALK1-ECD protein
RU2559532C2 (en) 2006-11-02 2015-08-10 Акселерон Фарма, Инк. Antagonists of alk1 receptor and ligands and their application
BRPI0911853A8 (en) 2008-05-02 2018-03-06 Acceleron Pharma Inc compositions and methods for angiogenesis modulation and pericyte composition
CN104479018B (en) 2008-12-09 2018-09-21 霍夫曼-拉罗奇有限公司 Anti- PD-L1 antibody and they be used to enhance the purposes of T cell function

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7022821B1 (en) * 1998-02-20 2006-04-04 O'brien Timothy J Antibody kit for the detection of TADG-15 protein

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2005116850A2 *

Also Published As

Publication number Publication date
AU2005248530A1 (en) 2005-12-08
WO2005116850A3 (en) 2010-06-17
CA2554703A1 (en) 2005-12-08
WO2005116850A2 (en) 2005-12-08
WO2005116850A9 (en) 2006-12-07

Similar Documents

Publication Publication Date Title
US7553948B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of ovarian cancer
US7488813B2 (en) Diagnostic markers, especially for in vivo imaging, and assays and methods of use thereof
US20060147946A1 (en) Novel calcium channel variants and methods of use thereof
WO2007039903A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
KR20190087106A (en) Biomarkers for predicting the response of anticancer drugs to gastric cancer and their uses
WO2006054297A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
US20060263786A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
EP1721257A2 (en) Differential expression of markers in ovarian cancer
WO2006131783A2 (en) Polynucleotides, polypeptides, and diagnosing lung cancer
WO2005072050A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
WO2010061393A1 (en) He4 variant nucleotide and amino acid sequences, and methods of use thereof
US7528243B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
EP1749025A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
US8216792B2 (en) Compositions and methods for detection and treatment of proliferative abnormalities associated with overexpression of human transketolase like-1 gene
WO2006043271A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
US7906635B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of ovarian cancer
WO2005107364A9 (en) Polynucleotide, polypeptides, and diagnostic methods
WO2006021874A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
EP1735468A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
EP1732943A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
EP1545566A2 (en) Sim2 polypeptides and polynucleotides and uses of each in diagnosis and treatment of ovarian, breast and lung cancers
JP2007520217A (en) Novel nucleotide and amino acid sequences, and assays and methods of use for breast cancer diagnosis using the same
AU2005276208A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060825

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR LV MK YU

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SHKLAR, MAXIM

Inventor name: SHEMESH, RONEN

Inventor name: SAMEAH-GREENWALD, SHIRLEY

Inventor name: WALACH, SHIRA

Inventor name: SELLA-TAVOR, OSNAT

Inventor name: COHEN, YOSSI

Inventor name: NOVIK, AMIT

Inventor name: AKIVA, PINCHAS

Inventor name: AYALON-SOFFER, MICHAL

Inventor name: DAHARY, DVIR

Inventor name: SOREK, ROTEM

Inventor name: TOPORIK, AMIR

Inventor name: KOL, GUY

Inventor name: DIBER, ALEXANDER

Inventor name: LEVINE, ZURIT

Inventor name: POLLOCK, SARAH

Inventor name: COJOCARU, GAD, S.

DAX Request for extension of the european patent (deleted)
PUAK Availability of information related to the publication of the international search report

Free format text: ORIGINAL CODE: 0009015

RIC1 Information provided on ipc code assigned before grant

Ipc: C07H 21/02 20060101AFI20101006BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20101227