EP1732943A2 - Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer - Google Patents

Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer

Info

Publication number
EP1732943A2
EP1732943A2 EP05718182A EP05718182A EP1732943A2 EP 1732943 A2 EP1732943 A2 EP 1732943A2 EP 05718182 A EP05718182 A EP 05718182A EP 05718182 A EP05718182 A EP 05718182A EP 1732943 A2 EP1732943 A2 EP 1732943A2
Authority
EP
European Patent Office
Prior art keywords
amino acid
amino acids
homologous
pea
acid sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05718182A
Other languages
German (de)
French (fr)
Other versions
EP1732943A4 (en
Inventor
Amir Toporik
Dvir Dahary
Rotem Sorek
Sarah Pollock
Zurit Levine
Pinchas Akiva
Alexander Diber
Amit Novik
Osnat Sella-Tavor
Michal Ayalon-Sofer
Shira Walach
Shirley Sameah-Greenwald
Ronen Shemesh
Naomi Keren
Maxim Shklar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Compugen USA Inc
Original Assignee
Compugen USA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compugen USA Inc filed Critical Compugen USA Inc
Priority claimed from PCT/IB2005/000433 external-priority patent/WO2005072050A2/en
Publication of EP1732943A2 publication Critical patent/EP1732943A2/en
Publication of EP1732943A4 publication Critical patent/EP1732943A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57415Specifically defined cancers of breast
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention is related to novel nucleotide and protein sequences that are diagnostic markers for breast cancer, and assays and methods of use thereof.
  • Breast cancer is the most commonly occurring cancer in women, comprising almost a third of all malignancies in females. It is the leading cause of death for women between the ages 40-55 in the United States and one out of 8 females in the United States will develop breast cancer at some point in her life. The death rate from breast cancer has been slowly declining over the past decade, partially due do the usage of molecular markers that facilitate the discovery, tumor typing (and therefore choice of treatment), response to treatment and recurrence.
  • the most widely used serum markers for breast cancers are Mucin 1 (measured as CA 15-3) and CEA (CarcinoEmbryonic Antigen). Mucin 1 (MUCl) is present on the apical surface of normal epithelial cells.
  • VNTR Variable Number Tandem Repeat
  • CA 15-3 is a broadly used marker for breast cancer, a combination of CA 15-3 and CEA is more sensitive than using a single marker.
  • CA 15-3, CEA and ESR Epithrocyte Sedimentation Rate
  • Serum markers used to monitor therapeutic response in patients with metastatic breast cancer are associated with the "spike phenomenon". It is an initial transient rise of tumor marker levels which can be seen in up to 30% of responders in the first 3 months of commencing a therapy. It is important not to interpret this as a sign of disease progression leading to premature change of an effective therapy.
  • CA 27.29 is a new monoclonal antibody directed against a different part of MUCl and it is a newer marker than CA 15-3. It detects a different glycosylation pattern of MUCl, as compared with CA 15-3.
  • CA 27.29 is the first FDA- approved blood test for breast cancer recurrence. Because of superior sensitivity and specificity, CA 27.29 has supplanted CA 15-3 as the preferred tumor marker in breast cancer.
  • the CA 27.29 level is elevated in approximately one third of women with early-stage breast cancer (stage I or II) and in two thirds of women with late-stage disease (stage III or IV).
  • CA 27.29 lacks predictive value in the earliest stages of breast cancer and thus has no role in screening for or diagnosing the malignancy.
  • CA 27.29 also can be found in patients with benign disorders of the breast, liver, and kidney, and in patients with ovarian cysts. However, CA 27.29 levels higher than 100 units per mL are rare in benign conditions.
  • Estrogen 2 (beta) was shown to have a diagnostic role in breast cancer. It has been shown that the expression of the 'ex' variant of Estrogen 2 is correlated with response to Hormone adjuvant therapy. In addition it has been shown it may assist in better characterization of ER-1 positive breast cancers (together with progesterone receptor).
  • HER-2 also known as c-erbB2 is a membrane proto-oncogene with intrinsic tyrosine kinase activity.
  • Tumor expressing HER-2 are associated with shorter survival, shorter time-to- relapse and an overall worse prognosis. Tumors expressing HER-2 can be targeted with Trastuzumab - a biological adjuvant therapy which blocks the growth promoting action of HER- 2.
  • the ImmunoHistoChemistry (IHC) and Fluorescence In Situ Hybridization (FISH) tests are used to detect HER2: l.IHC: The most common test used to check HER2 status is an ImmunoHistoChemistry (IHC) test. The IHC test measures the protein made by the HER2 gene. 2. FISH: This test measures the number of copies of the HER2 gene present in the tumor cell.
  • HER-2 has been reported to show a better assessment of response to chemotherapy than a biochemical index score based on measurement of CA 15.3, CEA and ESR in a small series of patient. That finding is yet to be confirmed in a larger group of patient with HER-2 expressing tumors.
  • Other molecular markers mainly used for the diagnosis for cancers other than breast cancer were shown to have a diagnostic potential in breast cancer.
  • CA125 which is a major marker for ovarian cancer is also associated with breast cancer.
  • High levels of CA 19-9 a major marker for colorectal and pancreatic cancers, can be found in breast cancer. Overall, these markers are not frequently used for the detection of breast cancer to due their inferiority compared with other markers already described.
  • markers for the diagnosis and typing of breast cancer are being used by pathologists, including both markers described above and additional markers, such as immunohistochemistry markers that have been shown to have a beneficial value for the diagnosis of breast cancer, including PCNA and Ki-67 are maybe the most important and highly used immunohistochemistry markers for breast cancer.
  • Other markers as E-Cadherin, Cathepsin D and TFF1 are also used for that purpose.
  • Cyclin- E expression level in the breast cancer was found to be a very strong indicator for prognosis, stronger than any other biological marker.
  • pathological conditions which represent an increased risk factor for development breast cancer.
  • Non- limiting examples of these conditions include: - Ductal hyperplasia without atypia. It is the most frequently encountered breast biopsy result that is associated with increased risk of future development of breast cancer (2 fold increased risk).
  • the loss of expression of transforming growth factor beta receptor II in the affected epithelial cells is associated with an increased risk of invasive breast cancer. Atypical hyperplasia.
  • Women having atypical hyperplasia with over-expression of HER-2 have a greater than 7- fold increased risk of developing invasive breast carcinoma, as compared with women with non-proliferative benign breast lesions and no evidence of HER-2 amplification. These pathological conditions should be effectively diagnosed and monitored in order to facilitate early detection of breast cancer.
  • the background art does not teach or suggest markers for breast cancer that are sufficiently sensitive and/or accurate, alone or in combination.
  • the present invention overcomes these deficiencies of the background art by providing novel markers for breast cancer that are both sensitive and accurate. These markers are overexpressed in breast cancer specifically, as opposed to normal breast tissue.
  • the measurement of these markers, alone or in combination, in patient (biological) samples provides information that the diagnostician can correlate with a probable diagnosis of breast cancer.
  • the markers of the present invention alone or in combination, show a high degree of differential detection between breast cancer and non-cancerous states.
  • suitable biological samples which may optionally be used with preferred embodiments of the present invention include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, breast tissue, any human organ or tissue, including any tumor or normal tissue, any sample obtained by lavage (for example of the bronchial system or of the breast ductal system), and also samples of in vivo cell culture constituents.
  • the biological sample comprises breast tissue and/or a serum sample and/or a urine sample and/or a milk sample and/or any other tissue or liquid sample.
  • the sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay.
  • Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dm.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from EMBnet, maintained by the ISREC Bionformatics group and the LICR Information Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformatics, http://w ⁇ w.ch.embnet.org/soflware/TMDPRED_form.html) for transmembrane region prediction; (iii) signalp imm or (iv) signalp_nn (both from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk services/S
  • signalp_hmm and “signalp_nn” refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor.
  • T - > C means that the SNP results in a change at the position given in the table from T to C.
  • M - > Q means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*).
  • a comment may be found in parentheses after the above description of the SNP itself.
  • This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP.
  • An FTId is a unique and stable feature identifier, which allows construction of links directly from position- specific annotation in the feature table to specialized protein-related databases.
  • the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence.
  • SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker.
  • Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein.
  • a key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P- alue without including the level of expression in cell- lines (PI) - library based statistics: P- value including the level of expression in cell-lines (P2) - EST clone statistics: P- value without including the level of expression in cell- lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell- lines (R3) - EST clone statistics: P- value including the level of expression in cell- lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell- lines (R4)
  • Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer.
  • microarrays As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured.
  • the probe name begins with the name of the cluster (gene), followed by an identifying number.
  • Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the Human Genome U133 (HG-U133) Set at www.affymetrix.com/products/arrays/specific/hgul33.affx; GeneChip Human Genome U133A 2.0 Array at wvvw.affymetrix.com/products/arrays/specific/hgul33av2.affx; and Human Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgul33plus.affx).
  • the probe names follow the Affymetrix naming convention.
  • the data is available from NCBI Gene Expression Omnibus (see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210).
  • the probes designed according to the present inventors are listed below.
  • TAA histograms The following list of abbreviations for tissues was used in the TAA histograms.
  • TAA Tumor Associated Antigen
  • TAA histograms represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below.
  • nucleic acid sequences of the present invention refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below.
  • oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and/or for any other use.
  • breast cancer refers to cancers of the breast or surrounding tissue, including but not limited to ductal carcinoma (in-situ or invasive), lobular carcinoma (in- situ or invasive), inflammatory breast cancer, mucinous carcinoma, tubular carcinoma, or Paget's disease of the nipple, as well as conditions that are indicative of a higher risk factor for later development of breast cancer, including but not limited to ductal hyperplasia without atypia and atypical hyperplasia, referred to herein collectively as "indicative conditions”.
  • the term "marker” in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having breast cancer (or one of the above indicative conditions) as compared to a comparable sample taken from subjects who do not have breast cancer (or one of the above indicative conditions).
  • the phrase “differentially present” refers to differences in the quantity of a marker present in a sample taken from patients having breast cancer (or one of the above indicative conditions) as compared to a comparable sample taken from patients who do not have breast cancer (or one of the above indicative conditions).
  • a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and or NAT-based assays.
  • a polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present.
  • diagnosis means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity.
  • the "sensitivity” of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives”). Diseased individuals not detected by the assay are “false negatives.” Subjects who are not diseased and who test negative in the assay are termed “true negatives.”
  • the "specificity” of a diagnostic assay is 1 minus the false positive rate, where the "false positive” rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
  • Diagnosing refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery.
  • the term “detecting” may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease.
  • a "biological sample obtained from the subject” may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below.
  • the term "level” refers to expression levels of RNA and or protein or to DNA copy number of a marker of the present invention. Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage.
  • test amount refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of breast cancer (or one of the above indicative conditions).
  • a test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals).
  • a “control amount” of a marker can be any amount or a range of amounts to be compared against a test amount of a marker.
  • a control amount of a marker can be the amount of a marker in a patient with breast cancer (or one of the above indicative conditions) or a person without breast cancer (or one of the above indicative conditions).
  • a control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals).
  • Detect refers to identifying the presence, absence or amount of the object to be detected.
  • a “label” includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means.
  • useful labels include 32 P, 35 S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target.
  • the label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample.
  • the label can be inco ⁇ orated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin.
  • the label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly.
  • the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize.
  • the binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule.
  • the binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6: 1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry.
  • Exemplar ⁇ ' detectable labels include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads.
  • the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture.
  • Immunoassay is an assay that uses an antibody to specifically bind an antigen.
  • the immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen.
  • the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample.
  • Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein.
  • polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymorphic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species.
  • immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein.
  • solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity).
  • a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below amino acid sequence comprising a sequence in the table below:
  • T10888_PEA_1. _P6 * According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • T39971 T5 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below Protein Name Z21368_PEA_ X_P2 Z2136S_PEA_ .1-P5 Z21368_PEA_ .1JP15 Z21368 PEA 1 P16
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • T59832 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • JP18 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name Z41644_PEA 1 T5 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below Protein Name Z41644_PEA 1 P10
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name HUMGRP5E T4 HUMGRP5E_T5 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMGRP5E node 0 HUMGRP5E_node 2 HUMGRP5E node 8 HUMGRP5E node 3 HUMGRP5E rode 7
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name AA155578 PEA 1 T10 AA155578 PEA 1 T12
  • AA 155578 PEA 1 T8 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • T94936JPEA 1 P3 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • HSSTROL3 T12 a nucleic acid sequence comprising a sequence in the table below:
  • HSSTROL3 node 6 HSSTROL3 node 10 HSSTROL3 node 13 HSSTROL3 node 15 HSSTROL3 node 19 HSSTROL3 node 21 HSSTROL3_node_24 HSSTROL3 node 25 HSSTROL3_node 26 HSSTROL3 node 28 HSSTROL3 node 29 HSSTROL3 node 11 HSSTROL3_node 17 HSSTROL3 node 18 HSSTROL3 node 20 HSSTROL3 node 27
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name AY 180924 PEA 1 TI a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below Protein Name R75793. -PEA_ .1_P2 R75793. -PEA. -1_P5 R75793. _PEA_ .1JP6
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name R20779_T7 a nucleic acid sequence comprising a sequence in the table below: Segment Name R20779_node 0 R20779 node 2 R20779 node 7 R20779 node 9 R20779 node 18 R20779 node 21 R20779_node 24
  • an isolated polypeptide comprising an amino acid sequence according to R20779_P2.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence according to HSS100PCB_P3.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name HUMOSTRO PEA 1 PEA 1 T14 HUMOSTRO PEA 1 PEA 1 T16 HUMOSTRO PEA 1 PEA 1 T30 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMOSTROJPEA_l. _PEA. _l_node_0 HUMOSTROJPEA_l. _PEA. _l_node_10 HUMOSTRO_PEA_l.
  • _PEA_ _l_node_15 HUMOSTRO_PEA_l.
  • _PEA_ _l_node_17 HUMOSTRO_PEA_l.
  • _PEA_ .l_node_20 HUMOSTROJPEA.
  • _l_JPEA_l_node_21 HUMOSTRO_PEA_l.
  • an isolated polynucleotide comprising a polynucleotide having a sequence selected from the group consisting of: RI 1723_PEA_1_T15, RI 1723_PEA_1_T17, RI 1723_PEA_1_T19, RI 1723_PEA_1_T20 ; RI 1723_PEA_1_T5, or RI 1723_PEA_1_T6.
  • an isolated polynucleotide comprising a node having a sequence selected from the group consisting of : R11723_PEA_l_node_13, R11723_PEA_l_node_16, R11723_PEA_l_ ⁇ ode_19, RI 1723JPEA_l_node_2, RI 1723_PEA_l_node_22, RI 1723_PEA_l_ node_31, RI 1723JPEA_l_node_10, RI 1723_PEA_l_node_l 1, RI 1723 _PEA_l_node_15, RI 1723_PEA_l_node_18, RI 1723_PEA_l_node_20, RI 1723_PEA_l_node_21, RI 1723_PEA_l_node_23, RI 1723_PEA_l_node_23, RI 1723_
  • an isolated polypeptide comprising a polypeptide having a sequence selected from the group consisting of : RI 1723_PEA_1_P2, RI 1723_PEA_1_P6, RI 1723_PEA_1 JP7, R11723 PEA 1 P13, or R11723 PEA 1 P10.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • T46984_PEA_ J_T54 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • M78076_PEA_1. _T28 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • HSMUC1AJ?EA_ JJT47 a nucleic acid sequence comprising a sequence in the table below:
  • HSMUCIA PEA 1 P63 According to prefe ⁇ ed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMUCl A_PEA_1_P63, comprising a first amino acid sequence being at least 90 % homologous to
  • MTPGTQSPFFLLLLLTNLT TGSGHASSTPGGEKETSATQRSSN co ⁇ esponding to amino acids 1 - 45 of MUCl JTUMA ⁇ , which also co ⁇ esponds to amino acids 1 - 45 of HSMUCl A JPE A_1JP63
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK co ⁇ esponding to amino acids 46 - 85 of HSMUC1AJPEA_1 JP63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSMUCl AJPEA_1 JP63 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUCl AJPEA_1_P63.
  • an isolated chimeric polypeptide encoding for T46984J D EA_1 JP2 comprising a first amino acid sequence being at least 90 % homologous to
  • FFQLVDV ⁇ TGAELTPHQ co ⁇ esponding to amino acids 1 - 433 of RIB2_HUMA ⁇ which also co ⁇ esponds to amino acids 1 - 433 of T469S4JPEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP co ⁇ esponding to amino acids 434 - 444 of T46984JPEA_J_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for T46984J?EA_1 JP10 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984J?EA_1 JP10 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LMDQK in T46984JPEA_1_P10.
  • an isolated chimeric polypeptide encoding for T46984_PEA_1_P11 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T46984J?EA_1 JP12 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTIIASTWALTPTHYLTKHD ⁇ RLKASLDRPFTNLESAFYSINGLSSL GAQWDAKKACT ⁇ IRS ⁇ LDPS ⁇ VDSLFYAAQASQALSGCEISIS ⁇ ET DLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATNQALQTASHLSQQADLRSI VEEEDLVA ⁇ DELGGVYLQFEEGLETT ⁇ VAAnT MDHVGTLPSIKEDQVIQLMNA
  • PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN co ⁇ esponding to amino acids 1 - 338 of RIB2_HUMAN, which also co ⁇ esponds to amino acids 1 - 338 of T46984_PEA_1 JP12
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SQDLH co ⁇ esponding to amino acids 339 - 343 of T46984J?EA_1J?12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T46984_PEA_1 JP12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T469S4_PEA_1JP12.
  • an isolated chimeric polypeptide encoding for T46984JPEA_1JP21 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M co ⁇ esponding to amino acids 1 - 1 of T46984J?EA_1 JP21, and a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984J?EA_1 JP27 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence FGSGL ⁇ MSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984_PEA_1_P27.
  • an isolated chimeric polypeptide encoding for T46984JPEA_1 JP32 comprising a first amino acid sequence being at least 90 % homologous to MA PGSSTVFLLALTIIASTWALTPT TKJTD ⁇ TSP KASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRS ⁇ DPS ⁇ SLFYAAQASQALSGCEISISbffiTKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEffiDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHNGTEPSIKEDQVIQLMNA ITSKIs FESLSEAFSVASAAAVLSHNRYHWVNV EGSASDTHEQAIL PLTQATNKLEHAKSVASRATNLQKTSFTPVGD ELOTM ⁇ NKFSSGYYY
  • an isolated polypeptide encoding for a tail of T469S4JPEA_1 JP32 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984_PEA_1 JP32.
  • an isolated chimeric polypeptide encoding for T469S4JPEA_J_P34 comprising a first amino acid sequence being at least 90 % homologous to
  • PLTQATVKLEHAKSVASRATVLQKTSFTPVG co ⁇ esponding to amino acids 1 - 329 of RIB2_HUMAN, which also co ⁇ esponds to amino acids 1 - 329 of T46984_PEA_1_P34.
  • an isolated chimeric polypeptide encoding for T46984JPEA_1 JP35 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTmXTi aD ⁇ ERLKASLDRPFTNLESAFYSINGLSSL GAQWDA KACTYIRS ⁇ LDPS ⁇ VDSLFYAAQASQALSGCEISIS ⁇ ETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGV TQFEEGLETTALFVAATNKLMDHVGTEPSIKEDQVIQLMNA ITSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI co ⁇ esponding to amino acids 1 - 287 of PJB2 JHUMAN, which also co ⁇ esponds
  • an isolated polypeptide encoding for a tail of T46984_PEA_1_P35 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GCWPSRQSREQHISSRRKMEILKTECQEK ⁇ SRT SMRRKMEKKNFI in T46984_PEA_1_P35.
  • an isolated chimeric polypeptide encoding for T46984J?EA_1_P38 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTHYI.TK ⁇ DVERLKASLDRPFTNLESAFYSIVGLSSL GAQWDAKKACTYIRSNLDPSN ⁇ SLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEAL co ⁇ esponding to amino acids 1 - 145 of PJB2_HUMAN, which also co ⁇ esponds to amino acids 1 - 145 of T46984_PEA_1_P38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDPDWCQCLQL
  • an isolated polypeptide encoding for a tail of T46984J?EA_1 JP3S comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T469S4_PEA_1 JP3S.
  • an isolated chimeric polypeptide encoding for T46984J?EA_1 JP39 comprising a first amino acid sequence being at least 90 % homologous to
  • VTQ ⁇ YHAVAALSGFGLPLASQEALSALTARLSKEETVLA co ⁇ esponding to amino acids 1 - 160 of RIB2_HUMAN, which also co ⁇ esponds to amino acids 1 - 160 of T46984JPEA_1JP39.
  • an isolated chimeric polypeptide encoding for T46984J?EA_1 JP45 comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL
  • T46984JPEA_1 JP45 and a second amino acid sequence being at least 70%, optionally at least
  • an isolated polypeptide encoding for a tail of T46984JPEA_J_P45 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • NSPGSADSIPPVPAG in T46984_PEA_1_P45 there is provided an isolated chimeric polypeptide encoding for T469S4JPEA_J JP46, comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T46984J?EA_1 JP46 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_PEA_1_P46.
  • an isolated chimeric polypeptide encoding for TI 1628_PEA_1_P2 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIP FKGHPETLEKJDK i LKSEDE co ⁇ esponding to amino acids 1 - 55 of TI 1628 JPE A_l JP2, and a second amino acid sequence being at least 90 % homologous to MKASEDLKK ⁇ GATVLTALGGILKI ⁇ &GHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQV LQSKXTPGDFGADAQGAMNKALELFRKDMASNYKELGFQG co ⁇ esponding to amino acids 1 - 99 of Q8WVH6, which also co ⁇
  • an isolated polypeptide encoding for a head of TI 1628JPEA_1_P2 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQL ⁇ T.
  • an isolated chimeric polypeptide encoding for TI 1628JPEA_J JP5 comprising a first amino acid sequence being at least 90 % homologous to IVlKASEDLKKHGATVLTALGGELKKKGH LQSKHPGDFGAJ AQGAMNKALELFRKDMASNYKELGFQG co ⁇ esponding to amino acids 56 - 154 of MYG_HUMAN_V1, which also co ⁇ esponds to amino acids 1 - 99 of T1162S_PEA_1_P5.
  • an isolated chimeric polypeptide encoding for TI 1628_PEA_1 JP7 comprising a first amino acid sequence being at least 90 % homologous to
  • SKHPGDFGADAQGAMNK co ⁇ esponding to amino acids 1 - 134 of M r G_HUMAN_Vl, which also co ⁇ esponds to amino acids 1 - 134 of TI 1628JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence G co ⁇ esponding to amino acids 135 - 135 of TI 1628 JPE A_1JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for TI 1628JPEA_1_P10 comprising a first amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGLSDGEWQL ⁇ NVWGKVEADIPGHGQEVLIPJ.FKGFfPETLEKJDKFKHLKSEDE co ⁇ esponding to amino acids 1 - 55 of TI 1628JPEA_J JP10, and a second amino acid sequence being at least 90 % homologous to MKASEDLKKHGATVLTA GGILKKKGHffi
  • LQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG co ⁇ esponding to amino acids 1 - 99 of Q8WVH6, which also co ⁇ esponds to amino acids 56 - 154 of TI 1628_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of TI 1628_PEA_1_P10 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQLVLNNWGKNEAI IPGHGQEVLIRLFKGHPETXEK.FDKFK ⁇ LKSEDE of T1 ⁇ 628JPEA_J_P10.
  • an isolated polypeptide encoding for a tail of M78076JPEA_1 JP4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECLTVNPSLQFPLNP in M78076_PEA_1_P4.
  • an isolated chimeric polypeptide encoding for M78076JPEA_1 JP12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M78076jPEA_l _P12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECVCSKGFPFPLIGDSEG in M78076JPEA_1JP12.
  • an isolated chimeric polypeptide encoding for M78076JPEA_1 JP14 comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLE YCRQM YPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQP ⁇ T)DYF ⁇ /EPPQAEEEEET PPSSHTLA GKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAKMDLEERR ⁇ RQI ⁇ EVMREWA ⁇ 1A
  • an isolated polypeptide encoding for a tail of M78076JPEA_1 JP14 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP in M78076JPEA_1J 3 14.
  • an isolated chimeric polypeptide encoding for M78076JPEA_1 JP21 comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSL AGGSPG A AE APGS AQ V AGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEHEGFLP ⁇ ⁇ lDLEERRMRQINEVMREWAMAX)N
  • NPTYRFLEERP co ⁇ esponding to amino acids 406 - 650 of APP1_HUMAN, which also co ⁇ esponds to amino acids 353 - 597 of M78076JPEA_1JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of M78076JPEA_J JP21 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EA, having a structure as follows: a sequence starting from any of amino acid numbers 352-x to 352; un ⁇ ending at any of amino acid numbers 353+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for M78076JPEA_1_P24 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of M78076_PEA_1_P24 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RECLLPWLPLQISEGRS in M78076JPEA_1_P24.
  • ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQV co ⁇ esponding to amino acids 1 - 449 of APP1 JHUMAN, which also co ⁇ esponds to amino acids 1 - 449 of M78076J?EA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated chimeric polypeptide encoding for M78076JPEA_1_P25 comprising a first amino acid sequence being at least 90 % homologous to
  • ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ co ⁇ esponding to amino acids 1 - 448 of APP1_HUMAN, which also co ⁇ esponds to amino acids 1 - 448 of M78076JPEA_1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • PQNPNSQPP ⁇ . ⁇ GSLEVIISHPFVPJlLE ⁇ LISPFQFQNSIPKNSQlVPAASPRGTSSP co ⁇ esponding to amino acids 449 - 505 of M78076JPEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of M78076J?EA_1 JP25 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PQNPNSQPPvAAGSLEV ⁇ SHPF ⁇ RRLEILISPFQFQNSIPKNSQiVPAASPRGTSSP in M78076_PEA_1JP25.
  • an isolated chimeric polypeptide encoding for M85491JPEA_J_P13 comprising a first amino acid sequence being at least 90 % homologous to MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIR TYQVC ⁇ FVFESSQNNWLRTK ⁇ IRRRGAHRIHNE ⁇ I FSNRDCSSIPSWGSCKETF LYYY EADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKTNTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEANENGTVCRGCPSGTFKANQGDEACTHCPLN SRTTSEGATNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPR
  • an isolated polypeptide encoding for a tail of M85491 _PEA_1_P13 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPIGWVLSPSPTSLRAPLPG in MS5491_PEA_1JP13.
  • an isolated chimeric polypeptide encoding for M85491 JPEA_1 JP14 comprising a first amino acid sequence being at least 90 % homologous to
  • ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL co ⁇ esponding to amino acids 271 - 301 of M85491 JPEA_1 JP14, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of M85491 JPEA_1 JP14 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ERQDLTMLSRLVLNSWPQMJLPPQPPKVLEL in MS5491_PEA_1_P14.
  • an isolated chimeric polypeptide encoding for HSSTROL3 JP4 comprising a first amino acid sequence being at least 90 % homologous to
  • WQLVQEQVRQTMAEAJLKNWSDVTPLTFTEVHEGRADIMIDFARYW co ⁇ esponding to amino acids 1 - 163 of MM11 JHUMAN, which also co ⁇ esponds to amino acids 1 - 163 of
  • HSSTROL3JP4 a bridging amino acid H co ⁇ esponding to amino acid 164 of HSSTROL3_P4, a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSSTROL3JP4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG in HSSTROL3JP4.
  • an isolated chimeric polypeptide encoding for HSSTROL3 JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSSTROL3JP5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%» and most preferably at least about 95% homologous to the sequence ELGFPSSTGRDESLEHCRCQGLHK in HSSTROL3_P5.
  • an isolated chimeric polypeptide encoding for H8STROL3JP7 comprising a first amino acid sequence being at least 90 % homologous to ⁇ LAPAAWLRSAAARALLPPMLLLLLQPPPLLAPvALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLPPPRCGVPDPSDGLSARMlQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW co ⁇ esponding to amino acids 1 - 163 of MM11 JHUMAN, which also co ⁇ esponds to amino acids 1 - 163 of HSSTROL3JP7, a bridging amino acid H co ⁇ esponding to amino acid 164 of HSSTROL3JP7, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAILAFFPKTHR
  • an isolated polypeptide encoding for a tail of HSSTROL3 JP7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3JP7.
  • an isolated chimeric polypeptide encoding for HSSTROL3JP8 comprising a first amino acid sequence being at least 90 % homologous to
  • HSSTROL3JP8 a bridging amino acid H co ⁇ esponding to amino acid 164 of HSSTROL3JPS, a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSSTROL3JPS comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRPCLPVPLLLCWPL in HSSTROL3_P8.
  • an isolated chimeric polypeptide encoding for HSSTROL3 JP9 comprising a first amino acid sequence being at least 90 % homologous to LAPAAWLRSAAARA LPPMLLLLLQPPPLLAPxALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK co ⁇ esponding to amino acids 1 - 96 of MMl 1 JHUMAN, which also co ⁇ esponds to amino acids 1 - 96 of HSSTROL3JP9, a second amino acid sequence being at least 90 % homologous to PJLRFPWQLVQEQVRQTMAEAX.KVWSDVTPLTFTEVHEGPvADOvlLDFARYW co ⁇ esponding to amino acids 113 - 163 of MMl 1 JHUMAN, which also co ⁇ esponds to amino acids 97 - 147 of HSSTROL3JP
  • an isolated chimeric polypeptide encoding for an edge portion of HSSTROL3 JP9 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KR, having a structure as follows: a sequence starting from any of amino acid numbers 96-x to 96; and ending at any of amino acid numbers 97+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of HSSTROL3 JP9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3 JP9.
  • an isolated chimeric polypeptide encoding for AYl 80924JPEA_J_P3 comprising a first amino acid sequence being at least 90 % homologous to
  • MLNNSGLFVLLCGLLVSSSAQEVLAGVSSQLLN co ⁇ esponding to amino acids 1 - 33 of LATH JHUMAN, which also co ⁇ esponds to amino acids 1 - 33 of AYl 80924 _PEA_1 JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GETVLLWVMQNPEPMPVKFSLAKYLGHNEHY co ⁇ esponding to amino acids 34 - 64 of AY 180924 JPEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for R75793 _PEA_1 JP2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HUMCA1XIAJP14 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMCA1XIAJP14 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSMMIINSQTIMVNNYSSSFrTLML in HUMCA1XIA_P14.
  • an isolated chimeric polypeptide encoding for HUMCA1XIAJP15 comprising a first amino acid sequence being at least 90 % homologous to
  • PIGPPGEK co ⁇ esponding to amino acids 1 - 714 of CA1BJHUMAN which also co ⁇ esponds to amino acids 1 - 714 of HUMCA1XIAJP15
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MCCNLSFGILIPLQK co ⁇ esponding to amino acids 715 - 729 of HUMCA1XIAJP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMCA1XIAJP15 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCCNLSFGILIPLQK in HUMCAIXIA JP15.
  • an isolated chimeric polypeptide encoding for HUMCA1XIAJP16 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HUMCA1XIAJP16 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of HUMCA1XIAJP16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85?/ 0 , more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSFSFSLFYKXVKFACDKRFNGRHDERKVVKLSLPLYLrYE in HUMCA1XLAJP16.
  • an isolated chimeric polypeptide encoding for HUMCA1XIAJ 17, comprising a first amino acid sequence being at least 90 % homologous to IEPWSSRWKTKPWLWDFTVTTLALTFLFQAJIEVRGAAPVDVLKALDFH ⁇ SPEGISKTT GFCT ⁇ T K ⁇ SKGSDTAYRVSKQAQLS.
  • an isolated polypeptide encoding for a tail of HUMCA1XIAJP17 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRSTRPEK ⁇ FVFQ in HUMCA1XLA_P17.
  • an isolated chimeric polypeptide encoding for R20779_P2 comprising a first amino acid sequence being at least 90 % homologous to MCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNTAEIQHCLV NAGDVGCG ECFENNSCEIRGLHGICMTTH.HNAGKFDAQGKSFIKDALKCKAHALRH RFGCISRKCPA EMVSQLQRECYLKIHDLCAAAQENTRVIVEMIHFKDLLLHE co ⁇ esponding to amino acids 1 - 169 of STC2 JHUMAN, which also co ⁇ esponds to amino acids 1 - 169 of R20779JP2, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence CYKIEITTV
  • an isolated polypeptide encoding for a tail of R20779JP2 comprising a polypeptide being at least 70%), optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence CYKIEITMPKRRKVKLRD in R20779_P2.
  • HSCOC4JPEA_l JP5 and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at leant 90% and most preferably at least 95% homologous to a polypeptide having the sequence D TLSGPQVTLLPFPCTPAPCSLCS co ⁇ esponding to amino acids 819 - 843 of HSCOC _PEA_l J?5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l_P5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DVTLSGPQNrLLPFPCTPAPCSLCS in HSCOC4_PEA_l JP5.
  • an isolated chimeric polypeptide encoding for HSCOC4_PEA_J JP6 comprising a first amino acid sequence being at least 90 % homologous to
  • HSCOC4JPEA_l JP12 an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP12, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGL ⁇ WASSFFTLSLQKPRLLLFSPS WHLG VPLSVGVQLQDVPRGQWKGSVFLR
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA Vietnamese JP 12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RAREGVGPGTGGGEGVE in HSCOC4JPEA_l_P12.
  • an isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP15 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP15 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence V ⁇ HSLV ⁇ HSLAWVARTPGPRGQARSRPQPPTRG ⁇ PAALLPGVFGGRLTSWLRDLEL in HSCOC4_PEA_l_P15.
  • an isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP16 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQWKGSVFLR ⁇ PSR ⁇ NPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYR AI.DQKMPVPSTDTITVMV ENSHGLRVRKI ⁇ TVRPSSIFQDDFVIPDISEPGTWKJSARFSDGLESNSSTQF PNFEVKITPGKPYILT GHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSFFLSLSKAEFQDALEKL ⁇ U G
  • YTMEA ffiDYEDYEYDELPAJ ⁇ DPDA LQPVTPLQLFEGRRMlRR EAPK co ⁇ esponding to amino acids 1 - 1457 of C04JHUMANJ 1, which also co ⁇ esponds to amino acids 1 - 1457 of HSCOC4J?EA_l JP16, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a tail of HSCOC4_PEA_J J?16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AERQGGANWHGHRGRHPPEWPRPAC in HSC0C4JPEA_1JP16.
  • an isolated chimeric polypeptide encoding for HSCOC4 JPEA_1 JP20 comprising a first amino acid sequence being at least 90 % homologous to
  • QGSFQGGFRSTQ co ⁇ esponding to amino acids 1 - 1303 of C04_HUMAN_V1, which also co ⁇ esponds to amino acids 1 - 1303 of HSCOC4JPEA_l JP20
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGAVPGLWRGW ⁇ NLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH co ⁇ esponding to amino acids 1304 - 1349 of HSCOC4JPEA_l JP20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP20 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence VGAVPGLWRGWWLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH in HSCOC4JPEA_1JP20.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_J JP9 comprising a first amino acid sequence being at least 90 % homologous to
  • YFDSV co ⁇ esponding to amino acids 1 - 1529 of C04 JHUMAN V 1 , which also co ⁇ esponds to amino acids 1 - 1529 of HSC0C4J?EA_1JP9
  • a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SGER co ⁇ esponding to amino acids 1530 - 1533 of HSCOC4_PEA_l_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4_PEA_l_P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about S5%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGER in HSCOC4_PEA_l_P9.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP22 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNN ⁇ PCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKKEVYMPSSIFQDDFVIPDISEPGTWKISAJRFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAY FGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKL
  • RLFETKITQVLHF co ⁇ esponding to amino acids 1 - 1653 of C04JHUMA ⁇ _V1, which also co ⁇ esponds to amino acids 1 - 1653 of HSCOC4J?EA_l JP22
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SMKQTGEAGRAGGRQGG co ⁇ esponding to amino acids 1654 - 1670 of HSCOC4J?EA_l JP22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4J?EA_l_P22 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SMKQTGEAGRAGGRQGG in HSCOC4_PEA_l JP22.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP23 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCOC4J?EA_l JP23 comprising a polypeptide being at least 70%, optionally at least about S0%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QSSHRGPGLTLPRGPAVLVSLGVACSSYRSCTQPVCSDTNFLPSQPQSNSPFPLLLTPS in HSC0C4JPEA_1JP23.
  • an isolated chimeric polypeptide encoding for HSCOC4_PEA_l_P24 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP24 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SADVLCFTGHQVRADSWPPCVLLKSASVLRGSALASVAPWSGVCRTRMATG in HSCOC4_PEA_JJP24.
  • an isolated chimeric polypeptide encoding for H8COC4JPEA_l JP25 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQG ⁇ NLLFSSRRGHLFLQT ⁇ QPIYNPGQRVRYRWALDQKMRPSTDTITVMV ENSHGLRVRI K ⁇ VYMPSSIFQDDFVIPDISEPGT KISARFSDGLESNSSTQFEVKKYNL
  • SAEVCQCAEG co ⁇ esponding to amino acids 1 - 1593 of C04JHUMAN _V1, which also co ⁇ esponds to amino acids 1 - 1593 of HSCOC4J?EA_l JP25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG co ⁇ esponding to amino acids 1594 - 1657 of HSCOC4_PEA_l_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4jPEA_l JP25 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG in HSCOC4JPEA_l_P25.
  • an isolated chimeric polypeptide encoding for HSCOC4_PEA_l JP26 comprising a first amino acid sequence being at least 90 % homologous to
  • SAEVCQCAEG co ⁇ esponding to amino acids 1 - 1593 of C04 JHUMAN _V1, which also co ⁇ esponds to amino acids 1 - 1593 of HSCOC4_PEA_l_P26, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL co ⁇ esponding to amino acids 1594 - 1691 of HSCOC4J?EA_l JP26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4J?EA_l_P26 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ETEGL GRGSGGGMAGAPPTLSDGFPNFRE VPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL in HSCOC4_PEA_l JP26.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP30 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQG ⁇ NLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKKEVY ⁇ 4PSSIPQDDFVIPDISEPGTN SAPJ ⁇ SDGLESNSSTQFEVKK ⁇ NL PNFEVKITPGKPYILTVPGFFL.DEMQLDIQARYLYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSMSLSKAEFQDALEKL
  • an isolated polypeptide encoding for a tail of HSC0C4JPEA_1 JP30 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RNPVRLLQPRAQMFCVLRGTK in HSCOC4JPEA_1_P30.
  • an isolated chimeric polypeptide encoding for HSCOC4_PEA_l_P38 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPL8VGVQLQDVPRGQWKGSVFLR NPSRNNNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITNMN E ⁇ SHGLRVPJ ⁇ H EW / MPSSIFQDDFVIPDISEPGT ⁇ T SAILFSDGLES ⁇ SSTQFE ⁇ KYVL PNFEV TPGKP TLTNPGHLDEMQLDIQAI ⁇ YIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLV ⁇ GQSFFLSLSKAEF
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l _P3S comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVTLSGPQVTLLPFPCTPAPCSLCS in HSCOC4_PEA_J_P3S.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP39 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP39 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSSRGEG in HSCOC4_PEA_l_P39.
  • an isolated chimeric polypeptide encoding for HSCOC4J?EA_l JP40 comprising a first amino acid sequence being at least 90 % homologous to
  • HSCOC4JPEA_l JP40 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP40 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and pjost preferably at least about 95% homologous to the sequence AGEWTEPHFPLKGRVPGRPGEAEYGHY in HSCOC4_PEA_l JP40.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP41 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP41 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SGER in HSCOC4JPEA_JJP41.
  • an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP42 comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR ⁇ PSR ⁇ NPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWT.
  • an isolated polypeptide encoding for an edge portion of HSCOC4JPEA_l JP42 comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for WAPGAALGQGREGRTQAGAGLLEPAQAEPGRQLTRLHR, co ⁇ esponding to HSCOC4JPEA_l JP42.
  • an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP42 comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90%> and most prefej-jbly at least about 95%) homologous to the sequence VWSATQGNPLCPRY in HSCOC4 _PEA_l_P42.
  • an isolated chimeric polypeptide encoding for HUMTREFACJPEAJ2 JP8 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of HUMTREFACJPEA_2_P8 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence WKVHLPKGEGFSSG in HIJMTREFACJPEA_2JP8.
  • an isolated chimeric polypeptide encoding for HUMOSTRO JPE A_l JPEA_1 JP21 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMOSTRO JPE A_l JPEA_1JP21 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95 % homologous to the sequence VFLNFS in HUMOSTRO JPEA_1JPEA_1JP21.
  • an isolated chimeric polypeptide encoding for HLTMOSTROJ?EA_l JPEA_1 JP25 comprising a first amino acid sequence being at least 90 % homologous to
  • HUMOSTRO_PEA__l _PEA_1_P25 and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%» and most preferably at least 95%> homologous to a polypeptide having the sequence H co ⁇ esponding to amino acids 32 - 32 of HUM0STR0JPEA_1JPEA_JJP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • HUMOSTROJPEA_J_PEA_l JP30 and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI co ⁇ esponding to amino acids 32 - 39 of HUMOSTRO_PEA_1_PEA_J_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HUMOSTROJPEA_l JPEA_1_P30 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTRO_PEA_1_PEA_1_P30.
  • YMCQAHNSATGLNRTTVTMITNS corresponding to amino acids 1 - 319 of CEA6_HUMA ⁇ , which also co ⁇ esponds to amino acids 1 - 319 of T10888J?EA_1J?2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence DWTRP co ⁇ esponding to amino acids 320 - 324 of T10888JPEA_1 JP2, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T 10888 JPEA_1 JP2 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888J?EA_1_P2.
  • an isolated chimeric polypeptide encoding for T10888JPEA_1 JP4 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10S88JPEA_1 JP4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_J JP4.
  • an isolated chimeric polypeptide encoding for T108SSJPEA_1 JP4 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888JPEA_1 JP4 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10S88_PEA_1_P4.
  • an isolated chimeric polypeptide encoding for T108SSJPEA_1 JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T10888JPEA_1 JP5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWfflEALASHFQVESGSQRRAl ⁇ KJvFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF WFCFLISHV in T10888_PEA_1 JP5.
  • T39971_P6 an isolated chimeric polypeptide encoding for T39971 JP9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWNALADQESCKGRCTEGFNNDKKCQCDELCSY ⁇ QSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEK ⁇ ATVHEQVGGPSLTSDLQAQSKG ⁇ PEQTPV LKPEEEAPAPEVGASKPEG ⁇ DSRPETLHPGRPQPPAEEELCSGKPFDAFTDLK ⁇ GSLFAFR
  • PAPGHL co ⁇ esponding to amino acids 357 - 478 of VTNCJHUMAN, which also co ⁇ esponds to amino acids 326 - 447 of T39971 JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of T39971JP9 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T39971 JP1 1 comprising a first amino acid sequence being at least 90 % homologous to
  • CEGSSLSAVFEHFAMMQRDSWEDLFELLFWGRTS co ⁇ esponding to amino acids 1 - 326 of VTNCJHUMAN, which also co ⁇ esponds to amino acids 1 - 326 of T39971JP11, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T39971 JP11 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
  • DKY TIVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL co ⁇ esponding to amino acids 442 - 478 of Q9BSH7, which also co ⁇ esponds to amino acids 327 - 363 of T39971 JP1 1, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of T39971JP11 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T39971JP12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T39971JP12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VPGAVGQGRKHLGRV in T39971 J > 12.
  • an isolated chimeric polypeptide encoding for T39971 JP12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T39971 JP12 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 %> homologous to the sequence VPGAVGQGRKHLGRV in T39971_P12.
  • an isolated chimeric polypeptide encoding for Z21368_PEA_J_P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP2 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI in Z21368_PEA_1_P2.
  • an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • MKYSCCALVLAVLGT ⁇ LLGSLCSTVRSPRFRGRIQQERKNIRPNTILVLTDDQDVEL co ⁇ esponding to amino acids 1 - 57 of Q7Z2W2, which also co ⁇ esponds to amino acids 1 - 57 of Z21368J?EA_1JP5, second bridging amino acid sequence comprising A, and a third amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of Z21368_PEA_1_P5 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least three amino acids comprise LAF having a structure as follows (numbering according to Z21368J?EA_1 JP5): a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 59 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for Z2136SJPEA_J_P5 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at 5 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKYSCCALVLAVLGTELLGSLCSTVRSPPJHlGRIQQERKN PNIILVLTDDQDVELAFF GKYLNEYNGSYPPGWP ⁇ WLGLKNSRFYNYTVCRNGIK ⁇ INYFKMSKPJvIYPHRPVMMVTSHAAPHGPEDSAPQFSKLYPNASQmTPSYNYAPNMDK HW QYTGPMLPIH ffiFTT ⁇ LQPJOlLQTLMSVDDSVERLYNMLVETGELENTYIIYTAJ3 l o HGY ⁇ IGQFGLVKGKSMP Y
  • an 5 isolated polypeptide encoding for a head of Z21368JPEA_1 JP5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKYSCCALVLAVLGTELLGSLCST ⁇ SPPJRGRIQQERKNIRPNlILVLTDDQDVELAFF GKYL ffiYNGSYIPPGWREWLGLIKNSRFYNYTNCRNGIJ ⁇ KHGFDYAKDYFTDLITNES0 IN KMSKPsMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNY ⁇ APNMDK HWIMQYTGPMLPIHMEFTMLQRKRLQTLMSVDDSVERLYNML ⁇ TGELENTYHIYTAD HGYHIGQFGLVKGKS
  • an isolated chimeric polypeptide encoding for Z21368 JPE A_1_P5 comprising a first amino acid sequence being at least 90 %> homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERIs-NTRPNIILVLTDDQDVEL co ⁇ esponding to amino acids 1 - 57 of SUL1 JHUMAN, which also co ⁇ esponds to amino acids 1 - 57 of Z21368JPEA_1 JP5, and a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP16 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CVTVPPLSQPQIH in Z2136SJPEA_1_P16.
  • an isolated chimeric polypeptide encoding for Z21368J?EA_ 1 JP22 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z21368_PEA_1JP22 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ARYDGDQPRCAPRPRGLSPTVF in Z2136S_PEA_1_P22.
  • an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP23 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP23 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRL ⁇ H in Z21368JPEA_1JP23.
  • an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP23 comprising a first amino acid sequence being at least 90 % homologous to MKYSCCALNLA ⁇ T.GTELLGSLCST ⁇ H ⁇ SPRFRGRIQQERK ⁇ IRP ⁇ IILNLTDDQDVELGSL QV ⁇ I ⁇ KTROMEHGGATFI ⁇ AFVTTPMCCPSRSSMLTGKYVFf ⁇ H ⁇ W.
  • an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP23 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRLNH in Z21368_PEA_1 JP23.
  • an isolated chimeric polypeptide encoding for T59832JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK co ⁇ esponding to amino acids 12 - 55 of GILT JHUMAN, which also co ⁇ esponds to amino acids 1 - 44 of T59832JP5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated chimeric amino acids 45 - 189 of T59832JP5 wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832JP5 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least
  • ECAMGDRGMQLMHANAQRTDALQPPHEYNPWVTVNG co ⁇ esponding to amino acids 12 - 223 of GELT JHUMAN, which also co ⁇ esponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VPJFLALSLTXlVPWSQGWTRQRDQR co ⁇ esponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832JP7 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832JP7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T59832JP7 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7.
  • an isolated chimeric polypeptide encoding for T59832JP7 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV co ⁇ esponding to amino acids 1 - 90 of T59832JP7, and a second amino acid sequence being at least 90 % homologous to MEI NVTL YGNAQEQNVSGRWEFXCQHGEEECK-FNKVEACVLDELDMELAFLT C MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYV PWVTNNGVRIFLALSLTLr
  • an isolated polypeptide encoding for a head of T59832JP7 comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832JP7.
  • an isolated chimeric polypeptide encoding for T59832JP7 comprising a first amino acid sequence being at least 90 % homologous to
  • ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG co ⁇ esponding to amino acids 1 - 212 of Q8WU77, which also co ⁇ esponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR co ⁇ esponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832JP7 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 % homologous to the sequence VRTFLALSLTL ⁇ VPWSQGWTRQRDQR in T59832JP7.
  • an isolated chimeric polypeptide encoding for T59832JP9 comprising a first amino acid sequence being at least 90 %» homologous to
  • ECAMGDRGMQLMHANAQRTDALQPPHE co ⁇ esponding to amino acids 12 - 21 ' 4 of GfLTJHUMAN, which also co ⁇ esponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR co ⁇ esponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832JP9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKXRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9.
  • an isolated chimeric polypeptide encoding for T59832JP9 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNNTLNPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE co ⁇ esponding to amino acids 1 - 203 of BAC98466, which also co ⁇ esponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous
  • an isolated polypeptide encoding for a tail of T59832JP9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NPWIORPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9.
  • an isolated chimeric polypeptide encoding for T59832JP9 comprising a first amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN ⁇ KTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFL ⁇ RELFPTWLLV co ⁇ esponding to amino acids 1 - 90 of T59832_P9, second amino acid sequence being at least 90 % homologous to MEILNVTL VPYGNAQEQNVSGRWEFKCQHGEEECKFNK VE AC VLDELDMELAFLTI VC MEEFEDMERSLPLCLQLYAPGLSPDT ⁇ MECAMGDRGMQLMHANAQRTDALQPPHE co ⁇ esponding to amino acids 1 - 90 of T59832_P9, second amino acid sequence being at least
  • a third amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a head of T59832JP9 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN TvTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P9.
  • an isolated polypeptide encoding for a tail of T59832JP9 comprising a polypeptide being at least 70%), optionally at least about S0%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence NPWKXRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9.
  • an isolated chimeric polypeptide encoding for T59832JP9 comprising a first amino acid sequence being at least 90 % homologous to
  • Q8WLI77 which also co ⁇ esponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR co ⁇ esponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T59832JP9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for T59832JP12 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCP ⁇ FL ⁇ v ⁇ LFPTWLLVMEILN ⁇ TLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE co ⁇ esponding to amino acids 12 - 141 of GELT JHUMAN, which also conesponds to amino acids 1 - 130 of T59832JP12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTLMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKP
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832JP12 comprising a first amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDWTAAVQASPLQALDFFGNGPPVN T TGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV co ⁇ esponding to amino acids 1 - 90 of T59832JP12, second amino acid sequence being at least 90 % homologous to MEILNVTLNPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE co ⁇ esponding to amino acids 1 - 40 of BAC85622, which also co ⁇ esponds to amino acids 91 - 130 of T59832JP12, third amino acid sequence being at least 90 %
  • CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG co ⁇ esponding to amino acids 72 - 122 of BAC85622, which also co ⁇ esponds to amino acids 131 - 181 of T59832JP12, and a fourth amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a head of T59832JP12 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to the sequence
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of T59832JP12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for T59832JP12 comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN ⁇ T TGNLYLRGPLKKSNA PLVNVTLY ⁇ ALCGGCRAFLE ELFPTWLLVMEILNVTLNPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 1 - 130 of Q8WU77, which also co ⁇ esponds to amino acids 1 - 130 of T59832 JP 12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDT ⁇ MECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED
  • QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK co ⁇ esponding to amino acids 162 - 250 of Q8WU77, which also co ⁇ esponds to amino acids 131 - 219 of T59832JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP12 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832J 18, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK co ⁇ esponding to amino acids 12 - 55 of GILT JHUMAN, which also co ⁇ esponds to amino acids 1 - 44 of T59832JP18, and a second amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832JP1S comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for T59832JP18 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of T59832JP18 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMGRP5EJP4 comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGPAVPLPAGGGTNLTKMYPRGNIHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEA, RNLLGLEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK co ⁇ esponding to amino acids 1 - 127 of GRPJHUMAN, which also co ⁇ esponds to amino acids 1 - 127 of HUMGRP5EJP4, and a second amino acid sequence being at least 90 %> homologous to GSQREGRNPQLNQQ co ⁇ esponding to amino acids 135 - 148 of GRPJHLTMAN, which also co ⁇ esponds to amino acids 128 - 141 of HUMGRP5E
  • an isolated chimeric polypeptide encoding for HUMGRP5EJP5 comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRA LPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLEEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK co ⁇ esponding to amino acids 1 - 127 of GRP JHUMAN, which also co ⁇ esponds to amino acids 1 - 127 of HUMGRP5EJP5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS co ⁇ esponding to amino acids 128 - 142
  • an isolated polypeptide encoding for a tail of HUMGRP5EJP5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5.
  • an isolated chimeric polypeptide encoding for AA155578J?EA_1 JP4 comprising a first amino acid sequence being at least 90 % homobgous to
  • KLKA JHUMAN which also co ⁇ esponds to amino acids 1 - 29 of AA155578_PEA_1JP6, and a second amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for AA155578JPEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGARALAKLLPLLMAQLW co ⁇ esponding to amino acids 1 - 29 of KLKA JHUMAN, which also co ⁇ esponds to amino acids 1 - 29 of AA155578_PEA_1 JP8, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GHCGLE co ⁇ esponding to amino acids 30 - 35 of AA155578JPEA_1 JP8, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for AA 155578 J?EA_1_P9 comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGAPCARGSQ PWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNK co ⁇ esponding to amino acids 1 - 90 of KLKA JHUMAN, which also co ⁇ esponds to amino acids 1 - 90 of AA155578_PEA_1_P9.
  • an isolated chimeric polypeptide encoding for HSENA78_P2 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T94936JPEA_1_P2 comprising a first amino acid sequence being at least 90 % homologous to MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWNQTYEEGLFYAQKS KLKPLMVIHHLEDCQYSQALKXWAQ ⁇ EEIQEMAQ ⁇ I ⁇ IML
  • VPRIMFVDPSLTVRADIAGRYS ⁇ RLYTYEPRDLPL co ⁇ esponding to amino acids 1 - 150 of Q8TD06, which also co ⁇ esponds to amino acids 1 - 150 of T94936JPEA_1JP2.
  • an isolated chimeric polypeptide encoding for T94936_PEA_1_P3 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for Z41644JPEA_1 JP10 comprising a first amino acid sequence being at least 90 %> homologous to

Abstract

Description

NOVEL NUCLEOTIDE AND AMINO ACID SEQUENCES, AND ASSAYS AND METHODS OF USE THEREOF FOR DIAGNOSIS OF BREAST CANCER
FIELD OF THE INVENTION The present invention is related to novel nucleotide and protein sequences that are diagnostic markers for breast cancer, and assays and methods of use thereof.
BACKGROUND OF THE INVENTION Breast cancer is the most commonly occurring cancer in women, comprising almost a third of all malignancies in females. It is the leading cause of death for women between the ages 40-55 in the United States and one out of 8 females in the United States will develop breast cancer at some point in her life. The death rate from breast cancer has been slowly declining over the past decade, partially due do the usage of molecular markers that facilitate the discovery, tumor typing (and therefore choice of treatment), response to treatment and recurrence. The most widely used serum markers for breast cancers are Mucin 1 (measured as CA 15-3) and CEA (CarcinoEmbryonic Antigen). Mucin 1 (MUCl) is present on the apical surface of normal epithelial cells. Its extracellular domain consists of a heavily O- linked glycosylated peptide core made up of variable number of multiple repeats of 20 amino acid sequence referred to as VNTR (Variable Number Tandem Repeat). This variability results in natural polymoφhism of MUCl. Each VNTR has five potential O-linkage sites. The breast cancer disease state alters the enzymes which glycosylate Mucin 1 and therefore the polysaccharide side chains of tumor associated MUCl are generally shorter than those on the normally expressed molecule. Both aberrant and up-regulated expression of MUCl are features of malignancy and MUCl related markers are based on it. Though CA 15-3 is a broadly used marker for breast cancer, a combination of CA 15-3 and CEA is more sensitive than using a single marker. For the purpose of monitoring therapeutic response, CA 15-3, CEA and ESR (Erythrocyte Sedimentation Rate) are used as a panel, leading to over 90% of patients biochemically assessable. Serum markers used to monitor therapeutic response in patients with metastatic breast cancer are associated with the "spike phenomenon". It is an initial transient rise of tumor marker levels which can be seen in up to 30% of responders in the first 3 months of commencing a therapy. It is important not to interpret this as a sign of disease progression leading to premature change of an effective therapy. CA 27.29 is a new monoclonal antibody directed against a different part of MUCl and it is a newer marker than CA 15-3. It detects a different glycosylation pattern of MUCl, as compared with CA 15-3. CA 27.29 is the first FDA- approved blood test for breast cancer recurrence. Because of superior sensitivity and specificity, CA 27.29 has supplanted CA 15-3 as the preferred tumor marker in breast cancer. The CA 27.29 level is elevated in approximately one third of women with early-stage breast cancer (stage I or II) and in two thirds of women with late-stage disease (stage III or IV). CA 27.29 lacks predictive value in the earliest stages of breast cancer and thus has no role in screening for or diagnosing the malignancy. CA 27.29 also can be found in patients with benign disorders of the breast, liver, and kidney, and in patients with ovarian cysts. However, CA 27.29 levels higher than 100 units per mL are rare in benign conditions. Recently Estrogen 2 (beta) was shown to have a diagnostic role in breast cancer. It has been shown that the expression of the 'ex' variant of Estrogen 2 is correlated with response to Hormone adjuvant therapy. In addition it has been shown it may assist in better characterization of ER-1 positive breast cancers (together with progesterone receptor). HER-2 (also known as c-erbB2) is a membrane proto-oncogene with intrinsic tyrosine kinase activity. Tumor expressing HER-2 are associated with shorter survival, shorter time-to- relapse and an overall worse prognosis. Tumors expressing HER-2 can be targeted with Trastuzumab - a biological adjuvant therapy which blocks the growth promoting action of HER- 2. The ImmunoHistoChemistry (IHC) and Fluorescence In Situ Hybridization (FISH) tests are used to detect HER2: l.IHC: The most common test used to check HER2 status is an ImmunoHistoChemistry (IHC) test. The IHC test measures the protein made by the HER2 gene. 2. FISH: This test measures the number of copies of the HER2 gene present in the tumor cell. Measurement of the extracellular domain of HER-2 has been reported to show a better assessment of response to chemotherapy than a biochemical index score based on measurement of CA 15.3, CEA and ESR in a small series of patient. That finding is yet to be confirmed in a larger group of patient with HER-2 expressing tumors. Other molecular markers, mainly used for the diagnosis for cancers other than breast cancer were shown to have a diagnostic potential in breast cancer. For example, CA125 which is a major marker for ovarian cancer is also associated with breast cancer. High levels of CA 19-9, a major marker for colorectal and pancreatic cancers, can be found in breast cancer. Overall, these markers are not frequently used for the detection of breast cancer to due their inferiority compared with other markers already described. Panels of markers for the diagnosis and typing of breast cancer are being used by pathologists, including both markers described above and additional markers, such as immunohistochemistry markers that have been shown to have a beneficial value for the diagnosis of breast cancer, including PCNA and Ki-67 are maybe the most important and highly used immunohistochemistry markers for breast cancer. Other markers as E-Cadherin, Cathepsin D and TFF1 are also used for that purpose. Despite relevant research efforts and the identification of many putative good prognosticators, few of them are proving clinically useful for identifying patients at minimal risk of relapse, patients with a worse prognosis, or patients likely to benefit from specific treatments.
Most of them, such as epidermal growth factor receptor, cyclin E, p53 (this mutation is present in approximately 40% of human breast cancers as an acquired defect), bcl-2, vascular endothelial growth factor, urokinase-type plasminogen activator- 1 and the anti-apoptosis protein survivin, are suggested for possible inclusion in the category of biomarkers with a high level of clinico- laboratory effectiveness. However, no single biomarker was able to identify those patients with the best (or worst) prognosis or those patients who would be responsive to a given therapy. High level cyclin E expression has been associated with the initiation or progression of different human cancers, in particular breast cancer but also leukemia, lymphoma and others.
Cyclin- E expression level in the breast cancer was found to be a very strong indicator for prognosis, stronger than any other biological marker. There are some non-cancerous pathological conditions which represent an increased risk factor for development breast cancer. Non- limiting examples of these conditions include: - Ductal hyperplasia without atypia. It is the most frequently encountered breast biopsy result that is associated with increased risk of future development of breast cancer (2 fold increased risk). In particular, the loss of expression of transforming growth factor beta receptor II in the affected epithelial cells is associated with an increased risk of invasive breast cancer. Atypical hyperplasia. Women having atypical hyperplasia with over-expression of HER-2 have a greater than 7- fold increased risk of developing invasive breast carcinoma, as compared with women with non-proliferative benign breast lesions and no evidence of HER-2 amplification. These pathological conditions should be effectively diagnosed and monitored in order to facilitate early detection of breast cancer.
SUMMARY OF THE INVENTION The background art does not teach or suggest markers for breast cancer that are sufficiently sensitive and/or accurate, alone or in combination. The present invention overcomes these deficiencies of the background art by providing novel markers for breast cancer that are both sensitive and accurate. These markers are overexpressed in breast cancer specifically, as opposed to normal breast tissue. The measurement of these markers, alone or in combination, in patient (biological) samples provides information that the diagnostician can correlate with a probable diagnosis of breast cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between breast cancer and non-cancerous states. According to preferred embodiments of the present invention, examples of suitable biological samples which may optionally be used with preferred embodiments of the present invention include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, breast tissue, any human organ or tissue, including any tumor or normal tissue, any sample obtained by lavage (for example of the bronchial system or of the breast ductal system), and also samples of in vivo cell culture constituents. In a preferred embodiment, the biological sample comprises breast tissue and/or a serum sample and/or a urine sample and/or a milk sample and/or any other tissue or liquid sample. The sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay. Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dm.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from EMBnet, maintained by the ISREC Bionformatics group and the LICR Information Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformatics, http://wλ\w.ch.embnet.org/soflware/TMDPRED_form.html) for transmembrane region prediction; (iii) signalp imm or (iv) signalp_nn (both from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk services/SignalP/background/prediction.php) for signal peptide prediction. The terms "signalp_hmm" and "signalp_nn" refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor. In some cases for the manual inspection of cellular localization prediction inventors used the ProLoc computational platform [Einat Hazkani-Covo, Erez Levanon, Galit Rotman, Dan Graur and Amit Novik; (2004) "Evolution of multicellularity in metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis." Cell Biology International 2004;28(3): 171-8.], which predicts protein localization based on various parameters including, protein domains (e.g., prediction of trans-membranous regions and localization thereof within the protein), pi, protein length, amino acid composition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain organelle (such as, nuclear localization signal, NLS, mitochondria localization signal), signal peptide and anchor modeling and using unique domains from Pfam that are specific to a single compartment. Information is given in the text with regard to SNPs (single nucleotide polymorphisms). A description of the abbreviations is as follows. "T - > C", for example, means that the SNP results in a change at the position given in the table from T to C. Similarly, "M - > Q", for example, means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*). As part of the description of an SNP, a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP. An FTId is a unique and stable feature identifier, which allows construction of links directly from position- specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field, as follows: FTId=XXX_number, in which XXX is the 3- letter code for the specific feature key, separated by an underscore from a 6-digit number. In the table of the amino acid mutations of the wild type proteins of the selected splice variants of the invention, the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence. SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker. Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein. Information given in the text with regard to the Homology to the known proteins was determined by Smith- Waterman version 5.1.2 using special (non default) parameters as follows: -model= sw.model -GAPEXT=0 -GAPOP=100.0 -MATRLX=blosuml00 Information is given with regard to overexpression of a cluster in cancer based on ESTs.
A key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P- alue without including the level of expression in cell- lines (PI) - library based statistics: P- value including the level of expression in cell-lines (P2) - EST clone statistics: P- value without including the level of expression in cell- lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell- lines (R3) - EST clone statistics: P- value including the level of expression in cell- lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell- lines (R4) Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer.
Information is given with regard to overexpression of a cluster in cancer based on microarrays. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. For microarrays prepared according to a design by the present inventors, the probe name begins with the name of the cluster (gene), followed by an identifying number. Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the Human Genome U133 (HG-U133) Set at www.affymetrix.com/products/arrays/specific/hgul33.affx; GeneChip Human Genome U133A 2.0 Array at wvvw.affymetrix.com/products/arrays/specific/hgul33av2.affx; and Human Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgul33plus.affx). The probe names follow the Affymetrix naming convention. The data is available from NCBI Gene Expression Omnibus (see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210). The dataset (including results) is available from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1133 for the Series GSE1133 database (published on March 2004); a reference to these results is as follows: Su et al (Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. Epub 2004 Apr 09). The probes designed according to the present inventors are listed below.
>Z21368_0_0_61857 (SEQ ID NO:895)
AGTTCATCCTTCTTCAGTGTGACCAGTAAATTCTTCCCATACTCTTGAAG >HUMGRP5E_0_0_I6630 (SEQ ID NO:896)
GCTGATATGGAAGTTGGGGAATCTGAATTGCCAGAGAATCTTGGGAAGAG
>HUMGRP5E_0_2_0 (SEQ ID NO:897)
TCTCATAGAAGCAAAGGAGAACAGAAACCACCAGCCACCTCAACCCAAGG
>HSENA78_0_1_0 (SEQ ID NO:898) TGAAGAGTGTGAGGAAAACCTATGTTTGCCGCTTAAGCTTTC AGCTCAGC
>M85491_0_0_25999 (SEQ ID NO: 899)
GACATCTTTGCATATCATGTCAGAGCTATAACATCATTGTGGAGAAGCTC
>M85491 )_14_0 (SEQ ID NO:900)
GTCATGAAAATCAACACCGAGGTGCGGAGCTTCGGACCTGTGTCCCGCAG >HSSTROL3_0_0_12 18 (SEQ ID NO: 901 )
ATGAGAGTAACCTCACCCGTGCACTAGTTTACAGAGCATTCACTGCCCCA
>HSSTROL3_0_0_12517 (SEQ ID NO:902)
CAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGACTTCACAAATGAA
>HUMCA1XIA_0_0_14909 (SEQ ID NO:903) GCTGCAATCTAAGTTTCGGAATACTTATACCACTCCAGAAATAATCCTCG
>HUMCA1XIA_0_18_0 (SEQ ID NO:904)
TTCAGAACTGTTAACATCGCTGACGGGAAGTGGCATCGGGTAGCAATCAG
>R20779_0_0_30670 (SEQ ID NO:905)
CCGCGTTGCTTCTAGAGGCTGAATGCCTTTCAAATGGAGAAGGCTTCCAT >HSS 100PCB_0_0_12280 (SEQ ID NO:906)
CTCAAAATGAAACTCCCTCTCGCAGAGCACAATTCCAATTCGCTCTAAAA
>HSCOC4_0_0_9892 (SEQ ID NO:907)
AAGGACCAGAGTCCATGCCAAGACCACCCTTCAGCTTCCAAGGCCCTCCA
>HSCOC4_0_39_0 (SEQ ID NO.908) ATCCTCCAGCCATGAGGCTGCTCTGGGGGCTGATCTGGGCATCCAGCTTC
>HSCOC4_0_0_9883 (SEQ ID NO:909) CCTGTTTGCTCTGACACCAACTTCCTACCCTCTCAGCCTCAAAGTAACTC
>HSCOC4_0_0_9885 (SEQ ID NO:910)
GCTGAGGTGTGGCCGAGGACCTGACCATCTGGAAGTGTGAAAATCCCCTT
>T11628_0_9_0 (SEQ ID NO:911) ACAAGATCCCCGTGAAGTACCTGGAGTTCATCTCGGAATGCATCATCCAG
>T11628_0_0_45174 (SEQ ID NO:912)
TAAACAATCAAAGAGCATGTTGGCCTGGTCCTTTGCTAGGTACTGTAGAG
>T11628_0_0_45161 (SEQ ID O:913)
TGCCTCGCCACAATGGCACCTGCCCTAAAATAGCTTCCCATGTGAGGGCT >M78076_0_7_0 (SEQ ID N0:914)
GAGAAGATGAACCCGCTGGAACAGTATGAGCGAAAGGTGAATGCGTCTGT
>HSMUC1 A_0_37_0 (SEQ ID NO:915)
AAAAGGAGACTTCGGCTACCCAGAGAAGTTCAGTGCCCAGCTCTACTGAG
>HSMUC1A_0_0_11364 (SEQ ID NO:916) AAAGGCTGGCATAGGGGGAGGTTTCCCAGGTAGAAGAAGAAGTGTCAGCA
>HSMUC1A_0_0_11365 (SEQ IDNO:917)
AATTAACCCTTTGAGAGCTGGCCAGGACTCTGGACTGATTACCCCAGCCT
The following list of abbreviations for tissues was used in the TAA histograms. The term "TAA" stands for "Tumor Associated Antigen", and the TAA histograms, given in the text, represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below. "BONE" for "bone"; "COL" for "colon"; "EPI" for "epithelial"; "GEN" for "general"; "LIVER" for "liver"; "LUN" for "lung"; "LYMPH" for "lymph nodes"; "MARROW" for "bone marrow"; "OVA" for "ovary"; "PANCREAS" for "pancreas"; "PRO" for "prostate"; "STOMACH" for "stomach"; "TCELL" for "T cells"; "THYROID" for "Thyroid"; "MAM" for "breast"; "BRAIN" for "brain"; "UTERUS" for "uterus"; "SKL " for "skin"; "KIDNEY" for "kidney"; "MUSCLE" for "muscle"; "ADREN" for "adrenal"; "HEAD" for "head and neck"; "BLADDER" for "bladder";
It should be noted that the terms "segment", "seg" and "node" are used interchangeably in reference to nucleic acid sequences of the present invention, they refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below. Optionally and preferably, they are examples of oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and/or for any other use. As used herein the phrase "breast cancer" refers to cancers of the breast or surrounding tissue, including but not limited to ductal carcinoma (in-situ or invasive), lobular carcinoma (in- situ or invasive), inflammatory breast cancer, mucinous carcinoma, tubular carcinoma, or Paget's disease of the nipple, as well as conditions that are indicative of a higher risk factor for later development of breast cancer, including but not limited to ductal hyperplasia without atypia and atypical hyperplasia, referred to herein collectively as "indicative conditions". The term "marker" in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having breast cancer (or one of the above indicative conditions) as compared to a comparable sample taken from subjects who do not have breast cancer (or one of the above indicative conditions). The phrase "differentially present" refers to differences in the quantity of a marker present in a sample taken from patients having breast cancer (or one of the above indicative conditions) as compared to a comparable sample taken from patients who do not have breast cancer (or one of the above indicative conditions). For example, a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and or NAT-based assays. A polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present. As used herein the phrase "diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. As used herein the phrase "diagnosing" refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery. The term "detecting" may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a "biological sample obtained from the subject" may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term "level" refers to expression levels of RNA and or protein or to DNA copy number of a marker of the present invention. Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made. Determining the level of the same variant in normal tissues of the same origin is preferably effected along-side to detect an elevated expression and/or amplification and/or a decreased expression, of the variant as opposed to the normal tissues. A "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of breast cancer (or one of the above indicative conditions). A test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). A "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a patient with breast cancer (or one of the above indicative conditions) or a person without breast cancer (or one of the above indicative conditions). A control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). "Detect" refers to identifying the presence, absence or amount of the object to be detected. A "label" includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means. For example, useful labels include 32P, 35S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample. The label can be incoφorated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin. The label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly. For example, the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize. The binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule. The binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6: 1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry. Exemplar}' detectable labels, optionally and preferably for use with immunoassays, include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. "Immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen. The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide (or other epitope), refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymorphic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T10888 PEA 1 TI
T 10888 PEA 1 T4
T 10888 PEA 1 T5
T10888 PEA 1 T6
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
T 10888 PEA 1 node 11 T 10888 PEA l_node 12
T10888_PEA l_node_17
T10888 PEA 1 node 4
T10888_PEA_l_node_6
T10888_PEA_l_node_7
T10888 PEA l_node 9
T 10888_PEA_l_node_l 5
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below amino acid sequence comprising a sequence in the table below:
Protein Name
T10888_PEA_1. _P2
T10888_PEA_1. P4
T10888_PEA_1. _P5
T10888_PEA_1. _P6 * According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T39971 T10
T39971 T12
T39971 T16
T39971 T5 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
T39971 node 0
T39971_node_18
T39971 node 21 T39971 _.τ.'.ode._22
T39971 ..rode. .23
T39971. .node. .31
T39971. _node_ .33
T39971. _node_ .7
T39971. _node_ .1
T39971. _node_ .10
T39971. _node_ .11
T39971. _node_ .12
T39971. _node_ .15
T39971. _node_ .16
T39971. _node_ .17
T39971. _node_ .26
T39971. _node. .27
T39971. _node_ .28
T39971. _node. .29
T39971. _node_ .3
T39971. _node_ .30
T39971.jnode. .34
T39971. _node_ .35
T39971. _node_ .36
T39971. _node_4
T39971. _node_ .5
T39971. _node_ .8
T39971. _node_ 9
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name T39971 P6
T39971 P9
T39971 Pl l
T39971_P12
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
Z21368_PEA_1 T10
Z21368 PEA 1 Ti l
Z21368 PEA 1 T23
Z21368 PEA 1 T24
Z21368_PEA_1 T5
Z21368 PEA 1 T6
Z21368_PEA_1 T9
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
Z21368 PEA 1 node 0
Z21368 PEA 1 node 15
Z21368 PEA_1 node 19
Z21368 PEA 1 node 2
Z21368 PEA 1 node 21
Z21368 PEA_1 node 33
Z21368 PEA_1 node 36
Z21368 PEA 1 node 37
Z21368 PEA 1 node 39
Z21368 PEA_1 node 4
Z21368_PEA_1 node_41
Z21368 PEA 1 node 43
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below Protein Name Z21368_PEA_ X_P2 Z2136S_PEA_ .1-P5 Z21368_PEA_ .1JP15 Z21368 PEA 1 P16
Z21368 PEA 1 P22
Z21368 PEA 1 P23 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T59832. _T11
T59832. _T15
T59832. _T22
T59832. _T28
T59832 T6
T59832. T8 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
T59832 node 1
T59832 node 22
T59832 node 23
T59832 node 24
T59832 node 29
T59832 node 39
T59832 node 7
T59832 node 10
T59832 node 11
T59832 node 12
T59832 node 14
T59832 node 16
T59832 node 19
T59832 node 2
T59832 node 20 T59832_node 25
T59832_node 26
T59832_node 27
T59832 node 28
T59832 node 3
T59832 node 30
T59832 node 31
T59832 node 32
T59832 node 34
T59832 node 35
T59832 node 36
T59832 node 37
T59832 node 38
T59832 node 4
T59832 node 5
T59832_jιode 6
T59832 node 8
T59S32_node 9
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
T59832. _P5
T59832. _P7
T59832. _P9
T59832^ .PI 2
T59832. JP18 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name Z41644_PEA 1 T5 a nucleic acid sequence comprising a sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below Protein Name Z41644_PEA 1 P10
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name HUMGRP5E T4 HUMGRP5E_T5 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMGRP5E node 0 HUMGRP5E_node 2 HUMGRP5E node 8 HUMGRP5E node 3 HUMGRP5E rode 7
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name AA155578 PEA 1 T10 AA155578 PEA 1 T12
AA155578 PEA_1 T13
AA 155578 PEA 1 T8 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
AA 155578 PEA 1 node 11
AA155578 PEA 1 node 12
AA 155578 PEA 1 node 14
AA155578 PEA_1 node_19
AA155578_PEA_l_node_21
AA155578 PEA 1 node_23
AA 155578 PEA 1 node 24
AA 155578 PEA 1 node 25
AA 155578 PEA_1 node_4
AA155578 PEA 1 node 7
AA 155578 PEA 1 node 15
AA 155578 PEA 1 node_18
AA 155578 PEA 1 node 22
AA155578 PEA 1 node 6
AA155578 PEA 1 node 8
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
AA 155578 PEA 1 P4
AA155578 PEA 1 P6
AA 155578 PEA 1 P8
AA 155578 PEA_1 P9 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
Transcript Name
HSENA78 T5
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HSENA78 node 0
HSENA78 node 2
HSENA78 node 6
HSENA78 node 9
HSENA78 node 3
HSENA78_node_4
HSENA78 node 8
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name
HSENA78 P2
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T94936 PEA 1 TI
T94936 PEA 1 T2
a nucleic acid sequence comprising a sequence in the table below:
Segment Name T94936_PEA_l_node_14
T94936_PEA 1 node_16
T94936_PEA 1 node_2
T94936_PEA_l_node_20
T94936_PEA 1 node_23
T94936_PEA 1 node_0
T94936JPEA 1 node_.l l
T94936_PEA 1 node_13
T94936_PEA 1 node 17
T94936_PEA 1 node 6
T94936 PEA 1 node 8
T94936_PEA 1 node_9
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
T94936 PEA 1 P2
T94936JPEA 1 P3 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
M85491 PEA 1 T16
M85491_PEA_1 T20
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
M85491 PEA 1 node 0 M85491._PEA__l_node_13
M85491. _PEA_ _l_node_21
M85491. _PEA_ _l_node_23
M85491. _PEA_ X_node_24
M85491. _PEA_ _l_node_8
M85491. _PEA_ _l_node_9
M85491. _PEA_ _l_jnode_10
M85491. _PEA_ .l_node_18
M85491. _PEA_ J_node_19
M85491. -PEA. _l_node_6
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
M85491 PEA_1_P13
M85491 PEA_1 P14 According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
HSSTROL3 T5
HSSTROL3 T8
HSSTROL3 T9
HSSTROL3 T10
HSSTROL3 Ti l
HSSTROL3 T12 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HSSTROL3 node 6 HSSTROL3 node 10 HSSTROL3 node 13 HSSTROL3 node 15 HSSTROL3 node 19 HSSTROL3 node 21 HSSTROL3_node_24 HSSTROL3 node 25 HSSTROL3_node 26 HSSTROL3 node 28 HSSTROL3 node 29 HSSTROL3 node 11 HSSTROL3_node 17 HSSTROL3 node 18 HSSTROL3 node 20 HSSTROL3 node 27
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name AY 180924 PEA 1 TI a nucleic acid sequence comprising a sequence in the table below:
Segment Name
AY180924_PEA_ l_node_3
AY180924_PEA_ _node_0
AY180924_PEA_ _node_2
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
AY180924 PEA 1 P3
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
R75793. _PEA_1_ TI
R75793. _PEA_1. _T3
R75793. _PEA_1_ _T5
a nucleic acid sequence comprising a sequence in the table below:
R75793. -PEA- _l_node_0
R75793. -PEA_ _l_node_9
R75793. _PEA_ „l_node_l 1
R75793. -PEA. _l_node_14
R75793. _PEA_ _l_node_4
R75793. .PEA. _l_node_5
R75793. _PEA_ _l_node_6
R75793. -PEA. _l_node_8 R75793_PEA 1 node_13
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below Protein Name R75793. -PEA_ .1_P2 R75793. -PEA. -1_P5 R75793. _PEA_ .1JP6
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HUMCAiXIA_node_92
HUMCAlXIA_node. _11
HUMCAlXIA_node_ .15
HUMCAlXIA_node. .19
HUMCAlXIA_ node_21
HUMCAlXIA_node„ .23
HUMCAlXIA_node_25
HUMCAlXIA_node_ -27
HUMCAlXIA_node. -29
HUMCAlXIAjiode. .31
HUMCAlXIA_node. -33
HUMCAlXIA_node. .35
HUMCAlXIA_node_37
HUMCAlXIA_node. .39
HUMCAlXIA_node_41
HUMCAlXIA_node_43
HUMCAlXIA_node_ -45
HUMCAlXIA_node_47
HUMCAlXIA_node_49
HUMCAlXIA_node. . 1
HUMCAlXIA_node. _57
HUMCAlXIA_node_ .59
HUMCAlXIA_node. .62
HUMCAlXIA_node. .64
HUMCAlXIA_node_ .66
HUMCAlXIA_node_ .68
HUMCAlXIA_node_ .70
HUMCAlXIA_node_ _72
HUMCAlXIA_node_74
HUMCAlXIA_node_ .76 HUMCAlXIA_node_ _78 HUMCAlXIA_node_81 HUMCAlXIA_node_ .83 HUMCAlXIA_node. .85 HUMCAlXIA_node_ .87 HUMCAlXIA_node. .89 HUMCAlXIA_node_ .91
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name R20779_T7 a nucleic acid sequence comprising a sequence in the table below: Segment Name R20779_node 0 R20779 node 2 R20779 node 7 R20779 node 9 R20779 node 18 R20779 node 21 R20779_node 24
R20779_node 27
R20779_node_28
R20779_node 30
R20779_node 31
R20779 node 32
R20779 node 1
R20779 node 3
R20779 node 10
R20779 node 1 1
R20779 node 14
R20779 node 17
R20779 node 19
R20779 node 20
R20779 node 22
R20779 node 23
R20779_node 25
R20779 node 29
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence according to R20779_P2. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
HSS100PCB TI
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HSS100PCB_node_3 HSS100PCB_node 4
HSS100PCB node 5
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence according to HSS100PCB_P3. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
HSCOC4_PEA 1_T1
HSCOC4_PEA 1 T2
HSCOC4 PEA 1 T3
HSCOC4 PEA 1 T4
HSCOC4 PEA 1 T5
HSCOC4 PEA 1_T7
HSCOC4 PEA 1 TS
HSCOC4 PEA 1 Ti l
HSCOC4 PEA 1 T12
HSCOC4 PEA 1_T14
HSCOC4 PEA 1 T15
HSCOC4 PEA 1_T20
HSCOC4 PEA 1 T21
HSCOC4_PEA_l_T25
HSCOC4 PEA 1 T28
HSCOC4 PEA 1 T30
HSCOC4 PEA 1 T31
HSCOC4 PEA 1_T32
HSCOC4 PEA 1 T40
a nucleic acid sequence comprising a sequence in the table below: Segment Name
HSCOC4. .PEA. _l_node. J
HSCOC4. .PEA. _l_node_ .5
HSCOC4. _PEA. _l_node_ .7
HSCOC4 -PEA. _l_node_30
HSCOC4_PEA_ _l_node_33
HSCOC4. -PEA_ _l_node_ .35
HSCOC4. _PEA_ _l_node_ .37
HSCOC4. _PEA_ _l_node_ .39
HSCOC4. _PEA_ _l_node_43
HSCOC4. -PEA. .l_node_48
HSCOC4. .PEA. .l_node_49
HSCOC4 _PEA. _l_node_ .51
HSCOC4 -PEA. _l_node. .58
HSCOC4. -PEA_ _l_node_ .59
HSCOC4_PEA_ .l_node. .62
HSCOC4. .PEA_ _l_node_ .66
HSCOC4. _PEA_ _l_node_ .72
HSCOC4. _PEA_ _l_node. .77
HSCOC4. _PEA_ _l_node_ .79
HSCOC4. _PEA_ .l_node_ .93
HSCOC4. PEA_ .l_node_ .100
HSCOC4. _PEA_ .l_node. .105
HSCOC4. _PEA_ .l_node_ .107
HSCOC4. „PEA_ .l_node_ ,108
HSCOC4. PEA_ .l_node_ .109
HSCOC4. PEA_ .l_node_ .1 10
HSCOC4. PEA_ .l_node_ .112
HSCOC4. PEA_ .l_node_ .1 13
HSCOC4. PEA_ l_node_ 2 HSCOC4_PEA 1 node_84
HSCOC4 PEA 1 node_85
HSCOC4_PEA 1 node_86
HSCOC4 PEA 1 node_87
HSCOC4 PEA 1 node_
HSCOC4 PEA 1 node 89
HSCOC4 PEA 1 node_90
HSCOC4 PEA 1 node 91
HSCOC4 PEA 1 node 92
HSCOC4 PEA 1 node 94
HSCOC4 PEA 1 node 96
HSCOC4 PEA 1 node_97
HSCOC4 PEA 1 node 98
HSCOC4 PEA 1 node 99
HSCOC4 PEA 1 node 101
HSCOC4 PEA 1 node 102
HSCOC4 PEA 1 node 103
HSCOC4 PEA 1 node 104
HSCOC4 PEA 1 node 106
HSCOC4 PEA 1 node 111
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
HSCOC4. .PEA. _1. _P3
HSCOC4. .PEA. .1. _P5
HSCOC4. -PEA. .1. _P6
HSCOC4. -PEA. .1. _P12
HSCOC4. .PEA. .1. _P15
HSCOC4. .PEA. .1. _P16 HSCOC4 PEA 1_P20
HSC0C4JPEA 1 P9
HSCOC4 PEA 1 P22
HSCOC4 PEA 1 P23
HSCOC4 PEA 1 P24
HSCOC4 PEA 1 P25
HSCOC4 PEA 1 P26
HSCOC4 PEA 1 P30
HSCOC4 PEA 1 P38
HSC0C4 PEA 1 P39
HSC0C4 PEA 1 P40
HSC0C4 PEA 1 P41
HSC0C4 PEA 1 P42
According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
HUMTREFAC_PEA_ 2 _T4
HUMTREFAC_PEA. _2_ _T5
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HUMTREFAC. .PEA. .2. _node_0
HUMTREFAC. .PEA. 0 _node_9
HUMTREFAC. .PEA. .2. _node_2
HUMTREFAC. .PEA. _2_ _node_3
HUMTREFAC. -PEA. .2. _node_4
HUMTREFAC. .PEA. .2. _node_5
HUMTREFAC. PEA. .2. _node_8 According to prefeπed embodiments of the present invention, there is provided ar. isolated polypeptide comprising an amino acid sequence in the table below
According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name HUMOSTRO PEA 1 PEA 1 T14 HUMOSTRO PEA 1 PEA 1 T16 HUMOSTRO PEA 1 PEA 1 T30 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMOSTROJPEA_l. _PEA. _l_node_0 HUMOSTROJPEA_l. _PEA. _l_node_10 HUMOSTRO_PEA_l. _PEA_ _l_node_16 HUMOSTROJPEA_l_ _PEA_ X_node_23 HUMOSTRO_PEA_l. _PEA. .l_node_31 HUMOSTRO_PEA_l. _PEA_ _l_node_43 HUMOSTRO_PEA_l. _PEA_ _l_node_3 HUMOSTRO_PEA_l_ _PEA_ _l_node_5 HUMOSTRO_PEA_l. _PEA_ _l_node_7 HUMOSTRC_PEA_l_ _PEA_ l_node_8 HUMOSTRO_PEA_l. _PEA_ _l_node_15 HUMOSTRO_PEA_l. _PEA_ _l_node_17 HUMOSTRO_PEA_l. _PEA_ .l_node_20 HUMOSTROJPEA. _l_JPEA_l_node_21
HUMOSTRO_PEA. .l_PEA_l_node_22
HUMOSTRO_PEA_ .l_PEA_l_node_24
HUMOSTRO_PEA_ _l_PEA_l_node_26
HUMOSTRO_PEA_ .lJPEA_ljnode_27
HUMOSTRO_PEA_ .l_PEA_l_node_28
HUMOSTRO_PEA_ .l_PEA_l_node_29
HUMOSTRO_PEA_ l_PEA_l_node_30
HUMOSTRO_PEA_ l_PEA_l_node_32
HUMOSTRO_PEA_ .l_PEA_l_node_34
HUMOSTRO_PEA_ .l_PEA_.l_jiode_.36
HUMOSTRO_PEA_ .l_PEA_l_node_37
HUMOSTRO_PEA_ .l_PEA_l_node_38
HUMOSTRO_PEA. JJPEA_.l_jnode_.39
HUMOSTRO_PEA_ .l_PEA_l_node_40
HUMOSTRO_PEA_ JJPEA_J_node_41
HUMOSTRO_PEA_ .l_PEA_l_node_42
According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
HUMOSTRO PEA 1 PEA 1 P21
HUMOSTRO PEA 1_PEA_1_P25
HUMOSTRO PEA 1 PEA 1 P30 According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a polynucleotide having a sequence selected from the group consisting of: RI 1723_PEA_1_T15, RI 1723_PEA_1_T17, RI 1723_PEA_1_T19, RI 1723_PEA_1_T20; RI 1723_PEA_1_T5, or RI 1723_PEA_1_T6. According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a node having a sequence selected from the group consisting of : R11723_PEA_l_node_13, R11723_PEA_l_node_16, R11723_PEA_l_ ιode_19, RI 1723JPEA_l_node_2, RI 1723_PEA_l_node_22, RI 1723_PEA_l_ node_31, RI 1723JPEA_l_node_10, RI 1723_PEA_l_node_l 1, RI 1723 _PEA_l_node_15, RI 1723_PEA_l_node_18, RI 1723_PEA_l_node_20, RI 1723_PEA_l_node_21, RI 1723_PEA_l_node_23, RI 1723_PEA_l_node_24, RI 1723_PEA_l_node_25, R11723_PEA_l_node_26, R11723J?EA_l_node_27, RI 1723JPEA_l_node_28, RI 1723 _PEA_l_node_29, RI 1723 _PEA_l_node_3, RI 1723_PEA_l_node_30, RI 1723_PEA_l_node_4, RI 1723_PEA_l_node_5, RI 1723 J?EA_l_node_6, RI 1723_PEA_l_nodeJ7 or RI 1723_PEA_l_node_8. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising a polypeptide having a sequence selected from the group consisting of : RI 1723_PEA_1_P2, RI 1723_PEA_1_P6, RI 1723_PEA_1 JP7, R11723 PEA 1 P13, or R11723 PEA 1 P10.
According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T46984. JPEA_1. .T2
T46984. _PEA_1. .T3
T46984. _PEA_1. _T12
T46984_ -PEA_1. _T13
T46984_PEA_1. JT14
T46984_PEA_1. _T15
T46984_PEA_1. _T19
T46984. _PEA_1. _T23
T46984. _PEA_1. _T27
T46984. -PEA_1. _T32
T46984. PEA_1. JT34 T46984_PEA_ JJT35
T469S4_PEA_ JJT40
T46984JPEA. JJT42
T46984JPEA_ JJT43
T46984JPEA. JJT46
T469S4JPEA. JJT47
T46984JPEA. JJT48
T46984_PEA_ JJT51
T46984_PEA_ JJT52
T46984_PEA_ J_T54 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
T46984. -PEA_1. _node_2
T46984. _PEA_1. _node_4
T46984. _PEA_1. _node_6
T46984. JPEA_1. _node_12
T46984. _PEA_1. _node_14
T46984. -PEA_1. _node_25
T46984. -PEA_1. _node_29
T46984JPEAJ. _node_34
T46984. _PEA_1. _node_46
T46984. -PEA_J. _node_47
T46984_PEA_1_ _node_52
T46984. PEA_1. _node_65
T46984. PEA_1. _node_69
T469S4_PEA_J_ _node_75
T46984. PEA_1. _node_86 T46984JPEA. J_node_74
T46984_PEA_ J_node_ .83
T469S4_PEA_ J_node_ .84
T46984_PEA_ _l_node_ .85
According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
T46984 PEA 1 P2
T46984 PEA 1_P3
T46984 PEA 1 P10
T46984 PEA 1 Pll
T46984 PEA 1 P12
T46984 PEA 1 P21
T46984 PEA 1 P27
T46984 PEA 1 P32
T46984 PEA 1 P34
T46984 PEA_1_P35
T46984 PEA 1JP38
T46984 PEA 1 P39
T46984 PEA 1 P45
T46984 PEA 1 P46 According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
T1 162S. _PEA_1. _T3
T11628. _PEA_J_ _T4
T11628. _PEA_1_ T5 T11628..PEA.J._T7
T11628. _PEA_ .1. T9
T11628. _PEA_ J. _T11 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
T11628JPEA_J_node_7
T11628 PEA l_node 11
TI 1628_PEA_l_node_16
T11628 PEA 1 node 22
T11628 PEA 1 node 25
T11628 PEA 1 node 31
T11628 PEA 1 node 37
T11628 PEA 1 node 0
T11628 PEA 1 node 4
T11628 PEA 1 node 9
T 11628 PEA l_node_13
T11628 PEA 1 node 14
T11628 PEA 1 node 17
T11628 PEA 1 node 18
T11628 PEA 1 node 19
T11628 PEA 1 node 24
T11628 PEA 1 node 27
T11628 PEA 1 node 28
T11628JPEA l_node_29
T11628 PEA 1 node 30
T11628 PEA 1 node 32
T11628 PEA 1 node 33 Ti l 628 JPE A_ J. )..Mle_34
T11628JPEA. .1. _node_35
T11628J?EA_ J. _rιode_36
According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
T11628 PEA 1 P2
T11628 PEA 1 P5
T11628 PEA 1 P7
T11628 PEA 1 P10
According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
M78076. -PEA_1. _T2
M78076_PEA_1. _T3
M78076. PEA_1. _T5
M78076. PEA_1. _T13
M78076. -PEA_1. _T15
M78076. _PEA_1. _T23
M78076. -PEA_1. _T26
M78076J?EA_1. -T27
M78076_PEA_1. _T28 a nucleic acid sequence comprising a sequence in the table below:
Segment Name M78076_PEA_ _l_node_49
M7S076_PEA_ J_node_50
M78076JPEA. J_node_51
M78076JPEA. _l_node_52
M78076_PEA_ J_node_53
According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
M78076 PEA 1 P3
M78076 PEA 1 P4
M78076 PEA 1 P12
M78076 PEA 1 P14
M78076 PEA 1JP21
M78076 PEA 1 P24
M78076 PEA 1 P2
M78076_PEA_1 P25
According to prefeπed embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name
HSMUC1A PEA 1 T12
HSMUC1A PEA 1 T26
HSMUCIA PEA 1 T28
HSMUC1A PEA 1JT29
HSMUCIA PEA 1 T30
HSMUCIA PEA 1 T31
HSMUCIA PEA 1 T33
HSMUCIA PEA 1 T34 HSMUC1AJPEA- JJT35
HSMUC1A_PEA_ J_T36
HSMUC1AJ?EA_ J-T40
HSMUC1AJPEA. JJT42
HSMUCIAJPEA- JJT43
HSMUC1AJ?EA_ JJT47 a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HSMUCIA PEA 1 node 0
HSMUCIA PEA_l_node_14
HSMUCIA PEA 1 node 24
HSMUCIA PEA 1 node 29
HSMUCIA PEA_l_node 35
HSMUCIA PEA_l_node 38
HSMUCIA PEA 1 node 3
HSMUCIA PEA l_node 4
HSMUCIA PEA 1 node 5
HSMUCIA PEA_l_node_6
HSMUCIA PEA 1 node 7
HSMUCIA PEA 1 node 17
HSMUCIA PEA l_node_18
HSMUCIA PEA_l_node_20
HSMUCIA PEA 1 node 21
HSMUCIA PEA 1 node 23
HSMUCIA PEA l_node 26
HSMUCIA PEA_l_node 27
HSMUCIA PEA 1 node 31 HSMUCl A J?EA_ J. _node_34
HSMUCl A JPE A. J. _node_36
HSMUCl A_PEA_ J. _node_37
According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name
HSMUCIA PEA 1 P25
HSMUCIA PEA 1 P29
HSMUCIA PEA 1 P30
HSMUC1A_PEA_1_P32
HSMUC 1 AJPEA_J_P36
HSMUCIA PEA 1 P39
HSMUCIA PEA 1 P45
HSMUCIA PEA 1JP49
HSMUCIA PEA 1_P52
HSMUCIA PEA 1 P53
HSMUCIA PEA 1 P56
HSMUC1A_PEA_1_P58
HSMUCIA PEA_1_P59
HSMUCIA PEA 1 P63 According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMUCl A_PEA_1_P63, comprising a first amino acid sequence being at least 90 % homologous to
MTPGTQSPFFLLLLLTNLT TGSGHASSTPGGEKETSATQRSSN coπesponding to amino acids 1 - 45 of MUCl JTUMAΝ, which also coπesponds to amino acids 1 - 45 of HSMUCl A JPE A_1JP63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK coπesponding to amino acids 46 - 85 of HSMUC1AJPEA_1 JP63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMUCl AJPEA_1 JP63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUCl AJPEA_1_P63. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JDEA_1 JP2, comprising a first amino acid sequence being at least 90 % homologous to
MA PGSSTVFLLALTIIASTWALTPTHYLTIOTDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRS^DPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQ VIQLMNA IFSKJ<^ESLSEAFSVASAAAVLSHNR\ΗVPVVV EGSASDTHEQAILP ^ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKXEFDS ASGTYTLYLIIGDATLKNPDLWNV coπesponding to amino acids 1 - 498 of RTB2_HUMAN, which also coπesponds to amino acids 1 - 498 of T46984_PEA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCA coπesponding to amino acids 499 - 501 of T46984JPEA_1 JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEA_J_P3, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTFT LTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQWDAKKACTYTRSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQΠΉAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEEDLVAX<LDELGGVYLQFEEGLETTALFVAATΎKLMDHVGTEPSIKEDQVIQLMNA ITSK_ NFESLSEAFSVASAAAVLSHNRYHWVV PEGSASDTΉ^ PLTQATVKLEHAKSVASPVATVLQKTSFTPVGD ELNFMNVKFSSGYYOFLVEVEGDN R TANTNELRV _STEVGITNVDLSTNDKDQSLAPKTTRVTYPAI^^
FFQLVDVΝTGAELTPHQ coπesponding to amino acids 1 - 433 of RIB2_HUMAΝ, which also coπesponds to amino acids 1 - 433 of T469S4JPEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP coπesponding to amino acids 434 - 444 of T46984JPEA_J_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984J?EA_1 JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ICHIWKLIFLP in T46984_PEA_1_P3. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984J?EA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTWLLALTIIASTWALTPTHYLTIOIDVERLKASLDRPFTNLESAFYSIVGLSSL GAQWDAJ KACT TRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGλ^YLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA IFSKXNFESLSEAFSVASAAAVLSHNRYHVPVVVWEGSASDTHEQ.AILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITNVDLSTVDKOQSIAPKTTRVTYPAJ AKGTFIADSHQNFAL FFQLVD\^TGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVΛTKTELDTSEPJKIEFDS ASGTYTLYLIIGDATLK PILWNV coπesponding to amino acids 1 - 498 of RTB2_HUMAN, which also coπesponds to amino acids 1 - 498 of T46984JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LMDQK coπesponding to amino acids 499 - 503 of T46984JPEA_1 JP10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984J?EA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LMDQK in T46984JPEA_1_P10. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTWLLALTIIASTWALTPTmXTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTAPJ.SK_ETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTAJLFVAAT ^KLMDHVGTEPSIKEDQVIQLMNA IFSKKNFESLSEAJSVASAAAVLSHNRYH VV PEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKISTEVGITTWDLSTVDΪ DQSIAPKTTRVTYPAK.AKGTFIADSHQNFAL FFQLVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNVYKFELDTSERKIEFDS ASGTYTLYLIIGDATLKNPILWNVADVVTKFPEEEAPSTΛ^SQNLFTPKQEIQHLFREPEK RPPT SNTFTALILSPLLLLFALWIPJGANNSOTTFAPSTIIFHLGHAAMLGLMYVYWT QLΝMFQTLKYLAILGSVTFLAGΝRMLAQQAVKR coπesponding to amino acids 1 - 628 of PJB2_HUMAΝ, which also coπesponds to amino acids 1 - 628 of T46984J?EA_1J?1 1.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984J?EA_1 JP12, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTIIASTWALTPTHYLTKHD ΕRLKASLDRPFTNLESAFYSINGLSSL GAQWDAKKACTΥIRSΝLDPSΝVDSLFYAAQASQALSGCEISISΝET DLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATNQALQTASHLSQQADLRSI VEEEDLVA^DELGGVYLQFEEGLETT ^VAAnT MDHVGTLPSIKEDQVIQLMNA
PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN coπesponding to amino acids 1 - 338 of RIB2_HUMAN, which also coπesponds to amino acids 1 - 338 of T46984_PEA_1 JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SQDLH coπesponding to amino acids 339 - 343 of T46984J?EA_1J?12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1 JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T469S4_PEA_1JP12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEA_1JP21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence M coπesponding to amino acids 1 - 1 of T46984J?EA_1 JP21, and a second amino acid sequence being at least 90 % homologous to
KACTYIRSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSSVTQΠΗAV AALSGFGLPLASQEALSALTARLSKEET ^LATVQALQTASHLSQQADLRSIVEEIEDLVA RLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFES LSEAFSVASAAAVLSHNRYHNP\ ^VVPEGSASDTHEQAILP QVTNNLSQPLTQATVKL EHAKSVASRATVLQKTSFTPVGD ELΝFMΝNKFSSGYYDFLVEVEGDΝRYIAΝTVEL RΛΕJSTΕVGITΝVDLSTVDKDQSLAPKTTT VTYPAKAKGTFIADSHQΝFALFFQLVDVΝT GAELTPHQTFVPJ.HΝQKTGQEVVTVAEPDΝKΝNYKFELDTSERKXEFDSASGTYTLYLII GDATLKΝPILW VADVVIKFPEEE PSTVLSQ^ΠL-FTPKQEIQHLFREPEKRPPTVVSΝTF TALILSPLLLLFAL WGANΛ^SNFTFAPSTIIFFΠ.GHAAMLGLMYV ^WTQLNMFQTLKY LAILGSVTFLAGNRMLAQQAVKRTAH coπesponding to amino acids 70 - 631 of RIB2_HUMAN, which also coπesponds to amino acids 2 - 563 of T46984JPEA_1JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEA_1_P27, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTWLLALTIIASTWALTPTHYLTKΗDVEPJ.KASLDRPFTNLESAFYSINGLSSL GAQWDAJ KACTYmSΝLDPSΝNDSLFYAAQASQALSGCEISISΝETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTAPX.SKΕETNLATVQALQTASHLSQQADLRSI VEEffiDLVAJ_DELGGVYLQFEEGLETTALFVAATYKlMDHVGTEPSIKEDQVIQLMΝA IFSKKNFESLSE AFSV AS AAA VLSHNRYHVPV VPEGS ASDTHEQ AXLRLQ VTNVLSQ
PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRλ IϋSTEVGITNNDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFA coπesponding to amino acids 1 - 415 of R1B2JHPUMAN, which also coπesponds to amino acids 1 - 415 of T46984_PEA_1_P27, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP coπesponding to amino acids 416 - 459 of T46984JPEA_1 JP27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984J?EA_1 JP27, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence FGSGLλ MSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984_PEA_1_P27. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEA_1 JP32, comprising a first amino acid sequence being at least 90 % homologous to MA PGSSTVFLLALTIIASTWALTPT TKJTDλTSP KASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRS^DPS ωSLFYAAQASQALSGCEISISbffiTKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEffiDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHNGTEPSIKEDQVIQLMNA ITSKIs FESLSEAFSVASAAAVLSHNRYHWVNV EGSASDTHEQAIL PLTQATNKLEHAKSVASRATNLQKTSFTPVGD ELOTMΝNKFSSGYYDFLNEVEGDΝ RYIANTVE coπesponding to amino acids 1 - 364 of RTB2_HUMAN, which also coπesponds to amino acids 1 - 364 of T46984J?EA_1J?32, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence GQVRWLTPVΓPALWEAKAGGSPEVRSSILAWPT coπesponding to amino acids 365 - 397 of T46984_PEA_1 JP32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T469S4JPEA_1 JP32, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVIPALWEAKAGGSPEVRSSILAWPT in T46984_PEA_1 JP32. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T469S4JPEA_J_P34, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSST LLALTIIASTWALTPTHYLTKHD ERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQΛ QLMNA IFSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAΓLRLQVTNVLSQ
PLTQATVKLEHAKSVASRATVLQKTSFTPVG coπesponding to amino acids 1 - 329 of RIB2_HUMAN, which also coπesponds to amino acids 1 - 329 of T46984_PEA_1_P34.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984JPEA_1 JP35, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTmXTi aDλ^ERLKASLDRPFTNLESAFYSINGLSSL GAQWDA KACTYIRSΝLDPSΝVDSLFYAAQASQALSGCEISISΝETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGV TQFEEGLETTALFVAATNKLMDHVGTEPSIKEDQVIQLMNA ITSKKNFESLSEAFSVASAAAVLSHNRYHVPVVVVPEGSASDTHEQAI coπesponding to amino acids 1 - 287 of PJB2 JHUMAN, which also coπesponds to amino acids 1 - 287 of T46984JPEA_1 JP35, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GCWPSRQSREQfflSSRRKMEILKTECQEI ESRTfflSMRRKMEK α^I coπesponding to amino acids 288 - 334 of T46984J?EA_1J?35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984_PEA_1_P35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GCWPSRQSREQHISSRRKMEILKTECQEKΕSRT SMRRKMEKKNFI in T46984_PEA_1_P35. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984J?EA_1_P38, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTHYI.TKΗDVERLKASLDRPFTNLESAFYSIVGLSSL GAQWDAKKACTYIRSNLDPSNΛ SLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAVAALSGFGLPLASQEAL coπesponding to amino acids 1 - 145 of PJB2_HUMAN, which also coπesponds to amino acids 1 - 145 of T46984_PEA_1_P38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDPDWCQCLQLHFCS coπesponding to amino acids 146 - 160 of T46984JPEA_1 JP38, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984J?EA_1 JP3S, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T469S4_PEA_1 JP3S.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984J?EA_1 JP39, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTWLLALTIIASTWALTPTHYLTKIXDVERLKASLDRPFTNLESAFYSΓVGLSSL GAQWDAJKXACTYMSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS
VTQΓYHAVAALSGFGLPLASQEALSALTARLSKEETVLA coπesponding to amino acids 1 - 160 of RIB2_HUMAN, which also coπesponds to amino acids 1 - 160 of T46984JPEA_1JP39.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T46984J?EA_1 JP45, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL
GAQWDAKIv-ACTYIRSNLDPSNNDSLFYAAQASQALSGCE coπesponding to amino acids
1 - 101 of RTB2JHUMAΝ, which also coπesponds to amino acids 1 - 101 of
T46984JPEA_1 JP45, and a second amino acid sequence being at least 70%, optionally at least
80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG coπesponding to amino acids 102 - 116 of T46984J?EA_1 JP45, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984JPEA_J_P45, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NSPGSADSIPPVPAG in T46984_PEA_1_P45. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T469S4JPEA_J JP46, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSST LLALTIIASTWALTTTHYLTKHDNERLKASLDRPFT LESAFYSIVGLSSL GAQVPDAK coπesponding to amino acids 1 - 69 of RIB2JHUMAN, which also coπesponds to amino acids 1 - 69 of T46984JPEA_1 JP46, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG coπesponding to amino acids 70 - 84 of T46984 _PEA_1 JP46, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T46984J?EA_1 JP46, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_PEA_1_P46. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for TI 1628_PEA_1_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIP FKGHPETLEKJDK i LKSEDE coπesponding to amino acids 1 - 55 of TI 1628 JPE A_l JP2, and a second amino acid sequence being at least 90 % homologous to MKASEDLKKΗGATVLTALGGILKI<ϋ&GHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQV LQSKXTPGDFGADAQGAMNKALELFRKDMASNYKELGFQG coπesponding to amino acids 1 - 99 of Q8WVH6, which also coπesponds to amino acids 56 - 154 of TI 1628JPEA_1 JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of TI 1628JPEA_1_P2, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQL\T. INWGK\ΕADIPGHGQEVLIRLFKGFfPETLEKJFDKFKHLKSEDE of T11628 PEA_1 P2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for TI 1628JPEA_J JP5, comprising a first amino acid sequence being at least 90 % homologous to IVlKASEDLKKHGATVLTALGGELKKKGH LQSKHPGDFGAJ AQGAMNKALELFRKDMASNYKELGFQG coπesponding to amino acids 56 - 154 of MYG_HUMAN_V1, which also coπesponds to amino acids 1 - 99 of T1162S_PEA_1_P5.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for TI 1628_PEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to
MGLSDGEWQLVLNVWGKVEADΓPGHGQEVLΓRLFKGHPETLEK^DKFKHLKSEDEMK ASEDLKKHGATVLTALGGΓLKKXGHHEAEIKPLAQSHATKHKIPVKYLEFISECΠQVLQ
SKHPGDFGADAQGAMNK coπesponding to amino acids 1 - 134 of M rG_HUMAN_Vl, which also coπesponds to amino acids 1 - 134 of TI 1628JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence G coπesponding to amino acids 135 - 135 of TI 1628 JPE A_1JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for TI 1628JPEA_1_P10, comprising a first amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGLSDGEWQL\^NVWGKVEADIPGHGQEVLIPJ.FKGFfPETLEKJDKFKHLKSEDE coπesponding to amino acids 1 - 55 of TI 1628JPEA_J JP10, and a second amino acid sequence being at least 90 % homologous to MKASEDLKKHGATVLTA GGILKKKGHffi
LQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG coπesponding to amino acids 1 - 99 of Q8WVH6, which also coπesponds to amino acids 56 - 154 of TI 1628_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of TI 1628_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQLVLNNWGKNEAI IPGHGQEVLIRLFKGHPETXEK.FDKFKΗLKSEDE of T1 Ϊ628JPEA_J_P10. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076J?EA_1 _P3, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAXPME RWCGGSRSGSCAHPHHQVWFRCLPGEFVSEALLWEGCPXFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DΠTGMPGEISEHEGFLRAKMDLEERRMRQIΝE\^IREWAMADΝQSKΝLPKADRQALΝ EHFQSILQTLEEQVSGERQRLΛ^THATRVIALINDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRHYQHΛ^AAVDPEKAQQMRFQVHTHLQVΓEERVNQSLGLLD
QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKD coπesponding to amino acids 1 - 517 of APP1 JrP MAN, which also coπesponds to amino acids 1 - 517 of M7S076JPEA_1 JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE coπesponding to amino acids 518 - 519 of M78076J?EA_1 JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYNCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQA£EEEETVPPPSSHTLA\NGKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALN EHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQΛDPPQAERVLL ALRRYLPAEQK£QRHTLRHYQHVAAVDPEKAQQMRJQVHTHLQVIEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG coπesponding to amino acids 1 - 526 of APP1JHPUMAN, which also coπesponds to amino acids 1 - 526 of M78076JPEA_1 JP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ECLTNNPSLQIPLNP coπesponding to amino acids 527 - 541 of M78076_PEA_J JP4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076JPEA_1 JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECLTVNPSLQFPLNP in M78076_PEA_1_P4. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1 JP12, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQVWFRCLPGEFNSEALLWEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAKMDLEERPMRQIΝE IREWAΛ'IADΝQSKΝLPKADRQALΝ EHFQSILQTLEEQVSGERQRL\ΠETHATRVIALIΝDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKΕQRHTLRHYQHNAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG coπesponding to amino acids 1 - 526 of APP1_HUMAN, which also coπesponds to amino acids 1 - 526 of M78076J?EA_1JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence ECVCSKGFPFPLIGDSEG coπesponding to amino acids 527 - 544 of M78076JPEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076jPEA_l _P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECVCSKGFPFPLIGDSEG in M78076JPEA_1JP12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1 JP14, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLE YCRQM YPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQP\T)DYF\/EPPQAEEEEET PPSSHTLA GKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAKMDLEERR^^RQIΝEVMREWAΛ1A ΝQSKΝLPKADRQALΝ EHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQλTEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGST EQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDEL coπesponding to amino acids 1 - 570 of APP1 _HUMAN, which also coπesponds to amino acids 1 - 570 of M78076JPEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP coπesponding to amino acids 571 - 619 of M78076JPEA_J JP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076JPEA_1 JP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP in M78076JPEA_1J314. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1 JP21, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSL AGGSPG A AE APGS AQ V AGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEHEGFLP^ θ lDLEERRMRQINEVMREWAMAX)NQSKNLPKADRQALN E coπesponding to amino acids 1 - 352 of APPl JHUMAN, which also coπesponds to amino acids 1 - 352 of M78076JPEA_1_P21, and a second amino acid sequence being at least 90 % homologous to
AERVLLALRRYLRAEQKΈQRHTLRH ^QHVAAVDPEKAQQMRFQVHTHLQVIEERVNQ SLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMT LPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDELAPAGTGVSREA VSGLL GAGGGSLΓVXSMLLLRRK PYGAISHGVVEVDPMLTLEEQQLRELQRHGYE
NPTYRFLEERP coπesponding to amino acids 406 - 650 of APP1_HUMAN, which also coπesponds to amino acids 353 - 597 of M78076JPEA_1JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of M78076JPEA_J JP21, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EA, having a structure as follows: a sequence starting from any of amino acid numbers 352-x to 352; un¬ ending at any of amino acid numbers 353+ ((n-2) - x), in which x varies from 0 to n-2.
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1_P24, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTA VGDPSTRS WPPG SRVEGAEDEEEEESFPQPVDDWVEPPQAEEEEETWPPSSHTLA GKVTPTPRPTDGV DI\TGMPGEISEHEGFLPAI<LMDLEERPxMRQr^VMPvEW
EHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLD QNPHLAQELRPQI coπesponding to amino acids 1 - 481 of APP1JPIUMAN, which also coπesponds to amino acids 1 - 481 of M7S076JPEA_1 J?24, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RECLLPWLPLQISEGRS coπesponding to amino acids 482 - 498 of M78076_PEA_1 JP24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076_PEA_1_P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RECLLPWLPLQISEGRS in M78076JPEA_1_P24. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1 JP2, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGP TLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQM\TELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQVNPFRCLPGEFNSEALLWEGCP ^LHQERMDQCESSTRRHQ EAQEACSSQGLLLHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGA£DEEEEESFPQPVDDYFVEPPQA£EEEETVPPPSSHTLAVNGKVTPTPRPTDGV DIWGMPGEISEHEGRT.RAKMDLEERRMRQIΝEVLVΠVΕWAMADΝQSKΝLPKADRQA^ EHFQSΓLQTLEEQVSGERQRLVETHAT VIALINDQRRAALEGFLAALQADPPQAERVLL
ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQV coπesponding to amino acids 1 - 449 of APP1 JHUMAN, which also coπesponds to amino acids 1 - 449 of M78076J?EA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
LTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLT CCVFDPCFLALGFLLPPPSILCSVPWΓFTAFPRIVFFFFFFLRQVLALSPRQESSVRSWLIAT
STSWVQAILLPQPLE coπesponding to amino acids 450 - 588 of M78076JPEA_1_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076J?EA_1 JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLT
CC DPCFLALGFLLPPPSILCS WIFTAPPRIWFFFFFLRQVLALSPRQESSVRSWLIAT STSWVQAILLPQPLE in M78076_PEA_1_P2.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78076JPEA_1_P25, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME
RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG
SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMAD^^ EHFQSΓLQTTEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLL
ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ coπesponding to amino acids 1 - 448 of APP1_HUMAN, which also coπesponds to amino acids 1 - 448 of M78076JPEA_1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
PQNPNSQPP^ .λGSLEVIISHPFVPJlLEιLISPFQFQNSIPKNSQlVPAASPRGTSSP coπesponding to amino acids 449 - 505 of M78076JPEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M78076J?EA_1 JP25, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PQNPNSQPPvAAGSLEVπSHPF ^RRLEILISPFQFQNSIPKNSQiVPAASPRGTSSP in M78076_PEA_1JP25. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M85491JPEA_J_P13, comprising a first amino acid sequence being at least 90 % homologous to MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIR TYQVC^FVFESSQNNWLRTK^IRRRGAHRIHNE^I FSNRDCSSIPSWGSCKETF LYYY EADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKTNTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEANENGTVCRGCPSGTFKANQGDEACTHCPLN SRTTSEGATNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDSG GREDLWNIICKSCGSGRGACTRCGDNNQYAPRQLGLTEPRIYISDLLAHTQYTFEIQAV NGVTDQSPFSPQFASVNITTNQAAPSAVSΓMHQVSRTVDSITLSWSQPDQPNGVILDYEL
QYYEK coπesponding to amino acids 1 - 476 of EPB2 _HUMAN, which also coπesponds to amino acids 1 - 476 of M85491 JPEA_1 JP13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPIGWVLSPSPTSLRAPLPG coπesponding to amino acids 477 - 496 of M85491_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M85491 _PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPIGWVLSPSPTSLRAPLPG in MS5491_PEA_1JP13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M85491 JPEA_1 JP14, comprising a first amino acid sequence being at least 90 % homologous to
MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMNHPPSGWEEVSGYDENMNTIR TYQVCNVFESSQNNWLRTKPIRPRGAHRIHVEMKFSVPJDCSSIPSVPGSCKΈTFNLY Ύ
EAX^FDSATKTFP TWMENPWVKVDTIAAL1ESFSQVDLGGRVMKINTEVRSFGPVSRSGF YL AFQDYGGCMSLIA VRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCR coπesponding to amino acids 1 - 270 of EPB2JHTUMAN, which also coπesponds to amino acids 1 - 270 of M85491 JPEA_1 JP14, and a second amino acid sequence being at least 70%, optionally at least 80%», preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL coπesponding to amino acids 271 - 301 of M85491 JPEA_1 JP14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of M85491 JPEA_1 JP14, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ERQDLTMLSRLVLNSWPQMJLPPQPPKVLEL in MS5491_PEA_1_P14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSSTROL3 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAV/T.RSAAARALLPPM LLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGWDPSDGLSAPNRQKRFVLSGGRWEKTDLTYRILRFP
WQLVQEQVRQTMAEAJLKNWSDVTPLTFTEVHEGRADIMIDFARYW coπesponding to amino acids 1 - 163 of MM11 JHUMAN, which also coπesponds to amino acids 1 - 163 of
HSSTROL3JP4, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3_P4, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGILAHAPFPKTHREGDVHTO\TJETWTIGDDQGTDLLQVAAITEFGHNLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTΝ El APLEPDAPPDACE ASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYP ALASRHWQGL PSPVDAAFEDAQGfflWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPE KNKIYFFRGRDYWRFILPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG conesponding to amino acids 165 - 445 of MM1 1 JHUMAN, which also coπesponds to amino acids 165 - 445 of HSSTROL3JP4, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
ALGVRQLVGGGHSSRFSHLWAGLPHACHRKSGSSSQVLCPEPSALLSVAG coπesponding to amino acids 446 - 496 of HSSTROL3 JP4, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSSTROL3JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG in HSSTROL3JP4.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSSTROL3 JP5, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAWLRSAAAI^ALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPPPASSLRPPRCGVPDPSDGLSARNRQKPJΛ^I^GGRWEKTDLTΥRILRFP WQLVQEQVRQTMAEAL.KVWSDVTPLTTTΕVHEGRADIMIDFARYW coπesponding to amino acids 1 - 163 of MM 11 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP5, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3JP5, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGILAITAFFPKTFfREGD\/HFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWTNTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPG\TALASRHWQGL PSPVDAAFEDAQGHIWFFQ coπesponding to amino acids 165 - 358 of MM 11 JHUMAN, which also coπesponds to amino acids 165 - 358 of HSSTROL3JP5, and a third amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELGFPSSTGRDESLEHCRCQGLHK coπesponding to amino acids 359 - 382 of HSSTROL3JP5, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSSTROL3JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%» and most preferably at least about 95% homologous to the sequence ELGFPSSTGRDESLEHCRCQGLHK in HSSTROL3_P5. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H8STROL3JP7, comprising a first amino acid sequence being at least 90 % homologous to λLAPAAWLRSAAARALLPPMLLLLLQPPPLLAPvALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLPPPRCGVPDPSDGLSARMlQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW coπesponding to amino acids 1 - 163 of MM11 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP7, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3JP7, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAILAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTKGELFFFKAGFNWPXRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHlWFFQG coπesponding to amino acids 165 - 359 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 359 of HSSTROL3JP7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least S5%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV coπesponding to amino acids 360 - 370 of HSSTROL3JP7, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSSTROL3 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3JP7. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSSTROL3JP8, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAWLRSAAARALLPPMLLLLLQPPPLLAJIALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPPxPASSLRPPRCGVPDPSDGLSAPNRQKPvFVLSGGRλVEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW coπesponding to amino acids 1 - 163 of MMl 1 JHUMAN, which also coπesponds to amino acids 1 - 163 of
HSSTROL3JP8, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3JPS, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGILAHAFFPKTHPJEGDVHFDYOETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTNTSRTPALGPQAGIDTΝ EIAPLE coπesponding to amino acids 165 - 286 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 286 of HSSTROL3JP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VRPCLPVPLLLCWPL coπesponding to amino acids 287 - 301 of HSSTROL3_P8, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSSTROL3JPS, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRPCLPVPLLLCWPL in HSSTROL3_P8. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSSTROL3 JP9, comprising a first amino acid sequence being at least 90 % homologous to LAPAAWLRSAAARA LPPMLLLLLQPPPLLAPxALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK coπesponding to amino acids 1 - 96 of MMl 1 JHUMAN, which also coπesponds to amino acids 1 - 96 of HSSTROL3JP9, a second amino acid sequence being at least 90 % homologous to PJLRFPWQLVQEQVRQTMAEAX.KVWSDVTPLTFTEVHEGPvADOvlLDFARYW coπesponding to amino acids 113 - 163 of MMl 1 JHUMAN, which also coπesponds to amino acids 97 - 147 of HSSTROL3JP9, a bridging amino acid H conesponding to amino acid 148 of HSSTROL3 JP9, a third amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASPvHWQGL PSPVDAAFEDAQGHIWFFQG coπesponding to amino acids 165 - 359 of MMl 1 JHUMAN, which also coπesponds to amino acids 149 - 343 of HSSTROL3 JP9, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV corresponding to amino acids 344 - 354 of HSSTROL3_P9, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSSTROL3 JP9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KR, having a structure as follows: a sequence starting from any of amino acid numbers 96-x to 96; and ending at any of amino acid numbers 97+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSSTROL3 JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3 JP9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AYl 80924JPEA_J_P3, comprising a first amino acid sequence being at least 90 % homologous to
MLNNSGLFVLLCGLLVSSSAQEVLAGVSSQLLN coπesponding to amino acids 1 - 33 of LATH JHUMAN, which also coπesponds to amino acids 1 - 33 of AYl 80924 _PEA_1 JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GETVLLWVMQNPEPMPVKFSLAKYLGHNEHY coπesponding to amino acids 34 - 64 of AY 180924 JPEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of AYl 80924 JPEA_1J?3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GETVLL MQNPEPMPVKFSLAKYLGHNEHY in AYl 80924 _PEA_1_P3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R75793 _PEA_1 JP2, comprising a first amino acid sequence being at least 90 % homologous to
MKTH.AVLVLLGVSIFLVSAQNPTTAAPADTYPATGPADDEAPDAETTAAATTATTAAPT TATTAASTTARKDIP coπesponding to amino acids 1 - 74 of Q96DR8, which also coπesponds to amino acids 1 - 74 of R75793J?EA_1JP2, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence AP coπesponding to amino acids 75 - 76 of R75793_PEA_J JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCA1XIAJP14, comprising a first amino acid sequence being at least 90 % homologous to
MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRXNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIG VE VGRSPVFLFEDHTGKPAPED TLFRTVNIADGKWHRVAIS\T5KKTNTM IVDCKJ<XTTKPLDRSERAIVDTΝGITWGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EAΝIVDDFQEY YGTMESYQTEAPRHVSG1ΝEPΝPVEEIFTEEYLTGEDYDSQRKΝSED TL\ΈΝKJ3ΓDGRDSDLLVDGDLGEYDFYEYKE ΈDKPTSPPΝEEFGPGVPAETDITETSΓ GHGAYGEKGQKGEPA VVEPGMLVEGPPGPAGPAGLMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQAPJALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG PIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPPGPQGPIGYPGPRGVK GADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGEVGQIGPRGEDGPEGPKGRAGPT GDPGPSGQAGEKGKLGVPGLPGYPGRQGPKGSTGFPGFPGANGEKGARGVAGKPGPR GQRGPTGPRGSRGARGPTGKPGPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGP PGKDGLPGHPGQRGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGEQG LPGAAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQGPPGP
V coπesponding to amino acids 1 - 1056 of CA IB JHUMAN JV5, which also coπesponds to amino acids 1 - 1056 of HUMCA1XIAJ 14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSMMIΓNSQTIMWNYSSSFITLML coπesponding to amino acids 1057 - 1081 of HUMCA1XIAJP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCA1XIAJP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSMMIINSQTIMVNNYSSSFrTLML in HUMCA1XIA_P14. According to prefeπed embodiments of the present invent ion, there is provided an isolated chimeric polypeptide encoding for HUMCA1XIAJP15, comprising a first amino acid sequence being at least 90 % homologous to
MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSΓY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVDCKKXTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDΠEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEΓFTEEYLTGEDYDSQRKNSED TLYENKEMGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSRN GHGAYGEKGQKGEPA VNEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAΓLQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG
PIGPPGEK coπesponding to amino acids 1 - 714 of CA1BJHUMAN, which also coπesponds to amino acids 1 - 714 of HUMCA1XIAJP15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MCCNLSFGILIPLQK coπesponding to amino acids 715 - 729 of HUMCA1XIAJP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCA1XIAJP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCCNLSFGILIPLQK in HUMCAIXIA JP15. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCA1XIAJP16, comprising a first amino acid sequence being at least 90 % homologous to
MEPWSSRWKTIOIWLWDFTVTTLAX.TFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIG T GRSP LFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM I VDCKKKTTKPLDRSERAI VDTNGlTVFGTRiLDEE VFEGDIQQFLITGDPKAA YD YCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAE T EAESVTEGPTVTEETIAQT EA 1N^DFQEY^^YG1 ESYQTEAPRHVSGTOEPNPVEEIFTEEYLTGEDYDSQRKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSIN GHGAYGEKGQKGEPA EPGMLNEGPPGPAGPAG GPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTIS AQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGE1GPRGLPGEA coπesponding to amino acids 1 - 648 of CA IB JHUMAN, which also coπesponds to amino acids 1 - 648 of HUMCA1XIAJP16, a second amino acid sequence being at least 90 % homologous to GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEK coπesponding to amino acids 667 - 714 of C A 1 B JHUMAN, which also coπesponds to amino acids 649 - 696 of HUMCA1XIAJP16, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSFSFSLFYKXVIKFACDKTlFVGPxJHDERKVVKLSLPLYLIYE coπesponding to amino acids 697 - 738 of HUMCA1XIAJP16, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMCA1XIAJP16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCA1XIAJP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85?/0, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSFSFSLFYKXVKFACDKRFNGRHDERKVVKLSLPLYLrYE in HUMCA1XLAJP16. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCA1XIAJ 17, comprising a first amino acid sequence being at least 90 % homologous to IEPWSSRWKTKPWLWDFTVTTLALTFLFQAJIEVRGAAPVDVLKALDFHΝSPEGISKTT GFCTΝT KΝSKGSDTAYRVSKQAQLS.APTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSπ/ NEHGIQQIGVE VGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAIS VEKKTVTM IVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDE coπesponding to amino acids 1 - 260 of CA IB JHUMAN, which also conesponds to amino acids 1 - 260 of HUMCA1XIAJP17, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRSTRPEKVFVFQ coπesponding to amino acids 261 - 273 of HUMCA1XIAJ 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCA1XIAJP17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRSTRPEKλ FVFQ in HUMCA1XLA_P17. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R20779_P2, comprising a first amino acid sequence being at least 90 % homologous to MCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNTAEIQHCLV NAGDVGCG ECFENNSCEIRGLHGICMTTH.HNAGKFDAQGKSFIKDALKCKAHALRH RFGCISRKCPA EMVSQLQRECYLKIHDLCAAAQENTRVIVEMIHFKDLLLHE coπesponding to amino acids 1 - 169 of STC2 JHUMAN, which also coπesponds to amino acids 1 - 169 of R20779JP2, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence CYKIEITTVIPKIUIKVKLRD coπesponding to amino acids 170 - 187 of R20779JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R20779JP2, comprising a polypeptide being at least 70%), optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence CYKIEITMPKRRKVKLRD in R20779_P2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4J?EA_l JP3, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLΓWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQVVKGSVFLR
NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGΓNLLFSSRRGHLFLQTDQPΠNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKKΕVYMPSSIFQDDFVIPDISEPGT VTΑSAJ^SDGLESNSSTQFEVKKYVL PNIΕVOTPGKPYILTNPGHLDEMQLDIQAJ YIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLNΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYNAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYY ^MILSRGQI MΝPEPKJ TLTSVSVFVDHHLA PSFYFVAFYYΉGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPJ -P SCPKEKTTRKKRΝVOT LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPEΝWLWRVETVDRFQILTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTN coπesponding to amino acids 1 -
865 of C04JHUMAΝ, which also coπesponds to amino acids 1 - 865 of HSC0C4_PEA_1JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least
85%), more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RPHRSLSIQELGEPGPSEGWGG coπesponding to amino acids 866 - 887 of HSCOC4JPEA_l JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
RPHRSLSIQELGEPGPSEGWGG in HSCOC4JPEA_JJP3. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP5, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGΠSILLFSSRRGHLFLQTDQPIYOTGQRVRYR ALDQKMRPSTDTΓΓVMV ENSHGLRVRKKEVYMPSSIFQDD IPDISEPGTWIΑSARFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTK1VNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFS ΥYMILSRGQIVFMNILEPKRTLTSVS VDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGK ELSVDGAKQYR GESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKΕKTTRKKPVNVNFQKAINEKLGQYASPTAKJICCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLPJKXSRDKGQAGLQRALERLQEEDLID
EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWEIHGLSLSKTKG conesponding to amino acids 1 - 818 of C04 JHUMAN, which also coπesponds to amino acids 1 - 818 of
HSCOC4JPEA_l JP5, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at leant 90% and most preferably at least 95% homologous to a polypeptide having the sequence D TLSGPQVTLLPFPCTPAPCSLCS coπesponding to amino acids 819 - 843 of HSCOC _PEA_l J?5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DVTLSGPQNrLLPFPCTPAPCSLCS in HSCOC4_PEA_l JP5. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4_PEA_J JP6, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR ΝPSRΝΝNPCSPKVDFTLSSERDFALLSLQ LKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIYΝPGQRVRYRVFALDQKMRPSTDTITVMV EΝSHGLR ^PKKEV\^PSSIFQDDFVIPDISEPGTWKISARFSDGLESΝSSTQFEVKK RVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SRPPRVGDTLNLNLRAVGSGATFSHYY TVIILSRGQΓVTMNREPKRTLTSVSVFVDHHLA PSF\TVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG
LAFSDGDQWTLSRKRLSCPKEKTTPi iαiNVNFQKAINEKLGQYA^^ LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL
R REFHLHLRLPMSVP^EQLELRPVLYNYLDK TVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSWPTAAAANSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREEL VYELΝPLDHRGRTLEΓPGΝSDPΝMIPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKG conesponding to amino acids 1 - 1052 of C04 HUMAN, which also coπesponds to amino acids 1 - 1052 of HSCOC4J?EA_l J?6, and a second amino acid sequence being at least 70%, optionally at least
80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SGCKGKQEGGQERTVTGRWTAQEATEGKKGGP coπesponding to amino acids 1053 - 1084 of HSCOC4J?EA_l _?6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4J?EA_l_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SGCKGKQEGGQERTVTGRWTAQEATEGKKGGP in HSCOC4_PEA_l_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP12, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLΓWASSFFTLSLQKPRLLLFSPS WHLG VPLSVGVQLQDVPRGQWKGSVFLR
NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTT QGINLLFSSRRGITLFLQTDQPIYNPGQRVRYR ALDQKMRPSTDTITVMV ENSHGLRVRKKEWTVIPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYBLTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTL WGQSFFLSLSKAEFQDALEKLNMGITDLQGLPSLWAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKEIHLWGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQΉSELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTXNLNLRAVGSGATTSMNΥMTLSRGQINFMNREPKRTLTSVS VDHHLA
PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG
LAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLID EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVPVRFEQLELRPVL T^YLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSAPJ>VAFSVWTAAAAVSLKWARGSFEFPVGDAVSKVLQFFIKEGAIHREEL WΈLNPLDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTTvπ APTLAASR TDKTEQWSTXPPETi )HAVDLIQKGYMI IQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGSV TGSQSNAVSPTPAPP PSDPMPQAPALWTETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLNNTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKINVKVGGNSKGTLKV conesponding to amino acids 1 - 1380 of CO4JrIUMAN_Vl, which also coπesponds to amino acids 1 - 1380 of HSCOC4_PEA_l_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RAREGVGPGTGGGEGVE coπesponding to amino acids 1381 - 1397 of HSC0C4JPEA_1JP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA„l JP 12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RAREGVGPGTGGGEGVE in HSCOC4JPEA_l_P12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP15, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGWLSVGVQLQDVPRGQVVKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGΓNLLFSSRRGFU^FLQTDQPΓYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVP KXΕVYMPSSIFQDDFVIPDISEPGTWKISAP FSDGLESNSSTQFEVKI YVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKJRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLP^VGSGATFSHYYYMLLSRGQIW INPXEPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLR\ VQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LAJLVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSR P^LSCPKEKTT K Pi NFQKAXT^
LPMMRSCEQPAARVQQPDCREPFLSCCQFAESLPvJG SRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFFlLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSAPPVAFSVVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSYVRNrASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMWQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝ VXLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝ^MAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLΝVTLSSTGRΝGFKSHALQLΝΝRQ IRGLEEELQ coπesponding to amino acids 1 - 1359 of C04 _HUMAΝ_V 1 , which also coπesponds to amino acids 1 - 1359 of HSCOG4_PEA_l JP15, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence VNHSLVNHSLAWNARTPGPRGQARSRPQPPTRGIPAALLPGVFGGRLTSWLRDLEL coπesponding to amino acids 1360 - 1415 of HSCOC4JPEA_l JP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VΝHSLVΝHSLAWVARTPGPRGQARSRPQPPTRGΓPAALLPGVFGGRLTSWLRDLEL in HSCOC4_PEA_l_P15. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP16, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQWKGSVFLR ΝPSRΝΝNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYR AI.DQKMPVPSTDTITVMV ENSHGLRVRKI<Ε TVRPSSIFQDDFVIPDISEPGTWKJSARFSDGLESNSSTQF PNFEVKITPGKPYILT GHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSFFLSLSKAEFQDALEKL^U GIT^LQGLP NAAAXTESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHL GAPFLLQALNRΕMSGSPASGIPVKVSATNSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPALARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMILSRGQINFMΝREPKRTLTSVSVFVDHHLA PSFWVA YYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQ TIΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPJ RLSCPKEKTTPVKKX^NVNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAAPVVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVΈTVDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVL WLDKNLTVSVIIVSPVEGLCLAGGGGLAQ QVLWAGSARPVAFSVWTAAAAVSLKWARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELΝPLDHRGRTLEIPGΝSDPΝMIPDGDFΝS YNRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMΓYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLK\T.SLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVLALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKJ Θ VGGNSKGTLKVLRT T LDMKNTTCQDLQFFIVTVKGHVE
YTMEA ffiDYEDYEYDELPAJϋ^DPDA LQPVTPLQLFEGRRMlRR EAPK coπesponding to amino acids 1 - 1457 of C04JHUMANJ 1, which also coπesponds to amino acids 1 - 1457 of HSCOC4J?EA_l JP16, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
AJERQGGAVWHGHRGRHPPEWIPRPAC coπesponding to amino acids 1458 - 1483 of HSCOC4JPEA_l J?16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4_PEA_J J?16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AERQGGANWHGHRGRHPPEWPRPAC in HSC0C4JPEA_1JP16. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4 JPEA_1 JP20, comprising a first amino acid sequence being at least 90 % homologous to
MPXLWGLIWASSFFTLSLQL^RLLLFSPSVVFLLGΛ LSVGVQLQDVPRGQVVKGSVFLR ΝPSRΝΝNPCSPKVDFTLSSERDF ALLSLQVPLKDAKSCGLHQLLRGPEVQLV AHSPWLK DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIYΝPGQRΛ IYR ALDQKMRPSTDTITVMV EΝSHGLRVRKKEVYMPSSIFQDDFVIPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVL PΝFEV TPGKPYILT GHLDEMQLDIQARYIYGKPVQGVAYVPXPGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVS ATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPALARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝL U^RAVGSGATFSHYYYΛIILSRGQIVF ΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPVKP SCPKEKTTPJOOU NFQKAR^
LPMMRSCEQRAAJLVQQPDCPJIPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDLPVRSFFPEN VLWRVET 'ORFQRLTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSVVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKΕGAIHREEL WΈLNPLDHRGRTLEΓPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LPJ PRGCGEQT YLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ
DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS
FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAIVIAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWΓETTAYALLHLLLHEGKAEMADQAAAWLTR
QGSFQGGFRSTQ coπesponding to amino acids 1 - 1303 of C04_HUMAN_V1, which also coπesponds to amino acids 1 - 1303 of HSCOC4JPEA_l JP20, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGAVPGLWRGWΛNLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH coπesponding to amino acids 1304 - 1349 of HSCOC4JPEA_l JP20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence VGAVPGLWRGWWLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH in HSCOC4JPEA_1JP20. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_J JP9, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQ LKDAKSCGLHQLLRGPEVQLVAHSPλVXK DSLSRTTNIQGINLLFSSRPGHLFLQTDQPIYNPGQRVRYR ALDQKMRPSTDTITVMV
ENSHGLRVRKI^VYMPSSIFQDDFVIPDISEPGT VTΑSAPJFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYILTVPGHLDEMQLDΪQARYLYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIΓESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGRPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPΠIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFSHYΎYMILSRGQIVFMNREPKPTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVA SLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG
LAFSDGDQWTLSPJCRLSCPKEKTTPVKKX^NVNFQKA LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID
EDDΓPVRSFFPENWLWR\ΈTVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVPJIFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVL AGSARPVAFSVVPTAAAAVSLKVNAPGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTLEΓPGNSDPNMΓPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASR XDKTEQWSTTPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPA LRGVAHNNLMAMAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIAXDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ RRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQEVTVKGHVE YTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNRRRREAPKVVEEQESRV HYTVCIWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRWSHFETEGPHVLL
YFDSV coπesponding to amino acids 1 - 1529 of C04 JHUMAN V 1 , which also coπesponds to amino acids 1 - 1529 of HSC0C4J?EA_1JP9, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SGER coπesponding to amino acids 1530 - 1533 of HSCOC4_PEA_l_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4_PEA_l_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about S5%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGER in HSCOC4_PEA_l_P9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP22, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNΛ PCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKKEVYMPSSIFQDDFVIPDISEPGTWKISAJRFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAY FGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYNAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTΌGSGQVSIPIIΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD
SRPPRVGDTLNL tti^VGSGATFSHYYYMILSRGQIVFMr^PKT ^
PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESNFVLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAPSDGDQWTLSRKRLSCPKEKTTRKI RΝVΝFQKAL^
LP^IMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLID EDDΓPVRSFFPEΝWLWRVET\HDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVI P^EQLELRP\T. 7NYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSARPVAFSVWTAAAAVSLKWARGSFEFPVGDAVSKΛH.QFFIKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNS YVRVTASDPLDTLGSEGALSPGG V ASL LP PRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPAJ LRGVAHNNLMAMAQETGDNLYWGSV TGSQSNA VSPTPAPRNPSDPMPQ APALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVLALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ ULGLEEELQFSLGSKJNVKVGGNSKGTLKVLRTYNNLDMKNTTCQDLQFFIVTVKGHVE YTMEAΝEDYEDYE ELPAK^DPDAPLQPVTPLQLFEGPR^JPJΛRREAPKVVEEQESRV FTI^TVCIWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYNSHFETEGPHNLL YFDSVPTSRECVGFEAVQEWVGLVQPASATLYDY WERRCSVFYGAPSKSRLLATLC SAEVCQCAEGKCPRQRRALERGLQDEDGYRMKFACYYPRVEYGFQVKVLREDSRAAF
RLFETKITQVLHF coπesponding to amino acids 1 - 1653 of C04JHUMAΝ_V1, which also coπesponds to amino acids 1 - 1653 of HSCOC4J?EA_l JP22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SMKQTGEAGRAGGRQGG coπesponding to amino acids 1654 - 1670 of HSCOC4J?EA_l JP22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4J?EA_l_P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SMKQTGEAGRAGGRQGG in HSCOC4_PEA_l JP22. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP23, comprising a first amino acid sequence being at least 90 % homologous to
MP LWGLIWASSFFTLSLQKPPJXLFSPSWHHLG LSVGVQLQDWRGQWKGSVFLR NPSRNNNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGHLFLQTDQPLYΝPGQRVRYR ALDQKMRPSTDTIT\/MV ENSHGLRVRKKEVYMPSSIFQDDFWDISEPGTN KISAPFSDGLESNSSTQFEVKKYVL PNFE VKITPGKPYILTVPGHLDEMQLDIQ ARYI YGKPVQG V A YVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAΠESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSRPIIIPQTISELQLSVSAGSPHPAIARLTVNAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFSHY 'YTVIILSRGQIΛ^MNREPKRTLTSVSWVDHHLA PSFYFVAFYYHGDHPVANSLRVD VQAGACEGKLELSVDGAKQYRNGES VKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKP SCPKEKTTPXK^NVNFQKAINEKIGQYASPTAKRCCQDGVTR LPMMRSCEQRAAJ VQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRJ^EQLELRPVL T^LDKNLTVSVHVSPVEGLCLAGGGGLAQ QVL AGSAFJ>VAFSV TAAAAVSLKWARGSFEFPVGDAVSKVLQFFIKEGAIHREEL WELNPLDHRGRTLEIPGNSDPNMIPDGDFNSWRVTASDPLDTLGSEGALSPGGVASL LP PRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEI^SAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNLALDALSAYWLASHTTEERGLΝΛ^TLSSTGRΝGFKSHALQLΝΝRQ IRGLEEELQFSLGSKJ VVKVGGΝSKGTLKVLRTYΝVLD ^KΝTTCQDLQFFIVTNKGHVE YTMEAΝEDYEDYEYDELPAKJDDPDAPLQPVTPLQLFEGRRΝRRRREAPKWEEQESRV M^TVCIWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDSVPT PxECVGFEAVQEVPVGLVQPASATLYDYYNPERRCSVFYGAPSKSRLLATLC SAEVCQCAEGKCPRQRRALERGLQDEDGYRMKFACYYPRVEYG corresponding to amino acids 1 - 1626 of C04 JHUMAN _V1, which also coπesponds to amino acids 1 - 1626 of HSCOC4J?EA_l JP23, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
QSSHRGPGLTLPRGPAVLVSLGVACSSYRSCTQPVCSDTNFLPSQPQSNSPFPLLLTPS coπesponding to amino acids 1627 - 1685 of HSCOC4JPEA_l JP23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4J?EA_l JP23, comprising a polypeptide being at least 70%, optionally at least about S0%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QSSHRGPGLTLPRGPAVLVSLGVACSSYRSCTQPVCSDTNFLPSQPQSNSPFPLLLTPS in HSC0C4JPEA_1JP23. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4_PEA_l_P24, comprising a first amino acid sequence being at least 90 % homologous to
MPJXWGLIWASSFFTLSLQKPP^LFSPSWHLG LSVGVQLQDΛWGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVPJ EVYMPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKK NL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIΓESPGGEMEEAE LTSWYFVSSPFSLDLSKTK-RHLVPGAPFLLQALVREMSGSPASGIPVKVSAT\^SSPGSVP EVQDIQQNTDGSGQVSIPIUPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SRPPRVGDTLNLNLRAVGSGATFSM^YYMILSRGQIVFMNREPKRTLTSVSVF\T3HHLA PSFYFVAΓYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAPSDGDQWTLSRM^SCPKEKTTRXKRNVNFQKAINEKLGQYASPTAKPCCQDGVTR
LPMMRSCEQRAAPVQQPDCREPFLSCCQFAESLPV KSRDKGQAGLQRALEILQEEDLΓD EDDIPVRSFFPENWLWRVETVDRFQILTXWLPDSLTTWEfflGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHNSPVEGLCLAGGGGI AQ QVLWAGSARPVAFSVVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQffiKEGAIHREEL VYELΝPLDHRGRTLE1PGΝSDPΝM1PDGDFΝSY VTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTTAASRYLDKTEQWSTLPPETΕI HAVDLIQKGYMWQQFRK ADGSYAAW SRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALS AYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKJNVKVGGNSKGTTKVLRTYNVLDMKNTTCQDLQIEVTVKGHVE YTMEA EDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGPΛNRPJlREAPK EEQESRV HYTVCIWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDS conesponding to amino acids 1 - 1528 of C04 JHUMA VT, which also coπesponds to amino acids 1 - 1528 of HSCOC4_PEA_l_P24, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SADVLCFTGHQVRADSWPPCVLLKSASVLRGSALASVAPWSGVCRTRMATG coπesponding to amino acids 1529 - 1579 of HSCOC4JPEA_l JP24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SADVLCFTGHQVRADSWPPCVLLKSASVLRGSALASVAPWSGVCRTRMATG in HSCOC4_PEA_JJP24. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for H8COC4JPEA_l JP25, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGΓNLLFSSRRGHLFLQTΌQPIYNPGQRVRYRWALDQKMRPSTDTITVMV ENSHGLRVRI KΕVYMPSSIFQDDFVIPDISEPGT KISARFSDGLESNSSTQFEVKKYNL
PΝFEVKITPGKPYΓLTVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR
GLESQTKLVΝGQSFFLSLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKPVXLLVPGAPFLLQALVPXJIMSGSPASGIPVKVSATNSSPGSVP EVQDIQQΝTDGSGQVSΓPΠIPQTISELQLSVSAGSPHPALARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATTSM^ΥMILSRGQI MΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGK EAMΝSYDLGCGPGGGDSALQVFQAAG L AFSDGDQ WTLSRKRLSCPKEKTTRKJ<31ΝVΝFQ1^IΝEKLGQ YASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCPVEPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENΛVXWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMS PJΕQLELRPVLYNYLDK TVSVHVSPVEGLCLAGGGGLAQ QVL AGSARPVAFSVVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREEL VYELNPLDHRGRTLEΓPGNSDPNMΓPDGDFNS YVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKOIIAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNΛVXLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEL^SAGLLGAHAAAITAYALTLTK PADLRGVAJHNNLM LAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLNNTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKJNNKVGGNSKGTLKVLRTYNNLDMKNTTCQDLQFFIVTVKGFTVE YTMEA^DYED ΕYDELPAKDDPDAPLQPVTPLQLFEGRRNRRRREAPKWEEQESRV MΛTVCIWP^GKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDSVPTSRECVGFEAVQEVPVGLVQPASATLYDYYNPERRCSΛ YGAPSKSRLLATLC
SAEVCQCAEG coπesponding to amino acids 1 - 1593 of C04JHUMAN _V1, which also coπesponds to amino acids 1 - 1593 of HSCOC4J?EA_l JP25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG coπesponding to amino acids 1594 - 1657 of HSCOC4_PEA_l_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4jPEA_l JP25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG in HSCOC4JPEA_l_P25. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4_PEA_l JP26, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLG LSVGVQLQD RGQVVKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYR ALDQKMRPSTDTITVMV ENSHGLRVRKKEW^PSSIFQDDFVIPDISEPGT VT SAPFSDGLESNSSTQFEVKKYNL PΝFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYΛHIFGLLDEDGKKTFFR GLESQTKLVΝGQSFFLSLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSΓPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SP >PRVGDTLΝLΝLRAVGSGATFSMN^^VLILSRGQIVFMΝPJ3PKPTLTSVSVFVDHHLA
PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS
LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG
LAFSDGDQWTLSRKRLSCPKEKTTPVKXRNVNFQKAINEKLGQYASPTAJ^
LPM SCEQPVAARVQQPDCPJEPFLSCCQFAESLPJ^SRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVPJIFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSAPXPVAFSVΛΦTAAAAVSLKVVAJΛGSFEFPVGDAVSKVLQFFIKEGAIHREEL NYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSYNRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTLALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGSV TGSQSNA VSPTPAPRNPSDPMPQAPALWΓETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ GLEEELQFSLGSKJNVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQFFIVTNKGHNE YTMEAΝEDYEDYEYDELPAJ DDPDAPLQPVTPLQLFEGRRΝRRRREAPKNVEEQESRV M^TVCIWP^ΝGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYNSHFETEGPHVLL YFDS TSRECVGFEAVQEVPVGLVQPASATLYDYYΝPERRCSVFYGAPSKSRLLATLC
SAEVCQCAEG coπesponding to amino acids 1 - 1593 of C04 JHUMAN _V1, which also coπesponds to amino acids 1 - 1593 of HSCOC4_PEA_l_P26, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL coπesponding to amino acids 1594 - 1691 of HSCOC4J?EA_l JP26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4J?EA_l_P26, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ETEGL GRGSGGGMAGAPPTLSDGFPNFRE VPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL in HSCOC4_PEA_l JP26. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP30, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGΓNLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKKEVYΛ4PSSIPQDDFVIPDISEPGTN SAPJ^SDGLESNSSTQFEVKK\NL PNFEVKITPGKPYILTVPGFFL.DEMQLDIQARYLYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSMSLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKPJILWGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSΓPIIΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTI-NLNLRAVGSGATTSHYYYMILSRGQIVFMNPPPKPTLTSVSWVDHHLA PSF AFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYPJ^GESVKLHLETT>S LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKIILSCPKEKTT KKRNNNFQK^^
LPMMRSCEQRAAΛVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVRPPEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVL AGSAP 'VAFSVVPTAAAAVSLKVVAΛGSFEFPVGDAVSKVLQIEKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ
DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGS coπesponding to amino acids 1 - 1232 of C04 JHUMAN V3, which also coπesponds to amino acids 1 - 1232 of HSC0C4JPEA_1 JP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RNPVRLLQPRAQMFCVLRGTK coπesponding to amino acids 1233 - 1253 of HSCOC4JPEA_1_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSC0C4JPEA_1 JP30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RNPVRLLQPRAQMFCVLRGTK in HSCOC4JPEA_1_P30. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4_PEA_l_P38, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPL8VGVQLQDVPRGQWKGSVFLR NPSRNNNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITNMN EΝSHGLRVPJΣH EW/MPSSIFQDDFVIPDISEPGT\\T SAILFSDGLESΝSSTQFE\^ KYVL PNFEV TPGKP TLTNPGHLDEMQLDIQAI^YIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVΝGQSFFLSLSKAEFQDALEKXΝMGITDLQGLRLWAAALTESPGGEMEEAE LTSWYFVSSPFSLDLSKTKPHLVPGAPFLLQALVREMSGSPASGIPVKVSATNSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTXΝLΝLRAVGSGATTSHYYYMΓLSRGQIVFMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYΛΉGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLS RKI LSCPKEKTTPJXRΝNΝFQKAIΝEKLGQYASPTAKP.CCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLID
EDDIPVRSFFPENWLWRVETVDRFQE TLWLPDSLTTWEIHGLSLSKTKG coπesponding to amino acids 1 - 818 of C04JHUMAN, which also conesponds to amino acids 1 - 818 of HSCOC4J?EA_l JP38, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVTLSGPQVTLLPFPCTPAPCSLCS coπesponding to amino acids 819 - 843 of HSC0C4JPEA_1JP3S, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l _P3S, comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVTLSGPQVTLLPFPCTPAPCSLCS in HSCOC4_PEA_J_P3S. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP39, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSVΛ HLGVPLSVGVQLQDVPRGQW^KGSVFLR NPSPJ^NVPCSPKΛHDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRPGHLFLQTDQPIYNPGQR\ RYRWALDQKΛ/[RPSTDTIT IV ENSHGLRVRKKEW^PSSIFQDDFVIPDISEPGTWKISAPFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYTLTVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKXVNGQSfflSLSKAEFQDALEKLNMGITDLQGLPJ.YVAAATIESPGGEMEEAE LTSWΥFVSSPFSLDLSKTKPLDLVPGAPFLLQ coπesponding to amino acids 1 - 387 of C04_HUMAN, which also coπesponds to amino acids 1 - 387 of HSCOC4_PEA_l JP39, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSSRGEG coπesponding to amino acids 388 - 394 of HSCOC4JPEA_l JP39, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP39, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSSRGEG in HSCOC4_PEA_l_P39. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4J?EA_l JP40, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLΓWASSFFTLSLQKPRLLLFSPS\ VHLGVPLSVGVQLQDVPRGQVVKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRPGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRIO EW^MPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKY coπesponding to amino acids 1 - 236 of C04 JHUMAN, which also coπesponds to amino acids 1 - 236 of HSCOC4JPEA_1JP40, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGEWTEPHFPLKGRVPGRPGEAEYGHY coπesponding to amino acids 237 - 263 of
HSCOC4JPEA_l JP40, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP40, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and pjost preferably at least about 95% homologous to the sequence AGEWTEPHFPLKGRVPGRPGEAEYGHY in HSCOC4_PEA_l JP40. According to prrleπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP41, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPRLLLFSPSVNHLGWLSVGVQLQDVPRGQWKGSVFLR IPSRΝ CSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGΓΝLLFSSRRGITLFLQTDQPIYΝPGQRVRYR ALDQKM^ EΝSHGLRVPKKE\^YMPSSIFQDDFVIPDISEPGTWKISAPVFSDGLESΝSSTQFEVKLKYNL PΝFE VKITPGKP YILT VPGHLDEMQLDIQ ARYTYGKP VQG VA YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSFFLSLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKPXLLVPGAPFLLQALΛ IEMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFSHYYYIVIΓLSRGQIVFMNREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELS VDGAKQYRNGES VKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG
LAPSDGDQWTLSPx^d^SCPKΕKTTPJKKPNΛ NFQKAR ffiKLGQYASPT^
LPM^LRSCEQRAARVQQPDCREPFLSCCQFA£SLPJ KSRDKGQAGLQRALEILQEEDLII)
EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEEHGLSLSKTKGLCVATPVQL RVFREFFfLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSλΗVSPVEGLCLAGGGGLAQ QVLWAGSAPJ>VAFSWPTAAAAVSLKWARGSFEFPVGDAVSKVLQffiKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSWRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAF\ .KVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNIALDALSAYWIASHTTEERGLNNTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKINNKVGGNSKGTLKVLRT 'NVLDMKNTTCQDLQIEVTVKGHNE YTlVlEAlSnEDYEDYEYDELPAKODPDAPLQPVTPLQLFEGRRNRRRREAPKNVEEQESRV ^TVCIWRNGKVGLSGMAIADVTLLSGFFLALRAJDLEKLTSLSDRYVSHFETEGPHVLL YFDSV coπesponding to amino acids 1 - 1529 of C04JrIUMAN_Nl, which also coπesponds to amino acids 1 - 1529 of HSC0C4JPEA_1J?41, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGER coπesponding to amino acids 1530 - 1533 of HSCOC4J?EA_l JP41, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP41, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SGER in HSCOC4JPEA_JJP41. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP42, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR ΝPSRΝΝNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWT.K DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPLYΝPGQRVRYRVFALDQKMRPSTDTITVMV EΝSHGLRVRKJ<^VYMPSSIFQDDFVIPDISEPGTWKISA^SDGLESΝSSTQFEVKKYNL PΝFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMΓLSRGQIVFMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPJGILSCPKEKTTRKJKT^ΝVΝFQKAIΝEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDΓPVRSFFPEΝWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVRRFEQLELRPVLYΝYLDK LTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSAT^PVAFSW TAAAAVSLKVΛ^ARGSFEFPVGDAVSKVLQFFIKEGAIHREEL \^YELΝPLDHRGRTLEΓPGΝSDPΝMΓPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMTYLAPTLAASRYLDKTEQWSTLPPETKI HAVDLIQKGYMWQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETNALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQAPALWffiTTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNIALDALSAYWIASHTTEERGLΝVTLSSTGRΝGFKSHALQLΝΝRQ IRGLEEELQFSLGSKJΝVKVGGΝSKGTLKVLRTYΝVLDMKΝTTCQDLQffiVTVKGHVE YTMEANEDYEDYEYDELPA DDPDAPLQPVTPLQLF^GRRNRP lREAPKVVEEQESRV HYTVCIW coπesponding to amino acids 1 - 1473 of C04JHUMANJV1, which also coπesponds to amino acids 1 - 1473 of HSCOC4JPEA_l _P42, a second amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence WAPGAALGQGREGRTQAGAGLLEPAQAEPGRQLTRLHR coπesponding to amino acids 1474 - 1511 of HSCOC4_PEA_l JP42, a third amino acid sequence being at least 90 % homologous to RNGKVGLSGMAIADVTLLSGFHALRADLEK coπesponding to amino acids 1474 - 1503 of C04 JHUMANJ 1 , which also coπesponds to amino acids 1512 - 1541 of HSCOC4JPEA_l JP42, and a fourth amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VWSATQGNPLCPRY coπesponding to amino acids 1542 - 1555 of HSC0C4JPEA_1J?42, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCOC4JPEA_l JP42, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for WAPGAALGQGREGRTQAGAGLLEPAQAEPGRQLTRLHR, coπesponding to HSCOC4JPEA_l JP42. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP42, comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90%> and most prefej-jbly at least about 95%) homologous to the sequence VWSATQGNPLCPRY in HSCOC4 _PEA_l_P42. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTREFACJPEAJ2 JP8, comprising a first amino acid sequence being at least 90 %> homologous to
MAARALCMLGLVLALLSSSSAEEYVGL coπesponding to amino acids 1 - 27 of TFF3 JHUMAN, which also coπesponds to amino acids 1 - 27 of HUMTREFAC JPE AJ2JP8, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WKVHLPKGEGFSSG conesponding to amino acids 28 - 41 of HUMTREFAC JPE A_2_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTREFACJPEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence WKVHLPKGEGFSSG in HIJMTREFACJPEA_2JP8. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO JPE A_l JPEA_1 JP21, comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ coπesponding to amino acids 1 - 58 of OSTPJHUMAN, which also coπesponds to amino acids 1 - 58 of HUMOSTRO J?EA_1 _PEA_1 JP21, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VFLNFS coπesponding to amino acids 59 - 64 of HUMOSTRO JPEA_1JPEA_1JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMOSTRO JPE A_l JPEA_1JP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95 % homologous to the sequence VFLNFS in HUMOSTRO JPEA_1JPEA_1JP21. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HLTMOSTROJ?EA_l JPEA_1 JP25, comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQ coπesponding to amino acids 1 - 31 of
OSTP JHUMAN, which also coπesponds to amino acids 1 - 31 of
HUMOSTRO_PEA__l _PEA_1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%» and most preferably at least 95%> homologous to a polypeptide having the sequence H coπesponding to amino acids 32 - 32 of HUM0STR0JPEA_1JPEA_JJP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMOSTRO_PEA_l JPEA_1 JP30, comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQ coπesponding to amino acids 1 - 31 of
OSTPJHUMAN, which also coπesponds to amino acids 1 - 31 of
HUMOSTROJPEA_J_PEA_l JP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI coπesponding to amino acids 32 - 39 of HUMOSTRO_PEA_1_PEA_J_P30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMOSTROJPEA_l JPEA_1_P30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTRO_PEA_1_PEA_1_P30.
According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888JPEA_1 JP2, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCPXIIWWKEVLLTASLLTFN^NPPTTAKLTIESTPR^NNAEGKENLLLAHN^ QNRIGYSWYKGERVDGNSLRVGYNIGTQQATPGPAYSGRETRYPNASLLIQNNTQNDTG FYTLQVIKSDLVNEEATGQFFFVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWNNGQSLPVSPP QLSNGNMTLTLLSVKPVJTOAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFΓPNITVNNSGS
YMCQAHNSATGLNRTTVTMITNS corresponding to amino acids 1 - 319 of CEA6_HUMAΝ, which also coπesponds to amino acids 1 - 319 of T10888J?EA_1J?2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence DWTRP coπesponding to amino acids 320 - 324 of T10888JPEA_1 JP2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T 10888 JPEA_1 JP2, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888J?EA_1_P2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888JPEA_1 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTFFISTPFNVAEGKΕVLLLAHNLP QNRIGYSWY GERVDGNSLΓVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTXQVIXSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWNNGQSLPVSPPJ.QLSNG MTLTLLSVKPNDAGSYECEIQNPASANRSDPVTLNNL coπesponding to amino acids 1 - 234 of CEA 6 JHUMAN, which also coπesponds to amino acids 1 - 234 of T10888J?EA_1 JP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL coπesponding to amino acids 235 - 256 of T10888JPEA_1J?4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10S88JPEA_1 JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_J JP4. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T108SSJPEA_1 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTTESTPFNNAEGKEVLLLAHNLP QNPJGYS WYKGERVDGNSLIVG YλTGTQQ ATPGPAYSGRETIYPN ASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL coπesponding to amino acids 1 - 234 of Q13774, which also coπesponds to amino acids 1 - 234 of T10888JPEA_1 JP4, and a second amino acid sequence being at least 10%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL coπesponding to amino acids 235 - 256 of T10888JPEA_1 JP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888JPEA_1 JP4, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10S88_PEA_1_P4. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T108SSJPEA_1 JP5, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCP WKEVLLTASLLTFWNPPTTAKLTIESTPFNNAEGKEVLLLAHNLP QNRIGYSWYKGERNDGNSLINGYNIGTQQATPGPAYSGRETIYPNASLLIQNNTQNDTG FYTLQVΓKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWNNGQSLPVSPRLQLSNGNMTLTLLSVKPNDAGSYECEIQNPASANRSDPVTLNNLY GPDVPTISPSKAΝYRPGE JLΝLSCHAASΝPPAQYSWFIΝGTFQQSTQELFLPΝITVΝ SGS YMCQAHNSATGLNRTTVTMITNSG coπesponding to amino acids 1 - 320 of CEA6 JHUMAN, which also coπesponds to amino acids 1 - 320 of T108S8_PEA_1 JP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWIΉEALASHFQVESGSQPVΛAPVKXFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF WFCFLISHV coπesponding to amino acids 321 - 390 of T10888_PEA_J JP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888JPEA_1 JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWfflEALASHFQVESGSQRRAl^KJvFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF WFCFLISHV in T10888_PEA_1 JP5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T10888JPEA_1 JP6, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLA HNLPQNPJGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQ NDTGFYTLQVIKSDLVNEEATGQFHVY coπesponding to amino acids 1 - 141 of CEA6 JHUMAN, which also coπesponds to amino acids 1 - 141 of T108SSJPEA_1 JP6, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI coπesponding to amino acids 142 - 183 of T10S88JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T10888JPEA_JJP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGINRPGLCLWPSLHYj YQGLDI in T10888_PEA_1_P6. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971JP6, comprising a first amino acid sequence being at least 90 % homologous to
MAPLPJ>LLILALLAWVA AOQESCKGRCTEGr^NΛT)KKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQNGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGroSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVl^GYPKLIRDVWGffiGProAAFTPJNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG coπesponding to amino acids 1 - 276 of VTNC_HUMAN, which also coπesponds to amino acids 1 - 276 of T39971JP6, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TQGWGD coπesponding to amino acids 277 - 2S3 of T39971 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971JP6, comprising a polypeptide being at least
70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence TQGWGD in
T39971_P6. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 JP9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWNALADQESCKGRCTEGFNNDKKCQCDELCSY ^QSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKΝΝATVHEQVGGPSLTSDLQAQSKGΝPEQTPV LKPEEEAPAPEVGASKPEGΓDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKΝGSLFAFR
GQYC ΕLDEKAVPsPG TKLIRDVWGffiGProAAFTRlΝCQGKTYLFKGSQ ΛWRFEDGV LDPDYPRΝ1SDGFDGIPDΝNDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT coπesponding to amino acids 1 - 325 of VTΝC JHUMAN, which also coπesponds to amino acids 1 - 325 of T39971 JP9, and a second amino acid sequence being at least 90 % homologous to
SGMAPRPSLAJKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSPvATWLSLFSSEESNLGA >u\TYDDYRMDWLWATCEPIQSWFFSGDKYYRVNLRTRRVDTVDPPYP PAPGHL coπesponding to amino acids 357 - 478 of VTNCJHUMAN, which also coπesponds to amino acids 326 - 447 of T39971 JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971JP9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 JP1 1, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRΓNCQGKTΎLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE
CEGSSLSAVFEHFAMMQRDSWEDLFELLFWGRTS coπesponding to amino acids 1 - 326 of VTNCJHUMAN, which also coπesponds to amino acids 1 - 326 of T39971JP11, and a second amino acid sequence being at least 90 % homologous to
DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL conesponding to amino acids 442 - 478 of VTNCJHUMAN, which also coπesponds to amino acids 327 - 363 of T39971JP11 , wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971 JP11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 JP1 1, comprising a first amino acid sequence being at least 90 % homologous to MAPLPJ^LLITALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQNGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEl^VRPGYPKLIPJ)N GffiGPmAAFTPJNCQGKTYLFKGSQ\rWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS coπesponding to amino acids 1 - 326 of Q9BSH7, which also coπesponds to amino acids 1 - 326 of T39971 JP1 1, and a second amino acid sequence being at least 90 % homologous to
DKY TIVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL coπesponding to amino acids 442 - 478 of Q9BSH7, which also coπesponds to amino acids 327 - 363 of T39971 JP1 1, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T39971JP11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971JP12, comprising a first amino acid sequence being at least 90 % homologous to
KIAPLRPLLILALLAWVALADQESCKGRCTEGFNNDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGD VFTMPEDEYTVYDDGEEKNNATNHEQVGG PS LTSDLQAQSKGΝPEQTPV LKPEEEAPAPEVGASKPEGmSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKΝGSLFAFR GQYCYELDEKAVRPGYPKLIPvD GffiGPIDAAFTΕIΝCQGKTΥLFK conesponding to amino acids 1 - 223 of VTNC_HUMAN, which also coπesponds to amino acids 1 - 223 of T39971 JP12, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV coπesponding to amino acids 224 - 238 of T39971 JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VPGAVGQGRKHLGRV in T39971 J>12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T39971 JP12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLP^LLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYC\ΕLDEKAVRPGYPKLIRDVWGffiGProAAFTRINCQGKTYLFK coπesponding to amino acids 1 - 223 of Q9BSH7, which also coπesponds to amino acids 1 - 223 of T39971 JP12, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%), more preferably at least 90%> and most preferably at least 95%) homologous to a polypeptide having the sequence VPGAλ^GQGRKHLGRV coπesponding to amino acids 224 - 238 of T39971 JP 12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T39971 JP12, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 %> homologous to the sequence VPGAVGQGRKHLGRV in T39971_P12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368_PEA_J_P2, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTNRSPRFRGMQQERKNRRPNILLVLTDDQDVELGSL QVMNKTRKJMEHGGATFΓNAFVTTPMCCPSRSSMLTGKYV^^ QAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSY PGWL^WLGLIKNSRFYNYTNCR ΝGΠ EKHGFDYAKDYFTDLITΝESIΝYFKMSKPMYPFΓRPVMMVISHAAPHGPEDSA^ FSKL YPΝASQHITPS YΝ YAPΝMDKHWΓMQ YTGPMLPIHMEFTΝILQRKRLQTLMS VDD SVERLYNMLVETGELENTYNYTADHGYFFLGQFGLVKGKSMPYDFDIRVPFFIRGPSVEP GSIVPQIVLNMLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTMKA VERGKFLRKL^ESSIG^QQSNIHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGK LRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQ GTPKYKPRFVHTRQTRSLS VEFEGEIYDINLEEEEELQ VLQPPJ^IAKRHDEGHKGPRDLQ ASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYI DKΕIEALQDKIKNLREVRGHLKPJX^EECSCSKQSYYNL EKGVKKQEKLKSHLHPFKE AAQEVDSKXQLFKENNRRRKKERKEKJPJIQRKGEECSLP coπesponding to amino acids 1 - 761 of SULl JHUMAN, which also coπesponds to amino acids 1 - 761 of Z21368JPEA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI coπesponding to amino acids 762 - 790 of Z21368J?EA_1 JP2, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP2, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI in Z21368_PEA_1_P2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP5, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTΕLLGSLCSTVRSPRFRGRIQQERKNIRPNTILVLTDDQDVEL coπesponding to amino acids 1 - 57 of Q7Z2W2, which also coπesponds to amino acids 1 - 57 of Z21368J?EA_1JP5, second bridging amino acid sequence comprising A, and a third amino acid sequence being at least 90 % homologous to
FFGKYLNEYNGSYIPPGWPJEWLGLKNSRFYNYTVCRNGIl^KHGFDYA ΩYFTDLITN ESlNYFKMSKPJviYPHRPVM IVISHAAPHGPEDSAPQFSKLYPNASQmTTSYNYA^ DKHW QYTGPMLPIIHMEFTNILQPJKH^LQTXMSV
ADHGY GQFGLVKGKSMPYDFDIRWFFIRGPSVEPGSIVPQIVLNLDLAPTILDIAGLDT PPDVDGKSVLKLLDPEKPGNRFRTN -AKIWRDTFLV^
PKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLPJHKCKGPSDLLTVRQSTRNLY ARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFE GEIYDINLEEEEELQVLQPRNIAJsdlHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPT TVRVTHKCFILP TOSfflCEPJELYQSARAWKDHKAYmKEIEALQDIOKNLREVRGHLKR RKPEECSCSKQSYYϊN EKGVKKQEiπ-KSHLHPF^
KJ3KRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNE THNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCN PRPKNLDVGNKDGGSYDLHRGQLWDGWEG conesponding to amino acids 139 - 871 of Q7Z2W2, which also coπesponds to amino acids 59 - 791 of Z2136S_PEA_1JP5, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of Z21368_PEA_1_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least three amino acids comprise LAF having a structure as follows (numbering according to Z21368J?EA_1 JP5): a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 59 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z2136SJPEA_J_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at 5 least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKYSCCALVLAVLGTELLGSLCSTVRSPPJHlGRIQQERKN PNIILVLTDDQDVELAFF GKYLNEYNGSYPPGWP^WLGLKNSRFYNYTVCRNGIKΕ^ INYFKMSKPJvIYPHRPVMMVTSHAAPHGPEDSAPQFSKLYPNASQmTPSYNYAPNMDK HW QYTGPMLPIH ffiFTT^LQPJOlLQTLMSVDDSVERLYNMLVETGELENTYIIYTAJ3 l o HGYΉIGQFGLVKGKSMP YDFD FFΓRGPS VEPGSIVPQΓVLNTDLAPTILDIAGLDTPP DVDGKSVLKLLDPEKPGNRFRTNKKAKTwT TFLVERGK^^ K H3RVKELCQQARYQTACEQPGQKWQCΓEDTSGKLRIHKCKGPSDLLTVRQSTRNLYA RGFHDK KΈCSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGE IYDINLEEEEELQVLQPPNIAXRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTV
15 RVTHKCFILPNDSIHCERELYQS ARAWKDHKA YIDKEIEALQDKIKNLREVRGHLKTIRK PEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNPJIRKKERKE KPRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETH NFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME coπesponding to amino acids 1 - 751 of Z21368_PEA_J_P5, and a second amino acid sequence being at least 90 0 % homolo gous to LRSCQGYKQCNPRPKNLD VGNKDGGS YDLHRGQL WDG WEG coπesponding to amino acids 1 - 40 of AAH12997, which also coπesponds to amino acids 752 - 791 of Z21368_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an 5 isolated polypeptide encoding for a head of Z21368JPEA_1 JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKYSCCALVLAVLGTELLGSLCSTλ^SPPJRGRIQQERKNIRPNlILVLTDDQDVELAFF GKYL ffiYNGSYIPPGWREWLGLIKNSRFYNYTNCRNGIJ ΕKHGFDYAKDYFTDLITNES0 IN KMSKPsMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNY^APNMDK HWIMQYTGPMLPIHMEFTMLQRKRLQTLMSVDDSVERLYNMLλΕTGELENTYHIYTAD HGYHIGQFGLVKGKS IP\TDFDIRVPFFπiGPSVEPGSIWQlNLNIDLAPT DIAGLDTPP DVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTTLVER
KYERVKELCQQARYQTACEQPGQKWQCffiDTSGKLPJHKCKGPSDLLTVRQSTRNLYA RGFFωKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFNHTRQTRSLSVEFEGE H JIΝLEEEEELQNLQPRΝIAKl^HDEGHKGPPJDLQASSGGΝPGPMLAJ3SSΝAVGPPTT^ RVTHKCFILPNDSfflCERELYQSARAWKDHKAYroKEffiALQDK^ PEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEA.AQEVDSKLQLFKENNRRPΩ ER1^ KPJ1QRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETH NFLFCEFATGFLEYFDMNTDPYQLTNTNHTVERGILNQLHVQLME of Z21368_PEA_1_P5. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368 JPE A_1_P5, comprising a first amino acid sequence being at least 90 %> homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERIs-NTRPNIILVLTDDQDVEL coπesponding to amino acids 1 - 57 of SUL1 JHUMAN, which also coπesponds to amino acids 1 - 57 of Z21368JPEA_1 JP5, and a second amino acid sequence being at least 90 % homologous to
AFFGKYLNEYNGSYIPPGWREWLGLIKNSPJFYNYTVCRNGIKEKHGFDYAKDYFTDLIT NESr WKMSKRNIYPFlRPVMMVISHAAPHGPEDSAPQFSKLYPNASQFilTPSYNYAPN MDKJHWIMQYTGPMLPi EFTNILQRKRLQTLM8VDDSVERLYNMLVETGELENTYII YTADHGY GQFGL\^GKSMPYDFDIR FFIT GPSVEPGSIWQlVLNIDLAPTILDIAGL DTPPDVDGKSVLKLLDPEKPGNP RT KI AKIWRDTFLVERGKFLRKI^ESSKNIQQSN HLPKYERVKELCQQARYQTACEQPGQKWQCffiDTSGlSLRIHKCKGPSDLLTNRQSTRN LYARGFHDKDKECSCRESGYR.ASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSλ E FEGEIYDINLEEEEELQVLQPRNIAKΛIIDEGHKGPPJ^LQASSGGNRGRMLADSSNAVGP PTTVRVTHKCFrLPNDSIHCERELYQSAJλAWKDHKAYmKEIEALQDKIKNLREVRGHL KRRKPEECSCSKQSYΛNKEKGVKXQEKLKSHLHPFKEAAQEλωSKLQLF KERJ^KRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRT \/NETHNFLFCEFATGFLE\TDMNTDPYQLTNTVHTVERGrLNQLHVQLMELRSCQG\T QCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG coπesponding to amino acids 138 - 871 of SUL1 JHUMAN, which also coπesponds to amino acids 58 - 791 of Z21368JPEA_ 1 _P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of Z21368_PEA_1JP5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LA, having a structure as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368J?EA_1 JP 15, comprising a first amino acid sequence being at least 90 % homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIFLVLTDDQDVELGSL QVMNKTPKTMEHGG ATFIN AFVTTPMCCPSRSSMLTGKY VHNHNVYTNNENCSSPS W QAMHEPRTFAWXNNTGYRTAFFGK\T.NEYNGSYIPPGWREWLGLIKNSRFYN VCR NGK£KHGFDYAKDYFTDLITNESINYFKMSKRMYPHPJ>VMMVISHAAPHGPEDSAPQ FSKLYPNASQfflTPSYNYAPNMDIOrW QYTGPMLPIHMEFTNILQPJ ^ SVERLYNMLVETGELENTYflYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEP GSIVPQIΛT.NIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNP ^RTNKXAKIWRDTFL VERG coπesponding to amino acids 1 - 416 of SUL1 JHUMAN, which also coπesponds to amino acids 1 - 416 of Z21368JPEA_1_P15. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP16, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSWRSPP FRGRIQQERKNIRPNIILVLTDDQDVELGSL QWTNKTTKXMEHGGATF AFVTTPMCCPSRSSMLTGKYVHNHNN^ TNNENCSSPSW QAIVLHEPRTTAWXNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCR NGIKEKHGFDYAKDYFTDLITNESINYFKMSKJ^ YPHRPVMMVISHAAPHGPEDSAPQ FSKLYPNASQHITPSYNYAPNMDKHW QYTGPMLPIHKLEFTNILQPJ<TLLQTLMSVDD SVERLYNMLVETGELENTYIIYTADHGYHIGQFGL\^KGKSMPYDFDIRVPFFΓRGPSVEP GSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNR coπesponding to amino acids 1 - 397 of SUL1 JHUMAN, which also coπesponds to amino acids 1 - 397 of Z21368J?EA_1 JP16, and a second amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence CVTVPPLSQPQIH coπesponding to amino acids 398 - 410 of Z21368JPEA_1 JP16, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CVTVPPLSQPQIH in Z2136SJPEA_1_P16. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368J?EA_ 1 JP22, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSL QVMNKTRKIMEHGGATFINAPVTTPMCCPSRSSMLTGKYVHNHNNYTNNENCSSPSW QAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCR NGIKEKHGFDYAK coπesponding to amino acids 1 - 188 of SUL1 JHUMAN, which also coπesponds to amino acids 1 - 188 of Z21368JPEA_1JP22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ARYDGDQPRCAPRPRGLSPTVF coπesponding to amino acids 189 - 210 of Z21368JPEA_1 JP22, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z21368_PEA_1JP22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ARYDGDQPRCAPRPRGLSPTVF in Z2136S_PEA_1_P22. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP23, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTΕLLGSLCSTVRSP^RGWQQERiα^PJ'NIILVLTOD QVMNKTPKIMEHGGATT1NAFNTTPMCCPSRSSMLTGK
QAMHEPRTFAWTΝΝTGYRT coπesponding to amino acids 1 - 137 of Q7Z2W2, which also coπesponds to amino acids 1 - 137 of Z21368JPEA_1 JP23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence GLLHRLΝH coπesponding to amino acids 138 - 145 of Z21368JPE A_ 1 P23 , wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRLΝH in Z21368JPEA_1JP23. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z21368JPEA_1 JP23, comprising a first amino acid sequence being at least 90 % homologous to MKYSCCALNLA\T.GTELLGSLCST\H^SPRFRGRIQQERKΝIRPΝIILNLTDDQDVELGSL QV^IΝKTROMEHGGATFIΝAFVTTPMCCPSRSSMLTGKYVFfΝHΝW.λTΝΝEΝCSSPSW QAMHEPRTFAVYLNNTGYRT coπesponding to amino acids 1 - 137 of SUL1_HUMAN, which also coπesponds to amino acids 1 - 137 of Z21368JPEA_1 JP23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GLLHRLNH coπesponding to amino acids 138 - 145 of Z21368JPEA_1_P23, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z21368JPEA_1 JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRLNH in Z21368_PEA_1 JP23. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP5, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 12 - 55 of GILT JHUMAN, which also coπesponds to amino acids 1 - 44 of T59832JP5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence
VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRΛ GKTLFSSRRCGHTP SVPFRFRΓPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSS.AAGQGPPACΓL
AATLDAFIPARAGLACLWDLLGRCPRG coπesponding to amino acids 45 - 189 of T59832JP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP5, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG in T59832_P5. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PL ^NVTLYYEALCGGCRAFLIRELFPTWLLVMEILNNTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTΓVCMEEFEDMERSLPLCLQLYAPGLSPDTΓM
ECAMGDRGMQLMHANAQRTDALQPPHEYNPWVTVNG coπesponding to amino acids 12 - 223 of GELT JHUMAN, which also coπesponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VPJFLALSLTXlVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGΝGPPVΝYKTGΝLYLRGPLKKSΝA PLVΝVTLYYEALCGGCRAFLIRELFPTWLLVMEILΝNTLNPYGΝAQEQΝNSGRWEFKC QHGEEECKEΝKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTNNG coπesponding to amino acids 1 - 212 of BAC98466, which also coπesponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832_P7. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832JP7, and a second amino acid sequence being at least 90 % homologous to MEI NVTL YGNAQEQNVSGRWEFXCQHGEEECK-FNKVEACVLDELDMELAFLT C MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYV PWVTNNGVRIFLALSLTLrVPWSQGWTRQRDQR coπesponding to amino acids 1 - 148 of BAC85622, which also conesponds to amino acids 91 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832JP7, comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832JP7. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA
PLΛ^NVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTΓVCMEEFEDMERSLPLCLQLYAPGLSPDTIM
ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG coπesponding to amino acids 1 - 212 of Q8WU77, which also coπesponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 % homologous to the sequence VRTFLALSLTLΓVPWSQGWTRQRDQR in T59832JP7. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 90 %» homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA
PLVNN^^.YYEALCGGCP FL1TELFPTWLLVMEILNNTLVPYGNAQEQN SGRWEFKC QHGEEECKPNKVEACVLDELDMELAFLTΓVCMEEFEDMERSLPLCLQLYAPGLSPDTΓM
ECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 12 - 21'4 of GfLTJHUMAN, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKXRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNNTLNPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 203 of BAC98466, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKXRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence NPWIORPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN ^KTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLΓRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832_P9, second amino acid sequence being at least 90 % homologous to MEILNVTL VPYGNAQEQNVSGRWEFKCQHGEEECKFNK VE AC VLDELDMELAFLTI VC MEEFEDMERSLPLCLQLYAPGLSPDTΓMECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 113 of BAC85622, which also coπesponds to amino acids 91 -
203 of T59832JP9, and a third amino acid sequence being at least 70%, optionally at least S0%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NPWKΓRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids
204 - 244 of T59832JP9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832JP9, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN TvTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832_P9. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%), optionally at least about S0%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence NPWKXRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTL ΥEALCGGCRAFLFRELFPTWLLVMEILNNTLVPYGNAQEQNVSGRWEFKC QHGEEEC -FNKVEACVLDELDMELAFLΉVCMEEFEDMERSLPLCLQLYAPGLSPDTΓM ECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 203 of
Q8WLI77, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence
NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832_P9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCP^FLπvΕLFPTWLLVMEILNΛ^TLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE coπesponding to amino acids 12 - 141 of GELT JHUMAN, which also conesponds to amino acids 1 - 130 of T59832JP12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTLMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 173 - 261 of GILTJHUMAN, which also coπesponds to amino acids 131 - 219 of T59832JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLDWTAAVQASPLQALDFFGNGPPVN T TGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832JP12, second amino acid sequence being at least 90 % homologous to MEILNVTLNPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE coπesponding to amino acids 1 - 40 of BAC85622, which also coπesponds to amino acids 91 - 130 of T59832JP12, third amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG coπesponding to amino acids 72 - 122 of BAC85622, which also coπesponds to amino acids 131 - 181 of T59832JP12, and a fourth amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 182 - 219 of T59832JP12, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T59832JP12, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to the sequence
MTLSPLLLFLPPLLLLLD7PTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA
PLVNVTL ^ALCGGCRAFLIRELFPTWLLV of T59S32JP12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T59832JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence
KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK in T59832JP12. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN\T TGNLYLRGPLKKSNA PLVNVTLY\ΕALCGGCRAFLE ELFPTWLLVMEILNVTLNPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE conesponding to amino acids 1 - 130 of Q8WU77, which also coπesponds to amino acids 1 - 130 of T59832 JP 12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTΓMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED
QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8WU77, which also coπesponds to amino acids 131 - 219 of T59832JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832J 18, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 12 - 55 of GILT JHUMAN, which also coπesponds to amino acids 1 - 44 of T59832JP18, and a second amino acid sequence being at least 90 %> homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTNNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 173 - 261 of GELTJHUMAN, which also coπesponds to amino acids 45 - 133 of T59832JP18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP1S, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 1 - 44 of Q8WU77, which also coπesponds to amino acids 1 - 44 of T59832JP1S, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8WU77, which also coπesponds to amino acids 45 - 133 of T59832JP18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T59832JP18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 1 - 44 of Q8NEI4, which also coπesponds to amino acids 1 - 44 of T59832 JP 18, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8NEI4, which also conesponds to amino acids 45 - 133 of T59832 JP 18, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMGRP5EJP4, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGPAVPLPAGGGTNLTKMYPRGNIHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEA, RNLLGLEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK coπesponding to amino acids 1 - 127 of GRPJHUMAN, which also coπesponds to amino acids 1 - 127 of HUMGRP5EJP4, and a second amino acid sequence being at least 90 %> homologous to GSQREGRNPQLNQQ coπesponding to amino acids 135 - 148 of GRPJHLTMAN, which also coπesponds to amino acids 128 - 141 of HUMGRP5E JP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMGRP5E JP4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMGRP5EJP5, comprising a first amino acid sequence being at least 90 % homologous to MRGSELPLVLLALVLCLAPRGRA LPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLEEAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK coπesponding to amino acids 1 - 127 of GRP JHUMAN, which also coπesponds to amino acids 1 - 127 of HUMGRP5EJP5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS coπesponding to amino acids 128 - 142 of HUMGRP5EJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMGRP5EJP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5E_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA155578J?EA_1 JP4, comprising a first amino acid sequence being at least 90 % homobgous to
MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGAPCARGSQ PWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNKPLWARVGDDHLLLLQGEQLRRTT RSWΗPK ΗQGSGPΓLPRRTDEHDLMLLKLARP coπesponding to amino acids 1 - 146 of KLKAJHLΠVIAN, which also coπesponds to amino acids 1 - 146 of AA155578J?EA_1 JP4, and a second amino acid sequence being at least 90 % homologous to
YNKGLTCSSITELSPKECEVFYPGWTNNMICAGLDRGQDPCQSDSGGPLVCDETLQGEL SWG WTCGS AQHPA TQICKYMSWINKVIRSN coπesponding to amino acids 184 - 276 of KLKA JHUMAN, which also coπesponds to amino acids 147 - 239 of AA155578JPEA_1 JP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of AA155578JPEAJ: JP4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PY, having a structure as follows: a sequence starting from any of amino acid numbers 146-x to 146; and ending at any of amino acid numbers 147+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA155578J?EA_1 JP6, comprising a first amino acid sequence being at least 90 %> homologous to MPAPHLHLSAASGARALAKLLPLL LAQLW coπesponding to amino acids 1 - 29 of
KLKA JHUMAN, which also coπesponds to amino acids 1 - 29 of AA155578_PEA_1JP6, and a second amino acid sequence being at least 90 % homologous to
VKYNKGLTCSSITILSPKECE YPGWTNNMICAGLDRGQDPCQSDSGGPLVCDETLQ GILSWGVYPCGSAQHPAVYTQICKYMSWINKVIRSN coπesponding to amino acids 182 - 276 of KLKA_HUMAN, which also coπesponds to amino acids 30 - 124 of AA155578 JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of AA155578J?EA_1 JP6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise WN, having a structure as follows: a sequence starting from any of amino acid numbers 29-x to 29; and ending at any of amino acid numbers 30+ ((n-2) - x), in which x varies from 0 to n-2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA155578JPEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGARALAKLLPLLMAQLW coπesponding to amino acids 1 - 29 of KLKA JHUMAN, which also coπesponds to amino acids 1 - 29 of AA155578_PEA_1 JP8, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GHCGLE coπesponding to amino acids 30 - 35 of AA155578JPEA_1 JP8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of AA155578JPEA_1 JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GHCGLE in AA155578_PEA_1 JP8. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA 155578 J?EA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGAPCARGSQ PWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNK coπesponding to amino acids 1 - 90 of KLKA JHUMAN, which also coπesponds to amino acids 1 - 90 of AA155578_PEA_1_P9. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSENA78_P2, comprising a first amino acid sequence being at least 90 % homologous to
MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCVCLQTTQGVHP KJVπSNLQVFAIGPQCSKVE W coπesponding to amino acids 1 - 81 of SZ05 JHUMAN, which also conesponds to amino acids 1 - 81 of HSENA78JP2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T94936JPEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWNQTYEEGLFYAQKS KLKPLMVIHHLEDCQYSQALKXWAQΝEEIQEMAQΝI^IML
VPRIMFVDPSLTVRADIAGRYSΝRLYTYEPRDLPL coπesponding to amino acids 1 - 150 of Q8TD06, which also coπesponds to amino acids 1 - 150 of T94936JPEA_1JP2. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T94936_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to
MMLHSALGLCLLLVTVSSΝLAIAIKKEKRPPQTLSRGWGDDITWVQTYEEGLFYAQKS Kl^LMVIHHLEDCQYSQALKKWAQΝEEIQEMAQΝ PlMLΝLMHETTDKΝLSPDGQY VPRIMFV coπesponding to amino acids 1 - 122 of Q8TD06, which also coπesponds to amino acids 1 - 122 of T94936JPEA_1 JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GMYNISFHQ1YKISRΝQHSCFYF coπesponding to amino acids 123 - 145 of T94936JPEA_1 JP3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T94936JPEA_1J?3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to the sequence GMYNISFHQIYKISRΝQHSCFYF in T94936_PEA_1_P3. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z41644JPEA_1 JP10, comprising a first amino acid sequence being at least 90 %> homologous to
MPJXAAALLLLLLALYTARVDGSKCKCSRKGPKmYSDVKKLEMKPKYPHCEEKM TTKSVSRYRGQEHCLHPKLQSTT PJ^IXWYNAWNEKRR conesponding to amino acids 1 -
95 of SZ14JHUMAN, which also coπesponds to amino acids 1 - 95 of Z41644JPEA_1J?10, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI coπesponding to amino acids 96 - 123 of Z41644JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z41644_PEA_1JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644_PEA_1_P10. According to prefeπed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z41644JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to MRLLAAA LLLLLALYTARVDGSKCKCSRKGPKmYSDVKKLEMKPKYPHCEEKMVII TTKSVSRYRGQEHCLHPKLQSTKPvFKWYNAWNEKRR coπesponding to amino acids 13 - 107 of Q9NS21 , which also coπesponds to amino acids 1 - 95 of Z41644 JPE A_l JP10, and a second amino acid sequence being at least 10%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI coπesponding to amino acids
96 - 123 of Z41644J?EA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z41644JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644J?EA_1_P10. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z41644JPEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MPvLLAAAI.LLLLLALYTARVDGSKCKCSRKGPKmYSDVKJ<XEMl PKY?HCEEKMVII TTKSVSRYRGQEHCLHPKLQSTKrølKWYNAWNEKRR coπesponding to amino acids 13 - 107 of AAQ89265, which also coπesponds to amino acids 1 - 95 of Z41644J?EA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI coπesponding' to amino acids 96 - 123 of Z41644JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefeπed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of Z41644J?EA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644_PEA_1 JP10. According to prefeπed embodiments of the present invention, there is provided an isolated oligonucleotide, comprising an amplicon selected from the group consisting of SEQ ID NOs: 891 or 894. According to prefeπed embodiments of the present invention, there is provided a primer pair, comprising a pair of isolated oligonucleotides capable of amplifying the above. Optionally, the pair of isolated oligonucleotides is selected from the group consisting of: SEQ NOs 889 and 890; or 892 and 893. According to prefeπed embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally, the epitope may comprise a tail, head, or edge portion as described herein. According to prefeπed embodiments of the present invention, the antibody is capable of differentiating between a splice variant having said epitope and a coπesponding known protein. According to prefeπed embodiments of the present invention, there is provided an kit for detecting breast cancer, comprising a kit detecting overexpression of a splice variant as described herein. Optionally, the kit comprises a NAT-based technology. Preferably, the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally, the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally, the kit comprises an antibody as described herein. Preferably, the kit further comprises at least one reagent for performing an ELISA or a Western blot. According to prefeπed embodiments of the present invention, there is provided a method for detecting breast cancer, comprising detecting overexpression of a splice variant as described herein. Optionally detecting overexpression is performed with a NAT-based technology. Preferably, detecting overexpression is performed with an immunoassay. More preferably, the immunoassay comprises an antibody as described herein. According to prefeπed embodiments of the present invention, there is provided a biomarker capable of detecting breast cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof. According to prefeπed embodiments of the present invention, preferably any of the above nucleic acid and or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%, more preferably at least about 90%), most preferably at least about 95% homology thereto. Unless otherwise noted, all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described). All nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably. Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5thEd., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). All of these are hereby incorporated by reference as if fully set forth herein. As used herein, the followmg terms have the meanings ascribed to them unless specified otherwise.
BRIEF DESCRIPTION OF DRAWINGS Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages. Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions. Figure 3 is schematic summary of quantitative real-time PCR analysis. Figure 4 is schematic presentation of the oligonucleotide based microaπay fabrication. Figure 5 is schematic summary of the oligonucleotide based microaπay experimental floλv. Figure 6 is a histogram showing Cancer and cell-line vs. normal tissue expression for Cluster T 10888, demonstrating overexpression in colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma. Figure 7 is a histogram showing expression of the CEA6 JHUMAN
Carcinoembryonic antigen- related cell adhesion molecule 6 (T 10888) transcripts, which are detectable by amplicon as depicted in sequence name T 10888 juncl 1- 17, in normal and cancerous breast tissues. Figure 8 is a histogram showing the expression of CEA6 JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T 10888 transcripts which are detectable by amplicon as depicted in sequence name T10888juncl l-17 in different normal tissues. Figure 9 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster T39971, demonstrating overexpression in liver cancer, lung malignant tumors and pancreas carcinoma. Figure 10 is a histogram showing the expression of of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971 junc23-33 in normal and cancerous breast tissues. Figure 11 is a histogram showing the expression of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), antisense to SARMl (T23434), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33, in different normal tissues. Figure 12 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster Z21368, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Figure 13 is a histogram showing the expression of SULl JHUMAN - Extracellular sulfatase Sulf-lZ21368 transcripts, which are detectable by amplicon as depicted in sequence name Z21368seg39, in normal and cancerous breast tissues. Figure 14 is a histogram showing the expression of SLTL 1 JHUMAN - Extracellular sulfatase Sulf- lZ21368 transcripts, which are detectable by amplicon as depicted in sequence name Z21368seg39, in different normal tissues. Figure 15 is a histogram showing the expression of SLTL 1 JHUMAN - Extracellular sulfatase Sulf- 1 Z21368 transcripts which are detectable by amplicon as depicted in sequence name Z21368junc 17-21 in normal and cancerous breast tissues. Figure 16 is a histogram showing the expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 Z21368 transcripts, which are detectable by amplicon as depicted in sequence name Z21368juncl7-21, in different normal tissues. Figure 17 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster T59832, demonstrating overexpression in brain malignant tumors, breast malignant tumors, ovarian carcinoma and pancreas carcinoma. Figure 18 is a histogram showing low over expression observed for cluster T59832, amplicon name: T59832 junc6-25-26, in one experiment carried out with breast cancer samples panel. Figure 19 is a histogram showing the expression of GRP JHUMAN - gasrrin- releasing peptide (HUMGRP5E) transcripts, which are detectable by amplicon, as depicted insequence name HUMGRP5Ejunc3-7 in normal and cancerous breast tissues. Figure 20 is a histogram showing the expression of GRP JHUMAN - gasrrin- releasing peptide (HUMGRP5E) transcripts, which are detectable by amplicon, as depicted in sequence name HUMGRP5Ejunc3-7, in different normal tissues. Figure 21 is a histogram showing Cancer and cell- line vs. normal tissue expression for
Cluster AA155578, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Figure 22 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSENA78, demonstrating overexpression in epithelial malignant tumors and lung malignant tumors. Figure 23 is a histogram showing the expression of Homo sapiens breast cancer membrane protein 11 (BCMPl 1) T94936 transcripts which are detectable by amplicon as depicted in sequence name T94936 segl4 in nonnal and cancerous Breast tissues. Figure 24 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster Z41644, demonstrating overexpression in lung malignant tumors, breast malignant tumors and pancreas carcinoma. Figure 25 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster M85491, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues. Figure 26 is a histogram showing the expression of Ephrin type-B receptor 2 precursor
(EC 2.7.1.112) (Tyrosine -protein kinase receptor EPH3) M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in normal and cancerous breast tissues. Figure 27 is a histogram showing the expression of Ephrin type-B receptor 2 precursor M85491 transcripts, which are detectable by amplicon as depicted in sequence name M85491 seg24, in different normal tissues. Figure 28 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSSTROL3, demonstrating overexpression in transitional cell carcinoma, epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Figure 29A is a histogram showing the expression of Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) SL-3 HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in normal and cancerous breast tissues. Figure 29B is a histogram showing the expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3) HSSTROL3 transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24, in different normal tissues. Figures 30A-30C shows histograms showing over expression of various Stromelysin-3 precursor transcripts in cancerous breast samples relative to the normal samples. Figure 31 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster R75793, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues. Figure 32 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HUMCAIXIA, demonstrating overexpression in bone malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors. Figure 33 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster R20779, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors. Figure 34 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSS IOOPCB, demonstrating overexpression in a mixture of malignant tumors from different tissues. Figure 35 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSCOC4, demonstrating overexpression in brain malignant tumors, a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer. Figure 36 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HUMTREFAC, demonstrating overexpression in a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer. Figure 37 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster FIUMOSTRO, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues, lung malignant tumors, breast malignant tumors, ovarian carcinoma and skin malignancies. Figure 38 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster RI 1723, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues and kidney malignant tumors. Figure 39 is a histogram showing the expression of of RI 1723 transcripts which are detectable by amplicon as depicted in sequence name RI 1723 segl3 in normal and cancerous breast tissues. Figure 40 is a histogram showing the expression of RI 1723 transcripts, which are detectable by amplicon as depicted in sequence name RI 1723segl3, in different normal tissues. Figures 41A and B are histograms showing the expression of RI 1723 transcripts, which are detectable by amplicon as depicted in sequence name RI 1723 juncl 1- 18 in normal and cancerous breast tissues (Figure 41 A) or on a panel of normal tissues (Figure 41B). Figure 42 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster T46984, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues, breast malignant tumors, ovarian carcinoma and pancreas carcinoma. Figure 43 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSMUCIA, demonstrating overexpression in a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer. Figures 44-47 are histograms showing the combined expression of 8 sequences (T10888segl l-17, HUMGR5E junc3-7, HSSTROL3seg24, T94936 Seg 14, Z21368 seg39, Z21368 juncl7-21 T59832 jun6-25-26 and M85491seg24) in noπnal and cancerous breast tissues. Figure 48 is a histogram showing Cancer and cell- line vs. nomial tissue expression for Cluster HSU33147, demonstrating overexpression in a mixture of malignant tumors from different tissues. DESCRIPTION OF PREFERRED EMBODIMENTS The present invention is of novel markers for breast cancer that are both sensitive and accurate. Furthermore, at least certain of these markers are able to distinguish between different stages of breast cancer, such as 1. Ductal carcinoma (in-situ, invasive) 2. Lobular carcinoma (is- situ, invasive) 3. inflammatory breast cancer 4. Mucinous carcinoma 5. Tubular carcinoma 6. Paget's disease of nipple, alone or in combination; or one of the indicative conditions described above. The markers of the present invention, alone or in combination, can be used for prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment * monitoring of breast cancer. For example, optionally and preferably, these markers may be used for staging breast cancer and/or monitoring the progression of the disease. Furthermore, the markers of the present invention, alone or in combination, can be used for detection of the source of metastasis found in anatomical places other then breast. Also, one or more of the markers may optionally be used in combination with one or more other breast cancer markers (other than those described herein). Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. These markers are specifically released to the bloodstream under conditions of breast cancer (or one of the above indicative conditions), and/or are otherwise expressed at a much higher level and/or specifically expressed in breast cancer tissue or cells, and/or tissue or cells under one of the above indicative conditions. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can coπelate with a probable diagnosis of breast cancer and/or a condition that it is indicative of a higher risk for breast cancer. The present invention therefore also relates to diagnostic assays for breast cancer and/or an indicative condition, and methods of use of such markers for detection of breast cancer and/or an indicative condition, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample. According to a prefeπed embodiment of the present invention, use of the marker optionally and preferably permits a non-cancerous breast disease state to be distinguished from breast cancer and or an indicative condition. A non limiting example of a non-cancerous breast disease state includes breast fibrosis and or cysts. According to another prefeπed embodiment of the present invention, use of the marker optionally and preferably permits an indicative condition to be distinguished from breast cancer. In another embodiment, the present invention relates to bridges, tails, heads and/or insertions, and/or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and/or insertions are described in greater detail below with regard to the Examples. As used herein a "tail" refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (ofien 100%> identical) to a portion of the coπesponding known protein, while at least a second portion of the variant comprises the tail. As used herein a "head" refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100%) identical) to a portion of the coπesponding known protein. As used herein "an edge portion" refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein. An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein. A "bridge" may optionally be an edge portion as described above, but may also include a join between a head and a "known protein" portion of a variant, or a join between a tail and a "known protein" portion of a variant, or a join between an insertion and a "known protein" portion of a variant. Optionally and preferably, a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant, comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant. Also optionally, the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between). It should be noted that a bridge cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below. For example, certain descriptions of the bridges feature the following format: a bridge between two edges (in which a portion of the known protein is not present in the variant) may optionally be described as follows: a bridge portion of CONTIG-NAMEJPl (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at bast two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAMEJPl): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2. In this example, it should also be read as including bridges in which n is any number of amino acids between 10-50 amino acids in length. Furthermore, the bridge polypeptide cannot extend beyond the sequence, so it should be read such that 49-x (for example) is not less than 1, nor 50 + ((n-2) - x) (for example) greater than the total sequence length. In another embodiment, this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a coπesponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below). In another embodiment, this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as s t forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this invention. In another embodiment, this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention. In another embodiment, this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known coπesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction coπelates with the presence of a splice variant in the biological sample. In another embodiment, this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex coπelates with the presence of a splice variant nucleic acid sequence in the biological sample. According to the present invention, the splice variants described herein are non- limiting examples of markers for diagnosing breast cancer and/or an indicative condition. Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of breast cancer and/or an indicative condition, including a transition from an indicative condition to breast cancer. According to optional but prefeπed embodiments of the present invention, any marker according to the present invention may optionally be used alone or combination. Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker. Furthermore, such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi-quantitative measurement of any marker described herein to any other marker described herein, and/or any other known marker, and/or any other marker. With regard to such a ratio between any marker described herein (or a combination thereof) and a known marker, more preferably the known marker comprises the "known protein*' as described in greater detail below with regard to each cluster or gene. According to other prefeπed embodiments of the present invention, a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof, may be featured as a biomarker for detecting breast cancer and/or an indicative condition, such that a biomarker may optionally comprise any of the above. According to still other prefeπed embodiments, the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence coπesponding to a splice variant protein as described herein. Any oligopeptide or peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not Imited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges. The present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides. The present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, coπesponding to a splice variant of the present invention as described above, optionally for any application. Non- limiting examples of methods or assays are described below. The present invention also relates to kits based upon such diagnostic methods or assays.
Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or artificially induced, either randomly or in a targeted fashion. The present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %>, at least 75 %, at least 80 %>, at least 85 %>, at least 95 % or more say 100 % identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or man induced, either randomly or in a targeted fashion. The present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention. In cases where the polynucleotide sequences of the present invention encode previously unidentified polypeptides, the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove. A "nucleic acid fragment" or an "oligonucleotide" or a "polynucleotide" are used herein interchangeably to refer to a polymer of nucleic acids. A polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above). As used herein the phrase "complementary polynucleotide sequence" refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase. Au used herein the phrase "genomic polynucleotide sequence" refers to a sequence derived ' isolated) from a chromosome and thus it represents a contiguous portion of a chromosome. As used herein the phrase "composite polynucleotide sequence" refers to a sequence, which is composed of genomic and cDNA sequences. A composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween. The intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements. Prefeπed embodiments of the present invention encompass oligonucleotide probes. An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Alternatively, an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Oligonucleotides designed according to the teachings of the present invention can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis. Equipment and reagents for executing solid-phase synthesis are commercially available from, for example, Applied Biosystems. Any other means for such synthesis may also be employed; the actual synthesis of the oligonucleotides is well within the capabilities of one skilled in the art and can be accomplished via established methodologies as detailed in, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Cuπent Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Cuπent Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988) and "Oligonucleotide Synthesis" Gait, M. J., ed. (1984) utilizing solid phase chemistry, e.g. cyanoethyl phosphoramidite followed by deprotection, desalting and purification by for example, an automated trityl-on method or HPLC. Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases. Preferably, the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention. The oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodiester linkage. Preferably used oligonucleotides are those modified at one or more of the backbone, internucleoside linkages or bases, as is broadly described hereinunder. Specific examples of prefeπed oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural internucleoside linkages. Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat. NOs: 4,469,863; 4,476,301; 5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466, 677; 5,476,925; 5,519,126; 5,536,821 ; 5,541,306; 5,550,1 11; 5,563,253; 5,571,799; 5,587,361; and 5,625,050. Prefeπed modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms can also be used. Alternatively, modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH component parts, as disclosed in U.S. Pat. Nos. 5,034,506; 5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541 ,307; 5,561,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 5,623, 070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. Other oligonucleotides which can be used according to the present invention, are those modified in both sugar and the internucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target. An example for such an oligonucleotide mimetic, includes peptide nucleic acid (PNA). United States patents that teach the preparation of PNA compounds include, but are not limited to, U.S. Pat. Nos. 5,539,0S2; 5,714,331; and 5,719,262, each of which is herein incorporated by reference. Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat. No: 6,303,374. Oligonucleotides of the present invention may also include base modifications or substitutions. As used herein, "unmodified" or "natural" bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5- halo particularly 5-bromo, 5- trifluoromethyl and other 5- substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8- azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefeπed base substitutions, even more particularly when combined with 2-O-methoxyethyl sugar modifications. Another modification of the oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. Such moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac- glycerol or triethylammonium 1,2-d^O-hexadecyI-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, as disclosed in U.S. Pat. No: 6,303,374. It is not necessary for all positions in a given oligonucleotide molecule to be uniformly modified, and in fact more than one of the aforementioned modifications may be incoφorated in a single compound or even at a single nucleoside within an oligonucleotide. It will be appreciated that oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity. To enable cellular expression of the polynucleotides of the present invention, a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element. As used herein, the phrase "cis acting regulatory element" refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention. Preferably, the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transformed. Examples of cell type-specific and/or tissue- specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron- specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). The nucleic acid construct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom. The nucleic acid construct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication. Preferably, the nucleic acid construct utilized is a shuttle vector, which can propagate both in E. coli (wherein the construct comprises an appropriate selectable marker and origin of replication) and be compatible for propagation in cells, or integration in a gene and a tissue of choice. The construct according to the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a virus or an artificial chromosome. Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1
(+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif., includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5'LTR promoter. Currently prefeπed in vivo nucleic acid transfer techniques include transfection with viral or non- viral constructs, such as adenovirus, lentivirus, Heφes simplex I virus, or adeno- associated virus (AAV) and lipid-based systems. Useful lipids for lipid- mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefeπed constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or retroviruses. A viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus -defining element(s), or other elements mat control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is already present fa the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this piupose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second- strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
Hybridization assays Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non- limiting examples of probes according to the present invention were previously described). Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in-situ hybridization, primer extension. Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Cuπent Opin. Biotechnol. 10:71-75). Other detection methods include kits containing probes on a dipstick setup and the like. Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biological sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long. Thus, the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions. Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dexrrane sulfate, 1 M NaCl, 1 % SDS and 5 x 10^ cpm 32P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 %> SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x 106 cpm 32P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C. More generally, hybridization of short nucleic acids (below 200 bp in length, e.g. 17-40 bp in length) can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency; (i) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 1 - 1.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm; (ii) hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 2 - 2.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm, final wash solution of 6 x SSC, and final wash at 22 °C; (iii) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature. The detection of hybrid duplexes can be carried out by a number of methods. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Such labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art. A label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample. Probes can be labeled according to numerous well known methods. Non- limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Non- limiting examples of detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies. Other detectable markers for use with probes, which can enable an increase in sensitivity of the method of the invention, include biotin and radio- nucleotides. It will become evident to the person of ordinary skill that the choice of a particular label dictates the manner in which it is bound to the probe. For example, oligonucleotides of he present invention can be labeled subsequent to synthesis, by incoφorating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross- linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin- conjugated streptavidin) or the equivalent. Alternatively, when fluorescently- labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. For instance, samples may be hybridized to an iπelevant probe and treated with RNAse A prior to hybridization, to assess false hybridization. Although the present invention is not specifically dependent on the use of a label for the detection of a particular nucleic acid sequence, such a label might be beneficial, by increasing the sensitivity of the detection. Furthermore, it enables automation. Probes can be labeled according to numerous well known methods. As commonly known, radioactive nucleotides can be incoφorated into probes of the invention by several methods. Non- limiting examples of radioactive labels include 3H, l 4C, 32P, and 35S. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the like. Probes of the invention can be constructed of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA.
NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as realtime PCR for example). As used herein, a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions. Amplification of a selected, or target, nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8: 14 Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non- limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, BioTechnology 6:1197-1202; Malek et al, 1994, Methods Mol. Biol, 28:253-260; and Sambrook et al, 1989, supra). The terminology "amplification pair" (or "primer pair") refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction. Other types of amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below. As commonly known in the art, the oligos are designed to bind to a complementary sequence under selected conditions. In one particular embodiment, amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid. In one prefeπed embodiment, RT-PCR is carried out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA. In another prefeπed embodiment, the amplification of the differentially expressed nucleic acids is carried out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences. The nucleic acid (i.e. DNA or RNA) for practicing the present invention may be obtained according to well known methods. Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed. Optionally, the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system. As commonly known in the art, the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al, 1989, Molecular Cloning -A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al, 1989, in Cuπent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre- mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility. Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity. The polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non- limiting examples of these reactions are described in greater detail below). The pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g, melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C. Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Multis et al, is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This technology provides one approach to the problems of low target sequence concentration. PCR can be used to directly increase the concentration of the target to an easily detectable level. This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double -stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter. Because the desired segments of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified." Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refeπed to as "Ligase Amplification Reaction" (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids. In LCR, four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture. Provided that there is complete complementarity at the junction, ligase will covalently link each set of hybridized molecules, importantly, in LCR, two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA. LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990). However, because the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target- independent background signal. The use of LCR for mutant screening is limited to the examination of specific nucleic acid positions. Self-Sustained Synthetic Reaction (3SR/NASBA): The self-sustained sequence replication reaction (3SR) is a transcription-based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest. In a cocktail of enzymes and substrates that includes a second primer, reverse transcriptase, RNase H, RNA polymerase and ribo-and deoxyribonucleoside triphosphates, the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest. The use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g, 200-300 base pairs). Q-Beta (Qβ) Replicase: In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Qβ replicase. A previously identified major problem with false positives resulting from the replication of unhybridized probes has been addressed through use of a sequence-specific ligation step. However, available thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be performed by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere. A successful diagnostic method must be very specific. A straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction. While the 3SR/NASBA, and Qβ systems are all able to generate a large quantity of signal, one or more of the enzymes involved in each cannot be used at high temperature (i.e., > 55 degrees C). Therefore the reaction temperatures cannot be raised to prevent non-specific hybridization of the probes. If probes are shortened in order to make them melt more easily at low temperatures, the likelihood of having more than one perfect match in a complex genome increases. For these reasons, PCR and LCR cuπently dominate the research field in detection technologies. The basis of the amplification procedure in the PCR and LCR is the fact that the products of one cycle become usable templates in all subsequent cycles, consequently doubling the population with each cycle. The final yield of any such doubling system can be expressed as:
(1+X)n =y, where "X" is the mean efficiency (percent copied in each cycle), "n" is the number of cycles, and "y" is the overall efficiency, or yield of the reaction. If every copy of a target DNA is utilized as a template in every cycle of a polymerase chain reaction, then the mean efficiency is 100 %>. If 20 cycles of PCR are performed, then the yield will be 220, or 1,048,576 copies of the starting material. If the reaction conditions reduce the mean efficiency to 85 %, then the yield in those 20 cycles will be only 1.85^0, or 220,513 copies of the starting material. In other words, a PCR running at 85 %> efficiency will yield only 21 % as much final product, compared to a reaction running at 100 %> efficiency. A reaction that is reduced to 50 % mean efficiency will yield less than 1 %> of the possible product. In practice, routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield. At 50 % mean efficiency, it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive. In addition, any background products that amplify with a better mean efficiency than the intended target will become the dominant products. Also, many vanables can influence the mean efficiency of PCR, including target DNA length and secondary structure, primer length and design, primer and dNTP concentrations, and buffer composition, to name but a few. Contamination of the reaction with exogenous DNA (e.g, DNA spilled onto lab surfaces) or cross-contamination is also a major consideration. Reaction conditions must be carefully optimized for each different primer pair and target sequence, and the process can take days, even for an experienced investigator. The laboriousness of this process, including numerous technical considerations and other factors, presents a significant drawback to using PCR in the clinical setting. Indeed, PCR has yet to penetrate the clinical market in a significant way. The same concerns arise with LCR, as LCR must also be optimized to use different oligonucleotide sequences for each target sequence. In addition, both methods require expensive equipment, capable of precise temperature cycling. Many applications of nucleic acid detection technologies, such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences. One method of the detection of allele-specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer. An allele-specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence. This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect. A similar 3'-mismatch strategy is used with greater effect to prevent ligation in the LCR. Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target-independent background ligation products initiating the amplification. Moreover, the combination of PCR with subsequent LCR to identify the nucleotides at individual positions is also a clearly cumbersome proposition for the clinical laboratory. The direct detection method according to various prefeπed embodiments of the present invention may be, for example a cycling probe reaction (CPR) or a branched DNA analysis. When a sufficient amount of a nucleic acid to be detected is available, there are advantages to detecting that sequence directly, instead of making more copies of that target, (e.g, as in PCR and LCR). Most notably, a method that does not amplify the signal exponentially is more amenable to quantitative analysis. Even if the signal is enhanced by attaching multiple dyes to a single oligonucleotide, the coπelation between the final signal intensity and amount of target is direct. Such a system has an additional advantage that the products of the reaction will not themselves promote further reaction, so contamination of lab surfaces by the products is not as much of a concern. Recently devised techniques have sought to eliminate the use of radioactivity and/or improve the sensitivity in automatable formats. Two examples are the "Cycling Probe Reaction" (CPR), and "Branched DNA" (bDNA). Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses a long chimeric oligonucleotide in which a central portion is made of RNA while the two termini are made of DNA. Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process. The signal, in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the oligonucleotide is vulnerable to RNases that may carried through sample preparation. Branched DNA: Branched DNA (bDNA), involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g, alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased. The detection of at least one sequence change according to various prefeπed embodiments of the present invention may be accomplished by, for example restriction fragment length polymoφhism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand
Conformation Polymoφhism (SSCP) analysis or Dideoxy fingeφrinting (ddF). The demand for tests which allow the detection of specific nucleic acid sequences and sequence changes is growing rapidly in clinical diagnostics. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing. A handful of methods have been devised to scan nucleic acid segments for mutations. One option is to determine the entire gene sequence of each test sample (e.g, a bacterial isolate). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g, PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor- intense and expensive to be practical and effective in the clinical setting. In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain- terminating nucleotide analogs. Restriction fragment length polymorphism (RFLP): For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g, restriction fragment length polymoφhism [RFLP] analysis). Single point mutations have been also detected by the creation or destruction of RFLPs. Mutations are detected and localized by the presence and size of the RNA fragments generated by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals which are not suited for use in a clinical laboratory. RFLP analysis suffers from low sensitivity and requires a large amount of sample. When
RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered. Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity, but again, these are few in number. Allele specific oligonucleotide (ASO): If the change is not in a recognition sequence, then allele-specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis- match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations. The method is based on the differences in the melting temperature of short DNA fragme.f))s differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles. The ASO approach applied to PCR products also har- been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test. That is to say, they are inapplicable when one needs to detect the presence of a mutation within a gene or sequence of interest. Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis" (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences because of the coπesponding changes h their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of GC base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the fraction of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature. Modifications of the technique have been developed, using temperature gradients, and the method can be also applied to RNA:RNA duplexes. Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires specialized equipment to prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE). CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of mutations. A technique analogous to DGGE, termed temperature gradient gel electrophoresis
(TGGE), uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperature gradient peφendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel. Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single- Strand Conformation Polymoφhism" (SSCP) was developed by Hayashi, Sekya and colleagues and is based on the observation that single strands of nucleic acid can take on characteristic conformations in non-denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations. The SSCP process involves denaturing a DNA segment (e.g, a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that intra- molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fingeφrinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened. According to a presently prefeπed embodiment of the present invention the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable technique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self-sustained synthetic reaction, Qβ-Replicase, cycling probe reaction, branched DNA, restriction fragment length polymoφhism analysis, mismatch chemical cleavage, heteroduplex analysis, allele-specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy fingeφrinting. Detection may also optionally be performed with a chip or other such device. The nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group. This reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already incoφorated into the nucleic acid, which is now bound to the probes attached to the chip. Since the sequence and position of each probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given probe can be determined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
Amino acid sequences and peptides The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a coπesponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. Polypeptides can be modified, e.g, by the addition of carbohydrate residues to form glycoproteins. The terms "polypeptide," "peptide" and "protein" include glycoproteins, as well as non-glycoproteins. Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Solid phase polypeptide synthesis procedures are well known in the art and further described by John Moπow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd Ed, Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al, (1987) Methods in Enzymol. 153:516- 544, Srudier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511- 514, Takamatsu et al. (1987) EMBO X 6:307-311, Coruzzi et al. (1984) EMBO X 3:1671-1680 and Brogli et al, (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section Viπ, pp 421-463. The present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein. The present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %>, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, optionally and preferably including the following: filtering on (this option filters repetitive or low-complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E Λalue is 10, gap costs are 1 1, 1 (initialization and extension), and number of alignments shown is 50. Finally, the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as deletions, insertions or substitutions of one or more amino acids, either naturally occurring or artificially induced, either randomly or in a targeted fashion. Homology/identity of nucleic acid sequences is preferably determined by using BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and also preferably include having an E value of 10, filtering low complexity sequences and a word size of 1 1. It will be appreciated that peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which may have, for example, modifications rendering the peptides more stable while in a body or more capable of penetrating into cells. Such modifications include, but are not limited to N terminus modification, C terminus modification, peptide bond modification, including, but not limited to, CH2-NH, CH2-S, CH2-S=O, O=C-NH, CH2-O, CH2-CH2, S=C-NH, CH=CH or CF=CH, backbone modifications, and residue modification. Methods for preparing peptidomimetic compounds are well known in the art and are specified. Further details in this respect are provided hereinunder. Peptide bonds (CO-NH-) within the peptide may be substituted, for example, by N methylated bonds N(CH3)-CO-), ester bonds (- C(R)H-C-0-0-C(R)-N-), ketomethylen bonds (-CO-CH2-), α-aza bonds (-NH-N(R)-CO-), wherein R is any alkyl, e.g, methyl, carba bonds (- CH2-NH-), hydroxyethylene bonds (-CH(OH)-CH2-), thioamide bonds (-CS-NH-), olefinic double bonds (-CH=CH-), retro amide bonds (-NH-CO-), peptide derivatives (-N(R)-CH2-CO-), wherein R is the "normal" side chain, naturally presented on the carbon atom. These modifications can occur at any of the bonds along the peptide chain and even at several (2-3) at the same time. Natural aromatic amino acids, Tφ, Tyr and Phe, may be substituted for synthetic non- natural acid such as Phenylglycine, TIC, naphthylelanine (Noi), ring- methylated derivatives of Phe, halogenated derivatives of Phe or o- methyl- Tyr. In addition to the above, the peptides of the present invention may also include one or more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc). As used herein in the specification and in the claims section below the term "amino acid" or "amino acids" is understood to include the 20 naturally occurring amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and ornithine. Furthermore, the term "amino acid" includes both D- and L-amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention.
7 b/e 1
Table 1 Cont. Since Ibe peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble form, the peptides of the present invention preferably include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxyl-containing side chain. The peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide characteristics, cyclic forms of the peptide can also be utilized. The peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques. These methods include exclusive solid phase synthesis well known in the art, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Synthetic peptides can be purified by preparative high performance liquid chromatography and the composition of which can be confirmed via amino acid sequencing. In cases where large amounts of the peptides of the present invention are desired, the peptides of the present invention can be generated using recombinant techniques such as described by Bitter et al, (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:51 1-514, Takamatsu et al. (1987) EMBO X 6:307-311, Coruzzi et al. (1984) EMBO X 3: 1671- 1680 and Brogli et al, (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 and also as described above.
Antibodies "Antibody" refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g, an antigen). The recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad- immunoglobulin variable region genes. Antibodies exist, e.g, as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g. Fab' and F(ab)'2 fragments. The term "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain constant region domains, CHI, CH2 and CH3, but does not include the heavy chain variable region. The functional fragments of antibodies, such as Fab, F(ab')2, and Fv that are capable of binding to macrophages, are described as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds; (4) Fv, defined as a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (5) Single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain and the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of producing polyclonal and monoclonal antibodies as well as fragments thereof are well known in the art (See for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988, incoφorated herein by reference). Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly. These methods are described, for example, by Goldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and references contained therein, which patents are hereby incoφorated by reference in their entirety. See also Porter, R. R. [Biochem. X 73: 119-126 (1959)]. Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody. Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in Inbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variable chains can be linked by an intermolecular disulfide bond or cross- linked by chemicals such as glutaraldehyde. Preferably, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (sFv) are prepared by constructing a structural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The structural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing sFvs are described, for example, by [Whitlow and Filpula, Methods 2: 97-105 (1991); Bird et al. Science 242:423-426 (1988); Pack et al, Bio/Technology 11:1271-77 (1993); and U.S. Pat. No. 4,946,778, which is hereby incoφorated by reference in its entirety. Another form of an antibody fragment is a peptide coding for a single complementarity- determining region (CDR). CDR peptides ("minimal recognition units") can be obtained by constructing genes encoding the CDR of an antibody of interest. Such genes are prepared, for example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Larrick and Fry [Methods, 2: 106- 10 (1991)]. Humanized forms of non-human (e.g, murine) antibodies are chimeric molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies include human immunoglobulins (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non- human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by coπesponding non-human residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions coπespond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al. Nature, 321:522-525 (1986); Riechmann et al. Nature, 332:323- 329 (1988); and Presta, Cuπ. Op. Struct. Biol, 2:593-596 (1992)]. Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized antibody has one or more amino acid residues introduced into it from a source which is non- human. These non-human amino acid residues are often refeπed to as import residues, which are typically taken from an import variable domain. Humanization can be essentially performed following the method of Winter and co-workers [Jones et al. Nature, 321:522-525 (1986); Riechmann et al. Nature 332:323-327 (1988); Verhoeyen et al. Science, 239: 1534- 1536 (1988)], by substituting rodert CDRs or CDR sequences for the coπesponding sequences of a human antibody. Accordingly, such humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the coπesponding sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies. Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, X Mol. Biol, 227:381 (1991); Marks et al, X Mol. Biol, 222:581 (1991)]. The techniques of Cole et al. and Boerner et al. are also available for the preparation of human monoclonal antibodies (Cole et al. Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boerner et al, X Immunol, 147(l):86-95 (1991)]. Similarly, human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g, mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene reaπangement, assembly, and antibody repertoire. This approach is described, for example, in U.S. Pat. Nos. 5,545,807; 5,545,806; 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: Marks et al, Bio/Technology 10,: 779- 783 (1992); Lonberg et al. Nature 368: 856-859 (1994); Morrison, Nature 368 812-13 (1994); Fishwild et al. Nature Biotechnology 14, 845-51 (1996); Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar, Intern. Rev. Immunol. 13, 65-93 (1995). Preferably, the antibody of this aspect of the present invention specifically binds at least one epitope of the polypeptide variants of the present invention. As used herein, the term "epitope" refers to any antigenic determinant on an antigen to which the paratope of an antibody binds. Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three dimensional structural characteristics, as well as specific charge characteristics. Optionally, a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and/or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site. An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present invention in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination. One or more unique sequence portions may optionally combine with one or more other non-contiguous portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope.
Immunoassays In another embodiment of the present invention, an immunoassay can be used to qualitatively or quantitatively deie t and analyze markers in a sample. This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex of the antibody bound to the marker in the sample. To prepare an antibody that specifically binds to a marker, purified protein markers can be used. Antibodies that specifically bind to a protein marker can be prepared using any suitable methods known in the art. After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme- linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g, U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168). Generally, a sample obtained from a subject can be contacted with the antibody that specifically binds the marker. Optionally, the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample. Examples of solid supports include but are not limited to glass or plastic in the form of, e.g, a microtiter plate, a stick, a bead, or a microbead. Antibodies can also be attached to a solid support. After incubating the sample with antibodies, the mixture is washed and the antibody- marker complex formed can be detected. This can be accomplished by incubating the washed mixture with a detection reagent. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like. Usually the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10 °C to 40 °C. The immunoassay can be used to determine a test amount of a marker in a sample from a subject. First, a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will form an antibody- marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above. The amount of an antibody- marker complex can optionally be determined by comparing to a standard. As noted above, the test amount of marker need not be measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal. Preferably used are antibodies which specifically interact with the polypeptides of the present invention and not with wild type proteins or other isoforms thereof, for example. Such antibodies are directed, for example, to the unique sequence portions of the polypeptide variants of the present invention, including but not limited to bridges, heads, tails and insertions described in greater detail below. Prefeπed embodiments of antibodies according to the present invention are described in greater detail with regard to the section entitled "Antibodies". Radio-immunoassay (RIA): In one version, this method involves precipitation of the desired substrate and in the methods detailed hereinbelow, with a specific antibody and j ?5 radiolabelled antibody binding protein (e.g, protein A labeled with f~ ) immobilized on a precipitable carrier such as agarose beads. The number of counts in the precip itated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. Western blot: This method involves separation of a subsfrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g, nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabelled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Immunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fluorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.
Radio-imaging Methods These methods include but are not limited to, positron emission tomography (PET) single photon emission computed tomography (SPECT). Both of these techniques are non- invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, LIS Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby incoφorated by reference as if fully set forth herein. Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, viruses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20- 50 consecutive amino acids derived from the polypeptide sequences of the present invention. Methods of constructing such display libraries are well known in the art. Such methods are described in, for example, Young AC, et al, "The three-dimensional structures of a polysaccharide binding antibody to Cryptococcus neoformans and its complex with a peptide from a phage display library: implications for the identification of peptide mimotopes" J Mol Biol 1997 Dec 12;274(4):622-34; Giebel LB βt al. "Screening of cyclic peptide phage libraries identifies ligands that bind streptavidin with high affinities" Biochemistry 1995 Nov 28;34(47): 15430-5; Davies EL et al, "Selection of specific phage-display antibodies using libraries derived from chicken immunoglobulin genes" J Immunol Methods 1995 Oct 12; 186(1): 125-35; Jones C RT al. "Cuπent trends in molecular recognition and bioseparation" J Chromatogr A 1995 Jul 14;707(l):3-22; Deng SJ et al. "Basis for selection of improved carbohydrate-binding single-chain antibodies from synthetic gene libraries" Proc Natl Acad Sci U S A 1995 May 23;92(1 1):4992-6; and Deng SJ et al. "Selection of antibody single-chain variable fragments with improved carbohydrate binding by phage display" J Biol Chem 1994 Apr l;269(13):9533-8, which are incoφorated herein by reference.
The following sections relate to Candidate Marker Examp les (first section) and to Experimental Data for these Marker Examples (second section).
CANDIDATE MARKER EXAMPLES SECTION This Section relates to Examples of sequences according to the present invention, including illustrative methods of selection thereof. Description of the methodology undertaken to uncover the biomolecular sequences of the present invention Human ESTs and cDNAs were obtained from GenBank versions 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank/release.notes/gbl36.release.notes); NCBI genome assembly of April 2003; RefSeq sequences from June 2003; Genbank version 139 (December 2003); Human Genome from NCBI (Build 34) (from Oct 2003); RefSeq sequences from December 2003; and from LifeSeq library of Incyte Coφ (Wilmington, DE, USA; ESTs only). With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in GenBank, may be found in Boguski et al, Nat Genet. 1993 Aug;4(4):332-3; all of which are hereby incoφorated by reference as if fully set forth herein). Novel splice variants were predicted using the LEADS clustering and assembly system as described in Sorek, R, Ast, G. & Graur, D. Alu-containing exons are alternatively spliced. Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as LTS20040101876 on May 27 2004; all of which are hereby incoφorated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobulins. It then aligns the expressed sequences to die genome taking alternatively splicing into account and clusters overlapping expressed sequences into "clusters" that represent genes or partial genes. These were annotated using the GeneCarta (Compugen, Tel- Aviv, Israel) platform. The GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain structures, known and predicted proteins and detailed homology reports. A brief explanation is provided with regard to the method of selecting the candidates. However, it should noted that this explanation is provided for descriptive puφoses only, and is not intended to be limiting in any way. The potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are over- expressed in tumor tissues, by using databases of expressed sequences. Various parameters related to the information in the EST libraries, determined according to a manual classification process, were used to assist in locating genes and/or splice variants thereof that are over-expressed in cancerous tissues. The detailed description of the selection method is presented in Example 1 below. The cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1.
EXAMPLE 1 Identification of differentially expressed gene products - Algorithm In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes ) an algorithm based on an analysis of frequencies was configured. A specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow. Dry analysis Library annotation - EST libraries are manually classified according to: (i) Tissue origin (ii) Biological source - Examples of frequently used biological sources for construction of EST libraries include cancer cell- lines; normal tissues; cancer tissues; fetal tissues; and others such as normal cell lines and pools of normal cell- lines, cancer cell- lines and combinations thereof. A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above. (iii) Protocol of library construction - various methods are known in the art for library construction including normalized library construction; non-normalized library construction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library construction is not indicated. The following rules were followed: EST libraries originating from identical biological samples are considered as a single library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted. If the percentage of such sequences (as compared to all other sequences) was at least 4 standard deviations, above the average for all libraries being analyzed, this library was tagged as being contaminated and was eliminated from further consideration in the below analysis (see also Sorek, R. & Safer, H.M. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 31, 1067-74 (2003)for further details). Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest were analyzed. Splice variants were identified by using the LEADS software package as described above. EXAMPLE 2 Identification of genes over expressed in cancer. Two different scoring algorithms were developed. Libraries score -candidate sequences which are supported by a number of cancer libraries, are more likely to serve as specific and effective diagnostic markers. The basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries. Library counting: Small libraries (e.g, less than 1000 sequences) were excluded from consideration unless they participate in the cluster. For this reason, the total number of libraries is actually adjusted for each cluster. Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the normal libraries it might indicate actual over-expression. The algorithm - Clone counting: For counting EST clones each library protocol class was given a weight based on our belief of how much the protocol reflects actual expression levels: (i) non-normalized : 1 (ii) normalized : 0.2 (iii) all other classes : 0.1 Clones number score - The total weighted number of EST clones from cancer libraries was compared to the EST clones from normal libraries. To avoid cases where one library contributes to the majority of the score, the contribution of the library that gives most clones for a given cluster was limited to 2 clones. The score was computed as
where: c - weighted number of "cancer" clones in the cluster. C- weighted number of clones in all "cancer" libraries. n - weighted number of "normal" clones in the cluster. N- weighted number of clones in all "normal" libraries. Clones number score significance - Fisher exact test was used to check if EST clones from cancer libraries are significantly over- represented in the cluster as compared to the total number of EST clones from cancer and normal libraries. Two search approaches were used to find either general cancer- specific candidates or tumor specific candidates. • Libraries/sequences originating from tumor tissues are counted as well as libraries originating from cancer cell- lines ("normal'" cell- lines were ignored). • Only libraries/sequences originating from tumor tissues are counted
EXAMPLE 3 Identification of tissue specific genes For detection of tissue specific clusters, tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue". The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones (weighed - as described above) from tissue T in the cluster; and 2. Clones from the tissue T are at least 40 %> from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant.
EXAMPLE 4 Identification of splice variants over expressed in cancer of clusters λvhich are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A "segment" (sometimes refeπed also as "seg" or "node") is defined as the shortest contiguous transcribed region without known splicing inside. Only reliable ESTs were considered for region and segment analysis. An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and (iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB). The set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI), (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to SI group. Fisher Exact Test P-values were used to check if: SI is significantly enriched by cancer EST clones compared to S2; and SI is significantly enriched by cancer EST clones compared to cluster background
(S1+S2+S3). Identification of unique sequence regions and division of the group of transcripts accordingly is illusfrated in Figure 2. Each of these unique sequence regions coπesponds to a segment, also termed herein a "node".
Region 1 : common to all transcripts, thus it is preferably not considered for determining differential expression between variants; Region 2: specific to Transcript 1; Region 3: specific to Transcripts 2+3; Region 4: specific to Transcript 3; Region 5: specific to Transcripts 1 and 2; Region 6: specific to Transcript 1.
EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer A search for EST supported (no mRNA) regions for genes of: (i) known cancer markers (ii) Genes shown to be over-expressed in cancer in published micro-aπay experiments. Reliable EST supported-regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or (iv) 3 libraries. Actual Marker Examples The following examples relate to specific actual marker examples.
EXPERIMENTAL EXAMPLES SECTION This Section relates to Examples describing experiments involving these sequences, and illustrative, non- limiting examples of methods, assays and uses thereof. The materials and experimental procedures are explained first, as all experiments used them as a basis for the work that was performed.
The markers of the present invention were tested -with regard to their expression in various cancerous and non-cancerous tissue samples. A description of the samples used in the panel is provided in Table 1 below. A description of the samples used in the normal tissue panel is provided in Table 2 below. Tests were then performed as described in the "Materials and Experimental Procedures" section below.
Table 1: Tissue samples in testing panel
Table 2: Tissue samples in normal panel:
Materials and Experimental Procedures RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ LISA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS
(Wilmington, DE 19801, USA, http://www.absbioreagents.com) or Ambion (Austin, TX 78744
USA, http://www.ambion.com). Alternatively, RNA was generated from tissue samples using TRI- Reagent (Molecular Research Center), according to Manufacturer's instructions. Tissue and RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 μg) was mixed with 150 ng Random Hexamer primers (Invifrogen) and 500 μM dNTP in a total volume of 15.6 μl. The mixture was incubated for 5 min at 65 °C and then quickly chilled on ice. Thereafter, 5 μl of 5X Superscript!! first strand buffer (Invitrogen), 2.4μl 0.1M DTT and 40 units RNasin (Promega) were added, and the mixture was incubated for 10 min at 25 °C, followed by further incubation at 42 °C for 2 min. Then, 1 μl (200units) of Superscriptll (Invitrogen) was added and the reaction (final volume of 25μl) was incubated for 50 min at 42 °C and then inactivated at 70 °C for 15min. The resulting cDNA was diluted 1:20 in TE buffer (10 mM Tris pH=8, 1 mM EDTA pH=8). Real-Time RT-PCR analysis- cDNA (5μl), prepared as described above, was used as a template in Real- Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) with specific primers and UNG Enzyme (Eurogentech or ABI or Roche). The amplification was effected as follows: 50 °C for 2 min, 95 °C for 10 min, and then 40 cycles of 95 °C for 15sec, followed by 60 °C for 1 min. Detection was performed by using the PE Applied Biosystem SDS 7000. The cycle in which the reactions achieved a threshold level (Ct) of fluorescence was registered and was used to calculate the relative transcript quantity in the RT reactions. The relative quantity was calculated using the equation Q=efficiencyΛ"Ct. The efficiency of the PCR reaction was calculated from a standard curve, created by using serial dilutions of several reverse transcription (RT) reactions. To minimize inherent differences in the RT reaction, the resulting relative quantities were normalized to the geometric mean of the relative quantities of several housekeeping (HSKP) genes. Schematic summary of quantitative real-time PCR analysis is presented in Figure 3. As shown, the x-axis shows the cycle number. The Cj = Threshold Cycle point, which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR product signal is above the background level (passive dye ROX) and still in the Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements). The y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative quantification. The sequences of the housekeeping genes measured in all the examples on breast cancer panel were as follows:
G6PD (GenBank Accession No. NM_000402) G6PD Forward primer: gaggccgtcaccaagaacat G6PD Reverse primer: ggacagccggtcagagctc G6PD-amplicon: gaggccgtcaccaagaacattcacgagtcctgcatgagccagataggctggaaccgcatcatcgtggagaagcccttcgggagggacct gcagagctctgaccggctgtcc
SDHA (GenBank Accession No. NM_004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon :
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT AGTGGATCATGAATTTGATGCAGTGGTGG
PBGD (GenBank Accession No. BC019323),
PBGD Forward primer: TGAGAGTGATTCGCGTGGG
PBGD Reverse primer: CCAGGGTACGAGGCTTTCAAT
PBGD- amplicon:
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGGAC
AGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194), HPRT1 Forward primer: TGACACTGGCAAAACAATGCA HPRT1 Reverse primer: GGTCCTTTTCACCAGCAAGCT HPR I -amplicon:
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAA
AGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
The sequences of the housekeeping genes measured in all the examples on normal tissue samples panel were as follows:
RPL19 (GenBank Accession No. NM_000981), RPL 19 Forward primer: TGGC AAG AAGA AGGTCTGGTTAG
RPL19 Reverse primer: TGATCAGCCCATCTTTGATGAG
RPL 19 -amplicon:
TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATGCCA
ACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA TATA box (GenBank Accession No. NM_003194),
TATA box Forward primer : CGGTTTGCTGCGGTAATCAT
TATA box Reverse primer: TTTCTTGCTGCCAGTCTGGAC
TATA box -amplicon:
CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATTTT CAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAGACTG
GCAGCAAGAAA UBC (GenBank Accession No. BC000449)
UBC Forward primer: ATTTGGGTCGCGGTTCTTG
UBC Reverse primer: TGCCTTGACATTCTCGATGGT UBC -amplicon:
ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGAT
CTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG
TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA
SDHA (GenBank Accession No. NM_004168) SDHA Forward primer: TGGGAACAAGAGGGCATCTG
SDHA Reverse primer: CCACCACTGCATCAAATTCATG SDHA-amplicon :
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT
AGTGGATCATGAATTTGATGCAGTGGTGG Oligonucleotide-based micro-array experiment protocol-
Microarray fabrication Microaπays (chips) were printed by pin deposition using the MicroGrid II MGfl 600 robot from BioRobotics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel- Aviv, EL) as described by A. Shoshan et al, "Optical technologies and informatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). The designed oligonucleotides were synthesized and purified by desalting with the Sigma- Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino- modified linker at the 5 ' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham Bioscience, Piscataway, NJ, US). The 50-mer oligonucleotides, forming the target sequences, were first suspended in Ultra-pure DDW (Cat # 01-866-1 A Kibbutz Beit-Haemek, Israel) to a concentration of 50μM. Before printing the slides, the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21°C. Each slide contained a total of 9792 features in 32 subaπays. Of these features, 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate. An additional 28S features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel. Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are commercially available in the Array Control product (Array control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06).
Post-coupling processing of printed slides After the spotting of the oligonucleotides to the glass (CodeLink) slides, the slides were incubated for 24 hours in a sealed saturated NaCl humidification chamber (relative humidity 70-
75%). Slides were treated for blocking of the residual reactive groups by incubating them in blocking solution at 50°C for 15 minutes (lOml/slide of buffer containing 0.1M Tris, 50mM ethanolamine, 0.1% SDS). The slides were then rinsed twice with Ultra-pure DDW (double distilled water). The slides were then washed with wash solution (lOml/slide. 4X SSC, 0.1% SDS)) at 50°C for 30 minutes on the shaker. The slides were then rinsed twice with Ultra-pure DDW, followed by drying by centrifugation for 3 minutes at 800 rpm. Next, in order to assist in automatic operation of the hybridization protocol, the slides were treated with Nentana Discovery hybridization station barcode adhesives. The printed slides were loaded on a Bio-Optica (Milan, Italy) hematology staining device and were incubated for 10 minutes in 50ml of 3-Aminopropyl Triethoxysilane (Sigma A3648 lot
#122K589). Excess fluid was dried and slides were then incubated for three hours in 20 mm Hg in a dark vacuum desiccator (Pelco 2251, Ted Pella, Inc. Redding CA).
The following protocol was then followed with the Genisphere 900-RP (random primer), with mini elute columns on the Ventana Discovery HybStation™, to perform the microaπay experiments. Briefly, the protocol was performed as described with regard to the instructions and information provided with the device itself. The protocol included cDΝA synthesis and labeling. cDΝA concentration was measured with the TBS-380 (Turner Biosystems. Sunnyvale, CA.) PicoFlour, which is used with the OliGreen ssDΝA Quantitation reagent and kit.
Hybridization was performed with the Ventana Hybridization device, according to the provided protocols (Discovery Hybridization Station Tuscon AZ). The slides were then scanned with GenePix 4000B dual laser scanner from Axon Instruments Inc, and analyzed by GenePix Pro 5.0 software. Schematic summary of the oligonucleotide based microaπay fabrication and the experimental flow is presented in Figures 4 and 5. Briefly, as shown in Figure 4, DΝA oligonucleotides at 25uM were deposited (printed) onto Amersham 'CodeLink' glass slides generating a well defined 'spot'. These slides are covered with a long-chain, hydrophilic polymer chemistry that creates an active 3-D surface that covalently binds the DΝA oligonucleotides 5 '-end via the C6-amine modification. This binding ensures that the full length of the DNA oligonucleotides is available for hybridization to the cDNA and also allows lower background, high sensitivity and reproducibility. Figure 5 shows a schematic method for performing the microarray experiments. It should be noted that stages on the left-hand or right-hand side may optionally be performed in any order, including in parallel, until stage 4 (hybridization). Briefly, on the left-hand side, the target oligonucleotides are being spotted on a glass microscope slide (although optionally other materials could be used) to form a spotted slide (stage 1). On the right hand side, control sample RNA and cancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), to form labeled probes. It should be noted that the control and cancer samples come from coπesponding tissues (for example, normal prostate tissue and cancerous prostate tissue). Furthermore, the tissue from which the RNA was taken is indicated below in the specific examples of data for particular clusters, with regard to overexpression of an oligonucleotide from a "chip" (microaπay), as for example "prostate" for chips in which prostate cancerous tissue and normal tissue were tested as described above. In stage 3, the probes are mixed. In stage 4, hybridization is performed to form a processed slide. In stage 5, the slide is washed and scanned to form an image file, followed by data analysis in stage 6.
DESCRIPTION FOR CLUSTER T10888 Cluster T10888 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
These sequences are variants of the known protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor (SwissProt accession identifier CEA6JHUMAN; known also according to the synonyms Normal cross-reacting antigen; Nonspecific crossreacting antigen; CD66c antigen), SEQ ID NO: 13, refeπed to herein as the previously known protein. The sequence for protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor is given at the end of the application, as "Carcinoembryonic antigen-related cell adhesion molecule 6 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid imitations for Known Protein
Protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor localization is believed to be Attached to the membrane by a GPI-anchor. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; cell-cell signaling, which are annotation(s) related to Biological Process; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T10888 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 6 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 6 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster T10888 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen- related cell adhesion molecule 6 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T10888_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T10888_PEA_1_T1. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T10888_PEA_1_P2 and CEA6_HUMAN: l .An isolated chimeric polypeptide encoding for T10888JPEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCRLHWWKΕVLLTASLLTFWNPPTTAKLTffiSTPFNVAEGKEVLLLAJrINLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGrøTIYPNASLLIQNNTQNDTG FYTLQVIKSDLVNEEATGQFHV TELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPP QLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITNNNSGS YMCQAHNSATGLNRTTVTMITVS coπesponding to amino acids 1 - 319 of CEA6_HUMAN, which also coπesponds to amino acids 1 - 319 of T10888_PEA_1JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP coπesponding to amino acids 320 - 324 of T1088SJPEA_1_P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10S88_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DWTRP in T10888_PEA_1_P2. Ϊ 01
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- embrane region.. Variant protein T1088S_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T10888_PEA_1_P2 is encoded by the following transcript(s): T10888_PEA_1_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA_1_T1 is shown in bold; this coding portion starts at position 151 and ends at position 1 122. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). 20; Table 8 - Nucleic acid SNPs
Variant protein T10888_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T4. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T108S8JPEAJJP4 and CEA6_HUMAN: l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCPJ.ITΛ WKΕVLLTASLLTTW u PTTAKLTffiSTPF ^AEGKΕVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHNYPELPKPSISSNNSNP\ΕDI^AVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNNL coπesponding to amino acids 1 - 234 of CEA6_HUMAΝ, which also coπesponds to amino acids 1 - 234 of T10888_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL coπesponding to amino acids 235 - 256 of T10888_PEA_1JP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T10888_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888_PEA_1_P4. Comparison report between T10888_PEA_1_P4 and Q13774 (SEQ ID NO:829): l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to
MGPPSAPPCPJ,HNPWI EVLLTASLLTFWNPPTTAKLTffiSTPFNNAEGKEVLLLAHNLP QNRIGYSWYKGERNDGNSLJNGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL \VWVNGQSLPVSPPJLQLSNGNMTLTLLSV1 . NDAGSYECEIQNPASANRSDPVTLNVL coπesponding to amino acids 1 - 234 of Q137 /4, which also coπesponds to amino acids 1 - 234 of T10888_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL coπesponding to amino acids 235 - 256 of T10S8δ_PEA_l_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10S88_PEA_1 JP4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein T10888_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein T10888_PEA_1_P4 is encoded by the following transcript(s): T10888_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA_1_T4 is shown in bold; this coding portion starts at position 151 and ends at position 918. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T108SSJPEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T10888_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T5. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10S88_PEA_1_P5 and CEA6_HUMAN: l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCI<LH WKE ,LTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYNIGTQQATPGPAYSGRETIYPNASLLIQNNTQNDTG FYTLQVIKSDLVNEE ATGQFHNYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLS\^KRNDAGSYECEIQNPASANRSDPVTLNVLY GPDWTISPSKANYRPGENLNLSCHAASNPPAQYSWFΓNGTFQQSTQELFIPNIT\ NNSGS YMCQAHNSATGLNPTTVTMITVSG coπesponding to amino acids 1 - 320 of CEA6_HUMAN, which also coπesponds to amino acids 1 - 320 of T10888_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWIHEALASHFQVESGSQRJ ARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF WFCFLISHV coπesponding to amino acids 321 - 390 of T10888_PEA_1_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably ai 3east about 95% homologous to the sequence KWIHEALASHFQVESGSQI^ARKKPSFPTCVQGAHA PKFSPEPSQFTSADSFPLNFLFF WFCFLISHV in T10888_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a trans- membrane region downstream of this signal peptide.. Variant protein T10888JPEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein T10888_PEA_1_P5 is encoded by the following transcript(s): T10888_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA_1_T5 is shown in bold; this coding portion starts at position 151 and ends at position 1320. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T10888_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T6. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. Comparison report between T10888_PEA_1_P6 and CEA6_HUMAN: l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCP I PWKEVLLTASLLTFWT^PTTAKLTiESTPFNNAEGKEVLLLA HΝLPQΝWGYSWYKGERVDGΝSLINGYVIGTQQATPGPAYSGRETIYPΝASLLIQΝNTQ NDTGFYTLQVIKSDLVNEEATGQFFJNY coπesponding to amino acids 1 - 141 of CEA6_HUMAΝ, which also coπesponds to amino acids 1 - 141 of T10888_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI coπesponding to amino acids 142 - 183 of T10888_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLF YGrNRPGLCLWPSLHYILYQGLDI in T10888_PEA_1_P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaf-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T10888_PEA_1_P6 also has the following non-silent SΝPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T10888_PEA_1_P6 is encoded by the following transcript(s): T10888_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T10888_PEA_1_T6 is shown in bold; this coding portion starts at position 151 and ends at position 699. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster T 10888 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T1088S_PEA_l_node_l l according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T108S8_PEA_1_T1 and T10888_PEA_1_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_12 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T10888_PEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_17 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T1088S_PEA_1_T1 and T10888_PEA_1_T4. Table 17 below describes the starting and ending position of this segment on each transcript. 21- Table 17 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_4 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in.the following transcript(s): T10888_PEA_1 T1, T10888_PEA_1_T4, T10888_PEA_1_T5 and T1088δ_PEA_l_T6. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_6 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1, T10888_PEA_1_T4, T10888_PEA_1_T5 and T10888_PEA_1_T6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T108S8_PEA_l_node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10S88_PEA_1_T6. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T10888_PEA_l_node_9 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888JPEA_1_T1 , T10888JPEA_1_T4 and T10888_PEA_1_T5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster T10888_PEA_l_node_15 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T4. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/tM4EgaoKvm/vuztUrlRc7 :CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P2 x CEA6_HUMA Alignment segment 1/1:
Quality: 3163.00 Escore : 0 Matching length: 319 Total length: 319 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Al ignment :
1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPF VAEGKE 50
1 MGPPSAPPCRLHVPWKEλ/LLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVOGNSLIVGYVIGTQQATPGPAYSGRET 100 51 VLLLAH LPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 'J 00
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
101 IYPNASLLIQNVTQNDTGFYTLQVIKSD VNEEATGQFHVYPELPKPSIS 150 151 SN SNPVEDKDAVAFTCEPEVQNTTYLW λMGQSLPVSPRLQLSNG MTL 200
151 SN SNPVEDKDAVAFTCEPEVQNTTYL VNGQSLPVSPRLQLSNGNMTL 200 201 T LSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKA YR 250
201 TL SVKR DAGSYECEIQNPASA RSDPVTL VLYGPDVPTISPSKA YR 250 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300
251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300
301 AHNSATG NRTTVTMITVS 319
301 AHNSATGLNRTTVTMITVS 319
Sequence name: /tmp/Yjllgj 7TCe/PgdufzLOlW : CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P4 x CEA6_HUMAN
Alignment segment l/l: Quality: 2310.00 Escore : 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 i 1 1 1 1 1 1 E i 1 i 1 1 1 ] 1 1 1 1 1 1 1 1 1 1 1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 . . . . . 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 I I 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 I I 1 1. 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYL WVNGQSLPVSPRLQLSNGNMTL 200
151 SNNSNPVEDKDAVAFTCEPEVQNTTYL VNGQSLPVSPRLQLSNGNMTL 200
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 Sequence name: /tmp/Yj llgj7TCe/PgdufzL01 :Q13774
Sequence documentation:
Alignment of: T10888_PEA_1_P4 x Q13774
Alignment segment l/l:
Quality: 2310.00 Escore : 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50
1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLW VNGQSLPVSPRLQLSNGNMTL 200
151 SNNSNPVEDKDAVAFTCEPEVQNTTYL VNGQSLPVSPRLQLSNGNMTL 200 . . . 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
Sequence name: /tmp/x5xDBacdpj /rTXRGepv3y : CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P5 x CEA6_HUMAN
Alignment segment l/l:
Quality: 3172.00 Escore: 0 Matching length: 320 Total length: 320 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50
_--l VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLW VNGQSLPVSPRLQLSNGNMTL 200
151 SNNSNPVEDKDAVAFTCEPEVQNTTYL VNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250
251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 1111111 II 111111 II I II 1111 II 111111111111111111111111 251 PGENLNLSCHAASNPPAQYS FINGTFQQSTQELFIPNITVNNSGSYMCQ 300
301 AHNSATGLNRTTVTMITVSG 320 IIIIMIIMIIIMMIII 301 AHNSATGLNRTTVTMITVSG 320
Sequence name: /tmp/VAhvYFeatq/QNEM573uCo : CEA6_HUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P6 x CEA6_HUMAN Alignment segment 1/1:
Quality: 1393.00 Escore : 0 Matching length: 143 Total length: 143 Matching Percent Similarity: 99.30 Matching Percent Identity: 99.30 Total Percent Similarity: 99.30 Total Percent Identity: 99.30 Gaps : 0
Alignment : . . . . . 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNNTQNDTGFYTLQVIKSDLVNEEATGQFHVYRE 143
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPE 143
Alignment of: T10888_PEA_1_P6 x CEA6_HUMAN
Alignment segment l/l: Quality: 101.00 Escore : 0 Matching length: 141 Total length: 183 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 77.05 Total Percent Identity: 77.05 Gaps : 1
Alignment:
1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 1 1 1 i 1 1 1 1 1 1 1 1 E 1 1 1 1 1 E 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 f 1 1 1 1 1 1 MGPPSAPPCRLHVPWKEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 . . . . . 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYREYFHMTSG 150
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVY 141 151 C GSVLLPTYGIVRPGLCL PSLHYILYQGLDI 183
141 141
Expression of CE A6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 (T10888) transcripts which are detectable by amplicon as depicted in sequence name T10888 juncl l- 17 in normal and cancerous Breast tissues Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to juncl 1- 17, TlOSSSjuncl 1-17 amplicon(s) and TlOSSSjuncl 1- 17F and T10888juncl 1- 17R primers was measured by real time PCR. In parallel the expression of four housekeeping genes - PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67 Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 7 is a histogram showing over expression of the above- indicated CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Eπor bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5 fold over-expression, out of the total number of samples tested, is indicated in the bottom. As is evident from Figure 7, the expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 56-60, 63-67 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 19 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEA6_HUMAN
Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 2.00E-03. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of S.44E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: TlOSSSjuncl 1-17F forward primer; and TlOSSSjuncl 1-17R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T10S88juncl 1- 17. T10888juncl l- 17F (SEQ ID NO:830) CCAGCAATCCACACAAGAGCT T10888juncl l-17R (SEQ ID NO:831) CAGGGTCTGGTCCAATCAGAG TlOSSSjuncl 1- 17 (SEQ ID NO:S32) CCAGCAATCCACACAAGAGCTCTTTATCCCCAACATCACTGTGAATAATAGC
GGATCCTATATGTGCCAAGCCCATAACTCAGCCACTGGCCTCAATAGGACCACAGT CACGATGATCACAGTCTCTGATTGGACCAGACCCTG
Expression of CEA6_HUMAN Carcinoembryonic antigen- related cell adhesion molecule 6T 10888 transcripts which are detectable by amplicon as depicted in sequence name TlOSSSjuncl 1- 17 in different normal tissues. Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to T10888 juncl l- 17 amplicon(s) and T10888 juncl l- 17F and TlOSSS juncl 1- 17R was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL 19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 1S-20, Table 2 "Tissue samples in normal panel" above), to obtain a value of relative expression of each sample relative to median of the ovary samples. Primers and amplicon are as above. The results are presented in Figure 8, demonstrating the expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl l- 17, in different normal tissues.
DESCRIPTION FOR CLUSTER T39971 Cluster T39971 features 4 transcript(s) and 28 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Vitronectin precursor (SwissProt accession identifier VTNC_HUMAN; known also according to the synonyms Serum spreading factor; S-protein; V75), SEQ ID NO: 50, refeπed to herein as the previously known protein. Protein Vitronectin precursor is known or believed to have the following function(s): Vitronectin is a cell adhesion and spreading factor found in serum and tissues. Vitronectin interacts with glycosaminoglycans and proteoglycans. Is recognized by certain members of the integrin family and serves as a cell- to- subsfrate adhesion molecule. Inhibitor of the membrane- damaging effect of the terminal cytolytic complement pathway. The sequence for protein Vitronectin precursor is given at the end of the application, as "Vitronectin precursor amino acid sequence" (SEQ ID NO:50). Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Vitronectin precursor localization is believed to be Extracellular. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, melanoma. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Alphavbeta3 integrin antagonist; Apoptosis agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: immune response; cell adhesion, which are annotation(s) related to Biological Process; protein binding; heparin binding, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink >.
Cluster T39971 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 9 refer to weighted expression of ESTs in each categoiy, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 9 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: liver cancer, lung malignant tumors and pancreas carcinoma.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster T39971 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Vitronectin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T39971_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971_T5. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971_P6 and VTNCJHUMAN: l .An isolated chimeric polypeptide encoding for T39971JP6, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCS YYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQ ^WRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG coπesponding to amino acids 1 - 276 of VTNCJHUMAN, which also coπesponds to amino acids 1 - 276 of T39971 JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TQGWGD coπesponding to amino acids 277 - 283 of T39971 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TQGWGD in T39971JP6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T39971 JP6 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T39971 JP6 is encoded by the following transcript(s): T39971 JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 _T5 is shown in bold; this coding portion starts at position 756 and ends at position 1604. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein T39971 JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971JT10. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971 JP9 and VTNC_HUMAN: l.An isolated chimeric polypeptide encoding for T39971 JP9, comprising a first amino acid sequence being at least 90 % homologous to MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAP APEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDV GIEGPIDAAFTRTNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGJPDNNDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDΓFELLFWGRT coπesponding to amino acids 1 - 325 of VTNCJHUMAN, which also coπesponds to amino acids 1 - 325 of T39971JP9, and a second amino acid sequence being at least 90 % homologous to SGMAPPJ'SLAKXQRFRHP^π^GYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGA NNYDDYRMDWLWATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGC PAPGHL coπesponding to amino acids 357 - 478 of VTNCJHUMAN, which also coπesponds to amino acids 326 - 447 of T39971 JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 JP9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TS, having a structure as follows: a sequence starting from any of amino acid numbers 325-x to 325; and ending at any of amino acid numbers 326 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaLpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that tins protein has a trans -membrane region.. Variant protein T39971 JP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 9 - Amino acid mutations
Variant protein T39971JP9 is encoded by the following transcript(s): T39971 JT10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971JT10 is shown in bold; this coding portion starts at position 756 and ends at position 2096. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs >35
Variant protein T39971 JP11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by rranscript(s) T39971JT12. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971JP1 1 and VTNCJHUMAN: l.An isolated chimeric polypeptide encoding for T39971 JP11, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAP APEVGASKPEGΓDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYC\ΈLDEKAVPJ3GYPKLIRDVWGIEGPIDAAFTRΓNCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEE
CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS coπesponding to amino acids 1 - 326 of VTNCJHUMAN, which also coπesponds to amino acids 1 - 326 of T39971 JP11, and a second amino acid sequence being at least 90 % homologous to
DKYYRVNLRTRRVD DPPYPRSIAQYWLGCPAPGHL coπesponding to amino acids 442 - 478 of VTNCJHUMAN, which also coπesponds to amino acids 327 - 363 of T39971 JP11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 JP1 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2. Comparison report between T39971 J?l 1 and Q9BSH7 (SEQ ID NO:833): l.An isolated chimeric polypeptide encoding for T39971 JP11, comprising a first amino acid sequence being at least 90 % homologous to MAPLP^LLILALLAWVALADQESCKGRCTEGFKVDKKCQCDELCSYYQSCCTDYTAEC KPQVTPGD TMPEDEYTVYODGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAP APEVGASKPEGroSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEI AVPJ'GYPKLIPJDVWGffiGProAAFTRINCQGKTYLFKGSQYWRFEDGV LDPDYPRNISDGFDGIPDNNDAALALPAHSYSGRERW FKGKQYWEYQFQHQPSQEE CEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS conesponding to amino acids 1 - 326 of Q9BSH7, which also coπesponds to amino acids 1 - 326 of T39971 JP11, and a second amino acid sequence being at least 90 % homologous to
DKYYRVΝLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL coπesponding to amino acids 442 - 478 of Q9BSH7, which also coπesponds to amino acids 327 - 363 of T39971 JP11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T39971 JP11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SD, having a structure as follows: a sequence starting from any of amino acid numbers 326-x to 326; and ending at any of amino acid numbers 327 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T39971 JP11 also has the following non-silent SΝPs (Single Nucleotide
Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
Variant protein T39971JP1 1 is encoded by the following transcript(s): T39971JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971 T12 is shown in bold; this coding portion starts at position 756 and ends at position 1844. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP1 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T39971 JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T39971 T16. An alignment is given to the known protein (Vitronectin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T39971 JP12 and VTNC_HUMAN: l .An isolated chimeric polypeptide encoding for T39971 JP12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGFNNDKKCQCDELCSYYQSCCTDYTAEC KPQNTRGD TMPEDEYTVYDDGEEKΝΝATVHEQVGGPSLTSDLQAQSKGΝPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKΝGSLFAFR GQYCYELDEI \VPJ)GYPKLIPDV GffiGProAAFTRIΝCQGKTYLFK coπesponding to amino acids 1 - 223 of VTNCJHUMAN, which also coπesponds to amino acids 1 - 223 of T39971 JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV coπesponding to amino acids 224 - 238 of T39971JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T39971 JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971JP12.
Comparison report between T39971_P12 and Q9BSH7: l .An isolated chimeric polypeptide encoding for T39971_P12, comprising a first amino acid sequence being at least 90 % homologous to
MAPLRPLLILALLAWVALADQESCKGRCTEGF KKCQCDELCSYYQSCCTDYTAEC KPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPV LKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFR GQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK coπesponding to amino acids 1 - 223 of Q9BSH7, which also coπesponds to amino acids 1 - 223 of T39971 JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV coπesponding to amino acids 224 - 238 of T39971JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T39971 JP12, comprising a polypepti t being at least 70%, optionally at least about 80%, preferably at least about 85%, more prefeiably at least about 90% and most preferably at least about 95% homologous to the sequence VPGAVGQGRKHLGRV in T39971JP12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans -membrane region.. Variant protein T39971 JP12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T39971 JP12 is encoded by the following transcript(s): T39971JT16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T39971JT16 is shown in bold; this coding portion starts at position 756 and ends at position 1469. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T39971 JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster T39971 features 28 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T39971_node_0 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T39971_node_18 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT16. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster T39971_node_21 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following rranscript(s): T39971JT10, T39971 JT12 and T39971JT5. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T39971_node_22 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT5. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T39971_node_23 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12 and T39971 _T5. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster T39971_node_31 according to the present invention is supported by 94 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 JT10 and T39971 _T5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T39971_node_33 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 JT10, T39971JT12 and T39971JT5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T39971_node_7 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T39971_node_l according to the present invention can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster T39971_node_10 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971 JT16 and T39971JT5. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T39971_node_l 1 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971 JT16 and T39971JT5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster T39971_node_12 according to the present invention can be found in the following franscript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T39971_node_15 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 JT10, T39971JT12, T39971JT16 and T39971JT5. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster T39971_node_16 according to the present invention can be found in the following franscript(s): T39971 JT10, T39971 JT12, T39971JT16 and T39971JT5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment locaf' on on transcripts
Segment cluster T39971_node_17 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster T39971_node_26 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971_T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster T39971_node_27 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT5. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster T39971_node_28 according to the present invention can be found in the following transcript(s): T39971JT10 and T39971JT5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster T39971_node_29 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 JT10 and T39971JT5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster T39971_node_3 according to the present invention is supported by 78 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971 JT12, T39971JT16 and T39971 JT5. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster T39971_node_30 according to the present invention can be found in the following transcript(s): T39971JT10 and T39971JT5. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster T39971_node_34 according to the present invention can be found in the following transcript(s): T39971JT10, T39971JT12 and T39971JT5. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster T39971_node_35 according to the present invention can be found in the following transcripts): T39971JT10, T39971JT12 and T39971JT5. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster T39971_node_36 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971 T10, T39971 JT12 and T39971JT5. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster T39971_node_4 according to the present invention can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster T39971_node_5 according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster T39971_node_8 according to the present invention can be found in the following transcript(s): T39971JT10, T39971 JT12, T39971JT16 and T39971JT5. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster T3997 l_node_9 according to the present invention can be found in the following franscript(s): T39971JT10, T39971JT12, T39971JT16 and T39971JT5. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/xkraCL20cZ/43L7YcPH7x: VTNC_HUMAN
Sequence documentation:
Alignment of: T39971_P6 x VTNC_HUMAN
Alignment segment 1/1:
Quality: 2774.00 Escore: 0 Matching length: 278 Total length: 278 Matching Percent Similarity: 99.64 Matching Percent Identity: 99.64 Total Percent Similarity: 99.64 Total Percent Identity: 99.64 Gaps : 0
Alignment :
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 1 1 1 1 1 1 1 1 1 i ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 MAPLRPLLILALLA VALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 1 1 1 1 1 1 1 I i 1 1 1 1 M 1 1 1 I I 1 1 1 M 1 1 i 1 1 1 1 1 M I ! 1 1 M 1 1 1 1 1 1 1 1 1 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 I I I 1 1 1 M 1 M 1 ! 1 1 1 1 1 1 M I M 1 1 1 1 I I I M 1 1 1 M I M 1 1 M 1 M M 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDV 200
!01 GIEGPIDAAFTRINCQGKTYLFKGSQY RFEDGVLDPDYPRNISDGFDGI 250
201 GIEGPIDAAFTRINCQGKTYLFKGSQY RFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGTQ 278
151 PDNVDAALALPAHSYSGRERVYFFKGKQ 278 Sequence name: /tmp/X4DeeuSlB4/yMubSR5FPs : VTNC_HUMAN
Sequence documentation:
Alignment of: T39971_P9 x VTNCJHUMAN
Alignment segment 1/1:
Quality: 4430.00 Escore: 0 Matching length: 447 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 93.51 Total Percent Identity: 93.51 Gaps : 1
Alignment :
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 1 MAPLRPLLILALLA VALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDV 200
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDV 200
201 GIEGPIDAAFTRINCQGKTYLFKGSQY RFEDGVLDPDYPRNISDGFDGI 250
201 GIEGPIDAAFTRINCQGKTYLFKGSQY RFEDGVLDPDYPRNISDGFDGI 250
251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300
251 PDNVDAALALPAHSYSGRERVYFFKGKQY EYQFQHQPSQEECEGSSLSA 300 . . . . .
301 VFEHFAMMQRDSWEDIFELLF GRT 325
301 VFEHFAMMQRDSWEDIFELLF GRTSAGTRQPQFISRD HGVPGQVDAAM 350
326 SGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 369
351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400
370 WLSLFSSEESNLGANNYDDYRMD LVPATCEPIQSVFFFSGDKYYRVNLR 419
401 LSLFSSEESNLGANNYDDYRMD LVPATCEPIQSVFFFSGDKYYRVNLR 450
420 TRRVDTVDPPYPRSIAQY LGCPAPGHL 447
451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478 357
Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZw:VTNCJHUMAN
Sequence documentation:
Alignment of: T39971_P11 x VTNC_HUMAN
Alignment segment l/l:
Quality: 3576.00 Escore : 0 Matching length: 363 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps : 1
Al ignment : . . . . . 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300
PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300
VFEHFAMMQRDSWEDIFELLFWGRTS 326 11111111111111 II 1111111111 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350
326
AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400 DKYYRVNLR 335
WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450
TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478
Sequence name: /tmp/jvplVtnxNy/wxNSeFVZZw:Q9BSH7
Sequence documentation:
Alignment of: T39971JP11 x Q9BSH7
Alignment segment 1/1:
Quality: 3576.00 Escore : 0 Matching length: 363 Total length: 478 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps: 1
Alignment :
1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 5 . . . . . 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
10 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250
' 201 GIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGI 250 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 0 251 PDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSA 300 301 VFEHFAMMQRDSWEDIFELLFWGRTS 326
301 VFEHFAMMQRDSWEDIFELLFWGRTSAGTRQPQFISRDWHGVPGQVDAAM 350 5 326 326
351 AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAM 400 0 327 DKYYRVNLR 335 MINIMI 401 WLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLR 450
336 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 363 I M 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 I I 1 1 1 1 1 1 1 1 451 TRRVDTVDPPYPRSIAQYWLGCPAPGHL 478
Sequence name: /tmp/fgebv7ir4i/4δbTBMziJO :VTNC_HUMAN
Sequence documentation:
Alignment of: T39971_P12 x VTNC_HUMAN
Alignment segment l/l: Quality: 2237.00
Escore : 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSC 50 MMMMMMIMMMMIMMMMMMMMMMMMMM 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNVDIKCQCDELCSYYQSC 50 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 MMMMMIMMMMMIMMMIMMMMMMIMIIMM 51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTS 100 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 MMMMMMIMMMMMMMIMMMMMMMMMMM 101 DLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 151 AEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
2_01 GIEGPIDAAFTRINCQGKTYLFK 223
201 GIEGPIDAAFTRINCQGKTYLFK 223
Sequence name: /tmp/fgebv7ir4i/48bTBMziJ0 :Q9BSH7
Sequence documentation:
Alignment of: T39971_P12 x Q9BSH7 Alignment segment 1/1:
Quality: 2237.00
Escore : 0 Matching length: 223 Total length: 223 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFNNDKKCQCDELCSYYQSC 50 I I I I I 1 1 I I 1 1 M 1 I I 1 1 1 1 1 M 1 1 I I M I M 1 1 1 1 1 I I 1 1 1 1 1 1 I I 1 1 1 1 MAPLRPLLILALLAWVALADQESCKGRCTEGFΝVDKKCQCDELCSYYQSC 50
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKΝΝATVHEQVGGPSLTS 100
51 CTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKΝΝATVHEQVGGPSLTS 100
101 DLQAQSKGΝPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150 M 1 1 1 1 1 1 1 1 1 I I M I M 1 1 1 I I 1 1 M I 1 1 1 1 1 1 1 I I I I 1 1 I I 1 1 1 1 1 1 1 101 DLQAQSKGΝPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPP 150
151 AEEELCSGKPFDAFTDLKΝGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200 I I 1 1 M 1 1 1 M 1 1 I I t M I M 1 1 1 1 1 1 M 1 1 M 1 1 M 1 1 M 1 1 1 1 I I I I I 151 AEEELCSGKPFDAFTDLKΝGSLFAFRGQYCYELDEKAVRPGYPKLIRDVW 200
201 GIEGPIDAAFTRINCQGKTYLFK 223 ! 01 GIEGPIDAAFTRINCQGKTYLFK 223
Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971 junc23-33 in normal and cancerous breast tissues
Expression of VTNC_HUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts detectable by or according to junc23-33, T39971 junc23-33 amplicon and T39971 junc23-33F and T39971 junc23-33R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold differetial expression for each sample relative to median of the normal PM samples. Figure 10 is a histogram showing down regulation of the above- indicated VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts in cancerous breast samples relative to the normal samples. As is evident from Figure 10, the expression of VTNCJTUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts detectable by the above amplicon in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 56-60, 63-67 Table 1, "Tissue samples in testing panel"). Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T39971 junc23-33F forward primer; and T39971 junc23-33R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T39971 junc23- 33.
T39971junc22-33F (SEQ ID NO:834): GGGGCAGAACCTCTGACAAG T39971junc22-33R (SEQ ID NO:835): GGGCAGCCCAGCCAGTA T39971junc22-33 amplicon (SEQ ID NO:836): GGGGC AGAACCTCTGACAAGTACTACCGAGTCAATCTTCGCACACGGCGAGTGGAC ACTGTGGACCCTCCCTACCCACGCTCCATCGCTCAGTACTGGCTGGGCTGCCC
Expression of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), antisense to SARMl (T23434), T39971 transcripts which are detectable by amplicon as depicted in sequence name T39971junc23-33 in different normal tissues
Expression of VTTSIC JHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), transcripts detectable by or according to T39971junc23-33 amplicon(s) and T39971junc23-33F and T3997 Ijunc23-33R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35, Table 2, "Tissue samples in normal panel" above), to obtain a value of relative expression of each sample relative to median of the breast samples. Primers and amplicon are as above. The results are presented in Figure 11, demonstrating the expression of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein), antisense to SARMl (T23434), T39971 transcripts, which are detectable by amplicon as depicted in sequence name T39971junc23-33, in different normal tissues.
Expression of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) T39971 transcripts which are detectable by amplicon as depicted in sequence name T39971 seg22 in normal and cancerous breast tissues Expression of VTNCJHUMAN vitronectin (serum spreading factor, somatomedin B, complement S-protein) transcripts detectable by or according to seg22, T39971 seg22 amplicon(s) and primers T39971 seg22F and T39971 seg22R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1 : Tissue samples in testing panel, above), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was carried out no differential expression in the cancerous samples relative to the normal PM samples was observed. However, this may be due to a problem that is specific to this particular experiment.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a nonlimiting illustrative example only of a suitable primer pair: T39971 seg22F forward primer; and T39971 seg22R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T39971 seg22.
Forward primer T39971 seg22F (SEQ ID NO :837): GCAGTCTTGGATTCCTTTCACATT Reverse primer T39971 seg22R (SEQ ID NO :838): GAGGCTGTTGAAGTTAGGATCTCC Amplicon T39971 seg22 (SEQ ID NO :839): GCAGTCTTGGATTCCTTTCACATTTCACTGGGGACAGGCCTCAGCATGTGCCCACCC CTGACCCCCACCTCATGCTGGGAGATCCTAACTTCAACAGCCTC
DESCRIPTION FOR CLUSTER Z21368 Cluster Z21368 features 7 transcript(s) and 34 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Extracellular sulfatase Sulf- 1 precursor (SwissProt accession identifier SULl JHUMAN; known also according to the synonyms EC 3.1.6.-; HSulf-1), SEQ ID NO: 96, refeπed to herein as the previously known protein. Protein Extracellular sulfatase Sulf- 1 precursor is known or believed to have the following function(s): Exhibits arylsulfatase activity and highly specific endoglucosamine-6-sulfatase activity. It can remove sulfate from the C-6 position of glucosamine within specific subregions of intact heparin. Diminishes HSPG (heparan sulfate proteoglycans) sulfation, inhibits signaling by heparin-dependent growth factors, diminishes proliferation, and facilitates apoptosis in response to exogenous stimulation. The sequence for protein Extracellular sulfatase Sulf- 1 precursor is given at the end of the application, as "Extracellular sulfatase Sulf-1 precursor amino acid sequence" (SEQ ID NO:96). Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Extracellular sulfatase Sulf- 1 precursor localization is believed to be Endoplasmic reticulum and Golgi stack; also localized on the cell surface (By similarity). The following GO Annotation(s) apply to the previously known protein. The following annotatιon(s) were found: apoptosis; metabolism; heparan sulfate proteoglycan metabolism, which are annotation(s) related to Biological Process; arylsulfatase; hydrolase, which are annotation(s) related to Molecular Function; and extracellular space; endoplasmic reticulum; Golgi apparatus, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster Z21368 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 12 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 12 and Table 5. This cluster is overexpressed (at least at a minimum level) in the 171 following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster Z21368 features 7 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Extracellular sulfatase Sulf- 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z21368JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z21368JPEA_1_T5. An alignment is given to the known protein (Extracellular sulfatase Sulf-1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z21368JPEA_1 _P2 and SULl JHUMAN: l.An isolated chimeric polypeptide encoding for Z21368JPEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALλ^AVLGTELLGSLCSTVRSPIlFRGWQQERKNiRPNTILVLTDDQDVELGSL QV rNKTPKIMEHGGATTINAFVTTPMCCPSRSSM
QAMHEPRTTAVYLNNTGYRTAFFGKYLNEYNGSYTPPGWT^WLGLIIvNSRP ' rYTNCR
ΝGffiΕKΗGFDYAKDY^TDLIΗffiSIΝYFKMSKP^
FSKL ΥΝASQfflTPSYmrAPΝMDKIT MQYTGPMLPIHMEFTO^
SVERLYNMLVETGELENTYΠYTADHG ΉIGQFGLVKGKSMPYDFDIRVPFFLRGPSVEP GSIVPQIVLNIDL APTILDI AGLDTPPDVDGKS VLKXLDPEKPGNPJ RTNKKAKX VRDTFL VERGKPLRIΫ ΕESSKMQQSNHLPKYΕRVKELCQQARYQTACEQPGQKWQCIEDTSGK LRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQ GTPKYΕPRFΛTTTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAXPJRTOEGHKGPRDLQ ASSGGNRGRMLAI SSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARA\VIA)HKAYI DKΕEALQD KNLREVRGHLIGIRI^EECSCSKQSYYNKEKGVKKQEKLKSHLHPFKE AAQEVDSKLQLFKΕN IRRKKERKEKJIRQRKGEECSLPGLTCFTHDNNH QTAPFWN coπesponding to amino acids 1 - 761 of SUL 1 JHUMAN, which also coπesponds to amino acids 1 - 761 of Z21368JPEA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
PHKYSAHGRTRHFESATRTTNGAQKLSRI coπesponding to amino acids 762 - 790 of Z21368JPEA_1 JP2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Z21368JPEA_1JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI in Z21368_PEA_1 JP2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protem localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein Z21368JPEA_1 JP2 is encoded by the following franscript(s): Z21368JPEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z2136SJPEA_1_T5 is shown in bold; this coding portion starts at position 529 and ends at position 2898. Variant protein Z21368JPEA_1 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z21368JPEA_1_T9. An alignment is given to the known protein (Extracellular sulfatase Sulf-1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z21368JPEA_1 JP5 and Q7Z2W2 (SEQ ID NO:840): l.An isolated chimeric polypeptide encoding for Z21368JPEA_1 JP5, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTWSPRFRGRIQQERICNIPJ>NIILVLTDDQDVEL coπesponding to amino acids 1 - 57 of Q7Z2W2, which also coπesponds to amino acids 1 - 57 of Z21368JPEA_1JP5, second bridging amino acid sequence comprising A, and a third amino acid sequence being at least 90 % homologous to
FFGKYLNΕ ^GSYIPPGWPVEWLGLIKNSRFYNYTVCRNGIL^ ΗGFDYAKDYFTOLITN ESINYFKMSK IYPITRPV^IMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNM DIMWIMQYTGPMLPIITMEFTNILQRKRLQTLMSVDDSΛΗ
ADHGY GQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSINPQIVLNTDLAPTILDIAGLDT PPDVDGKSVLKLLDPEKPG FT^RTOKJ AKIWRDTFLVERGKFLR^ PK\ΕRVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLY ARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVΈFE GEIYDINLEEEEELQ\T.QPRNIAKI^HDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPT TVRVTHKCFILPNDSMCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKR RKPEECSCSKQSλΥNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLF KEKRRQRKGEECSLPGLTCFTΗDN TNQTAPFWM.GSFCACTSSNNNTYWCLRTVNE TΗNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCN PP PK.NLDVGNKDGGSYDLHRGQLWDGWΕG coπesponding to amino acids 139 - 871 of Q7Z2W2, which also coπesponds to amino acids 59 - 791 of Z21368JPEA_1JP5, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of Z21368JPEA_1 JP5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least three amino acids comprise LAP having a structure as follows (numbering according to Z21368JPEA_1 JP5): a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 59 + ((n-2) - x), in which x varies from 0 to n-2. Comparison report between Z21368_PEA_1JP5 and AAH12997 (SEQ ID NO:841): l.An isolated chimeric polypeptide encoding for Z21368_PEA_1 JP5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELAFF GKYLNEYNGSYIPPGWTEWLGLIKNSRPYNYTNCRNGI EKHGFDYAΙ^ INYFKMSIGIMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDK HWIMQYTGPMLPIHMEFTNILQRKTILQTLMSΛ DSVERLYNMLVETGELENTYIIYTAD HGYHIGQFGLVKGKSMPYDFDIRWFFIRGPSVEPGSIWQIVLNIDLAPTILDIAGLDTPP DVDGKSVLKLLDPEKPGNPFRT^JKKAIΑWL^TFLVERGKFLRKKEESS
KYERVKELCQQARYQTACEQPGQKWQCFFIDTSGKLRIHKCKGPSDLLTVRQSTRNLYA RGFHDKΌKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGE IYDINLEEEEELQVLQPPJ^IAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTV
RVTHKCFILPNDSmCErøLYQSARAWΕ Ff^ PEECSCSKQSYYNKEKGVKKQE-U SHLHPFKEAAQE ^
KRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETH NFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGTLNQLHVQLME coπesponding to amino acids 1 - 751 of Z21368_PEA_1_P5, and a second amino acid sequence being at least 90 % homologous to LRSCQG\TKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG coπesponding to amino acids 1 - 40 of AAH 12997, which also coπesponds to amino acids 752 - 791 of Z21368JPEA_1JP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z21368JPEA_1 JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MKYSCCALVLAN LGTELLGSLCSTVRSPP^RGRIQQERK^IRPNITLVLTDDQDVELAFF GK NE ^GSY PGWP^WLGLIKNSRFYNY TVCRNGIKEKHGFDYAKDYFTDLITNES INYFKMSK-RMYPHPJ>VMMVISFLAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDK HW QYTGPMLPIHMEFTMLQPJO .QTLMSVDDSVERLYN1 ILVETGELENTΥIIYTAD HG YHIGQFGLVKGKSMPYDFDIRVPFFIRGPS VEPGSIVPQrVLNIDLAPTILDIAGLDTPP DVDGKSVLKLLDPEKPGNT FRTNKKAIO\VRDTFLVERGKFLRlsLKIE KYERVKELCQQARYQTACEQPGQKWQCIEDTSGlOiUHKCKGPSDLLTVRQSTRNLYA RGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGE IYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTV RVTHKCFILPNDSmCErøLYQSAIlAWKDHKAYIDKEffiALQDKIXNLREVRG PEECSCSKQSYYNKEKGVKKQEKLKSHLITPFKΕAAQEVDSKLQLFK^^ PJ QRKGEECSLPGLTCFTHDNNH QTAPF\\^NLGSFCACTSSNNNT 'WCLRTVNETH NFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME of Z21368_PEA_1JP5. Comparison report between Z21368JPEA_1JP5 and SULl JHUMAN: l.An isolated chimeric polypeptide encoding for Z21368JPEA_1JP5, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVEL coπesponding to amino acids 1 - 57 of SULl JHUMAN, which also coπesponds to amino acids 1 - 57 of Z21368JPEA_1 JP5, and a second amino acid sequence being at least 90 % homologous to APFGK\ NE ^GSYIPIHτWREWLGLIKNSRFY 7TVCRNGIKEKHGFDYAiα)YFTO NESINYFKMSKRM TlffiPVMMVISFϊAAPHGPEDSAPQFSKLYPNASQfflTPSYNYAPN
MDKH IMQYTGPMLPH MEFTNILQPJ U.QTXMSVDDSVERLYNMLVETGELENTYΠ YTADHGYFFLGQFGLVKGKSMPYDFDLT FFIRGPSVEPGSINPQIVLNTDLAPTILDIAGL DTPPDVDGKSVLKLLDPEKPGNPFRTNKKAKTWTTDTIT.VE^^
HLPKYERVKELCQQARYQTACEQPGQKWQCFFIDTSGKLRIHKCKGPSDLLTVRQSTRN LYARGFHDKDKECSCRESG ΈASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVE FEGEIYDINLEEEEELQVLQPRNIAKRHODEGHKGPRDLQASSGGNRGRMLADSSNAVGP
PTTVRVTHKCFlLPNDSfflCERELYQSAI^Wi )HKAYIDKEIEALQDKIKNLPvE KPJIKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKE AAQEVDSKLQLFKENNRRPK KΕRKEKRRQRKGEECSLPGLTCFTHDNNHN'QTAPFWNLGSFCACTSSNNNTYWCLRT VNETFINFLFCEFATGFLEYFDMNTDPYQLTN'rNHTVERGILNQLHVQLMELRSCQGYK QCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG coπesponding to amino acids 138 - 871 of SUL1JHUMAN, which also coπesponds to amino acids 58 - 791 of Z21368JPEA_1JP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Z21368JPEA_1JP5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LA, having a structure as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein Z21368JPEA_1 JP5 is encoded by the following transcript(s):
Z21368JPEA_1 JT9, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript Z21368JPEA_1JT9 is shown in bold; this coding portion starts at position 556 and ends at position 2928. Variant protein Z21368_PEA_1JP15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z21368JPEA_1_T23. An alignment is given to the known protein (Extracellular sulfatase Sulf- 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z21368JPEA_1 JP15 and SULl JHUMAN: l .An isolated chimeric polypeptide encoding for Z21368JPEA_1 JP15, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSL QVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSW QAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYffPGWPEWLGLIKNSRFYNYTVCR NGIKEKHGFDYAKDYFTDLITNES YFKMSKRAIWHRPVMMVISHAAPHGPEDSAPQ FSKLYPNASQHITPSY APN LDl WIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDD SVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEP GSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLK LDPEKPGNRFRTNKKAKIWPDTFL VERG coπesponding to amino acids 1 - 416 of SULl JHUMAN, which also coπesponds to amino acids 1 - 416 of Z21368_PEA_1 JP15. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- embrane region. Variant protein Z2136SJPEA_1 JP15 is encoded by the following franscript(s): Z21368 JPE A_1_T23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z2136SJPEA_1_T23 is shown in bold; this coding portion starts at position 691 and ends at position 1938. Variant protein Z21368JPEA_1 JP16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z21368JPEA_1 _T24. An alignment is given to the known protein (Extracellular sulfatase Sulf- 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z21368_PEA_1 JP16 and SULl JHUMAN: l .An isolated chimeric polypeptide encoding for Z21368JPEA_1JP16, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSL QVMNKTΕI MEHGGATFINAFVTTPMCCPSRSSMLTGKYV1LNHNVYTNNENCSSPSW QAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCR NGIKΕKHGFDYAKDYFTDLITNESINYFKMSKPJvIΛTHP 'VMMVISHAAPHGPEDSAPQ FSKLYPNASQHITPS\/NYAPN^1DKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDD SVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEP GSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNP coπesponding to amino acids 1 - 397 of SULl JHUMAN, which also coπesponds to amino acids 1 - 397 of Z21368JPEA_1 JP16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CVTVPPLSQPQIH coπesponding to amino acids 398 - 410 of Z21368JPEA_1 JP16, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z21368JPEA_1JP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CVIVPPLSQPQIH in Z21368_PEA_1JP16. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein Z21368JPEA_1 JP16 is encoded by the following transcript(s):
Z21368JPEA_1 JT24, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z21368JPEA_1_T24 is shown in bold; this coding portion starts at position 691 and ends at position 1920. Variant protein Z21368J?EA_1JP22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
Z21368JPEA_1 JT10. An alignment is given to the known protein (Extracellular sulfatase Sulf- 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z21368JPEA_1JP22 and SULl JHUMAN: l.An isolated chimeric polypeptide encoding for Z21368JPEA_1JP22, comprising a first amino acid sequence being at least 90 % homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSL QVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGK NHNH /YTNNENCSSPSW QAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCR NGEKEKHGFDYAK coπesponding to amino acids 1 - 188 of SULl JHUMAN, which also coπesponds to amino acids 1 - 188 of Z21368JPEA_1JP22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ARYDGDQPRCAPRPRGLSPTVF coπesponding to amino acids 189 - 210 of Z21368JPEA_1 JP22, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z21368JPEA_1JP22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ARYDGDQPRCAPRPRGLSPTVF in Z21368JPEA_1 JP22.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein Z21368JPEA_1 JP22 is encoded by the following transcript(s):
Z21368JPEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z21368JPEA_1_T10 is shown in bold; this coding portion starts at position 691 and ends at position 1320.
Variant protein Z21368JPEA_1 JP23 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z21368JPEA_1_T1 1. An alignment is given to the known protein (Extracellular sulfatase Sulf- 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z21368 JPE A_l JP23 and Q7Z2W2: l .An isolated chimeric polypeptide encoding for Z21368JPEA_1 JP23, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNTIRPNIILVLTDDQDVELGSL
QVMNKTTUUMEHGGATFINAFV^
QAMHEPRTFAVYLNNTGYRT coπesponding to amino acids 1 - 137 of Q7Z2W2, which also coπesponds to amino acids 1 - 137 of Z21368JPEA_1JP23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95?/o homologous to a polypeptide having the sequence GLLHRLNH coπesponding to amino acids 138 - 145 of Z2136SJPEA_1JP23, wherein said first and second amino acid sequences art* contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z21368JPEA_1JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85?^ more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRLNH in Z21368JPEA_1_P23.
Comparison report between Z21368JPEA_1_P23 and SULl JHUMAN: l.An isolated chimeric polypeptide encoding for Z21368JPEA_1JP23, comprising a first amino acid sequence being at least 90 % homologous to
MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSL QVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSW QAMHEPRTFAVYLNNTGYRT coπesponding to amino acids 1 - 137 of SULl JHUMAN, which also coπesponds to amino acids 1 - 137 of Z21368JPEA_1JP23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GLLHRLNH coπesponding to amino acids 138 - 145 of Z21368JPEA_1JP23, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z21368JPEA_1 JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GLLHRLNH in Z21368_PEA_1 JP23.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans- membrane region. Variant protein Z21368JPEA_1JP23 is encoded by the following transcript(s): Z21368JPEA_1 _T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z21368JPEA_1JT11 is shown in bold; this coding portion starts at position 691 and ends at position 1 125. As noted above, cluster Z21368 features 34 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z21368JPEA_l_node_0 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368J?EA_1_T9. Table 7 below describes the starting and ending position of this segment on each transcript. Table 7 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_15 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1 _T11, Z21368JPEA_1_T23, Z21368JPEA_1_T24, Z21368 JPE A_1_T5, Z21368_PEA_1_T6 and Z21368JPEA_1JT9. Table 8 below describes the starting and ending position of this segment on each franscript. Table 8 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_19 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z2136SJPEA_1JT10, Z21368J?EA_1_T11, Z21368JPEA_1_T23, Z21368_PEA_1_T24, Z21368_PEA_1_T5 and Z21368JPEA_1_T6. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_2 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z2136SJ?EA_1_T11, Z21368JPEA_1 JT23, Z21368JPEA_1_T24, Z21368 JPE A_1_T5 and Z21368JPEA_1_T6. Table 10 below describes the starting and ending position of this segment on each franscript. Table 10 - Segment location on transcripts ? $ 5
Segment cluster Z21368 JPE A_l_node_21 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1 JT23, Z21368_PEA_1_T24, Z21368JPEA_1_T5, Z21368 EA_1_T6 and Z21368JPEA_1_T9. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_33 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1 JT10, Z21368JPEA_1_T11, Z21368JPEA_1_T23, Z21368JPEA_1_T24, Z2136SJPEA_1JT5, Z21368JPEA_1_T6 and Z21368JPEA_1_T9. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster Z21368 JPE A_l_node_36 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T1 1, Z2136SJPEA_1_T23, Z21368J?EA_1_T24, Z21368_PEA_1_T5, Z21368JPEA_1_T6 and Z21368JPEA_1_T9. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_37 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T24. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on tr-anscripts
Segment cluster Z21368JPEA_l_node_39 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): Z21368J?EA_1 _T23 and Z21368JPEA_1_T24. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z21368 JPE A_l_node_4 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368J?EA_1_T10, Z21368JPEA_1 JT11, Z21368JPEA_1_T23 and Z21368JPEA_1JT24. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_41 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z2136SJPEA_1 _T5, Z2136SJPEA_1_T6 and Z21368JPEA_1 _T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_43 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368_PEA_1_T10, Z21368JPEA_1JT11, Z21368JPEA_1_T5, Z21368J?EA_1JT6 and Z21368J?EA_1_T9. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_45 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z2136SJPEA_1 _T10, Z21368JPEA_1 _T11, Z21368JPEA_1_T5, Z21368J?EA_1 JT6 and Z21368_PEA_1_T9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_53 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1 JT1 1, Z21368J?EA_1_T5, Z21368 JPE A_1_T6 and Z21368JPEA_1_T9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_56 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368J?EA_1_T1 , Z21368JPEA_1_T11 and Z21368JPEA_1_T9. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_58 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368JPEA_1_T5, Z21368JPEA_1_T6 and Z21368JPEA_1_T9. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_66 according to the present invention is supported by 142 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368J?EA_1_T5, Z21368JPEA_1_T6 and Z21368JPEA_1_T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_67 according to the present invention is supported by 181 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368J?EA_1_T5, Z21368_PEA_1_T6 and Z21368JPEA_1_T9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_69 according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1 JT10, Z21368JPEA_1 JT1 1, Z21368_PEA_1_T5, Z21368JPEA_1_T6 and Z21368_PEA_1 JT9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in fength, and so are included in a separate description.
Segment cluster Z21368JPEA_l_node_l 1 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368_PEA_1_T10, Z21368JPEA_1JT1 1, Z21368JPEA_1JT23, Z21368_PEA_1_T24, Z2136SJ?EA_1_T5, Z21368JPEA_1_T6 and Z21368 JPE A_1_T9. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_12 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368JPEA_1_T23, Z21368JPEA_1 JT24, Z21368JPEA_1_T5, Z21368JPEA_1 JT6 and Z21368JPEA_1 JT9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_16 according to the present invention can be found in the following transcript(s): Z2136SJPEA_1 JT10, Z21368JPEA_1 _T1 1, Z21368JPEA_1_T23, Z21368JPEA_1 JT24, Z21368_PEA_1 _T5, Z21368_PEA_1_T6 and Z21368JPEA_1_T9. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
194
Segment cluster Z2136SJPEA_l_node_17 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368_PEA_1_T10, Z2136SJPEA_1_T11, Z21368J?EA_1_T23, Z2136SJPEA_1_T24, Z21368JPEA_1_T5, Z21368_PEA_1_T6 and Z21368_PEA_1 _T9. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_23 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368_PEA_1 _T11, Z21368JPEA_1_T23, Z21368JPEA_1_T24, Z21368 JPE A_1JT5, Z21368_PEA_1_T6 and Z21368JPEA_1_T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_24 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368JPEA_1JT23, Z21368JPEA JT24, Z2136S_PEA_1JT5, Z21368JPEA_1_T6 and Z21368 JPE A_1_T9. Table 31 below describes the starting and ending position of this segment on each transcript.
Segment cluster Z21368 JPEA_l_node_30 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1JT10, Z21368_PEA_1JT11, Z21368_PEA_1 JT23, Z21368_PEA_1_T24, Z21368_PEA_1_T5, Z21368JPEA_1_T6 and Z21368JPEA_1JT9. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster Z21368 JPE A_l_node_31 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368J?EA_1JT1 1, Z21368_PEA_1_T23, Z21368J?EA_1_T24, Z21368JPEA_1_T5, Z21368 JPE A_1_T6 and Z21368JPEA_1_T9. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_38 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1 JT10, Z21368JPEA_1_T11, Z21368JPEA_1_T23, Z21368_PEA_1_T24, Z21368JPEA_1_T5, Z21368JPEA_1JT6 and Z21368JPEA_1 JT9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster Z21368 JPE A_l_node_47 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368J?EA_1_T10, Z21368_PEA_1_T11, Z2136SJPEA_1_T5, Z21368J EA_1_T6 and Z21368JPEA_1_T9. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_49 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1JT10, Z21368JPEA_1_T11, Z21368JPEA_1_T5, Z21368JPEA_1_T6 and Z2136SJPEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster Z21368 JPEA_l_node_51 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T1 1, Z21368JPEA_1_T5, Z21368 JPE A_1_T6 and Z21368JPEA_I_T9. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_61 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368JPEA_1_T10, Z21368JPEA_1_T11, Z21368JPEA_1_T5, Z21368_PEA_1_T6 and Z21368_PEA_1_T9. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_68 according to the present invention is supported by S7 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z21368JPEA_1 JT10, Z21368JPEA_1_T1 1, Z21368_PEA_1_T5, Z21368_PEA_1_T6 and Z21368JPEA_1_T9. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster Z21368JPEA_l_node_7 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z21368JPEAJJT10, Z21368_PEA_1 _T11, Z21368JPEA_1_T23, Z21368_PEA_1_T24, Z21368J?EA_1_T5, Z21368_PEA_1_T6 and Z21368JPEA_1_T9. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Overexpression of at least a portion of this cluster was determined according to oligonucleotides and one or more chips. The results were as follows: Oligonucleotide Z21368_0_0_61857 was on the TAA chip and was found to be overexpressed in breast cancer.
Variant protein alignment to the previously known protein: Sequence name: /tmp/5ER3vIMKE2/9L0Y7lDlTQ : SULlJHUMAN
Sequence documentation:
Alignment of: Z21368_PEA_1_P2 x SUL1_HUMAN
Alignment segment l/l: Quality: 7664.00 Escore : 0 Matching length: 761 Total length: 761 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTG YV 100
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 . . . . . 101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150 151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200
151 YIPPGWRE LGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200 201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250 251 YAPNMDKH IMQYTGPMLPIHMEFTNILQR RLQTLMSVDDSVERLYNML 300
251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300 . . . . .
301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350
301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350
351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSλTLKLLDPEKPGNRFRT 400
351 PGSIVPQINLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 400
401 NKKAKI RDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARY 450 I II II 11111111111 II 11111111 II I i 11111111111111111111
401 NKKAKI RDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARY 450
451 QTACEQPGQK QCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 500
451 QTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 500
501 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 550
501 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 550
551 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 600
551 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 600
601 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 650 MMIMIIMIIIMIMIMIIIIIMIIIMIIimillillMII 601 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 650
651 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLH 700 651 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLH 700 701 PFKEAAQEVDSKIQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHD 750
701 PFKEAAQEVDSKliQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHD 750
751 NNHWQTAPFWN 761
751 NNHWQTAPFWN 761
Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h:Q7Z2W2
Sequence documentation:
Alignment of: Z21368_PEA_1_P5 x Q7Z2W2
Alignment segment 1/1:
Quality: 7869.00
Escore : 0 * Matching length: 791 Total length: 871 Matching Percent Similarity: 99.87 Matching Percent Identity: 99.87 Total Percent Similarity: 90.70 Total Percent Identity: 90.70 Gaps : 1
Alignment :
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELA 5! 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
59 FFGKYLNEYNGS 70
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTVFFGKYLNEYNGS 150 . . . . . 71 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 120
151 YIPPGWREWLGLI NSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200 121 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 170
201 NYFKMSKRMYPHRPλ/MMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250
171 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 220
251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300 221 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 270
301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350 . . . . .
271 PGSIVPQIVLNIDLAPTILDIAGLDTPPDNDGKSVLKLLDPEKPGNRFRT 320 IMIIIMIIIMMIIIIIMIMIIMIMIIIIMMIIIMIIIII
351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 400
321 NKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARY 370
401 NKKAKIWRDTFLVERGKFLRKKEESS NIQQSNHLPKYERVKELCQQARY 450
371 QTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 420
451 QTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 500
421 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 470
501 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 550
471 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 520
551 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 600 . . . . .
521 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 570
601 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 650
571 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLH 620 651 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGV KQEKLKSHLH 700
621 PFKFJ^QEVDSI<1QLFKENNRRRK ERKEKRRQRKGEECSLPGLTCFTHD 670 lillllMIIMIIIIMIIIMIIMIMMIIIMMMIMIimi 701 PFKEAAQEVDSKLQLFKElsraRRRiαζERKEKRRQRKGEECSLPGLTCFTHD 750 671 NNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEY 720
751 NNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEY 800 721 FDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDV 770
801 FDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDV 850 771 GNKDGGSYDLHRGQLWDGWEG 791 MIIIIMMIIIIIMIMI 851 GNKDGGSYDLHRGQLWDGWEG 871
Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h:AAH12997
Sequence documentation:
Alignment of: Z21368_PEA_1_P5 x AAH12997
Alignment segment 1/1: Quality: 420.00 Escore: 0 Matching length: 40 Total length: 40 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
752 LRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG 791 1 LRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG 40
Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h: SUL1_HUMAN
Sequence documentation:
Alignment of: Z21368_PEA_1_P5 x SUL1_HUMAN
Alignment segment l/l: Quality: 7878.00
Escore: 0 Matching length: 791 Total length: 871 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 90.82 Total Percent Identity: 90.82 Gaps : 1
Alignment : . . . . . 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50 I M I I M I I I M II IIII I I I I I II I I I I I I I M I M I M I MII I I I I I 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50 51 DDQDVEL 57
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
58 AFFGKYLNEYNGS 70 1111111111111 101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150
71 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 120 151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200
121 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 170
201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250
171 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 220 251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300
221 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 270
301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350
271 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 320
351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 400
321 NKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARY 370
401 NK AKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARY 450
371 QTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 420 MIIIMIMMMIIIMMIIMIMMIIMMIIMIMIMIIII
451 QTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDK 500
421 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 470 MMMMIIMMIMIMIMMIIIIMMIIIIIMIIIMMMI
501 DKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEF 550
471 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 520
551 EGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLA 600
521 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 570
601 DSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEI 650 571 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLH 620
651 EALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLH 700 621 PFKEAAQEVDSKLQLFKErøTRRRKKERKEKRRQRKGEECSLPGLTCFTHD 670
701 PFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHD 750 671 NNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEY 720
751 NNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEY 800 721 FDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDV 770 801 FDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDV 850 771 GNKDGGSYDLHRGQLWDGWEG 791
851 GNKDGGSYDLHRGQLWDGWEG 871
Sequence name: /tmp/AVAZGWHuF0/RzHFOnHIsT:SULl_HUMAN
Sequence documentation:
Alignment of: Z21368_PEA_1_P15 x SUL1_HUMAN Alignment segment 1/1:
Quality: 4174.00
Escore: 0 Matching length: 416 Total length: 416 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Ga s : 0
Alignment : 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150 101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150 151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200 I M I t 1 1 ! ! 1 M M t l 1 M 1 1 1 1 1 1 1 1 M 1 1 1 I I 1 1 1 1 M 1 1 1 1 M M E 1 151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200
201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250 201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250 251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300
251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300 301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350 301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350 351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 400
351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRT 400
401 NKKAKIWRDTFLVERG 416
401 NKKAKIWRDTFLVERG 416
Sequence name: /tmp/Jh gRdKqmt/kqSmj kWWk:SULl_HUMAN
Sequence documenta ion:
Alignment of: Z21368_PEA_1_P16 x SUL1_HUMAN
Alignment segment l/l: Quality: 3985.00 Escore: 0 Matching length: 397 Total length: 397 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
101 HNHNNYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150 . . . . . 151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200
151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESI 200 201 NYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250 201 NYFKMSKRMYPHRPNMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYN 250
251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300
251 YAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNML 300 301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350
301 VETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVE 350 351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNR 397
351 PGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNR 397
Sequence name: /tmp/GPlnIw3BOg/zXFdxqG4ow:SULl_HUMAN
Sequence documentation:
Alignment of: Z21368_PEA_1_P22 x SUL1_HUMAN
Alignment segment 1/1:
Quality: 1897.00 Escore: 0 Matching length: 188 Total length: 188 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MKYSCCALVLAVLGTELLGSLCSTλtRSPRFRGRIQQERKNIRPNIILVLT 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150
101 HNHNNYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGS 150
151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAK 188
151 YIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAK 188
Sequence name: /tmp/oji5Fs74fB/8xeB9KrGjp :Q7Z2W2 Sequence documentation:
Alignment of: Z21368_PEA_1_P23 x Q7Z2W2
Alignment segment l/l:
Quality: 1368.00 Escore: 0.000511 Matching length: 137 Total length: 137 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRT 137
101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRT 137
Sequence name: /tmp/oji5Fs74fB/8xeB9KrGjp:SULl_HUMAN
Sequence documentation:
Alignment of: Z21368_PEA_1_P23 x SUL1_HUMAN
Alignment segment 1/1:
Quality: 1368.00 Escore: 0.000511 Matching length: 137 Total length: 137 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50
1 MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLT 50 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 DDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYV 100 101 HNHNNYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRT 137 I I I I M M I I I I I I I I I M I I I I I M I M I I I I I I I I 101 HNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRT 137
Expression of SULl JHUMAN - Extracellular sulfatase Sulf- 1Z21368 transcripts which are detectable by amplicon as depicted in sequence name Z21368seg39 in normal and cancerous breast tissues
Expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 transcripts detectable by or according to seg39, Z21368seg39 amplicon and Z21368seg39F and Z21368seg39R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60,63-67, Table 1 above, Tissue samples in testing panel), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 13 is a histogram showing over expression of the above- indicated SULl JHUMAN - Extracellular sulfatase Sulf-1 transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5-fold over- expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 13, the expression of SULl JHUMAN - Extracellular sulfatase Sulf- 1 franscripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos 56-60,63-67, Table 1 above, Tissue samples in testing panel). Notably an over-expression of at least 5 fold was found in 13 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of SULl JHUMAN - Extracellular sulfatase Sulf- 1 franscripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 2.14E-03. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 6.91E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z21368seg39F forward primer; Z21368seg39R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z21368seg39.
Z21368seg39F (SEQ ID NO:842)- GTTGCATTTCTCAGTGCTGGTTT Z21368seg39R (SEQ ID NO:843)- AGGGTGCCGGGTGAGG Z21368seg39 (SEQ ID NO: 844)-
GTTGCATTTCTCAGTGCTGGTTTCTAATCAGACCAGTGGATTGAGTTTCTCTACCATC CTCCCCACGTTCTTCTCTAAGCTGCCTCCAAGCCTCACCCGGCACCCT
Expression of SULl JHUMAN - Extracellular sulfatase Sulf-lZ21368 transcripts which are detectable by amplicon as depicted in sequence name Z21368seg39 in different normal tissues
Expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 franscripts detectable by or according to Z21368seg39 amplicon and Z21368seg39F Z21368seg39R was measured by real time PCR. In parallel the expression of four housekeeping genes -[ RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (sample nos. 33-35 in table 2 "Tissue samples in normal panel") to obtain a value of relative expression of each sample relative to median of the Normal samples. Primers and amplicon are as above.
The results are presented in Figure 14, demonstrating the expression of SUL1_HUMAN - Extracellular sulfatase Sulf- 1Z21368 transcripts, which are detectable by amplicon as depicted in sequence name Z21368seg39, in different normal tissues.
Expression of SULl JHUMAN - Extracellular sulfatase Sulf- 1 Z21368 transcripts which are detectable by amplicon as depicted in sequence name Z21368juncl7-21 in normal and cancerous breast tissues Expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 transcripts detectable by or according to Z21368junc 17-21 amplicon and Z21368juncl7-21F and Z21368juncl7-21R primers was measured by real time PCR. In parallel the expression of four housekeeping genes - PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NMJ)00194; amplicon - HPRTl -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos 56-60,63-67 Table 1 above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 15 is a histogram showing over expression of the above- indicated
SULl JHUMAN - Extracellular sulfatase Sulf-1 transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Eπor bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 15, the expression of SUL1_HUMAN - Extracellular sulfatase
Sulf- 1 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos 56-60,63-67, Table 1 above, Tissue samples in testing panel). Notably an over-expression of at least 5 fold was found in 11 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of SULl JHUMAN - Extracellular sulfatase Sulf-1 franscripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by Ttest as 4.6E-03. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.78E-02 as checked by exact fisher test. The above values demonsfrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z21368juncl 7-21F forward primer; Z21368juncl 7-2 IR reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: Z21368juncl7- 21
Z2I368juncI7-21F (SEQ ID NO:845)- GGACGGATACAGCAGGAACG Z21368juncl7-21R (SEQ ID NO:846 TATTTTCCAAAAAAGGCCAGCTC Z21368juncl7-21 (SEQ ID NO:847>- GGACGGATACAGCAGGAACGAAAAAACATCCGACCCAACATTATTCTTGTGCTTAC CGATGATCAAGATGTGGAGCTGGCCTTTTTTGGAAAATA Expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 Z2136S transcripts which are detectable by amplicon ;ss depicted in sequence name Z21368juncl7-21 in different normal tissues
Expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 Z21368 transcripts detectable by or according to amplicon Z21368juncl7-21 was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NMJ003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NMJ00416S; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes, as above. The normalized quantity of each RT sample λvas then divided by the median of the quantities of the breast samples (Sample Nos. - 33-35 Table 2 above, 'Tissue samples on nonnal panel"), to obtain a value of relative expression of each sample relative to median of the breast samples. Primers and amplicon are as above.
The results are presented in Figure 16, demonstrating the expression of SULl JHUMAN - Extracellular sulfatase Sulf-1 Z21368 transcripts, which are detectable by amplicon as depicted in sequence name Z21368junc 17-21, in different normal tissues.
DESCRIPTION FOR CLUSTER T59832 Cluster T59832 features 6 transcript(s) and 33 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Gamma- interferon inducible lysosomal thiol reductase precursor (SwissProt accession identifier GDLTJHUMAN; known also according to the synonyms Gamma- interferon- inducible protein IP-30), SEQ ID NO: 142, refeπed to herein as the previously known protein. Protein Gamma- interferon inducible lysosomal thiol reductase precursor is known or believed to have the following function(s): Cleaves disulfide bonds in proteins by reduction. May facilitate the complet unfolding of proteins destined for lysosomal degradation. May be involved in MHC class Il-restricted antigen processing. The sequence for protein Gamma- interferon inducible lysosomal thiol reductase precursor is given at the end of the application, as "Gamma- interferon inducible lysosomal thiol reductase precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Gamma- interferon inducible lysosomal thiol reductase precursor localization is believed to be Lysosomal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: extracellular; lysosome, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T59832 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 17 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 17 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, breast malignant tumors, ovarian carcinoma and pancreas carcinoma. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
above. These transcript(s) encode for protein(s) which are variant(s) of protein Gamma- interferon inducible lysosomal thiol reductase precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T59832JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832JT6. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. Comparison report between T59832JP5 and GILTJrlUMAN: l.An isolated chimeric polypeptide encoding for T59832JP5, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 12 - 55 of GILT JHUMAN, which also coπesponds to amino acids 1 - 44 of T59832JP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGTATGPvAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRL VPPKGSMS AYCVLLGQELGSPFVAQGTSS AAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG coπesponding to amino acids 45 - 189 of T59S32JP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T59832JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VGTATGRAGWREQAPCRGTRLLLSPQTSQGKTRAPRGRCPCRVPGKTLFSSRRCGHTP SVPFRFRIPHLRGAAASTRLVPPKGSMSAYCVLLGQELGSPFVAQGTSSAAGQGPPACIL AATLDAFIPARAGLACLWDLLGRCPRG in T59832JP5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein T59832JP5 is encoded by the following transcript(s): T59832JT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832JT6 is shown in bold; this coding portion starts at position 149 and ends at position 715. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein T59S32JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832JT8. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832JP7 and GILTJHUMAN: l.An isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLK SNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNNSGRWEFKC QHGEEECKFΝKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHAΝAQRTDALQPPHE\NPWVTNΝG coπesponding to amino acids 12 - 223 of GILT JHUMAN, which also corresponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLΓVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPJFLALSLTLINPWSQGWTRQRDQR in T59832JP7. Comparison report between T59832JP7 and BAC98466 (SEQ ED NO:848): l.An isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLD TAAVQASPLQALDFFGNGPPVNΥKTGNLYLRGPLKK8NA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKλ^EACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNG coπesponding to amino acids 1
- 212 of BAC98466, which also coπesponds to amino acids 1 - 212 of T59832JP7, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRTFLALSLTLFVPWSQGWTRQRDQR in T59832JP7. Comparison report between T59832JP7 and BACS5622 (SEQ ED NO:849): 1.An isolated chimeric polypeptide encoding for T59832 JP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MTLSPLLLFLPPLLLLLDVFTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832 JP7, and a second amino acid sequence being at least 90 % homologous to MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKPNKVEACVLDELDMELAFLTINC MEEFEDMERSLPLCLQLYAPGLSPDT ECAMGDRGMQLMHANAQRTDALQPPHEYN PWVTNNGVREFLALSLTLIΛ WSQGWTRQRDQR coπesponding to amino acids 1 - 148 of BAC85622, which also coπesponds to amino acids 91 - 238 of T59832JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of T59832JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832JP7. Comparison report between T59832JP7 and Q8WU77 (SEQ ED NO:850): l.An isolated chimeric polypeptide encoding for T59832JP7, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVPYGNAQEQNNSGRWEFKC QHGEEECKFΝKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHAΝAQRTDALQPPHEYVPWVTNΝG coπesponding to amino acids 1 - 212 of Q8WU77, which also coπesponds to amino acids 1 - 212 ofT59832JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRIFLALSLTLIVPWSQGWTRQRDQR coπesponding to amino acids 213 - 238 of T59832 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832JP7, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence VRIFLALSLTLIVPWSQGWTRQRDQR in T59832JP7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein T59832JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; die last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein T59832JP7 is encoded by the following transcript(s): T59832JT8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832JT8 is shown in bold; this coding portion starts at position 149 and ends at position 862. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein T59832JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832JT11. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832JP9 and GILT JHUMAN: l.An isolated chimeric polypeptide encoding for T59S32JP9, comprising a first amino acid sequence being at least 90 % homologous to MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEΓLNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKPNKVEAC VLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIM ECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 12 - 214 of GILTJTUMAN, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKJRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about S0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9. Comparison report between T59832JP9 and BAC98466 (SEQ ID NO: 848): l.An isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLD TAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNΛ^TLYYΈALCGGCRAFLIRELFPTWLLVMEELNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVEACVLDELDMELAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTΓM
EC IGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 203 of BAC98466, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPWKTRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59S32JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59S32JP9. Comparison report between T59832JP9 and BAC85622: l.An isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLD TAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832_P9. second amino acid sequence being at least 90 % homologous to MEIL^TLWYGNAQEQNVSGRWEFKCQHGEEECKI^NKVEACVLDELDMELAFLTrVC MEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 113 of BACS5622, which also coπesponds to amino acids 91 - 203 of T59832JP9, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59832JP9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPλ NYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV of T59832JP9. 3.An isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59832JP9. Comparison report between T59832JP9 and Q8WU77 (SEQ ED NO:850): l.An isolated chimeric polypeptide encoding for T59832JP9, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVN "KTGNLYLRGPLKKSNA
PLVNVTLYYEALCGGCRAFLΓRELFPTWLLVMEELNVTLVPYGNAQEQNVSGRWΈFKC QHGEEECKFNK\^EACVLDELDMELAFLTΓVCMEEFEDMERSLPLCLQLYAPGLSPDTIM
ECAMGDRGMQLMHANAQRTDALQPPHE coπesponding to amino acids 1 - 203 of Q8WU77, which also coπesponds to amino acids 1 - 203 of T59832JP9, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence WWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR coπesponding to amino acids 204 - 244 of T59S32JP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T59832JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence NPWKIRPSSLPLSASCTRARSRMSALPQPAPSGVFASSDGR in T59S32JP9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein T59832JP9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein T59832JP9 is encoded by the following transcript(s): T59832JT11, for λvhich the sequence(s") is/are given at the end of the application. The coding portion of transcript T59832JT1 1 is shown in bold; this coding portion starts at position 149 and ends at position S80. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832 JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein T59832JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832JT15. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832JP12 and GILT JHUMAN: l.An isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLVMErLNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE coπesponding to amino acids 12 - 141 of GILT JHUMAN, which also coπesponds to amino acids 1 - 130 of T59832JP12, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 173 - 261 of GELT JHUMAN, which also coπesponds to amino acids 131 - 219 of T59832JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between T59S32JP12 and BAC85622: l.An isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTLSPLLLFLPPLLLLLD TAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYEALCGGCRAFLIRELFPTWLLV coπesponding to amino acids 1 - 90 of T59832JP12, second amino acid sequence being at least 90 % homologous to MEILNVTL VPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE coπesponding to amino acids 1 - 40 of BAC85622, which also coπesponds to amino acids 91 - 130 of T59832JP12, third amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYNPWVTNNG coπesponding to amino acids 72 - 122 of BAC85622, which also coπesponds to amino acids 131 - 181 of T59832JP12, and a fourth amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 182 - 219 of T59832JP12, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of T59832JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYLRGPLKKSNA PLVNVTLYYΕALCGGCRAFLIRELFPTWLLV of T59832 JP 12. 3. An isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. 4.An isolated polypeptide encoding for a tail of T59832JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK in T59832JP12. Comparison report between T59832JP12 and Q8WU77: l.An isolated chimeric polypeptide encoding for T59832JP12, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDWTAAVQASPLQALDFFGΝGPPVΝYKTGΝLYLRGPLKKSΝA PLVNVTLYΛΕALCGGCRAPLIRELFPTWLLVMEILNVTLVPYGNAQEQNVSGRWEFKC QHGEEECKFNKVE coπesponding to amino acids 1 - 130 of Q8WU77, which also coπesponds to amino acids 1 - 130 of T59832JP12, and a second amino acid sequence being at least 90 % homologous to CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHE\NPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8WU77, which also coπesponds to amino acids 131 - 219 of T59832JP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832JP12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EC, having a structure as follows: a sequence starting from any of amino acid numbers 130-x to 130; and ending at any of amino acid numbers 131+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T59832JP12 also has the following non- silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Variant protein T59832JP12 is encoded by the following franscript(s): T59832JT15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T59832JT15 is shown in bold; this coding portion starts at position 149 and ends at position 805. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein T59832JP18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T59832JT22. An alignment is given to the known protein (Gamma- interferon inducible lysosomal thiol reductase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T59832JP18 and GILT JHUMAN: l.An isolated chimeric polypeptide encoding for T59832JP18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 12 - 55 of GILTJHUMAN, which also coπesponds to amino acids 1 - 44 of T59832JP18, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 173 - 261 of GILTJHUMAN, which also coπesponds to amino acids 45 - 133 of T59832JP18, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between T59832JP18 and Q8WU77: l.An isolated chimeric polypeptide encoding for T59832JP18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK corresponding to amino acids 1 - 44 of Q8WU77, which also coπesponds to amino acids 1 - 44 of T59832JP18, and a second amino acid sequence being at least 90 %> homologous to
CLQLYAPGLSPDTΠMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8WU77, which also coπesponds to amino acids 45 - 133 of T59S32JP18, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between T59832JP18 and Q8NEI4 (SEQ ID NO:851): l.An isolated chimeric polypeptide encoding for T59832JP18, comprising a first amino acid sequence being at least 90 % homologous to
MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK coπesponding to amino acids 1 - 44 of Q8NEI4, which also coπesponds to amino acids 1 - 44 of T59832JP18, and a second amino acid sequence being at least 90 % homologous to
CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQPPHEYVPWVTVNGKPLED QTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK coπesponding to amino acids 162 - 250 of Q8NEI4, which also coπesponds to amino acids 45 - 133 of T59832JP18, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T59832JP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KC, having a structure as follows: a sequence starting from any of amino acid numbers 44-x to 44; and ending at any of amino acid numbers 45+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaLpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T59832JP18 also has me following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Variant protein T59832JP18 is encoded by the following transcript(s): T59832JT22, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T59832JT22 is shown in bold; this coding portion starts at position 149 and ends at position 547. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T59832JP18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs As noted above, cluster T59832 features 33 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T59832_node_l according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT1 1 , T59832JT15, T59832_T22, T59832JT6 and T59832JT8. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster T59832_node_22 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T28. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T59832_node_23 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JF28. Table IS below describes the starting and ending position of this segment on each transcript. Table IS - Segment location on transcripts
Segment cluster T59832_node_24 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT28. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T59832_node_29 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT2S and T59S32JT8. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T59832_node_39 according to the present invention is supported by 195 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59S32JT22, T59832JT28, T59832JT6 and T59S32JT8. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T59S32_nodeJ7 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT6. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster T59832_node_10 according to the present invention is supported by 332 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JΪT5, T59832JT6 and T59832JT8. Table 23 below describes the starting and ending position of this segment on each transcript Table 23 - Segment location on transcripts
Segment cluster T59832_node_l 1 according to the present invention is supported by 306 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT6 and T59832JT8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T59832_node_12 according to the present invention is supported by 280 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT1 1 , T59832JT15, T59832JT6 and T59832JT8. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster T59832_node_14 according to the present invention is supported by 280 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT6 and T59832JT8. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T59832_node_16 according to the present invention is supported by 2S7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59S32JT1 1, T59832JT15, T59832JT6 and T59832JT8. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T59832_node_19 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59S32JT6 and T59832JTS. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster T59832_node_2 according to the present invention is supported by 25S libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT1 1 , T59S32JT15, T59832JT22, T59832JT6 and T59832JT8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster T59832_node_20 according to the present invention is supported by 318 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59S32JT1 T59832JT6 and T 9832JT8. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster T59832_node_25 according to the present invention can be found in the following transcript(s): T59832JT1 1 , T59832JT15, T59832JT22, T59S32JT28, T59S32JT6 and T59832JT8. Table 31 below describes the starting and ending position of this segment on each transcript.
Segment cluster T59832_node_26 according to the present invention is supported by 342 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59S32JT15, T59S32JT22, T59S32JT28, T59832JT6 and T59832JTS. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster T59832_node_27 according to the present invention is supported by 314 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT22, T59832JT2S, T59832JT6 and T59832JT8. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster T59832_node_28 according to the present invention is supported by 284 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT15, T59832JT22, T59832JT2S, T59832JT6 and T59832JT8. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster T59832_node_3 according to the present invention can be found in the following transcript(s): T59832JT1 1, T59832JT15, T59832JT22, T59832JT6 and T59832JT8. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster T59832_node_30 according to the present invention can be found in the following transcript(s): T59832JT11 , T59832JT15, T59832JT22, T59832JT28, T59832JT6 and T59832JT8. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster T59832_node_31 according to the present invention can be found in the following transcript(s): T59832JT11, T59S32JT15, T59832JT22, T59832JT28, T59832JT6 and T59832JT8. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster T59832_node_32 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT22, T59832JT2S, T59S32JT6 and T59832JT8. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster T59832_node_34 according to the present invention can be found in the following franscript(s): T59S32JT1 1 , T59832JT15, T59832JT22, T59832JT2S, T59832JT6 and T59832JT8. Table 39 below describes die starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster T59832_node_35 according to the present invention can be found in the following transcript(s): T59832JT1 1 , T59832JT15, T59832JT22, T59832JT2S, T59832JT6 and T59832JTS. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster T59832_node_36 according to the present invention can be found in the following franscript(s): T59832JT11, T59832JT15, T59832JT22, T59S32JT2S, T59S32JT6 and T59832JT8. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster T59832_node_37 according to the present invention is supported by 300 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT22, T59832JT28, T59832JT6 and T59832JT8. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster T59832_node_38 according to the present invention is supported by 247 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT22, T59832JT28, T59832 JT6 and T59832JT8. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster T59832_node_4 according to the present invention is supported by 296 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59832JT15, T59832JT22, T59832JT6 and T59832JT8. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster T59832_node_5 according to the present invention is supported by 305 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832JT11, T59S32JT15, T59832JT22, T59832JT6 and T59832JT8. Table 45 below describes the starting and ending position of this segment on each transcript. 7αb/e 45 - Segment location on transcripts
Segment cluster T59832_node_6 according to the present invention can be found in the following transcript(s): T59832JT1 1, T59832JTT5, T59832JT22, T59832JT6 and T59832JT8. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster T59832_node_8 according to the present invention can be found in the following transcript(s): T59832JT11, T59832JT15, T59S32JT6 and T59832JT8. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster T59832_node_9 according to the present invention is supported by 330 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T59832_T11, T59832JT15, T59832JT6 and T59832JT8. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/YQPBtaxsLQ/JxSZR3ZR2p :GILT_HUMAN Sequence documentation:
Alignment of: T59832_P5 x GILT_HUMAN
Alignment segment l/l:
Quality: 429.00
Escore : 0 Matching length: 46 Total length: 46 Matching Percent Similarity: 97.83 Matching Percent Identity: 97.83 Total Percent Similarity: 97.83 Total Percent Identity: 97.83 Gaps : 0
Alignment : 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKVG 46 I E M M i M M M 1 1 1 1 M I M ! 1 1 1 M E [ 1 1 1 E 1 1 1 1 1 I I 1 1 1 I 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTG 57
Sequence name: /tmp/9HrQ57oZG0/ugNVzp0l7X:GILT_HUMAN
Sequence documentation: Alignment of: T59832_P7 x GILT_HUMAN
Alignment segment 1/1:
Quality: 2110.00 Escore: 0 Matching length: 212 Total length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPT LLVMEILNVTLVP 100
62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPT LLVMEIL NTLVP 111 . . . . . 101 YGΝAQEQΝVSGR EFKCQHGEEECKFΝKNEACVLDELDMELAFLTIVCME 150
112 YGΝAQEQΝVSGRWEFKCQHGEEECKFΝKVEACVLDELDMELAFLTIVCME 161 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHAΝAQRTDALQP 200 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211
201 PHEYVPWVTVNG 212 I I I I I M I I I M 212 PHEYVPWNTNNG 223
Sequence name: /tmp/9HrQ57oZG0/ug Vzp0l7X:BAC98466
Sequence documentation:
Alignment of: T59832_P7 x BAC98466
Alignment segment l/l: Quality: 2110.00
Escore: 0 Matching length: 212 Total length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
51 RGPLKKSNAPLVNNTLYYEALCGGCRAFLIRELFPT LLVMEILNVTLVP 100 101 YGNAQEQNVSGR EFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
101 YGNAQEQNVSGR EFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
201 PHEYVPWVTVNG 212
201 PHEYVPWVTVNG 212
Sequence name: /tmp/9HrQ57oZG0/ugNVzp0l7X:BAC85622
Sequence documentation:
Alignment of: T59832_P7 x BAC85622 Alignment segment l/l:
Quality: 1496.00 Escore: 0 Matching length: 148 Total length: 148 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
91 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDME 140
1 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDME 50
141 LAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHA 190
51 LAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHA 100
191 NAQRTDALQPPHEYVPWVTVNGVRIFLALSLTLIVPWSQGWTRQRDQR 238
101 NAQRTDALQPPHEYVPWVTVNGVRIFLALSLTLIVPWSQGWTRQRDQR 148 Sequence name: /tmp/9HrQ57oZG0/ugNVzp0l7X:Q8WU77
Sequence documentation:
Alignment of: T59832_P7 x Q8WU77
Alignment segment l/l:
Quality: 2110.00 Escore: 0 Matching length: 212 Total length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 201 PHEYVPWVTVNG 212
201 PHEYVPWVTVNG 212
Sequence name: /tmp/lttCiW30od/feIXLDs4rU:GILT_HUMAN
Sequence documentation:
Alignment of: T59832_P9 x GILT_HUMAN
Alignment segment 1/1:
Quality: 2016.00 Escore: 0 Matching length: 203 Total length: 203 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 . . . . . 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211
201 PHE 203
212 PHE 214
Sequence name: /tmp/lttCiW30od/feIXLDs4rϋ:BAC98466
Sequence documentation: Alignment of: T59832_P9 x BAC98466
Alignment segment 1/1:
Quality: 2016.00 Escore: 0 Matching length: 203 Total length: 203 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps': 0
Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
51 RGPLK SNAPLVNVTLYYEALCGGCRAFLIRΞLFPTWLLVMEILNVTLVP 100
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 . . . . . 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
201 PHE 203 M l 201 PHE 203
Sequence name: /tmp/lttCiW30od/feIXLDs4rϋ:BAC85622
Sequence documentation:
Alignment of: T59832_P9 x BAC85622
Alignment segment l/l: Quality: 1145.00
Escore: 0 Matching length: 113 Total length: 113 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 91 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDME 140 IMMIIMIIIIIMIIIIMIIIMIIIIIMMMIIMIMMIII 1 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDME 50 141 LAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHA 190
51 LAFLTIVCMEEFΞDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHA 100 191 NAQRTDALQPPHE 203 1111111111111 101 NAQRTDALQPPHE 113
Sequence name: /tmp/lttCiW30od/feIXLDs4rU:Q8WU77
Sequence documentation:
Alignment of: T59832_P9 x Q8WU77
Alignment segment l/l:
Quality: 2016.00 Escore: 0 Matching length: 203 Total length: 203 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment ;
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 i 1 1 1 1 1 E ! 1 1 f ! 1 1 f 1 1 1 E 1 1 1 1 1 1 1 1 1 ! 1 1 f 1 1 1 1 1 1 1 1 1 1 1 E [ ) T 1 1 1 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
201 PHE 203
201 PHE 203 Sequence name: /tmp/sIHTwdduiK/ToMKmEJiZc :GILT_HUMAN
Sequence documentation*.
Alignment of: T59832_P12 x GILT_HUMAN
Alignment segment l/l:
Quality: 2084.00 Escore: 0 Matching length: 219 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.60 Total Percent Identity: 87.60 Gaps : 1
Alignment : . . . . . 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100 IIIIIMIIIMMIMIIMIMIMMIIIII 62 RGPLKKΞNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVE 130 || 111 M I II I II 11111111 II 1111111 112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 131 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 169 1 1 1 E ! 1 1 E ! f 1 1 1 1 1 1 f 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 ϊ 1 1 1 1 1 162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211
170 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 219 1 1 1 M M M E 1 E I M I I I 1 1 1 I E M M I I t 1 1 1 M 1 1 1 1 I t M M 1 1 1 M 212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261
Sequence name: /tmp/sIHTwdduiK/ToMKmEJiZc -.BAC85622
Sequence documentation:
Alignment of: T59832_P12 x BAC85622
Alignment segment l/l:
Quality: 835.00
Escore : 0 Matching length: 91 Total length: 122 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 74.59 Total Percent Identity: 74.59 Gaps : 1 Alignmen :
91 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVE 130
1 MEILNVTLVPYGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDME 50 131 CLQLYAPGLSPDTIMECAMGDRGMQLMHA 159 51 LAFLTIVCMEEFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHA 100
160 NAQRTDALQPPHEYVPWVTVNG 181
101 NAQRTDALQPPHEYVPWVTVNG 122
Sequence name: /tmp/sIHT dduiK/ToMKmEJiZc :Q8WU77
Sequence documentation:
Alignment of: T59832_P12 x Q8WU77
Alignment segment l/l:
Quality: 2084.00 Escore: 0 Matching length: 219 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.60 Total Percent Identity: 87.60 Gaps : 1
Alignment : . . . . . 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVE 130 I M I I I I II I 1111 || II I I 11 II I II 111 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
131 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 169 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 170 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 219
201 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCF 250
Sequence name: /tmp/LH4xf8J65f/a95JQoTfNB :GILT_HUMAN
Sequence documentation:
Alignment of: T59832JP18 x GILT_HUMAN
Alignment segment l/l:
Quality: 1222.00 Escore: 0 Matching length: 133 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 53.20 Total Percent Identity: 53.20 Gaps : 1
Alignment : 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK 44 1 1 ! 1 ! 1 I E 1 1 E 1 1 1 I I 1 1 1 1 1 1 1 1 M E 1 1 ! 1 1 I t 1 1 1 1 1 I E 1 1 1 12 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 61
44 44
62 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 111 552
44 44
112 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 161 . . . . . 45 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 83
162 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 211 84' PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 133
212 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 261
Sequence name: /tmp/LH4xf8J65f/a95JQoTfNB :Q8WU77
Sequence documentation:
Alignment of: T59832_P18 x Q8WU77
Alignment segment l/l:
Quality: 1222.00 Escore : 0 Matching length: 133 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 53.20 Total Percent
Identity: 53.20 Gaps : 1
Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK 44 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 I I 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
44 44 51 RGPLKKSNAPLVNVTLYYEALCGGCRAFLIRELFPTWLLVMEILNVTLVP 100
44 44
101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150 . . . . . 45 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 83
151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200 84 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 133
201 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 250 Sequence name: /tmp/LH4xf8J65f/a95JQoTfNB :Q8NEI4
Sequence documentation:
Alignment of: T59832_P18 x Q8NEI4
Alignment segment l/l:
Quality: 1222.00 Escore: 0 Matching length: 133 Total length: 250 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 53.20 Total Percent Identity: 53.20 Gaps : 1
Alignment :
1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYK 44 1 MTLSPLLLFLPPLLLLLDVPTAAVQASPLQALDFFGNGPPVNYKTGNLYL 50
44 44
51 RGPLKKSNAPLV VTLYYEALCGGCQAFLIRELFPTWLLVMEILNVTLVP 100 . . . . . 44 44 101 YGNAQEQNVSGRWEFKCQHGEEECKFNKVEACVLDELDMELAFLTIVCME 150
45 CLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 83 11 ! 111 E f 1111111 E I ! I E IE 111 E 11 E I i f 111111 E 151 EFEDMERSLPLCLQLYAPGLSPDTIMECAMGDRGMQLMHANAQRTDALQP 200
84 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 133 201 PHEYVPWVTVNGKPLEDQTQLLTLVCQLYQGKKPDVCPSSTSSLRSVCFK 250
Expression of gamma- interferon inducible lysosomal thiol reductase (GILT) T59832 franscripts which are detectable by amplicon as depicted in sequence name T59832junc6-25-26 in normal and cancerous breast tissues Expression of gamma- interferon inducible lysosomal thiol reductase (GILT) transcripts detectable by or according to junc6-25-26, T59832junc6-25-26 amplicon and primers
T59832junc6-25-26F T59832junc6-25-26R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon
- PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl - amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD
(GenBank Accession No. NMJ000402; G6PD amplicon), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos.
56-60, 63-67, Table 1 above, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 18 is a histogram showing over expression of the above -indicated gamma- interferon inducible lysosomal thiol reductase (GELT) transcripts in cancerous breast samples relative to the normal samples. As is evident from Figure 18, the expression of gamma- interferon inducible lysosomal thiol reductase (GELT) transcripts detectable by the above amplicon(s) in cancer samples was higher in a few samples than in the non-cancerous samples (Sample Nos. 56-60, 63-67, Table 1 above, "Tissue samples in testing panel"). Notably an over- expression of at least 7 fold was found in 3 out of 28 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T59832junc6-25-26F forward primer; and T59832junc6-25-26R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: T59832junc6- 25-26. Forward primer T59832junc6-25-26F (SEQ ID NO :852): CCACCAGTTAACTACAAGTGCCTG Reverse primer T59832junc6-25-26R (SEQ ID NO :853): GCGTGCATGAGCTGCATG Amplicon T59832junc6-25-26 (SEQ ID NO :854): CCACCAGTTAACTACAAGTGCCTGCAGCTCTACGCCCCAGGGCTGTCGCCAGACAC TATCATGGAGTGTGCAATGGGGGACCGCGGCATGCAGCTCATGCACGC
DESCRIPTION FOR CLUSTER HUMGRP5E Cluster HUMGRP5E features 2 transcript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
These sequences are variants of the known protein Gastrin-releasing peptide precursor (SwissProt accession identifier GRPJrIUMAN; known also according to the synonyms GRP; GRP- 10), SEQ ID NO: 155, refeπed to herein as the previously known protein. Gastrin-releasing peptide is known or believed to have the following fϋnction(s): stimulates gastrin release as well as other gastrointestinal hormones. The sequence for protein Gastrin-releasing peptide precursor is given at the end of the application, as "Gastrin-releasing peptide precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Gastrin-releasing peptide localization is believed to be Secreted. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Diabetes, Type H It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Bombesin antagonist; Insulinotropin agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anorectic/Antiobesity; Releasing hormone; Anticancer; Respiratory; Antidiabetic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; neuropeptide signaling pathway, which are annotation(s) related to Biological Process; growth factor, which are annotation(s) related to Molecular Function; and secreted, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl
Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMGRP5E features 2 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Gastrin-releasing peptide precursor. A description of each variant protein according to the present invention is now provided.
Variant protein FIUMGRP5EJP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGRP5EJT4. An alignment is given to the known protein (Gastrin-releasing peptide precursor) at the end of the application One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between ITUMGRP5EJP4 and GRP JHUMAN: l.An isolated chimeric polypeptide encoding for HUMGRP5EJP4, comprising a first amino acid sequence being at least 90 % homologous to
MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNITWANGHLMGKKSTG ESSSVSERGSLKQQLREYmWEEAARΝLLGLffiA ϊΝRΝHQPPQPKAI GΝQQPSWDSED SSNFKDVGSKGK coπesponding to amino acids 1 - 127 of GRP JHUMAN, which also corresponds to amino acids 1 - 127 of HUMGRP5EJP4, and a second amino acid sequence being at least 90 % homologous to GSQREGRNPQLNQQ coπesponding to amino acids 135 - 148 of GRPJrIUMAN, which also corresponds to amino acids 128 - 141 of HUMGRP5EJP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMGRP5E JP4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KG, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaLpeptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans -membrane region.. Variant protein HUMGRP5EJP4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMGRP5EJP4 is encoded by the following transcript(s): HUMGRP5EJT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGRP5E_T4 is shown in bold; this coding portion starts at position 622 and ends at position 1044. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMGRP5EJP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGRP5EJT5. An alignment is given to the known protein (Gastrin-releasing peptide precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HUMGRP5EJP5 and GRPJHUMAN: l.An isolated chimeric polypeptide encoding for HUMGRP5EJP5, comprising a first amino acid sequence being at least 90 % homologous to
MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTG ESSSVSERGSLKQQLREYIRWEEAARNLLGLffiAKENRNHQPPQPKALGNQQPSWDSED SSNFKDVGSKGK coπesponding to amino acids 1 - 127 of GRPJHUMAN, which also coπesponds to amino acids 1 - 127 of HUMGRP5EJP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS corresponding to amino acids 128 - 142 of HUMGRP5EJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMGRP5EJP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DSLLQVLNVKEGTPS in HUMGRP5EJP5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMGRP5EJP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HUMGRP5EJP5 is encoded by the following transcript(s): HUMGRP5E_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGRP5EJT5 is shown in bold; this coding portion starts at position 622 and ends at position 1047. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGRP5EJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster HUMGRP5E features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMGRP5E_node_0 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5EJT4 and HUMGRP5EJT5. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster HU GRP5E_node_2 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5EJT4 and HUMGRP5EJT5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster HUMGRP5E_node_S according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGRP5EJT4 and HUMGRP5EJT5. Table 1 1 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMGRP5E_node_3 according to the present invention can be found in the following transcript(s): HUMGRP5EJT4 and HUMGRP5EJT5. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
Segment cluster HUMGRP5E_nodeJ7 according to the present invention can be found in the following transcript(s): HUMGRP5EJT5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/412zs2mwyT/B0wjOUAX0d:GRP_HUMAN
Sequence documentation:
Alignment of: HUMGRP5E_P4 x GRP HUMAN Alignment segment l/l:
Quality: 1291.00 Escore: 0 Matching length: 141 Total length: 148 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 95.27 Total Percent Identity: 95.27 Gaps : 1
Alignment : . . . . . 1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50
1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100
51 GKKSTGΞSSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 101 PKALGNQQPSWDSEDSSNFKDVGSKGK GSQREGRNPQLNQQ 141 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 101 PKALGNQQPSWDSEDSSNFKDVGSKGKVGRLSAPGSQREGRNPQLNQQ 148 Sequence name: /tmp/lme9ldnvfv/KbP5io8PtU: GRPJHUMAN
Sequence documentation:
Alignment of: HTJMGRP5E_P5 x GRP_HUMAN
Alignment segment 1/1: Quality: 1248.00
Escore : 0 Matching length: 127 Total length: 127 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50
1 MRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLM 50 51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100
51 GKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQ 100 101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127 101 PKALGNQQPSWDSEDSSNFKDVGSKGK 127
Expression of GRPJHUMAN - gastrin-releasing peptide (HUMGRP5E) franscripts, which are detectable by amplicon, as depicted in sequence name HUMGRP5Ejunc3-7 in normal and cancerous breast tissues.
Expression of GRPJHUMAN - gastrin-releasing peptide transcripts detectable by or according to junc3-7, HUMGRP5Ejunc3-7 amplicon(s) and TIUMGRP5Ejunc3-7F and HUMGRP5Ejunc3-7R primers was measured by real time PCR. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67 Table 1, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 19 is a histogram showing over expression of the above- indicated GPJJFIUMAN - gastrin-releasing peptide transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Eπor bars indicate the minimal and maximal values obtained. As is evident from Figure 19, the expression of GRPJHUMAN - gastrin-releasing peptide transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 56-60, 63-67, Table 1 "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 12 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of GRP JHUMAN - gastrin- releasing peptide franscripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 7.22E-04. Threshold of 5 fold over expression was found to differentiate between cancer and normal samples with P value of 1.12E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HUMGRP5Ejunc3-7F forward primer; and HUMGRP5Ejunc3-7R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: HUMGRP5Ejunc3-7. HUMGRP5Ejunc3-7F (SEQ ID NO:855) ACCAGCCACCTCAACCCA
HUMGRP5Ejunc3-7R (SEQ ID NO:856)
CTGGAGCAGAGAGTCTTTGCCT
HUMGRP5Ejunc3-7 (SEQ IDNO:857)
ACCAGCCACCTCAACCCAAGGCCCTGGGCAATCAGCAGCCTTCGTGGGATTCAGAG GATAGCAGCAACTTCAAAGATGTAGGTTCAAAAGGCAAAGACTCTCTGCTCCAG
Expression of GRPJHUMAN - gastrin-releasing peptide (HUMGRP5E) transcripts, which are detectable by amplicon, as depicted in sequence name HUMGRP5Ejunc3-7 in different normal tissues. Expression of GRPJHUMAN - gastrin-releasing peptide transcripts detectable by or according to HUMGRP5E junc3-7 amplicon(s) and HUMGRP5E junc3-7F and HUMGRP5E junc3-7R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_0009S1; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_00416S; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the breast samples (Sample Nos. 33-35 above, Table 2, "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the breast samples. Primers and amplicon are as above. The results are presented in Figure 20, demonstrating the expression of GRPJHUMAN - gastrin-releasing peptide (HLTMGRP5E) transcripts, which are detectable by amplicon, as depicted in sequence name HUMGRP5Ejunc3-7, in different normal tissues.
DESCRIPTION FOR CLUSTER AA 155578 Cluster AA 155578 features 4 transcript(s) and 15 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Kallikrein 10 precursor (SwissProt accession identifier KLKAJHUMAN; known also according to the synonyms EC 3.4.21.-; Protease serine-like 1; Normal epithelial cell-specific 1), SEQ ID NO: 177, refeπed to herein as the previously known protein. Protein Kallikrein 10 precursor is known or believed to have the following function(s): Has a tumor- suppressor role for NESl in breast and prostate cancer. The sequence for protein Kallikrein 10 precursor is given at the end of the application, as "Kallikrein 10 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Kallikrein 10 precursor localization is believed to be Secreted (Probable). The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis, which are annotation(s) related to Biological Process; chymotrypsin; trypsin; serine-type peptidase; hydrolase, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster AA 155578 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 21 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 21 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster AA 155578 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Kallikrein 10 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein AA155578JPEA_1 JP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
AA155578JPEA_1_T10. An alignment is given to the known protein (Kallikrein 10 precursor) at the end of the application. One or more alignments to one or more previously published protem sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA155578_PEA_1 JP4 and KLKA JHUMAN: l.An isolated chimeric polypeptide encoding for AA155578JPEA_1 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGAPCARGSQ PWQVS LFNGLSFHCAGVLVDQS WVLTAAHCGNKPLWARVGDDHLLLLQGEQLRRTT RSWHPKYHQGSGPILPRRTDEHDLMLLKLARP coπesponding to amino acids 1 - 146 of KLKA. JHUMAN, which also coπesponds to amino acids 1 - 146 of AA155578JPEA_1JP4, and a second amino acid sequence being at least 90 % homologous to
YNKGLTCSSITILSPKECEVFYPGVVTNNMICAGLDRGQDPCQSDSGGPLVCDETLQGIL SWGVYPCGSAQHPAVYTQICKYMSWTNKVIRSN coπesponding to amino acids 184 - 276 of KLKA JHUMAN, which also corresponds to amino acids 147 - 239 of AA155578J?EA_1 JP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of AA155578 JPEA_1JP4, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PY, having a structure as follows: a sequence starting from any of amino acid numbers 146-x to 146; and ending at any of amino acid numbers 147+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein AA155578_PEA_1JP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA 155578 JPE A_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein AA155578JPEA_1 JP4 is encoded by the following transcript(s): AA155578JPEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript AA155578J?EA_1JT10 is shown in bold; this coding portion starts at position 148 and ends at position 864. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA155578_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein AA155578 JPEA_1JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA155578JPEA_1 JT12. An alignment is given to the known protein (Kallikrein 10 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA155578JPEA_1 JP6 and KLKA JHUMAN: l.An isolated chimeric polypeptide encoding for AA155578 JPEA_1JP6, comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGARALAKLLPLLMAQLW coπesponding to amino acids 1 - 29 of KLKA JHUMAN, which also coπesponds to amino acids 1 - 29 of AA155578JPEA_1 JP6, and a second amino acid sequence being at least 90 % homologous to VKYNKGLTCSSITILSPKECEVFYPGWTNNMICAGLDRGQDPCQSDSGGPLVCDETLQ GILSWGVYPCGSAQHPAVYTQICKYMSWTNKVIRSN coπesponding to amino acids 182 - 276 of KLKAJHUMAN, which also coπesponds to amino acids 30 - 124 of AA155578JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of AA155578 JPEA_1 JP6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise WV, having a structure as follows: a sequence starting from any of amino acid numbers 29-x to 29; and ending at any of amino acid numbers 30+ ((n-2) - x), in which x varies
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaLpeptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein AA 155578 JPE A_l JP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA155578 JPEA_1 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein AA155578JPEA_1JP6 is encoded by the following transcript(s): AA155578JPEA_1 JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript AA155578JPEA_1_T12 is shown in bold; this coding portion starts at position 148 and ends at position 519. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA 155578 JPEA_1 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein AA155578JPEA_1 JP8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA155578J?EA_1 JT8. An alignment is given to the known protein (Kallikrein 10 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA155578JPEA_1 JP8 and KLKA JHUMAN: l.An isolated chimeric polypeptide encoding for AA155578JPEA_1JP8, comprising a first amino acid sequence being at least 90 % homologous to MRAPITLLILSAASGARALAKLLPLLMAQLW coπesponding to amino acids 1 - 29 of KLKA JHUMAN, which also coπesponds to amino acids 1 - 29 of AA155578JPEA_1 JP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GHCGLE coπesponding to amino acids 30 - 35 of AA155578JPEA_1JP8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of AA155578 JPEA_1 JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GHCGLE in AA155578 _PEA_1JP8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein AA155578JPEA_1 JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA 155578 JPEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein AA 155578 JPEA_1 JP8 is encoded by the following transcript(s): AA155578JPEA_1JT8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript AAl 55578 JPEA_1_T8 is shown in bold; this coding portion starts at position 285 and ends at position 389. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA155578JPEA_1JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein AA155578_PEA_1 JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA155578JPEA_1_T13. An alignment is given to the known protein (Kallikrein 10 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA155578_PEA_1 JP9 and KLKA JHUMAN: l.An isolated chimeric polypeptide encoding for AA155578JPEA_1JP9, comprising a first amino acid sequence being at least 90 % homologous to MRAPHLHLSAASGAPvALAKlI LLMAQLWAAEAALLPQNDTRLDPEAYGAPCARGSQ PWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNK corresponding to amino acids 1 - 90 of KLKA JHUMAN, which also coπesponds to amino acids 1 - 90 of AA155578JPEA_1JP9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans -membrane region. Variant protein AA155578JPEA_1 JP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA155578JPEA_1JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein AA 155578 JPE A_l JP9 is encoded by the following transcript(s): AA155578JPEA_1 JT13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript AA155578JPEA_1 JT13 is shown in bold; this coding portion starts at position 148 and ends at position 417. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA155578 JPEA_1JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster AA 155578 features 15 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster AA15557SJPEA_l_node_l 1 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1 _T10 and AA155578JPEA_1 JT13. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_12 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1JT13. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_14 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1 JT10 and AA155578JPEA_1_TS. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_19 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1_T10, AA155578JPEA_1_T12 and AA155578JPEA_1_T8. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster AA155578 JPEA_l_node_21 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1_T10, AA155578_PEA_1 JT12 and AA155578_PEA_1_T8. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_23 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1_T10, AA155578JPEA_1JT12 and AA155578 JPEA_1_T8. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster AA 155578 JPEA_l_node_24 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578 J?EA_1_T10, AA155578J?EA_1_T12 and AA155578 _PEA_1_T8. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster AA 155578 J?EA_l_node_25 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA15557SJPEA_1 JT10, AA155578JPEA_1_T12 and AA155578_PEA_1 _TS. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster AA155578 JPEA_l_node_4 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578 JPEA_1 JT10, AA155578_PEA_1 JT12 and AA155578 JPEA_1 JT13. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster AA155578JPEA_l_nodeJ7 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA15557SJPEA_1 _T8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster AA155578JPEA_l_node_15 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA__1_T8. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_18 according to the present invention can be found in the following transcript(s): AA155578_PEA_1_T12 and AA155578JPEA_1_T8. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_22 according to the present invention can be found in the following transcript(s): AA155578JPEA_1 JT10, AA155578JPEA_1JT12 and AA155578JPEA_1 JT8. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster AA155578JPEA_l_node_6 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1 JTS. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster AA155578 J?EA__l_node_8 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA155578JPEA_1_T10, AA155578JPEA_1 JT12, AA155578JPEA_1_T13 and AA155578JPEA_1_T8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: /tmp/4gXdRV0Clz/cQ4LqHmh5A:KLKA_HϋMAN
Sequence documentation:
Alignment of: AA155578_PEA_1_P4 x KLKA_HUMAN
Alignment segment 1/1:
Quality: 2283.00 Escore: 0 Matching length: 239 Total length: 276 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 86.59 Total Percent Identity: 86.59 Gaps : 1
Alignment :
1 MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGA 50 1 1 | 1 1 1 1 I I 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGA 50 51 PCARGSQPWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNKPLWARVGDDH 100 51 PCARGSQP QVSLFNGLSFHCAGVLVDQS VLTAAHCGNKPLWARVGDDH 100 101 LLLLQGEQLRRTTRSWHPKYHQGSGPILPRRTDEHDLMLLKLARP.... 146
101 LLLLQGEQLRRTTRSWHPKYHQGSGPILPRRTDEHDLMLLKLARPWPG 150 147 YNKGLTCSSITILSPKE 163 MMIMMIIIMIII 151 PRVRALQLPYRCAQPGDQCQVAGWGTTAARRVKYNKGLTCSSITILSPKE 200 164 CEVFYPGVVTNNMICAGLDRGQDPCQSDSGGPLVCDETLQGILS GVYPC 213 11 11 r 111111 J r r i 1111 J ( 11 r i ! 1111111 j j 1111111111111 201 CEVFYPGWT NMICAGLDRGQDPCQSDSGGPLVCDETLQGILS GVYPC 250
214 GSAQHPAVYTQICKYMSWINKVIRSN 239 IIIIMMMIIMMIIIMMMI 251 GSAQHPAVYTQICKYMSWINKVIRSN 276
Sequence name: /tmp/3VxcRS97HN/X9ncdxjYQx: KLKA_HUMAN
Sequence documentation:
Alignment of: AA155578_PEA_1_P6 x KLKAJHUMAN
Alignment segment l/l: Quality: 1140.00
Escore : 0 Matching length: 124 Total length: 276 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 44.93 Total Percent Identity: 44.93 Gaps : 1
Alignment : . . . . . 1 MRAPHLHLSAASGARALAKLLPLLMAQLW 29
1 MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGA 50 29 29
51 PCARGSQPWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNKPLWARVGDDH 100
29 29
101 LLLLQGEQLRRTTRSVVHPKYHQGSGPILPRRTDEHDLMLLKLARPVVPG 150
30 VKYNKGLTCSSITILSPKE 48 151 PRVRALQLPYRCAQPGDQCQVAGWGTTAARRVKYNKGLTCSSITILSPKE 200 49 CEVFYPGWTNNMICAGLDRGQDPCQSDSGGPLVCDETLQGILSWGVYPC 98
201 CEVFYPGWTNNMICAGLDRGQDPCQSDSGGPLVCDETLQGILSWGVYPC 250
99 GSAQHPAVYTQICKYMSWINKVIRSN 124 251 GSAQHPAVYTQICKYMSWINKVIRSN 276
Sequence name: /tmp/LsSdTeu0qX/6luiCMKTi9 : KLKA_HUMAN
Sequence documentation:
Alignment of: AA155578_PEA_1_P8 x KLKA_HUMAN
Alignment segment l/l:
Quality: 279.00 Escore : 0 Matching length: 29 Total length: 29 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRAPHLHLSAASGARALAKLLPLLMAQLW 29 I I I I I 1 1 I I I I I I I I I I I I I I I I I I I I I I 1 MRAPHLHLSAASGARALAKLLPLLMAQLW 29
Sequence name: /tmp/kcfKGMcF7s/YnKnMy8Dlq:KLKA_HUMAN
Sequence documentation:
Alignment of: AA155578_PEA_1_P9 x KLKA_HUMAN
Alignment segment 1/1: Quality: 887.00
Escore : 0 Matching length: 90 Total length: 90 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGA 50
1 MRAPHLHLSAASGARALAKLLPLLMAQLWAAEAALLPQNDTRLDPEAYGA 50 . . . . 51 PCARGSQPWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNK 90 51 PCARGSQPWQVSLFNGLSFHCAGVLVDQSWVLTAAHCGNK 90
DESCRIPTION FOR CLUSTER HSENA78 Cluster HSENA7S features 1 franscript(s) and 7 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Small inducible cytokine B5 precursor (SwissProt accession identifier SZ05 JHU AN; known also according to the synonyms CXCL5; Epithelial-derived neutrophil activating protein 78; Neutrophil- activating peptide EN A- 78), SEQ ID NO: 190, refeπed to herein as the previously known protein. Protein Small inducible cytokine B5 precursor is known or believed to have the following function(s): Involved in neutrophil activation. The sequence for protein Small inducible cytokine B5 precursor is given at the end of the application, as "Small inducible cytokine B5 precursor amino acid sequence". Protein Small inducible cytokine B5 precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: chemotaxis; signal transduction; cell-cell signaling; positive control of cell proliferation, which are annotation(s) related to Biological Process; and chemokine, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSENA78 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 22 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 22 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and lung malignant tumors. Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster HSENA78 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Small inducible cytokine B5 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSENA78JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSENA78JT5. An alignment is given to the known protein (Small inducible cytokine B5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSENA78JP2 and SZ05JHUMAN: l.An isolated chimeric polypeptide encoding for HSENA7SJP2, comprising a firsi amino acid sequence being at least 90 % homologous to SLLSSRAARWGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCVCLQTTQOVHP KMISNLQVFAIGPQCSKVEW coπesponding to amino acids 1 - 81 of SZ05 JHUMAN, which also coπesponds to amino acids 1 - 81 of HSENA78JP2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSENA78JP2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSENA78 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSENA78JP2 is encoded by the following franscript(s): HSENA78JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSENA78JT5 is shown in bold; this coding portion starts at position 149 and ends at position 391. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSENA7SJP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster HSENA78 features 7 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSENA78_node_0 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Segment cluster HSENA78_node_2 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster HSENA78_node_6 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster HSENA78_node_9 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSENA78_node_3 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 12 below describes the starting and ending position of this segment on each transcript.
Segment cluster HSENA78_node_4 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HSENA78_node_8 according to the present invention can be found in the following transcript(s): HSENA7SJT5. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment location on transcripts
Microaπay (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 15. Table 15 - Oligonucleotides related to this gene
Variant protein alignment to the previously known protein: Sequence name: /tmp/5kiQY6MxWx/pLnTrxsCqk: SZ05_HUMAN
Sequence documentation: Alignment of: HSENA78_P2 x SZ05_HUMAN Alignment segment l/l:
Quality: 767.00 Escore: 0 Matching length: 81 Total length: 81 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 43Ξ Gaps ;
Alignment
1 MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCV 50
1 MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCV 50
51 CLQTTQGVHPKMISNLQVFAIGPQCSKVEW
51 CLQTTQGVHPKMISNLQVFAIGPQCSKVEW
DESCRIPTION FOR CLUSTER T94936 Cluster T94936 features 2 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
As noted above, cluster T94936 features 2 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein T94936JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T94936JPEA_1_T1. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T94936_PEA_1 JP2 and Q8TD06 (SEQ ED NO: 858): l.An isolated chimeric polypeptide encoding for T94936JPEA_1 JP2, comprising a first amino acid sequence being at least 90 % homologous to MMLHSALGLCLLLVTVSSNLAΪAIKKEKKPPQTLSRGWGDDITWVQTYEEGLFYAQKS KKPLMNIHHLEDCQYSQ ALK VFAQΝEEIQEMAQΝKFEvELΝLMHETTDKΝLSPDGQ Y WRIMFVDPSLTVRADIAGRYSΝRLYTYEPRDLPL coπesponding to amino acids 1 - 150 of Q8TD06, which also coπesponds to amino acids 1 - 150 of T94936JPEA_1 JP2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T94936_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T94936JPEA_1JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
Variant protein T94936JPEA_1 JP2 is encoded by the following transcript(s): T94936JPEA_1_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T94936JPEA_1_T1 is shown in bold; this coding portion starts at position 76 and ends at position 525. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T94936JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein T94936JPEA_1 JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T94936JPEA_1 JT2. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T94936J?EA_1 JP3 and Q8TD06: l.An isolated chimeric polypeptide encoding for T94936_PEA_1 JP3, comprising a first amino acid sequence being at least 90 % homologous to MMLHSALGLCLLLVTVSSNL ADO JEKPJ>PQTLSRGWGDDITWVQTYEEGLFYAQKS KKPLMVIFFHLEDCQYSQ.ALL<XVFAQNEEIQEMAQNI ?EVILNLMHETTDKM.SPDGQY VPRIMFV coπesponding to amino acids 1 - 122 of Q8TD06, which also coπesponds to amino acids 1 - 122 of T94936JPEA_1 JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GMYVISFHQIYKISRNQHSCFYF coπesponding to amino acids 123 - 145 of T94936JPEA_1 JP3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T94936JPEA_1 JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GMYVISFHQIYKISRNQHSCFYF in T94936JPEA_1_P3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T94936JPEA_1 JP3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T94936JPEA_1 JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein T94936JPEA_1 JP3 is encoded by the following transcript(s): T94936JPEA_1_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T94936JPEA_1_T2 is shown in bold; this coding portion starts at position 76 and ends at position 510. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T94936JPEA_1 JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster T94936 features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T94936JPEA_l_node_14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1 JT2. Table 8 below describes the starting and ending position of this segment on each franscript. Table 8 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_16 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1 JT2. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_2 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1 _T1 and T94936JPEA_1_T2. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_20 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1 JT2. Table 1 1 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_23 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1_T1. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster T94936JPEA_l_node_0 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1 _T1 and T94936JPEA_1_T2. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_l 1 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T94936_PEA_1_T1 and T94936_PEA_1_T2. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_13 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1_T1 and T94936JPEA_1 _T2. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster T94936JPEA_l_node__17 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1_T1 and T94936JPEA_1JT2. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_6 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following rranscript(s): T94936JPEA_1_T1 and T94936JPEA_1 JT2. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T94936JPEA_l_node_8 according to the present invention can be found in the following transcript(s): T94936JPEA_1_T1 and T94936JPEA_1_T2. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T94936_PEA_l_node_9 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T94936JPEA_1_T1 and T94936_PEA_1_T2. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/lR8BEXWut:_/cdFRKHIcZR: Q8TD06
Sequence documentation:
Alignment of: T94936_PEA_1_P2 x Q8TD06
Alignment segment l/l: Quality: 1486.00 Escore: 0 Matching length: 150 Total length: 150 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWVQTYEE 50
1 MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWVQTYEE 50 51 GLFYAQKSKKPLMVIHHLEDCQYSQALKKVFAQNEEIQEMAQNKFIMLNL 100 11 II 11111111111111111111111 II 1111111111 II 111111111 51 GLFYAQKSKKPLMVIHHLEDCQYSQALKKVFAQNEEIQEMAQNKFIMLNL 100 101 MHETTDKNLSPDGQYVPRIMFVDPSLTVRADIAGRYSNRLYTYEPRDLPL 150 101 MHETTDKNLSPDGQYVPRIMFVDPSLTVRADIAGRYSNRLYTYEPRDLPL 150
Sequence name : /tmp/AG3unO0N3y/kjgGehygST : Q8TD06
Sequence documentation:
Alignment of: T94936_PEA_1_P3 x Q8TD06
Alignment segment l/l: Quality: 1214.00
Escore: 0 Matching length: 122 Total length: 122 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWVQTYEE 50 I I M I M M I I I 1 MMLHSALGLCLLLVTVSSNLAIAIKKEKRPPQTLSRGWGDDITWVQTYEE 50
51 GLFYAQKSKKPLMVIHHLEDCQYSQALKKVFAQNEEIQEMAQNKFIMLNL 100
51 GLFYAQKSKKPLMVIHHLEDCQYSQALKKVFAQNEEIQEMAQNKFIMLNL 100
101 MHETTDK LSPDGQYVPRIMFV 122
101 MHETTDKNLSPDGQYVPRIMFV 122
Expression of Homo sapiens breast cancer membrane protein 11 (BCMP11) T94936 transcripts which are detectable by amplicon as depicted in sequence name T94936 segl4 in normal and cancerous Breast tissues Expression of Homo sapiens breast cancer membrane protein 11 (BCMP11) franscripts detectable by or according to segl4, T94936 segl4 amplicon(s) and T94936 segl4F and T94936 segl4R primers was measured by real time PCR. In this specific example, the real-time PCR reaction efficiency was assumed to be 2 and was not calculated by a standard curve reaction (as detailed above in the section of "Real-Time RT-PCR analysis"). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 23 is a histogram showing over expression of the above -indicated Homo sapiens breast cancer membrane protein 11 (BCMP11) transcripts in cancerous breast samples relative to the normal samples. As is evident from Figure 23, the expression of Homo sapiens breast cancer membrane protein 11 (BCMP11) transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 17 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Homo sapiens breast cancer membrane protein 11 (BCMP11) transcripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 7.94E-02. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 6.74E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and prefembly encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: T94936 segl4F forward primer; and T94936 segl4R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the fo llowing amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: T94936 segl4. T94936 segl4 Forward primer (SEQ ED NO:859): TACAAAATTAGTAGAAATCAGCATTCTTGC T94936 segl4 Reverse primer (SEQ ID NO:860): TGTAGAACTAACAAGAGCTGATATTATTGGAT T94936 segl4 Amplicon (SEQ ID NO:861): TACAAAATTAGTAGAAATCAGCATTCTTGCTTTTATTTTTAAATGCTAGTTCAAGTA CTATTCTTTTTAAAGAGAAGTCATTTCTAATCCAATAATATCAGCTCTTGTTAGTTCT ACA
DESCRIPTION FOR CLUSTER Z41644 Cluster Z41644 features 1 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Small inducible cytokine B14 precursor (SwissProt accession identifier SZ 14 JHUMAN; known also according to the synonyms CXCL14; Chemokine BRAK), SEQ ED NO: 230, refeπed to herein as the previously known protein. Protein Small inducible cytokine B14 precursor is known or believed to have the following function(s): Not chemotactive for T-cells, B-cells, monocytes, natural killer cells or ghranulocytes. Does not inhibit proliferation of myeloid progenitors in colony formation assays. The sequence for protein Small inducible cytokine B14 precursor is given at the end of the application, as "Small inducible cytokine B14 precursor amino acid sequence". Protein Small inducible cytokine B14 precursor localization is believed to be Secreted. The following GO Annotations) apply to the previously known protein. The following annotation(s) were found: chemotaxis; signal transduction; cell-cell signaling, which are annotation(s) related to Biological Process; and chemokine, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink >.
Cluster Z41644 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 24 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 24 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: lung malignant tumors, breast malignant tumors and pancreas carcinoma. Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster Z41644 features 1 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Small inducible cytokine B14 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z41644JPEA_1 JP10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z41644JPEA_1_T5. An alignment is given to the known protein (Small inducible cytokine B14 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z41644J?EA_1 JP10 and SZ14JHUMAN: l.An isolated chimeric polypeptide encoding for Z41644JPEA_1JP10, comprising a first amino acid sequence being at least 90 % homologous to MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVI KLEMKPKYPHCEEKMVII TTXSVSRYRGQEHCLHPKLQSTKΛFIKWYNAWNEKRR coπesponding to amino acids 1 - 95 of SZ14JHUMAN, which also corresponds to amino acids 1 - 95 of Z41644JPEA_1JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI coπesponding to amino acids 96 - 123 of Z41644JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z41644_PEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644_PEA_1 JP10. Comparison report between Z41644JPEA_1 JP10 and Q9NS21 (SEQ JD NO:862): l.An isolated chimeric polypeptide encoding for Z41644JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MRLLAAALLLLLLALYTAJlVDGSKCKCSRKGPK YSDVKXLEMKPKYPHCEEKMVπ TT SVSRYRGQEHCLFiPKLQSTKl^I WYNAWNEKRR corresponding to amino acids 13 - 107 of Q9NS21, which also corresponds to amino acids 1 - 95 of Z41644JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI corresponding to amino acids 96 - 123 of Z41644JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z41644JPEA_1JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644JPEAJUP10. Comparison report beiween Z41644_PEA_1 JP10 and AAQ89265 (SEQ ID NO:863): l.An isolated chimeric polypeptide encoding for Z41644JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVII TTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 13 - 107 of AAQ89265, which also corresponds to amino acids 1 - 95 of Z41644_PEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI corresponding to amino acids 96 - 123 of Z41644JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z41644JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI in Z41644_PEA_1_P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein Z41644JPEA_1 JP10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z41644JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein Z41644JPEA_1 JP10 is encoded by the following transcript(s): Z41644J?EA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z41644 JPE A_1_T5 is shown in bold; this coding portion starts at position 744 and ends at position 1 112. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 45. known SNPs in variant protein Z41644JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster Z41644 features 21 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z41644 JPE A_l_node_0 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1 JT5. Table 8 below describes the starting and ending position of this segment on each transcript.
Segment cluster Z41644 JPE A_l_node_l 1 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z41644JPEA_1_T5. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_12 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1 JT5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_15 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z41644JPEA_1 JT5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_20 according to the present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1 JT5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_24 according to the present invention is supported by 185 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z41644JPEA_l_node_l according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1JT5. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment loca tion on transcripts Transcript name Segment starting position Segment ending position Z41644 JPE A_1_T5 617 697
Segment cluster Z41644JPEA_l_node_10 according to the present invention is supported by 138 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_13 according to the present invention can be found in the following transcript(s): Z41644JPEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_16 according to the present invention is supported by 152 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_17 according to the present invention can be found in the following transcript(s): Z41644JPEA_1_T5. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster Z41644JPEA_l_node_19 according to the present invention is supported by 1 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 19 below describes the starting and ending position of this segment on each franscript.
Segment cluster Z41644JPEA_l_node_2 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_21 according to the present invention can be found in the following transcript(s): Z41644JPEA_1JT5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_22 according to the present invention is supported by 164 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_23 according to the present invention is supported by 169 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_25 according to the present invention is supported by 138 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1_T5. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_3 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644_PEA_1_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_4 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644 JPE A_1_T5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_6 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1 _T5. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster Z41644 JPE A_l_node_9 according to the present invention is supported by 134 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z41644JPEA_1 JT5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/p5SSvhT9Xp/HQeIMsUrfm: SZ14_HUMAN
Sequence documentation: Alignment of: Z41644_PEA_1_P10 x SZ14_HUMAN
Alignment segment 1/1: Quality: 953.00
Escore: 0 Matching length: 95 Total length: 95 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 50
1 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 50 51 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR 95
51 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR 95
Sequence name: /tmp/p5SSvhT9Xp/HQeIMsUrfm:Q9NS21 Sequence documentation:
Alignment of: Z41644JPEA_1_P10 x Q9NS21
Alignment segment l/l:
Quality: 957.00 Escore: 0 Matching length: 96 Total length: 96 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.96 Total Percent Similarity: 100.00 Total Percent Identity: 98.96 Gaps : 0
Alignment :
1 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 50 || I I I 11 I || I I I I I I I I I I I I I II I I I I I I I 111 II I I I I I I 111 I I II 13 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 62 51 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRRY 96 I i f 1 1 ! 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 i 1 1 i M E 1 1 1 1 ) 1 E E 1 1 1 1 = 63 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRRF 10£ Sequence name: /tmp/p5SSvhT9Xp/HQeIMsTJrfm:AAQ89265
Sequence documentation:
Alignment of: Z41644 PEA 1_P10 x AAQ89265
Alignment segment 1/1:
Quality: 953.00 Escore: 0 Matching length: 95 Total length: 95 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 50
13 MRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPH 62
51 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR 95
63 CEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR 107
DESCRIPTION FOR CLUSTER M85491 Cluster M85491 features 2 transcript(s) and 11 segment(s) of interest, the names for which aie given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
These sequences are variants of the known protein Ephrin type-B receptor 2 [precursor] (SwissProt accession identifier EPB2 JHUMAN; known also according to the synonyms EC 2.7.1.112; Tyrosine-protein kinase receptor EPH-3; DRT; Receptor protein-tyrosine kinase HEK5; ERK), SEQ ED NO: 245, refeπed to herein as the previously known protein. Protein Ephrin type-B receptor 2 [precursor] is known to have the following function(s): Receptor for members of the ephrin- B family. The sequence for protein Ephrin type-B receptor 2 [precursor] is given at the end of the application, as "Ephrin type-B receptor 2 [precursor] amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Ephrin type-B receptor 2 [precursor] localization is believed to be Type I membrane protein.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: protein amino acid phosphorylation; transmembrane receptor protein tyrosine kinase signaling pathway; neurogenesis, which are annotation(s) related to Biological Process; protein tyrosine kinase; receptor; transmembrane -ephrin receptor; ATP binding; transferase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://wvvrw.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster M85491 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 25 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 25 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster M85491 features 2 transcript(s), which were listed in Table 1 above. Tliese transcript(s) encode for protein(s) which are variant(s) of protein Ephrin type-B receptor 2 [precursor]. A description of each variant protein according to the present invention is now provided. Variant protein M85491 JPEA_1 JP13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M85491 JPEA_1 JT16. An alignment is given to the known protein (Ephrin type-B receptor 2 [precursor]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between M85491 JPEA_1_P13 and EPB2 JHUMAN: LAn isolated chimeric polypeptide encoding for M85491 JPEA_1 JP13, comprising a first amino acid sequence being at least 90 % homologous to
MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIR TΥQVCNNFESSQNNWLRTΕTIRRRGAITRIHNEMKFSNRDCSSIPSNPGSCKETF .^ EADFDSATKTFPΝ MEΝPWVKNDTIAADESFSQVDLGGRVMKTΝTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQΝGAIFQETLSGAESTSLVAARGSCIAΝAEEVD VPIKLYCΝGDGEWLVPIGRCMCKAGFEANEΝGTVCRGCPSGTFKAΝQGDEACTHCPIΝ SRTTSEGATΝCVCRΝGYYRADLDPLDMPCTTIPSAPQAVISSVΝETSLMLEWTPPRDSG GREDLVYΝIICKSCGSGRGACTRCGDΝVQYAPRQLGLTEPRIYISDLLAHTQYTFEIQAV ΝGVTDQSPFSPQFASVΝITTΝQAAPSAVSIMHQVSRTVDSITLSWSQPDQPΝGVILDYEL QYYEK corresponding to amino acids 1 - 476 of EPB2 JHUMAN, which also coπesponds to amino acids 1 - 476 ofM85491JPEA_l JP13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPIGWVLSPSPTSLRAPLPG conesponding to amino acids 477 - 496 of M85491 JPEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M85491 JPEA_1JP13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPIGWVLSPSPTSLRAPLPG in M85491 _PEA_1_P13. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein MS5491JPEA_1_P13 is encoded by the following transcript(s): M85491 _PEA_1_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript MS5491 JPEA_1_T16 is shown in bold; this coding portion starts at position 143 and ends at position 1630. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M85491JPEA_1 JP13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein M85491 JPEA_1 JP14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M85491 JPEA_1_T20. An alignment is given to the known protein (Ephrin type-B receptor 2 [precursor]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between M85491JPEA_1JP14 and EPB2JHUMAN: l.An isolated chimeric polypeptide encoding for M85491 JPEA_1JP14, comprising a first amino acid sequence being at least 90 % homologous to MALPJvLGAALLLLPLLAAVEETTMDSTTATAELGWMNHPPSGWEEVSGYDENMNTIR TYQVCNVFESSQNNWLRTKFIRRRGAITPJHVEMKFSVRDCSSIPSVPGSCKETI MLYYY EADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKTNTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCLANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCR corresponding to amino acids 1 - 270 of EPB2 JHUMAN, which also coπesponds to amino acids 1 - 270 of M85491 JPEA_1 JP14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL corresponding to amino acids 271 - 301 of M85491 JPEA_1 JP14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M85491JPEA_1 JP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ERQDLTMLSRLVLNSWPQMTLPPQPPKVLEL in M85491JPEA_1 JP14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows withregard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein M85491 JPEA_1 JP14 is encoded by the following transcript(s): M85491JPEA_1JT20, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript M85491 JPEA_1_T20 is shown in bold; this coding portion starts at position 143 and ends at position 1045. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; die last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M85491 JPEA_1 JP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster M85491 features 1 1 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster M85491 JPEA_l_node_0 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): M85491 JPEA_1 JT16 and M85491 _PEA_1_T20. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_13 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T20. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_21 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M85491 JPEA_1_T16. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_23 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T16. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
47: Segment cluster M85491 JPEA_l_node_24 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEA_1_T16. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_8 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T16 and M85491 JPEA_1_T20. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_9 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M85491JPEA_1_T16 and M85491J?EA_1JT20. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster M85491 JPEA_l_node_10 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T16 and M85491JPEA_1_T20. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_18 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T16. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster M85491 JPEA_l_node_19 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 JPEA_1_T16. Table 20 below describes the starting and ending position of this segment on each franscript. Table 20 - Segment location on transcripts
Segment cluster M85491JPEA_l_ncde_6 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts ): M85491 JPEA_1_T16 and M85491 JPEA_1_T20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/qfmsU9VtxS/DylcLC9j 8v: EPB2JHUMAN
Sequence documentation:
Alignment of: M85491_PEA_1_P13 x EPB2_HUMAN
Alignment segment l/l: Quality: 4726.00 Escore: 0 Matching length: 476 Total length: 476 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYD 50 E 1 1 f I E 1 1 1 1 1 1 1 1 E ϊ E 1 1 1 1 1 1 1 1 1 1 f f 1 E 1 ] f 1 1 1 1 1 1 1 f 1 1 1 1 E 1 1 1 1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYD 50 51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100 101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150 101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150 151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 1 1 E 1 1 1 I E E I E E 1 1 1 1 1 M E 1 ) E 1 1 1 E 1 1 1 1 1 E 1 1 E E 1 1 1 1 1 1 1 1 1 f 1 1 1 151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 . . . . . 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250 I I I M I M I M I 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250 251 IGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGA 300 M I I 251 IGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGA 300 301 TNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDS 350 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 301 TNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDS 350 351 GGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLA 400
351 GGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLA 400 . . . . . 401 HTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVD 450
401 HTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVD 450 451 SITLSWSQPDQPNGVILDYELQYYEK 476
451 SITLSWSQPDQPNGVILDYELQYYEK 476
Sequence name : /tmp/rmnzuDbot6 /GiHbj elI8 iR : EPB2_HUMAN
Sequence documentation:
Alignment of: M85491_PEA_1_P14 x EPB2_HUMAN
Alignment segment l/l:
Quality: 2673.00 Escore: 0 Matching length: 270 Total length: 270 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYD 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYD 50 51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100 51 ENMNTIRTYQVCNVFESSQNN LRTKFIRRRGAHRIHVEMKFSVRDCSSI 100
101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150
101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150 . . . . . 151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200
151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250
201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250
251 IGRCMCKAGFEAVENGTVCR 270
251 IGRCMCKAGFEAVENGTVCR 270 Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in normal and cancerous breast tissues Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) transcripts detectable by or according to seg24, M85491seg24 amplicon and M85491seg24F M85491seg24R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl- amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 26 is a histogram showing over expression of the above- indicated Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) transcripts in cancerous breast samples relative to the normal samples. As is evident from Figure 26, the expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) franscripts detectable by the above amplicon in a few cancer samples was higher than in the non-cancerous samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"). Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: M85491seg24F forward primer; and M85491seg24R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: M85491seg24. M85491seg24 Forward primer (SEQ ID NO:864): GGCGTCTTTCTCCCTCTGAAC M85491seg24 Reverse primer (SEQ ID NO:865): GTCCCATTCTGGGTGCTGTG M85491seg24 Amplicon (SEQ ID NO:866):
GGCGTCTTTCTCCCTCTGAACCTCAGTTTCCACCTGTGTCGAGTGTGGGTGAGACCC CTCGCGGGGAGCTATGCAGGTTACGGAGAAAAGGCAGCACAGCACCCAGAATGGG AC
Expression of Ephrin type-B receptor 2 precursorM85491 transcripts, which are detectable by amplicon as depicted in sequence name M85491 seg24 in different normal tissues Expression of Ephrin type-B receptor 2 precursor transcripts detectable by or according to M85491 seg24 amplicon(s) and M85491 seg24F and M85491 seg24R priemrs was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the colon samples (Sample Nos. 1-3 Table 2,"Tissue samples on normal panel", above), to obtain a value of relative expression of each sample relative to median of the colon samples. Primers and amplicon are as above. The results are presented in Figure 27, demonstrating the expression of Ephrin type-B receptor 2 precursor M85491 transcripts, which are detectable by amplicon as depicted in sequence name M85491 seg24, in different normal tissues.
DESCRIPTION FOR CLUSTER HSSTROL3 Cluster HSSTROL3 features 6 transcript(s) and 16 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Stromelysin-3 precursor (SwissProt accession identifier MMl 1 JHUMAN; known also according to the synonyms EC 3.4.24.-; Matrix metalloproteinase- 11; MMP- 1 1 ; ST3; SL-3), SEQ ED NO: 270, refeπed to herein as the previously known protein. Protein Stromelysin-3 precursor is known or believed to have the following function(s): May play an important role in the progression of epithelial malignancies. The sequence for protein Stromelysin-3 precursor is given at the end of the application, as "Sfromelysin-3 precursor amino acid sequence". The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis; developmental processes; morphogenesis, which are annotation(s) related to Biological Process; stromelysin 3; calcium binding; zinc binding; hydrolase, which are aιmotation(s) related to Molecular Function; and extracellular matrix, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/spiOf/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSSTROL3 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 28 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 28 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: transitional cell carcinoma, epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
above. These transcript(s) encode for protein(s) which are variant(s) of protein Stromelysin-3 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSSTROL3 JP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3JT5. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 JP4 and MMl 1 JHUMAN: 1.An isolated chimeric polypeptide encoding for HSSTROL3 JP4, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRELRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVΗEGRADIMEDFARYW coπesponding to amino acids 1 - 163 of MM 11 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP4, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3JP4, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGG AHAFFPKTHREGDλ^HFDYDETV^TIGDDQGTDLLQVAAFiEFGHVLG LQHTTAAKALMSAFYTFR ΦLSLSPDDCRGVQHLYGQPWPTNTSRTPALGPQAGEDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPE KNKTYFFRGRDYWRFHPSTRRVDSPVPRRATDWRGλ SEIDAAFQDADG coπesponding to amino acids 165 - 445 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 445 of HSSTROL3JP4, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ALGVRQLVGGGHSSRFSHLVNAGLPHACHRKSGSSSQVLCPEPSALLSVAG coπesponding to amino acids 446 - 496 of HSSTROL3JP4, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSSTROL3 JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ALGVRQLVGGGHSSRFSHLWAGLPHACHRKSGSSSQVLCPEPSALLSVAG in HSSTROL3JP4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaLpeptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a frans- membrane region. Variant protein HSSTROL3JP4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSSTROL3 JP4 is encoded by the following transcript(s): HSSTROL3JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSSTROL3JT5 is shown in bold; this coding portion starts at position 24 and ends at position 1511. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSSTROL3 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSSTROL3JT8 and HSSTROL3JT9. An alignment is given to the known protein (Sfromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 JP5 and MMl 1 JHUMAN: 1.An isolated chimeric polypeptide encoding for HSSTROL3 JP5, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAAJIALLPPMLLLLLQPPPLLAI^LPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLP PPRCGVPDPSDGLSAR^^QKRFVLSGGRWEKTDLT^RILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADEVIIDFARYW coπesponding to amino acids 1 - 163 of MM 11 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP5, a bridging amino acid H coπesponding to amino acid 164 of HSSTROL3JP5, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGILAFiAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQ coπesponding to amino acids 165 - 358 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 358 of HSSTROL3JP5, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELGFPSSTGRDESLEHCRCQGLHK corresponding to amino acids 359 - 382 of HSSTROL3JP5, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2An isolated polypeptide encoding for a tail of HSSTROL3JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ELGFPSSTGRDESLEHCRCQGLHK in HSSTROL3JP5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans- membrane region. Variant protein HSSTROL3 JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSSTROL3JP5 is encoded by the following transcript(s): HSSTROL3JT8 and HSSTROL3JT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT8 is shown in bold; this coding portion starts at position 24 and ends at position 1 169. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
The coding portion of franscript HSSTROL3 _T9 is shown in bold; this coding portion starts at position 24 and ends at position 1 169. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
sequence as given at the end of the application; it is encoded by transcript(s) HSSTROL3JT10. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 JP7 and MMl 1 JHUMAN: l.An isolated chimeric polypeptide encoding for HSSTROL3JP7, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAWLRSAAARAJLLPPMLLLLLQPPPLLAPvALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPPvPASSLPPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW coπesponding to amino acids 1 - 163 of MMl 1 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP7, a bridging amino acid H corresponding to amino acid 164 of HSSTROL3JP7, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTFfREGDVHFDYDETWTIGDDQGTDLLQVAAFfEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGEDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQG corresponding to amino acids 165 - 359 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 359 of HSSTROL3JP7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV coπesponding to amino acids 360 - 370 of HSSTROL3JP7, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSSTROL3 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3JP7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans- membrane region. Variant protein HSSTROL3JP7 also has the following non- silent SNPs (Single i'Jucleotide Polymorphisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSSTROL3JP7 is encoded by the following transcript(s): HSSTROL3JT10, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSSTROL3JT10 is shown in bold; this coding portion starts at position 24 and ends at position 1 133. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3JTH. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3JP8 and MMl 1 JHUMAN: l.An isolated chimeric polypeptide encoding for HSSTROL3JP8, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPA ATQEAPRPASSLRPPRCGVPDPSDGLSA IQKPJ ^LSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVITEGPvADIMEDFARYW corresponding to amino acids 1 - 163 of MM 11 JHUMAN, which also coπesponds to amino acids 1 - 163 of HSSTROL3JP8, a bridging amino acid H corresponding to amino acid 164 of HSSTROL3JP8, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGΠ.AFIAFFPKTFIREGDVΉFDYDETWTIGDDQGTDLLQVAAHEFGFIVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGΓDTN EIAPLE corresponding to amino acids 165 - 286 of MMl 1 JHUMAN, which also coπesponds to amino acids 165 - 286 of HSSTROL3JP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRPCLPVPLLLCWPL coπesponding to amino acids 287 - 301 of HSSTROL3JP8, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSSTROL3JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRPCLPVPLLLCWPL in HSSTROL3JP8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSSTROL3 JP8 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 13 -Amino acid mutations
Variant protein HSSTROL3JP8 is encoded by the following franscript(s): HSSTROL3JT1 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT11 is shown in bold; this coding portion starts at position 24 and ends at position 926. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSSTROL3 JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSSTROL3 JT 12. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 JP9 and MMl 1 JHUMAN: l.An isolated chimeric polypeptide encoding for HSSTROL3JP9, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK coπesponding to amino acids 1 - 96 of MMllJHUMAN, which also coπesponds to amino acids 1 - 96 of HSSTROL3JP9, a second amino acid sequence being at least 90 % homologous to RILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW coπesponding to amino acids 113 - 163 of MMl 1 JHUMAN, which also coπesponds to amino acids 97 - 147 of HSSTROL3 JP9, a bridging amino acid H corresponding to amino acid 148 of HSSTROL3 JP9, a third amino acid sequence being at least 90 % homologous to GDDLPFDGPGGlLAHAFFPKTFiREGDVHFDYDETWTIGDDQGTDLLQVAALTEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVOAAFEDAQGFirWFFQG corresponding to amino acids 165 - 359 of MMl 1 JHUMAN, which also corresponds to amino acids 149 - 343 of HSSTROL3JP9, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV corresponding to amino acids 344 - 354 of HSSTROL3JP9, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSSTROL3JP9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KR, having a structure as follows: a sequence starting from any of amino acid numbers 96-x to 96; and ending at any of amino acid numbers 97+ ((n-2) - x), in which x varies from 0 to n-2-. 3. An isolated polypeptide encoding for a tail of HSSTROL3JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3JP9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signaL-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSSTROL3 JP9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 -Amino acid mutations
Variant protein HSSTROL3JP9 is encoded by the following transcript(s): HSSTROL3JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of franscπpt HSSTROL3JT12 is shown in bold; this coding portion starts at position 24 and ends at position 1085. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
As noted above, cluster HSSTROL3 features 16 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSSTROL3_node_6 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster KSSTROL3_node_10 according to the present invention is supported by 21 libranes. The number of libraries was determined as previously described. This segment can be found in the following κanscript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on franscripts
Segment cluster HSSTROL3_node_13 according to the present invention is supported by 36 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster HSSTROL3_node_15 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in die following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3 JT12. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSSTROL3_node_19 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3 N 1 and HSSTROL3JT12. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Segment cluster HSSTROL3_node_21 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT1 1 and HSSTROL3JT12. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster HSSTROL3_node_24 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT8 and HSSTROL3JT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSSTROL3_node_25 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSSTROL3_node_26 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9 and HSSTROL3JT11. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSSTROL3_node_28 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT9 and HSSTROL3JT10. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Segment cluster HSSTROL3_node_29 according to the present invention is supported by 109 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT1 1 and HSSTROL3JT12. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSSTROL3_node_l 1 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10 and HSSTROL3JT11. Table 28 below describes the starring and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSSTROL3_node_17 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT1 1 and HSSTROL3JT12. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSSTROL3_node_18 according to the present invention can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Segment cluster HSSTROL3_node_20 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT1 1. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSSTROL3_node_27 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: MMllJHUMAN
Sequence documentation: Alignment of: HSSTROL3_P4 x MM11_HUMA Alignment segment l/l:
Quality: 4444.00 Escore : Matching length: 445 Total length: 445 Matching Percent Similarity: 99.78 Matching Percent Identity: 99.78 Total Percent Similarity: 99.78 Total Percent Identity: 99.78 Gaps : 0
Alignment :
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 I I 1 1 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 . . . . . 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
151 GRADIMIDFARY DGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 11 I I I I I II I I I 11 I I 111 I I 11 I I I I I I I I I 11 I 11 I I I I I I 1111 I I I 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350
301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 . . . . . 351 QGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKI 400
351 QGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKI 400 401 YFFRGRDYWRFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG 445
401 YFFRGRDYWRFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG 445
Sequence name: MMl1JHUMAN
Sequence documentation:
Alignment of: HSSTR0L3_P5 x MM11_HUMAN
Alignment segment l/l:
Quality: 3566.00 Escore: 0 Matching length: 358 Total length: 358 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 99.72 Total Percent Identity: 99.72 Gaps : 0
Alignment :
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 1111 II 11111111111111111111111111111111111111111111 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHΞ 150
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 . . . . . 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALΔSRHWQGLPSPVDAAFEDA 350
301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 351 QGHIWFFQ 358 M I N I M 351 QGHIWFFQ 358
Sequence name: MM11_HUMAN
Sequence documentation:
Alignment of: HSSTROL3_P7 x MM11_HUMAN
Alignment segment 1/1:
Quality: 3575.00
Escore : 0 Matching length: 359 Total length: 359 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 99.72 Total Percent Identity: 99.72 Gaps : 0 Alignment :
1 MAPAA LRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP -50 I M M I E I M 111 E I M 11 M M 111 M II ! I M 11 i I i i M 11 M M 11 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 M 11 E M 11 It 111 it 111 it 11 M M M E 11 IE ! I E 11 M 1111111 M 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 . . . . . 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 IIIMMIIIIM MIIIIMIIIIMIIIIMMMMIMMMIII 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 11 II 1111111111111111 II 1111111111 II 11111111111 II 111 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300
301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 I E IE I ! 1 i 1 E 11 II 11 E t E 111111 IE E ! I i E i 1 E E II 11 E 11111 ! 11 E 301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 351 QGHIWFFQG 359
351 QGHIWFFQG 359
Sequence name: MM11_HUMAN
Sequence documentation:
Alignment of: HSSTROL3_P8 x MM11_HUMAN
Alignment segment l/l:
Quality: 2838.00
Escore : 0 Matching length: 286 Total length: 286 Matching Percent Similarity: 99.65 Matching Percent Identity: 99.65 Total Percent Similarity: 99.65 Total Percent Identity: 99.65 Gaps : 0
Alignment :
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 . . . . . 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE 286 1111111111 II 1111 II 11 II 1111 II 11111111 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE 286
Sequence name: MM11_HUMAN
Sequence documentation: Alignment of: HSSTROL3_P9 x MM11__HUMAN
Alignment segment l/l: Quality: 3316.00
Escore: 0 Matching length: 343 Total length: 359 Matching Percent Similarity: 99.71 Matching Percent Identity: 99.71 Total Percent Similarity: 95.26 Total Percent Identity: 95.26 Gaps : 1
Alignment:
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK.... 96
51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 97 RILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 134
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 135 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 184 1111111111 II 111111111111111111111111 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 185 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 234 IIIMIIIMMMMMIIIIMIIIIIMMIIMIMIIMMIIII 201 IGDDQGTDLLQVAAHEFGHNLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 . . . . . 235 RGVQHLYGQPWPTVTSRTPALGPQAGIDTΝEIAPLEPDAPPDACEASFDA 284
251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTΝΞIAPLEPDAPPDACEASFDA 300 285 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 334
301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350
335 QGHIWFFQG 343
351 QGHIWFFQG 359
Expression of Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) SL-3 HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in normal and cancerous breast tissues Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3 franscripts detectable by or according to seg24 HSSTROL3 seg24 amplicon(s) and HSSTROL3 seg24F and HSSTROL3 seg24R primers was measured by real time PCR. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon) SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) and G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative b median of the normal PM samples. Figure 29A is a histogram showing over expression of the above- indicated Stromelysin- 3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3) transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 29A, the expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3) transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos.56-60, 63-67 Table 1, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 20 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Stromelysin-3 precursor (EC
3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3) transcripts detectable by the above amplicon(s) in Breast cancer samples versus the normal tissue samples was determined by T test as 6.46E-03.
Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.12E-03 as checked by exact fisher test. The above values demonsfrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: HSSTROL3 seg24F forward primer; and HSSTROL3 seg24R reverse primer.The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: HSSTROL3 seg24. HSSTROL3 seg24 Forward Primer (SEQ ID NO:867):
ATTTCCATCCTCAACTGGCAGA HSSTROL3 seg24 Reverse Primer (SEQ ID NO:868): TGCCCTGGAACCCACG HSSTROL3 seg24 Amplicon (SEQ JD NO: 869): ATTTCCATCCTCAACTGGCAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGAC TTCACAAATGAAGGCACAGCATGGGAAACCTGCGTGGGTTCCAGGGCA
Expression of Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3)HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in different normal tissues
Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3) transcripts detectable by or according to HSSTROL3 seg24 amplicon(s) and HSSTROL3 seg24F and HSSTROL3 seg24R was measured by real time PCR. In parallel the expression of four housekeeping genes UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NMJ303194; TATA amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the lung samples (sample Nos. 15-17 Table 2,"Tissue samples on normal panel" above), to obtain a value of relative expression of each sample relative to median of the lung samples. Primers and amplicon are as above.
The results are presented in Figure 29B, demonstrating the expression of Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3) HSSTROL3 franscripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24, in different normal tissues.
Expression of Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 junc20-21 in normal and cancerous breast tissues Expression of Stromelysin-3 precursor franscripts detectable by or according to junc20- 21, HSSTROL3junc20-21 amplicon(s) and primers HSSTROL3junc20-21F and HSSTROL3junc20-21R was measured by real time PCR. It should be noted that for this experiment, RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS (Wilmington, DE 19801, USA, www.absbioreagents.com), GOG for ovary samples- Pediatic Cooperative Human Tissue Network, Gynecologic Oncology Group Tissue Bank, Children Hospital of Columbus (Columbus OH 43205 USA) or Ambion (Austin, TX 78744 USA, www.ambion.com)Alternatively, RNA was generated from tissue samples using TRI-Reagent (Molecular Research Center), according to Manufacturer's instructions. Tissue and RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 30A is a histogram showing over expression of the above- indicated Sfromelysin- 3 precursor franscripts in cancerous breast samples relative to the normal samples. As is evident from Figure 30A, the expression of Sfromelysin-3 precursor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above). Notably an over-expression of at least 5 fold was found in 13 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Stromelysin-3 precursor franscripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 1.28E-02. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples wth P value of 4.26E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSSTROL junc20-21F forward primer; and HSSTROL junc20-21R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: HSSTROL junc20-21. Forward primer HSSTROL junc20-21F (SEQ ID NO:870): TCTGCTGGCCACTGTGACTG Reverse primer HSSTROL junc20-21R (SEQ ID NO:871): GAAGAAAAAGAGCTCGCCTCG Amplicon HSSTROL junc20-21 (SEQ ID NO:872):
TCTGCTGGCCACTGTGACTGCAGCATATGCCCTCAGCATGTGTCCCTCTCTCCCACC CCAGCCAGACGCCCCGCCAGATGCCTGTGAGGCCTCCTTTGACGCGGTCTCCACCA TCCGAGGCGAGCTCTTTTTCTTC
Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 11) (ST3) (SL-3) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 junc21-27 in normal and cancerous breast tissues Expression of Stromelysin-3 precursor franscripts detectable by or according to junc21-27, HSSTROL3 junc21-27 amplicon(s) and primers HSSTROL3junc21-27F and HSSTROL3junc21-27R was measured by real time PCR (RNA was as for the experiment above). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon- HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 30B is a histogram showing over expression of the above- indicated Sfromelysin-3 precursor transcripts in cancerous breast samples relative to the normal samples . As is evident from Figure 30B, the expression of Stromelysin-3 precursor franscripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 56-60, 63-67 Table 1: Tissue samples in testing panel, above). Notably an over-expression of at least 20 fold was found in 20 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Stromelysin-3 precursor transcripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 5.98E-03. Threshold of 20 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.66E-03 as checked by exact fisher test. The above values demonsfrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: HSSTROL junc21-27F forward primer; and HSSTROL junc21-27R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: HSSTROL junc21-27. Forward primer HSSTROL junc21-27F (SEQ ID NO:873) : ACATTTGGTTCTTCCAAGGGACTAC Reverse primer HSSTROL junc21-27R (SEQ ID NO:874): TCGATCTCAGAGGGCACCC Amplicon HSSTROL junc21-27 (SEQ ID NO:875): ACATTTGGTTCTTCCAAGGGACTACTGGCGTTTCCACCCCAGCACCCGGCGT GTAGACAGTCCCGTGCCCCGCAGGGCCACTGACTGGAGAGGGGTGCCCTCTGAGAT CGA
Expression of Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3) HSSTROL3 franscripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg25 in normal and cancerous breast tissues Expression of Stromelysin-3 precursor transcripts detectable by or according to seg25, HSSTROL3 junc21-27 amplicon(s) and primers HSSTROL3junc21-27F and HSSTROL3junc21 - 27 R was measured by real time PCR (RNA was as for the experiment above). In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRT1- amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. N J000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 30C is a histogram showing over expression of the above- indicated Sfromelysin-3 precursor transcripts in cancerous breast samples relative to the normal samples . As is evident from Figure 30C, the expression of Sfromelysin-3 precursor transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 56-60, 63-67 Table 1: Tissue samples in testing panel, above). Notably an over-expression of at least 5 fold was found in 20 out of 28 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Sfromelysin-3 precursor transcripts detectable by the above amplicon(s) in breast cancer samples versus the normal tissue samples was determined by T test as 5.79E-02. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 6.75E-03 as checked by exact fisher test. The above values demonsfrate statistical significance of the results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSSTROL seg25F forward primer; and HSSTROL seg25R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: HSSTROL seg25. Forward primer HSSTROL seg25F (SEQ ID NO:876): CACTGCCCCAGCTTATCCC Reverse primer HSSTROL seg25R (SEQ ID NO:877): CTCTCCCAGCCTCAGTTTCCT Amplicon HSSTROL seg25 (SEQ ID NO:878): CACTGCCCCAGCTTATCCCAGGCCTCCCGCTTCCCTCTGCGGGTGGGGTGCTGAGCA GGCATTATTGGCCTGCATGTTTTACTGATGAGGAAACTGAGGCTGGGAGAG
DESCRIPTION FOR CLUSTER AYl 80924 Cluster AY 180924 features 1 transcript(s) and 3 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Latherin precursor (SwissProt accession identifier LATHJrlUMAN; known also according to the synonyms Breast cancer and salivary gland expressed protein), SEQ ID NO: 280, refeπed to herein as the previously known protein. Protein Latherin precursor is known or believed to have the following function(s): surfactant properties. The sequence for protein Latherin precursor is given at the end of the application, as "Latherin precursor amino acid sequence". The protein Latherin localization is believed to be Secreted. As noted above, cluster AY 180924 features 1 franscript, which were listed in Table 1 above. This transcript encode for protein which is a variant of protein Latherin precursor. A description of the variant protein according to the present invention is now provided. Variant protein AY 180924 JPEA_1 JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AYl 80924 JPEA_1 JIT. An alignment is given to the known protein (Latherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AY 180924 JPEA_1JP3 and LATHJrlUMAN: l.An isolated chimeric polypeptide encoding for AY 180924 JPEA_1JP3, comprising a first amino acid sequence being at least 90 % homologous to MLNNSGLFVLLCGLLVSSSAQEVLAGVSSQLLN coπesponding to amino acids 1 - 33 of LATHJrlUMAN, which also coπesponds to amino acids 1 - 33 of AYl 80924 JPEA_1J?3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GETNLL MQNPEPMPVCTSLA YLGHNEHY coπesponding to amino acids 34 - 64 of AY 180924 JPE A_l JP3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of AY180924JPEA_1JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GETVLLWNMQNPEPMPVKFSLAKYLGHNEHY in AY 180924 JPEA_1JP3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein AYl 80924 JPE A_l JP3 is encoded by the following franscript(s): AY 180924 JPEA_1 JTl, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript AYl 80924 JPEA_1 JTl is shown in bold; this coding portion starts at position 73 and ends at position 264. The transcript also has the following SNPs as listed in Table 4 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AYl 80924 JPEA_1 JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Nucleic acid SNPs
As noted above, cluster AY 180924 features 3 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster AYl 80924 JPEA_l_node_3 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in die following transcript(s): AYl 80924 JPEA_1_T1. Table 5 below describes the starting and ending position of this segment on each transcript. Table 5 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster AYl 80924 J?EA_l_node_0 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AY 180924 JPE A_l JTl . Table 6 below describes the starting and ending position of this segment on each transcript. Table 6 - Segment location on transcripts
Segment cluster AY180924JPEA_l_node_2 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AY 180924 JPE A_1_T1. Table 7 below describes the starting and ending position of this segment on each franscript. Table 7 - Segment location on transcripts
Nariant protein alignment to the previously known protein: Sequence name: /tmp/FepOCusBjG/YVh7Evl27H:LATH_HUMAΝ
Sequence documentation:
Alignment of: AY180924_PEA_1_P3 x LATHJHUMAN Alignment segment 1/1:
Quality: 300.00 Escore : 0 Matching length: 33 Total length: 33 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLNVSGLFVLLCGLLVSSSAQEVLAGVSSQLLN 33 1 1 1 1 1 1 1 1 I I 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 MLNVSGLFVLLCGLLVSSSAQEVLAGVSSQLLN 33
DESCRIPTION FOR CLUSTER R75793 Cluster R757 i features 3 transcript(s) and 9 segment(s) of interest, the names for which are given in Tables J and 2, respectively, the sequences themselves are given at the end of the application. The se'eaed protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
Cluster R75793 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 31 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 31 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster R75793 features 3 franscript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein R75793JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R75793JPEA_1_T1. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R75793JPEA_1 JP2 and Q96DR8 (SEQ ID NO: 294): l.An isolated chimeric polypeptide encoding for R75793JPEA_1JP2, comprising a first amino acid sequence being at least 90 % homologous to Ml^LAVLVLLGVSIFLVSAQNPTTAAPADTΥPATGPADDEAPDAETTAAATTATTAAPT TATTAASTTARKDIP coπesponding to amino acids 1 - 74 of Q96DR8, which also coπesponds to amino acids 1 - 74 of R75793JPEA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AP coπesponding to amino acids 75 - 76 of R75793JPEA_1 JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein R75793 JPEA_1 JP2 is encoded by the following transcript(s): R75793J?EA_1_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript R75793JPEA_1_T1 is shown in bold; this coding portion starts at position 69 and ends at position 296. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R75793JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
53 :
Variant protein R75793JPEA_1 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R75793J?EA_1_T5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein R75793JPEA_1 JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last colunm indicates whether the SNP is known or not; the presence of known SNPs in variant protein R75793JPEA_1 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein R75793 JPEA_1 JP5 is encoded by the following transcript(s): R75793JPEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript R75793JPEA_1_T5 is shown in bold; this coding portion starts at position 69 and ends at position 383. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R75793JPEA_1 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein R75793JPEA_1 JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R75793JPEA_1_T3. The location of the variant protein was detemiined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein R75793JPEA_1 JP6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R75793JPEA_1JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein R75793 JPEA_1 JP6 is encoded by the following transcript(s): R75793JPEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R75793JPEA_1_T3 is shown in bold; this coding portion starts at position 329 and ends at position 502. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R75793JPEA_1 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
As noted above, cluster R75793 features 9 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster R75793 JPEA_l_node_0 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1 JT3. Table 1 1 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_9 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1 _T5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_l 1 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1 JTl and R75793JPEA_1_T3. Table 13 below describes the starting and ending position of this segment on each franscript. Table 13 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_14 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793J?EA_1_T1 and R75793JPEA_1_T3. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster R75793 JPEA_l_node_4 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1 JTl and R75793_PEA_1 _T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_5 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1_T1 and R75793JPEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster R75793 JPEA_l_node_6 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R75793JPEA_1_T1 and R75793J?EA_1_T5. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_8 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1_T1, R75793JPEA_1_T3 and R75793JPEA_1 JT5. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster R75793JPEA_l_node_13 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R75793JPEA_1_T1. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: Q96DR8 Sequence documentation:
Alignment of: R75793_PEA_1_P2 x Q96DR8
Alignment segment 1/1:
Quality: 681.00 Escore : 0 Matching length: 74 Total length: 74 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment : 1 MKFLAVLVLLGVSIFLVSAQNPTTAAPADTYPATGPADDEAPDAETTAAA 50 1 M 1 1 1 1 1 1 1 1 1 E E 1 1 1 1 1 1 1 1 ! I I 1 1 1 1 E 1 ! ! 1 1 ! 1 I I 1 1 ! M I M M i 1 MKFLAVLVLLGVSIFLVSAQNPTTAAPADTYPATGPADDEAPDAETTAAA 50 51 TTATTAAPTTATTAASTTARKDIP 74 E ! 1 1 I E I E 1 i I I 1 1 E 1 1 M 1 1 1 ! ! 51 TTATTAAPTTATTAASTTARKDIP 74
Expression of Homo sapiens small breast epithelial mucin (LOCI 18430) R75793 transcripts which are detectable by amplicon as depicted in sequence name R75793 juncl 1-13 in normal and cancerous Breast tissues Expression of Homo sapiens small breast epithelial mucin (LOCI 18430) transcripts detectable by or according to juncll-13, R75793 juncll-13 amplicon(s) and primers R75793 juncl 1- 13F and R75793 juncl 1- 13R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD- amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) and G6PD (GenBank Accession No. NM_000402; G6PD amplicon), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was caπied out no differential expression in the cancerous samples relative to the normal PM samples was observed. However, this may be due to a failure of this particular experiment. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: R75793 juncl 1- 13F forward primer; and R75793 juncl 1- 13R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: R75793 juncl 1- 13. Forward primer R75793 juncl 1- 13F (SEQ ID NO:879): TGATGATGAAGCCCCTGATG Reverse primer R75793 juncl 1- 13R (SEQ ID NO:880): TATTGTCAAGGGGCTGGAATGT Amplicon R75793 juncl 1-13 (SEQ ID NO:881): TGATGATGAAGCCCCTGATGCTGAAACCACTGCTGCTGCAACCACTGCGACCACTG CTGCTCCTACCACTGCAACCACCGCTGCTTCTACCACTGCTCGTAAAGACATTCCAG CCCCTTGACAATA
Expression of Homo sapiens small breast epithelial mucin (LOCI 18430) R75793 franscripts which are detectable by amplicon as depicted in sequence name R75793 seg9 in normal and cancerous Breast tissues Expression of Homo sapiens small breast epithelial mucin (LOCI 18430) franscripts detectable by or according to seg9, R75793seg9 amplicon(s) and primers R75793 seg9F and R75793seg9R was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) and G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was carried out no differential expression in the cancerous samples relative to the normal PM samples was observed. However, this may be due to a failure of this particular experiment. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: R75793seg9F forward primer; and R75793seg9R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illustrative example only of a suitable amplicon: R75793seg9. Forward primer R75793seg9F (SEQ ID NO:882): TCCAGCAATAACCATTTTTCACTTC Reverse primer R75793seg9R (SEQ ID NO:883): GCTTTCACAGACTTTTGCTTAGGATT Amplicon R75793seg9 (SEQ ID NO:884): TCCAGCAATAACCATTTTTCACTTCCAGCCTCATGTCAAACAGCCAGTTTCCATGTG GATAGTCTTTGTTATAAGGAATCCTAAGCAAAAGTCTGTGAAAGC
DESCRIPTION FOR CLUSTER HUMCAIXIA Cluster HUMCAIXIA features 4 transcript(s) and 46 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
54;
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Collagen alpha 1 (SwissProt accession identifier CA IB JHUMAN; known also according to the synonyms XI), SEQ ID NO:348, refeπed to herein as the previously known protein Protein Collagen alpha 1 is known or believed to have the following function(s): May play an important role in fibrillogenesis by controlling lateral growth of collagen II fibrils. The sequence for protein Collagen alpha 1 is given at the end of the application, as "Collagen alpha 1 amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cartilage condensation; vision; hearing; cell-cell adhesion; extracellular matrix organization and biogenesis, which are annotation(s) related to Biological Process; exfracellular matrix structural protein; extracellular matrix protein, adhesive, which are annotation(s) related to Molecular Function; and extracellular matr ; collagen; collagen type XI, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink >.
Cluster HUMCA 1XIA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 32 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 32 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: bone malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HUMCAI XIA features 4 franscript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Collagen alpha 1. A description of each variant protein according to the present invention is now provided.
Variant protein HUMCA 1XIAJP 14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMCA lXIAJT 16. An alignment is given to the known protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JP14 and CA1BJΪUMANJV5 (SEQ JD NO: 349): l.An isolated chimeric polypeptide encoding for HUMCA lXIAJP 14, comprising a first amino acid sequence being at least 90 % homologous to MEPWSSRWXTKRWLWDFTVTTLALTFLFQAREVRGAA PVDVLKALDFHNSPEGISKTT GFCTΉRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTTPEDFSILFTVI^KKGIQSFLLSΓY NEHGIQQIGVEVGRSP LIΕDHTGKPAPEDYPLRATVNIA^ IVDCKKXTTKPLDRSERAIVDTNGITVFGTRILDEE EGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EAMVDDFQEYNYGTMESYQTEAPPJWSGTT^PNPVEEIFTΕEYLTGEDYDSQRKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSΓN GHGAYGEKGQKGEPA WEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG PIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPPGPQGPIGYPGPRGVK GADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGEVGQIGPRGEDGPEGPKGRAGPT GDPGPSGQAGEKGKLGVPGLPGYPGRQGPKGSTGFPGFPGANGEKGARGVAGKPGPR GQRGPTGPRGSRGARGPTGKPGPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGP PGKDGLPGHPGQRGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGEQG LPGAAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQGPPGP
V corresponding to amino acids 1 - 1056 of CA IB JHUMAN JV5, which also coπesponds to amino acids 1 - 1056 of HUMCA lXIAJP 14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSMMIINSQTIMWNYSSSFITLML corresponding to amino acids 1057 - 1081 of HUMCA1XIAJP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCA1XIAJP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSMMINSQTIMVVNYSSSFITLML in HUMCA lXIAJP 14.
It should be noted that the known protein sequence (CA IB JHUMAN; SEQ ID NO:348) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CA1BJHUMAN_V5 (SEQ ID NO:349). These changes were previously known to occur and are listed in the table below. Table 7 - Changes to CA1B_HUMAN_V5
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMCA lXIAJP 14 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HUMCA1XIAJP14 is encoded by the following transcript(s): HUMCA 1XIAJT 16, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMCA 1XIAJT 16 is shown in bold; this coding portion starts at position 319 and ends at position 3561. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last colunm indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HUMCA lXIAJP 15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMCAlXIAjri7. An alignment is given to the known protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JP15 and CA1BJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCA1XIAJP15, comprising a first amino acid sequence being at least 90 % homologous to MEPWSSRWKTKT^WLWDFTVTTLALTFLFQAJTEVRGAAPVDVLKALDFHNSPEGISKTT GFCTNT E^SKGSDTAYRVSKQAQLSAPT QLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVΌCKKKTTKPLDRSERAΓVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EANIVDDFQEYNYGTMES YQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSTN GHGAYGEKGQKGEPA WEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG
PIGPPGEK coπesponding to amino acids 1 - 714 of CA1B JHUMAN, which also coπesponds to amino acids 1 - 714 of HUMCA lXIAJP 15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MCCNLSFGILIPLQK coπesponding to amino acids 715 - 729 of HUMCA lXIAJP 15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMCA lXIAJP 15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCCNLSFGfLIPLQK in HUMCA lXIAJP 15.
The location of die variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMC A lXIAJP 15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 15 sequence provid es support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
The glycosylation sites of variant protein HUMCA lXIAJP 15, as compared to the known protein Collagen alpha 1, are described in Table 11 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last colunm indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein HUMCA1XIAJP15 is encoded by the following transcript(s): HUMCA1XIAJT17, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMCA lXIAJT 17 is shown in bold; this coding portion starts at position 319 and ends at position 2505. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HUMCA lXIAJP 16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA lXIAJT 19. An alignment is given to the known protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JP16 and CAIBJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCA lXIAJP 16, comprising a first amino acid sequence being at least 90 % homologous to
MEPWSSRWKTKRWLWDFTNTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTKRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTTPEDFSILFTNK KKGIQSFLLSIY ΝEHGIQQIGVEVGRSPVFLFEDFTTGKPAPEDYPLFRTVrøADGKWHRVAISVEKKTNTM IVDCKIO TTKPLDRSEl^IVDTΝGITNFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPINTEETIAQT EAΝIVDDFQEYΝYGTMESYQTΕAPRF1NSGTΝEPΝPVEEIFTEEYLTGEDYDSQRKΝSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSIN GHG A YGEKGQKGEPA WEPGMLVEGPPGP AGP AGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA coπesponding to amino acids 1 - 648 of CA1B JHUMAN, which also coπesponds to amino acids 1 - 648 of HUMCA lXIAJP 16, a second amino acid sequence being at least 90 % homologous to GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEK coπesponding to amino acids 667 - 714 of CA IB JHUMAN, which also corresponds to amino acids 649 - 696 of HUMCA1XIAJP16, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VSFSFSLFYKKVIKFACDKJtFVGRFmERKVVKLSLPLYLIYE coπesponding to amino acids 697 - 738 of HUMCA1XIAJP16, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMCA lXIAJP 16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2. 3.An isolated polypeptide encoding for a tail of HUMCA lXIAJP 16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSFSFSLFYKKVKFACDKP NGRHDERKVVKLSLPLYLIYE in HUMCAIXIA P16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HUMCA1XIAJP16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
The glycosylation sites of variant protein HUMCA1XIAJP16, as compared to the known protein Collagen alpha 1, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Tab'e 14 - Glycosylation site(s)
Variant protein HUMCA 1XIAJP16 is encoded by the following transcript(s): HUMCA lXIAJT 19, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMCA lXIAJT 19 is shown in bold; this coding portion starts at position 319 and ends at position 2532. The franscript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Variant protein HUMCA lXIAJP 17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA1XIAJT20. An alignment is given to the known protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JP17 and CA1BJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCA lXIAJP 17, comprising a first amino acid sequence being at least 90 % homologous to
MEPWSSRWKTKRWLWDFTNTTLALTFLFQAREVRGAAPVDVLKALDFFFNSPEGISKTT GFCTTVU^KNSKGSDTAYRVSKQAQLSAPTKQLFPGGTTPEDFSILFTVKPKKGIQSFLLSΓY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIAI3GKWΗRVAISVEKKTVTM IVDCK.KKTTKPLDRSERA1VDTNGITVFGTR1LDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDE coπesponding to amino acids 1 - 260 of CAIBJHUMAN, which also corresponds to amino acids 1 - 260 of HUMCA1XIAJP17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRSTRPEKVFVFQ coπesponding to amino acids 261 - 273 of HUMCAIXIA JP17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCA1XIAJP17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRSTRPEKVFVFQ in HUMCA lXIAJP 17.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMCAIXIA JP17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
The glycosylation sites of variant protein HUMCA1XIAJP17, as compared to the known protein Collagen alpha 1, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation site(s)
Variant protein HUMCA lXIAJP 17 is encoded by the following transcript(s): HUMCA1XIAJT20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCA 1XIAJT20 is shown in bold; this coding portion starts at position 319 and ends at position 1 137. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
As noted above, cluster HUMCAIXIA features 46 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster H TMCAlXIA_node_0 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17, HUMCA1XIAJT19 and HUMCA 1 XI AJT20. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_2 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17, HUMCA lXIAJT 19 and HUMCA1XIAJT20. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HUMC A 1 XIA_node_4 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16. HUMCA lXIAJT 17, HUMC A lXIAJT 19 and HUMCA 1 XI AJT20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 22. Table 22 - Oligonucleotides related to this segment
Segment cluster HUMCA lXIA_node_6 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXLAJT 16, HUMCA lXIAJT 17, HUMCA lXIAJT 19 and HUMCA 1XLAJT20. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 24. Table 24 - Oligonucleotides related to this segment
Segment cluster HUMCA lXIA_node__8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA LXLAJT 17, HUMCA 1 XI A _T 19 and HUMCAl XIAJT20. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_9 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T20. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_ 18 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16, HUMCA1XIAJT17 and HUMCA lXLAJT 19. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_54 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following frans cript(s): HUMCA lXIAJT 19. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_55 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 30. Table 30 - Oligonucleotides related to this segment
Segment cluster HUMCA lXIA__node_92 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAlXLAJTlβ. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HUMCA 1 XIA_node_ 11 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_ 15 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16, HUMCA1XIA_T17 and HUMCA lXIAJT 19. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_ 19 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIA_T17 and HUMCA lXIAJT 19. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_21 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA 1 XI A _T 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_23 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_25 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIA_T17 and HUMCA lXIAJT 19. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMCA lXIAjnode_27 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMC A lXIAJT 17 and HUMCA lXIAJT 19. Table 38 below describes the starting and ending position of this segment on each transcnpt. Table 38 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_29 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMCA lXLAjnode H according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIA_T17 and HUMCA lXIAJT 19. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_33 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_35 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXLAJT 19. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_37 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCAl XIAJTl 7 and HUMCA lXIAJT 19. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_39 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_41 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMCAl XI A_node_43 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 46 below describes the starting and ending position of this segment on each franscript. Table 46 - Segment location on franscripts
Segment cluster HUMCA lXIA_node_45 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16 and HUMCA1XIAJT17. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_47 according to the present invention is supported by 5 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_49 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_51 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_57 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment 57 : can be found in the following transcript(s): HUMCA lXLAJT 16. Table 51 below describes the starting and ending position of this segment on each franscript. Table 51 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_59 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Segment cluster HUMCAl XI A_node_62 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_64 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_66 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_68 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_70 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16. Table 57 below describes the starting and ending position of this segment on each franscript. Table 57 - Segment location on transcripts
Segment cluster HUMCA lXIA_nodeJ72 according to the present invention is supported by 6 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HUMCAl XI A_nodeJ74 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 59 below describes the starting and ending position of this segment on each franscript. Table 59 - Segment location on transcripts
Segment cluster HUMCA lXLA_node_76 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts
Segment cluster HLlMCAlXIA_node_78 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_81 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 62 below describes the starting and ending position of this segment on each franscript. Table 62 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_83 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 63 below describes the starting and ending position of this segment on each franscript. Table 63 - Segment location on transcripts
Segment cluster HUMCAlXIA_node_85 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster HLTMCAlXIA_node_87 according to the present invention is supported by 10 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HUMCA lXIA_node_89 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXLAJT 16. Table 66 below describes the starting and ending position of this segment on each franscript. Table 66 - Segment location on transcripts
Segment cluster HUMCA 1 XIA_node_91 according to the present invention is supported by 1 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCA lXIAJT 16. Table 67 below describes the starting and ending position of this segment on each franscript. Table 67 - Segment location on transcripts
Transcript nucleic acid sequences: Variant protein alignment to the previously known protein: Sequence name: CA1B_HUMAN_V5
Sequence documentation:
Alignment of: HUMCA1XIA_P14 x CA1B_HUMAN_V5
Alignment segment 1/1:
Quality: 1045S.00 Escore : 0 Matching length: 1058 Total length: 1058 Matching Percent Similarity: 99.91 Matching Percent Identity: 99.91 Total Percent Similarity: 99.91 Total Percent Identity: 99.91 Gaps : 0
Alignment : . . . . . 1 MEP SSR KTKR LWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50
1 MEP SSR KTKR L DFTVTTLALTFLFQAREVRGAAPVDVLKALDFH S 50 51 PEGISKTTGFCTNRK SKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 151 YPLFRTVNIADGK HRVAISVEKKTNTMIVDCKKKTTKPLDRSERAIVOT 200
151 YPLFRTVNIADGKWHRVAISVEKKTv MIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
301 NIVDDFQEY YGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVΞGPPGPAGPA 450
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450 . . . . .
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKG MGPQGEPGPPGQQGNPGPQG 700
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKG MGPQGEPGPPGQQGNPGPQG 700
701 LPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPP 750
701 LPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPP 750
751 GPQGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGE 800 I 111 I 11 I I 11 I 11 II II I I I I II I 111 I I I II I I 11111 II I 11111 I I
751 GPQGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGE 800
801 VGQIGPRGEDGPEGPKGRAGPTGDPGPSGQAGEKGKLGVPGLPGYPGRQG 850
801 VGQIGPRGEDGPEGPKGRAGPTGDPGPSGQAGEKGKLGVPGLPGYPGRQG 850
851 PKGSTGFPGFPGA GEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKP 900
851 PKGSTGFPGFPGA GEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKP 900
901 GPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGPPGKDGLPGHPGQ 950 901 GPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGPPGKDGLPGHPGQ 950 951 RGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGEQGLPG 1000
951 RGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGΞQGLPG 1000 1001 AAGKEGAKGDPGPQGI SGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQ 1050 1001 AAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQ 1050 1051 GPPGPWS 1058
1051 GPPGPVGS 1058
Sequence name : CA1B_HUMA
Sequence documentation:
Alignment of: HUMCA1XIA_P15 x CA1B_HUMA
Alignment segment 1/1:
Quality: 7073.00 Escore : 0 Matching length: 714 Total length: 714 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFH S 50 IIIIIMMIIIIIMIIIIM 1 MEP SSR KTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFH S 50 51 PEGISKTTGFCTNRK SKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
51 PEGISKTTGFCTNRK SKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 I I I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I II 11 I I I I 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
151 YPLFRTV IADGK HRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200 151 YPLFRTVTΠADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300 251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
301 NIVDDFQEY YGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
301 NlλtDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEI FTEEYLTGEDYDS 350
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPA 450 . . . . .
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650 651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700 701 LPGPQGPIGPPGEK 714
701 LPGPQGPIGPPGEK 714
Sequence name : CA1B_HUMAN
Sequence documentation:
Alignment of: HUMCA1XIA_P16 x CA1B_HUMAN
Alignment segment 1/1:
Quality: 6795.00 Escore : 0 Matching length: 696 Total length: 714 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 97.48 Total Percent Identity: 97.48 Gaps : 1 Al ignment :
1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50
1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 151 YPLFRTV IADGK HRVAI SVEKKTVTMIλtDCKKKTTKPLDRSERAIVDT 200
151 YPLFRTWIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I II 11 I I I I I I I II I I I I I 201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300 251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400 351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 . . . . .
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA.. 648
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650
649 GMAGVDGPPGPKG MGPQGEPGPPGQQGNPGPQG 682 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700
683 LPGPQGPIGPPGEK 696
701 LPGPQGPIGPPGEK 714
Sequence name: CA1B_HUMAN
Sequence documentation:
Alignment of: HUMCA1XIAJP17 x CA1BJHUMAN
Alignment segment l/l:
Quality: 2561.00 Escore: 0 Matching length: 260 Total length: 260 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50
1 MEP SSRWKTKR LWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50
51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 51 PEGISKTTGFCTNRK SKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 151 YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200 M I 151 YPLFRTT^IADGKWHRVAISVEKKTVTMIλ/DCKKKTTKPLDRSERAIλ/DT 200 . . . . . 201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 251 AQAQEPQIDE 260 M I N I M I ! 251 AQAQEPQIDE 260
DESCRIPTION FOR CLUSTER R20779 Cluster R20779 features 1 franscript(s) and 24 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Stanniocalcin 2 precursor (SwissProt accession identifier STC2JHUMAN; known also according to the synonyms STC-2;
Stanniocalcin-related protein; STCRP; STC-related protein), SEQ JD NO: 379, refeπed to herein as the previously known protein. Protein Stanniocalcin 2 precursor is known or believed to have the following function(s):
Has an anti-hypocalcemic action on calcium and phosphate homeostasis. The sequence for protein Stanniocalcin 2 precursor is given at the end of the application, as "Stanniocalcin 2 precursor amino acid sequence". Protein Stanniocalcin 2 precursor localization is believed to be
Secreted (Potential). The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell surface receptor linked signal transduction; cell-cell signaling; nutritional response pathway, which are annotation(s) related to Biological Process; hormone, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl
Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster R20779 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 33 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 33 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
59Ξ
For this cluster, at least one oligonucleotide was found to demonsfrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microaπay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below, shown in Table 6. Table 6 - Oligonucleotides related to this cluster
As noted above, cluster R20779 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Stanniocalcin 2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein R20779JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R20779JT7. An alignment is given to the known protein (Stanniocalcin 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R20779JP2 and STC2 JHUMAN: l.An isolated chimeric polypeptide encoding for R20779JP2, comprising a first amino acid sequence being at least 90 % homologous to
MCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNTAEIQHCLV NAGDVGCGVFECFENNSCEIRGLHGICMTR^HNAGKFDAQGKSFIKDALKCKAHALRH RFGCISRKCPAΓREMVSQLQRECYLKHDLCAAAQENTRVIVEMILTFKDLLLHE coπesponding to amino acids 1 - 169 of STC2 JHUMAN, which also coπesponds to amino acids 1 - 169 of R20779JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CYKffilTMPKRRKVKLRD coπesponding to amino acids 170 - 187 of R20779JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of R20779JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CYKΓEITMPKRRKVKLRD in R20779JP2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein R20779JP2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R20779JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
The glycosylation sites of variant protein R20779JP2, as compared to the known protein Stanniocalcin 2 precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Variant protein R20779JP2 is encoded by the following transcript(s): R20779JT7, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript R20779JT7 is shown in bold; this coding portion starts at position 1397 and ends at position 1957. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R20779JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. Tliese segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster R20779_node_0 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R20779JT7. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster R20779_node_2 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster R20779_node_7 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster R20779_node_9 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s) : R20779JT7. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster R20779_node_18 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster R20779_node_21 according to the present invention is supported by 106 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster R20779_node_24 according to the present invention is supported by 100 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on franscripts
Segment cluster R20779_node_27 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on franscripts
Segment cluster R20779_node_28 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table IS below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster R20779_node_30 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster R20779_node_31 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster R20779_node_32 according to the present invention is supported by 88 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
According to an optional embodiment of The present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster R20779_node_l according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster R20779_node_3 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster R20779_node_10 according to the present invention can be found in the following transcript(s): R20779JT7. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on franscripts
Segment cluster R20779_node_l 1 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster R20779_node_14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster R20779__node_17 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 27 below describes the starting and ending position of this segment on each transcript. 60S Table 27 - Segment location on transcripts
Segment cluster R20779_node_19 according to the present invention can be found in the following transcript(s): R20779JT7. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster R20779_node_20 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster R20779_node_22 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 30 below describes the starting and ending position of tiiis segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster R20779_node_23 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R20779JT7. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster R20779_node_25 according to the present invention can be found in the following transcript(s): R20779JT7. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster R20779_node_29 according to the present invention can be found in the following transcript(s): R20779JT7. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: STC2_HUMA
Sequence documentation:
Alignment of: R20779_P2 x STC2_HUMAN
Alignment segment l/l:
Quality: 1688.00 Escore: 0 Matching length: 171 Total length: 171 Matching Percent Similarity: 99.42 Matching Percent Identity: 99.42 Total Percent Similarity: 99.42 Total Percent Identity: 99.42 Gaps : 0
Alignment :
1 MCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNT 50 MMMIMMMMIMMMMMMMMMMMMIMMMIM 1 MCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNT 50 51 AEIQHCLV AGDVGCGVFECFE SCEIRGLHGICMTFLHNAGKFDAQGK 100 MMMMMMMMMMMMMMMMIMIMMMMMMM 51 AEIQHCLVNAGDVGCGVFECFENNSCEIRGLHGICMTFLHNAGKFDAQGK 100 101 SFIKDALKCKAHALRHRFGCISRKCPAIREMVSQLQRECYLKHDLCAAAQ 150 MMMMMMMMMMMMMMMMMMMMMMMMM 101 SFIKDALKCKAHALRHRFGCISRKCPAIREMVSQLQRECYLKHDLCAAAQ 150
151 ENTRVIVEMIHFKDLLLHECY 171
151 ENTRVIVEMIHFKDLLLHEPY 171
DESCRIPTION FOR CLUSTER HSSIOOPCB Cluster HSSIOOPCB features 1 transcript(s) and 3 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein S-100P protein (SwissProt accession identifier SIOPJHUMAN), SEQ JD NO: 385, refeπed to herein as the previously known protein. The sequence for protein S-100P protein is given at the end of the application, as "S-100P protein amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The following GO Annotation(s) apply to the previously known protein. The fol owing annotation(s) were found: calcium binding; protein binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.cli sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSSIOOPCB can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 34 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 34 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues. Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HSSIOOPCB features 1 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein S- 100P protein. A description of each variant protein according to the present invention is now provided.
Variant protein HSS100PCBJP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSSIOOPCB JTl. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSSIOOPCB JP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSS100PCBJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HSS100PCBJP3 is encoded by the following transcript(s): HSSIOOPCB JTl, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSSIOOPCB JTl is shown in bold; this coding portion starts at position 1057 and ends at position 1533. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSS100PCBJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster HSSIOOPCB features 3 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSS100PCB_node_3 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSS100PCBJT1. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on franscripts
Segment cluster HSS100PCB_node_4 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSS 100PCBJT1. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 11. Table 11 - Oligonucleotides related to this segment
Segment cluster HSS100PCB_node_5 according to the present invention is supported by 141 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSS100PCBJT1. Table 12 below describes the starting and ending position of this segment on each transcript. Segment location on transcripts
DESCRIPTION FOR CLUSTER HSCOC4 Cluster HSCOC4 features 19 transcript(s) and 79 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Complement C4 precursor [Contains: C4a anaphylatoxin] (SwissProt accession identifier C04JHUMAN) SEQ ID NO: 485), refeπed to herein as the previously known protein. Protein Complement C4 precursor [Contains: C4a anaphylatoxin] is known or believed to have the following function(s): C4 plays a central role in the activation of the classical pathway of the complement system. It is processed by activated CI which removes from the alpha chain the C4a anaphylatoxin. Derived from proteolytic degradation of complement C4, C4a anaphylatoxin is a mediator of local inflammatory process. It induces the contraction of smooth muscle, increases vascular permeability and causes histamine release from mast cells and basophilic leukocytes. The sequence for protein Complement C4 precursor [Contains: C4a anaphylatoxin] is given at the end of the application, as "Complement C4 precursor [Contains: C4a anaphylatoxin] amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: muscle contraction regulation; inflammatory response; complement activation; complement activation, classical pathway, which are annotation(s) related to Biological Process; complement component; proteinase inhibitor, which are annotation(s) related to Molecular Function; and extracellular; extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSCOC4 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 35 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 35 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HSCOC4 features 19 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Complement C4 precursor [Contains: C4a anaphylatoxin]. A description of each variant protein according to the present invention is now provided.
Variant protein HSCOC4JPEA_l JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JTl. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSC0C4JPEA_1JP3 and C04 JHUMAN : l .An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP3, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQ KG8VFLR ΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTΝIQGLΝLLFSSRRGHLFLQTDQPN^ΝPGQRVRYRWALDQKMRPSTOTITVMV EΝSHGLRVRKKEVYMPSSIFQDDFVΓPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVL PΝFEVKITPGKPYΓLTVPGHLDEMQLDIQARYIYGKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDI^KTKRHLVPGAPFLLQALΛOTEMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSΓPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMILSRGQIWMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LA VA GALDTALYAAGSKSHKPLNMGKVFΈAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKEKTTRKEI^NNNFQKAINEKLGQYASPTAK^ LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDIPVRSFFPEN LWRVETVDRFQILTLV^PDSLTTWEMGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVPJIFEQLELRPVLYNYLDKNLTN coπesponding to amino acids 1 - 865 of C04JHLuVIAΝ, which also coπesponds to amino acids 1 - 865 of HSC0C4JPEA_1JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RPHRSLSIQELGEPGPSEGWGG coπesponding to amino acids 866 - 887 of HSC0C4JPEA_1JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RPHRSLSIQELGEPGPSEGWGG in HSCOC4JPEA_l JP3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSC0C4JPEA_1 JP3 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
The glycosylation sites of variant protein HSC0C4JPEA_1JP3, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP3, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last colunm indicates whether the position is different on the variant protein). Table 9 - Phosphoiγlation site(s)
Variant protein HSCOC4JPEA_l JP3 is encoded by the following franscript(s): HSCOC4JPEA_l_Tl, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l_Tl is shown in bold; this coding portion starts at position 501 and ends at position 3161. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCOC4JPEA_l JT3. An alignment is given to the known protem (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4_PEA_l JP5 and C04 JHUMAN: l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP5, comprising a first amino acid sequence being at least 90 % homologous to
MPJ^LWGLRVASSFFTLSLQKPP LLFSPSVVHLGVPLSVGVQLQDVPRGQVVKGSVFLR ^SR TNVPCSPKVDFTLSSEPODFALLSLQ LKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTTsJIQGINLLFSSRRGHLFLQTDQPIYTWGQRVRYR ALDQKMRPSTDTITV TV ENSHGLRVRIG EVYMPSSIFQDDFVrPDISEPGTWiαSARFSDGLESNSSTQFEVKI NL PNFEVKITPGKPYILTVPGHLDEMQLDIQ ARYI YGKPVQG V A YNRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGTPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLI^VGSGATFSHY ΥMILSRGQIVFM nREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPV ANSLRVDVQAG ACEGKLELS VDGAKQYRNGES VKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKΕKTTPJ KRNNNFQKATNEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALETLQEEDLID EDDIPVRSFFPENWLWRVETVDRFQTLTLWLPDSLTTWEIHGLSLSKTKG coπesponding to amino acids 1 - 818 of C04 JHUMAN, which also coπesponds to amino acids 1 - 818 of
HSCOC4JPEA_l JP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVTLSGPQVTLLPFPCTPAPCSLCS coπesponding to amino acids 819 - 843 of HSCOC4JPEA_l JP5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVTLSGPQVTLLPFPCTPAPCSLCS in HSCOC4_PEA_l JP5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The glycosylation sites of variant protein HSCOC4JPEA_l JP5, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP5, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Phosphorylation site(s)
Variant protein HSCOC4_PEA_l JP5 is encoded by the following franscript(s): HSC0C4JPEA_1JT3, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4JPEA_l _T3 is shown in bold; this coding portion starts at position 501 and ends at position 3029. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by rranscript(s) HSCOC4JPEA_l _T4. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP6 and C04 JHUMAN: l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP6, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLIWASSFFTLSLQKPPJXLFSPSVVHLGλΦLSVGVQLQDWRGQVNKGSVFLR ΝPSRΝΝNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGITLFLQTOQPIYWGQRVR TIWALDQKMRPSTO^ ENSHGLRVRKKEVYMPSSIFQDDFVIPDISEPGTV^
PNFEVi TPGKPYILTWGHLDEMQLDIQARYTYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTi .VNGQSfflSLSKAEFQDALEKLNlVlGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWΥFVSSPFSLDLSKTKRHLWGAPFXLQALVREMSGSPASG1PVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPHff QTISELQLSVS AGSPHPAIARLTVAAPPSGGPGFLSEERPD SRPPRVGDTLNLNLRAVGSGATFSHYYYMILSRGQTWMNREPKRTLTSVS VDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSTKRLSCPKEKTTPJ<ϋ INVNFQKA EKXGQYASPTAKIλCCQDGVTR LPMMRSCEQI^AARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALELLQEEDLID EDDIPVRSFFPEN VXWRVETVDRFQTLTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSWVSPVEGLCLAGGGGLAQ QVL AGSARPVAFSVVPTAAAAVSLK ARGSFEFPVGDAVSKVLQffiKEGAIHREEL NYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMI\ APTLAASRYLDKTEQWSTLPPETKDHAVDLIQKG coπesponding to amino acids 1 - 1052 of C04 JHUMAN, which also coπesponds to amino acids 1 - 1052 of HSCOC4JPEA_l JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGCKGKQEGGQERTVTGRWTAQEATEGKKGGP coπesponding to amino acids 1053 - 1084 of HSCOC4 JPEA_1 JP6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGCKGKQEGGQERTVTGRWTAQEATEGKKGGP in HSCOC4_PEA_l_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses frπm SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSCOC4JPEA_l JP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
The glycosylation sites of variant protein HSCOC4JPEA_l JP6, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 16 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 16 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP6, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Phosphorylation site(s)
Variant protein HSCOC4_PEA_l JP6 is encoded by the following transcript(s): HSCOC4JPEA_l_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l JT4 is shown in bold; this coding portion starts at position 501 and ends at position 3752. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSC0C4JPEA_1JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein HSC0C4JPEA_1JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l_Tl 1. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP12 and C04_HUMANJV1 (SEQ ID NO: 486): l .An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP12, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTT IIQGlNLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRK EVYMPSSIFQDDFVIPDISEPGTWiαSARFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYTLTVPGHLDEMQLDIQARYIYGKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWΎFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQNTOGSGQVSIPIIIPQTISELQLSVSAGSPLIPAIARLTNAAPPSGGPGFLSIERPD SRPPRVGDTUΝLΝLRAVGSGATFSHY ΎMILSRGQIVI ΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKEKTTRKKI ΝNΝFQKAIΝEKLGQYASPTAKRCCQDGNTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDIPVRSFFPEΝ LWRVETVDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL R REFFILHLRLPMSVP IFEQLELRPVLYΝYLDKΝLTVSVΉVSPVEGLCLAGGGGLAQ QVL VPAGSARPVAFS WPTAAAA VSLK WARGSFEFPVGDAVSKVLQEEKEGAIHREEL VYELΝPLDHRGRTLEIPGΝSDPΝMIPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝ LLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV
TGSQSΝAVSPTPAPRΝPSDPMPQAPALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLΝVTLSSTGRΝGFKSHALQLΝΝRQ
IRGLEEELQFSLGSKΓΝVKVGGΝSKGTLKV coπesponding to amino acids 1 - 1380 of C04_HUMAΝ_V1, which also coπesponds to amino acids 1 - 1380 of HSC0C4JPEA_1JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RAREGVGPGTGGGEGVE coπesponding to amino acids 1381 - 1397 of HSCOC4JPEA_l JP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RAREGVGPGTGGGEGVE in HSCOC4J?EA_l JP12. It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04_HUMAN_V 1. These changes were previously known to occur and are listed in the table below. Table 19 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on die amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP12 is encoded by the following transcript(s): HSCOC4JPEA_l JTl 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l JTl 1 is shown in bold; this coding portion starts at position 501 and ends at position 4691. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Variant protein H8COC4JPEA_l JP15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT14. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP 15 and C04 JHUMAN JV 1 : l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP15, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQVVKGSVFLR NPSRNNNPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPΓYΝPGQRVRYRVFALDQKMRPSTDTIT VMV EΝSHGLRVRK EV TVIPSSIFQDDFVIPDISEPGTN^ΑSARFSDGLESΝSSTQFEVKKYNL P ΓFEVKITPGKJPYILIVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTΕXVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYNAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSΓPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMILSRGQR FMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSΛTDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKEKTTRKTCRNΛ^NFQKAINEKLGQYASPTAKIICCQDGVT R LPMMRSCEQP^ARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALETLQEEDLIE' EDDIPVRSFFPENΛVXWRVETVDRFQΓLTLWLPDSLTTWEEHGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSAPJ'VAFSVVPTAAAAVSLKVNARGSFEFPVGDAVSKVLQFFIKEGAJRIIILEEL VYELΝPLDHRGRTLEΓPGΝSDPΝMΓPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LP PRGCGEQTMIYLAPTLAASRΛ DKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFΛOJKNLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKAS AGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLΛ^WGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWTETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ
TRGLEEELQ coπesponding to amino acids 1 - 1359 of C04_HUMANJV1, which also coπesponds to amino acids 1 - 1359 of HSCOC4JPEA_l JP15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VNΗSLλ^NHSLAWVARTPGPRGQARSRPQPPTRGTPAALLPGVFGGRLTSWLRDLEL coπesponding to amino acids 1360 - 1415 of HSCOC4JPEA_l JP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VNΗSLVNΗSLAWλ^ARTPGPRGQARSRPQPPTRGTPAALLPGVFGGRLTSWLRDLEL in HSC0C4_PEA_1JP15.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04 JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 22 - Changes to C04J1UMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP15 is encoded by the following transcript(s): HSC0C4JPEA_1JT14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSC0C4JPEA_1JT14 is shown in bold; this coding portion starts at position 501 and ends at position 4745. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSC0C4JPEA_1JT15. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4 J?EA_1 JP 16 and C04 JHUMAN JV 1 : l.An isolated chimeric polypeptide encoding for HSCOC4JPEA l JP16, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTNIQGI .LFSSRRGHLFLQTDQPΠT^GQRVRYR ALDQI<MRPSTDTITVMV ENSHGLRVT KI EVYMPSSIFQDDFVIPDISEPGTWTΑSARFSDGLESNSSTQFEVKKYVL PNFE VKITPGKPYILTVPGHLDEMQLDIQARYΓYGKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAHESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSΓPΠΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SRPPRVGDTLNLNLRAVGSGATFSMNYMILSRGQIVFMNREPLS^TLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKLKTTRKKRJSΓVNTQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLED EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL R REFHLHLRLPMSVRRFEQLELRPVLYNYLDK-NL SVHVSPVEGLCLAGGGGLAQ QVLWAGSARPVAFSVVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELΝPLDHRGRTLEΓPGΝSDPΝMIPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLA SRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWXTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAJHΝΝLMAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQAPALWTETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALS AYWIASHTTEERGLΝNTLSSTGRΝGFKSHALQLΝΝRQ IRGLEEELQFSLGSKIΝVKVGGΝSKGTLKVLRTYΝNLDMKΝTTCQDLQFFIVTVKGHVE
YTMEAΝEDYED ΕYDELPAKDDPDAPLQPVTPLQLFEGRRΝRRRREAPK coπesponding to amino acids 1 - 1457 of C04JHUMAN_V1, which also coπesponds to amino acids 1 - 1457 of HSCOC4JPEA_l JP16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
AERQGGAλ^WHGHRGRHPPEWIPRPAC conesponding to amino acids 1458 - 1483 of HSC0C4JPEA_1JP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AERQGGAVWHGHRGRHPPEWIPRPAC in HSC0C4JPEA_1J?16. It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHUMAN_V1. These changes were previously known to occur and are listed in the table below.
Table 25 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HSCOC4_PEA_l JP16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP16 is encoded by the following transcript(s): HSCOC4JPEA_l_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4JPEA_l_T15 is shown in bold; this coding portion starts at position 501 and ends at position 4949. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSC0C4JPEA_1JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP20 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCOC4JPEA_l JT20. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4J?EA_l JP20 and C04JHUMANJV1: l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP20, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLTWASSFFTLSLQKPRLLLFSPSNVHLGVPLSVGVQLQDVPRGQNVKGSVFLR ΝPSP^WVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGHLFLQTDQPIYΝPGQRVRYRVFALDQKMRPSTDTITVMV EΝSHGLRVRK.KEVYMPSSIFQDDFVTPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVL PNFEVKITTGKPYILT GIILDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSTERPD SRPPRVGDTLNLNLRAVGSGATFSITYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPJKRLSCPKEKTTRKKRNΛTNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLLD EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RVFREFFLLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSVΛ TAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMΓYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK AIDGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAIIAAAITAYALTLTKAPADLRGVAITNNLMAMAQETGDNLN VGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWTETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQ coπesponding to amino acids 1 - 1303 of C04JHUMANJV1, which also coπesponds to amino acids 1 - 1303 of HSCOC4JPEA_l JP20, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGAVPGLWRGWWLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH coπesponding to amino acids 1304 - 1349 of HSCOC4JPEA_l JP20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VGAVPGLWRGWWLRPRACLSPGSTSLGHGDCPGCPVCLLDCLPHH in HSCOC4 PEA 1 P20.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHLIMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 28 - Changes to C04_HUhUN_Vl
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4J?EA_l JP20 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 29, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Amino acid mutations
Variant protein HSCOC4J?EA_l JP20 is encoded by the following transcript(s): HSCOC4JPEA_1_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_1_T20 is shown in bold; this coding portion starts at position 501 and ends at position 4547. The transcript also has the following SNPs as listed in Table 30 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 30 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l_T21. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or 651 more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4_PEA_l JP9 and C04 JHUMAN _V1: l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP9, comprising a first amino acid sequence being at least 90 % homologous to
MRLLWGLI ASSFFTLSLQKP^LLFSPSVVHLGVPLSVGVQLQDVPRGQVVKGSVFLR NPSP^NVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRRGHLFLQTDQPTYNPGQRVRYR ALDQKMRPSTDTITVMV ENSHGLRVRKKΕW^PSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYNL PΝFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSFFLSLSKAEFQDALEKLΝ VIGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLWGAPFLLQALVRΕMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSLERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMILSRGQI MΝREPKRTLTSVS WVDIIHLA PSFYFVAFYYHGDHPV AΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPXRLSCPKEKTTR I^I^ΝNΝFQL LΝ^KLGQY LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLED EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWELHGLSLSKTKGLCVATPVQL RVFREFFILHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSWPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREEL VYELNPLDHRGRTLELPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL
LP PRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKG TVTRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDE1NALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQAPALWTETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVLALDALSAYWIASHTTEERGLΝNTLSSTGRΝGFKSHALQLΝΝRQ IIlGLEEELQFSLGSKINVKVGGNSKGTLKVLRT 'NVLDMKNTTCQDLQrEVTVKGHVE YTMEANEDYED ^YDELPAKDDPDAPLQPVTPLQLFEGPI^N RRREAPKVNEEQESRV m^TVCIWPJ^GKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDSV coπesponding to amino acids 1 - 1529 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1529 of HSCOC4JPEA_l JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGER coπesponding to amino acids 1530 - 1533 of HSCOC4JPEA_l JP9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGER in HSCOC4JPEA_l JP9.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04 JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 31 - Changes to C04_Hm NAl
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP9 is encoded by the following transcript(s): HSCOC4JPEA_l_T21, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4JPEA_l JT21 is shown in bold; this coding portion starts at position 501 and ends at position 5099. The franscript also has the following SNPs as listed in Table 33 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT25. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSC0C4JPEA_1JP22 and C04JHUMANJV1: l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP22, comprising a first amino acid sequence being at least 90 % homologous to
MPULWGLRWASSFFTLSLQKPRLLLFSPSVVFFLGVPLSVGVQLQDVPRGQVNKGSVFLR ΝPSRΝ^VPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGHLFLQTDQPNTWGQRVRYR ALDQKMRPSTDTITVMV EΝSHGLRVRK.KEW^MPSSITQDDFVIPDISEPGTWIOSARFSDGLESΝSSTQFEVKKYNL PΝFEVIGTPGK^YILT GIILDEMQLDIQARYIYGKPVQGVAYNRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIΓESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHL GAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSΓPIIΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SRPPRVGDTLΝLΝLRAVGSGATFSHYYYMILSRGQIVFMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG L AFSDGDQ WTLSPI I .SCPKEKTTPJXRΝVΝFQKAL EKLGQ Y ASPTAKRCCQDG VTR LPMMRSCEQRAARVQQPDCPJ5PFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQΓLTLWLPDSLTTWEΓHGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVRRFEQLELRPVLYNYΓLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSAPJ>VAFSV TAAAAVSLKWARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMTPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LP PRGCGEQTMΓΠAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSN LLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALTAFVTIALHHGLA QDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLNNTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIEVTVKGHVE YTMEARØDYEDYE\TJELPAKDDPDAPLQPVTPLQLFEGRR U^RRREAPKWEEQESRV HYTVCIWRΝGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPFTVLL YFDSVPTSRECVGFEAVQEWVGLVQPASATLYD\ YΝPERRCSVFYGAPSKSRLLATLC SAEVCQCAEGKCPRQRRALERGLQDEDGYRMKFACYYPRVEYGFQVKVLREDSRAAF RLFETKITQVLHF coπesponding to amino acids 1 - 1653 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1653 of HSCOC4JPEA_l JP22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SMKQTGEAGRAGGRQGG coπesponding to amino acids 1654 - 1670 of HSCOC4JPEA_l JP22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SMKQTGEAGRAGGRQGG in HSCOC4JPEA_l JP22.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 34 - Changes to C04JfUMAN_Vl
The location of the variant protein was detemiined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSCOC4JPEA_l JP22 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 35, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP22 is encoded by the following transcript(s): HSCOC4JPEA_l_T25, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4 JPEA_1 JT25 is shown in bold; this coding portion starts at position 501 and ends at position 5510. The transcript also has the following SNPs as listed in Table 36 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Nucleic acid SNPs
67:
Variant protein HSCOC4JPEA_l JP23 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l_T28. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP23 and C04 JHUMAN _V1: l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP23, comprising a first amino acid sequence being at least 90 % homologous to
MP LWGLIWASSFFTLSLQKPRLLLFSPSVNTTLGVPLSVGVQLQDVPRGQNVKGSVFLR ΝPSPJ^ΝΛΦCSPKVDFTLSSERDF.ΛLLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTT^IQGINLLFSSRRGHLFLQTOQPIYNPGQR\TIYRWALDQKMRPSTDTITVMV ENSHGLRVRKI^VYMPSSIFQDDFVIPDISEPGTWT SARFSDGLESNSSTQFEVKKYNL PΝFEVKITPGKPYILT VPGHLDEMQLDIQARYTYGKPVQGVANVRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIΓESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLWGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SI^PRVGDTLΝLΝLRAVGSGATFSM^YYTVIILSRGQIVFMΝREPKRTLTSVS VDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSPJ RLSCPKΕKTTRKKRNNNFQKAINE H.GQYASPTAKRCCQDGV LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALETLQEEDLID EDDIPVRSFFPENΛVLWRVETVDRFQTLTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLYNYLDKJ LTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSARPVAFSVΛ TAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTLETPGNSDPMIRPDGDFNSYNRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKOHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVΈASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSNA VSPTPAPRNPSDPMPQ APALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ I GLEEELQFSLGSKINVKVGGNSKGTLKVLR^Y^^^^DMKNTTCQDLQffiVTVKGL^NE Y^ EAΝEDYEDYEYDELPAK^DPDAPLQPVTPLQLFEGPI^ RRREAPKVVEEQESRV HYTNCIWRNGKNGLSGMAIADVTLLSGFIIALRADLEKLTSLSDRYNSHFETEGPHVLL YFDSVPTSRECVGFEAVQEVPVGLVQPASATLYDYYΝPERRCSVFYGAPSKSRLLATLC SAEVCQCAEGKCPRQRRALERGLQDEDGYRMKFACYYPRVEYG coπesponding to amino acids 1 - 1626 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1626 of HSCOC4JPEA_l JP23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence QSSHRGPGLTLPRGPAVLVSLGVACSSYRSCTQPVCSDTNFLPSQPQSNSPFPLLLTPS coπesponding to amino acids 1627 - 1685 of HSCOC4JPEA_l JP23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QSSHRGPGLTLPRGPAVLVSLGVACSSYRSCTQPVCSDTNFLPSQPQSNSPFPLLLTPS in HSC0C4JPEA_1JP23.
It should be noted that the known protein sequence (C04JHLMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 37 - Changes to C04 HUMAN VI
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HSCOC4JPEA_l JP23 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 38, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 -Amino acid mutations
Variant protein HSCOC4JPEA_l JP23 is encoded by the following transcript(s): HSCOC4JPEA_l_T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l_T28 is shown in bold; this coding portion starts at position 501 and ends at position 5555. The transcript also has the following SNPs as listed in Table 39 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of knoλvn SNPs in variant protein HSCOC4JPEA_l JP23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 39 - Nucleic acid SNPs
Variant protein HSCOC4J?EA_l JP24 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCOC4JPEA_l JT30. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP24 and C04 JHUMAN JV 1 : l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP24, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVLILG LSVGVQLQD RGQVNKGSVFLR ΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGHLFLQTDQPΓV ^GQRVRYR ALDQKMRPSTDTITVMV EΝSHGLRVRKI^WTviPSSIFQDDFVlPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVL PΝFEVKITPGKPYΓLTVPGHLDEMQLDIQARYTYGKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSW VSSPFSLDLSKTKPVHL GAPFLLQALVREMSGSPASGTPVKVSATVSSPGSVP EVQDIQQΝTDGSGQVSΓPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSTERPD SRPPRVGDTLΝLΝLRAVGSGATFSHY^YYΛIILSRGQIVFM U^PKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWIXSRKRLSCPKEKTTRKKIWVOT^
LPMMRSCEQPVAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALETLQEEDLTD EDDIPVRSFFPENWLWRVETNDRFQILTLWLPDSLTTWERHGLSLSKTKGLCVATPVQL RWREFFTLHLRI.PMSVT RFEQLELRPVLYT^LDKΝLTVSV VSPVEGLCLAGGGGLAQ QVL AGSARPVAFSVΛ TAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTTEΓPGNSDPNMΓPDGDFNSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNΛVXLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVAUTAFVTIALHHGLAVTQDEGAEPLKQRVEASISKASS FLGEKAS AGLLG AHAAAITA YALTLTKAPADLRG VAHNNLMAMAQETGDNLY GS V TGSQSNAVSPTPAPRNPSDPMPQAPALWΓETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNIALDALSAY IASHTTEERGLNNTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSK^ ^VKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQFFIVTVKGH ^ YTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGPJWRRRREAPKVVEEQESRV HYTVC1 WRNGKVGLSGMAIAD VTLLSGFLLΛLRADLEKLTSLSDRWSHFETEGPHVLL
YFDS coπesponding to amino acids 1 - 1528 of C04JHUMAN_V1, which also coπesponds to amino acids 1 - 152S of HSCOC4JPEA_l JP24, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SADVLCFTGHQVRADSWPPCVLLKSASVLRGSALASVAPWSGVCRTRMATG coπesponding to amino acids 1529 - 1579 of HSCOC4JPEA_l JP24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SAD VLCFTGHQVRADSWPPCVLLKSASVLRGSALASVAPWSGVCRTRMATG in HSCOC4JPEA_l JP24.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 40 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP24 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 41 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Amino acid mutations
68 ;
Variant protein HSCOC4JPEA_l JP24 is encoded by the following transcript(s): HSCOC4JPEA_1_T30, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4JPEA_l _T30 is shown in bold; this coding portion starts at position 501 and ends at position 5237. The franscript also has the following SNPs as listed in Table 42 (given according to their position on die nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP24 sequence provides support for the deduced sequence of tins variant protein according to the present invention). Table 42 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT31. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4 JPEA_1 JP25 and C04 JHUMAN JV 1 : l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP25, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTΠJIQGΠ^LFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKI Ε\^YMPSSIFQDDFVLPDISEPGTWKISARFSDGLESNSSTQFE ^KKYVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYTΪ^GKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIΓESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGΓPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSΓPΠIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFS ΥYMILSRGQRVFMNREPKRTLTSVS VDHHLA PSF TVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKI^SCPKEKTTRKKIWVNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCPJEPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWETHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSVNPTAAAAVSLKVVARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELΝPLDHRGRTLEΓPGΝSDPΝMΓPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LPXLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTUPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝ LLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSNA VSPTPAPRNPSDPMPQAPALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDWIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ LT<GLEEELQFSLGSKI ^KNGGNSKGTLKVLRT "NNLD IK TTCQDLQFFIVTVKGHVE YTMEANED\ΕDYEYDELPAKDDPDAPLQPVTTLQLFEGPΧIWRRRREAPKVVEEQESRV HYWCTWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDSVPTSRECVGFEAVQEVPVGLVQPASATLYDYYNPERRCSVFYGAPSKSRLLATLC
SAEVCQCAEG conesponding to amino acids 1 - 1593 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1593 of HSCOC4JPEA_l JP25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG coπesponding to amino acids 1594 - 1657 of HSCOC4J?EA_l JP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRLPG in HSCOC4J?EA_l JP25.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04 JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 43 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSC0C4JPEA_1 JP25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 44, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 44 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP25 is encoded by the following transcript(s): HSCOC4JPEA_l_T31, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l_T31 is shown in bold; this coding portion starts at position 501 and ends at position 5471. The franscript also has the following SNPs as listed in Table 45 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 45 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP26 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT32. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4_PEA_l_P26 and C04 JHUMAN JV1 : l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP26, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQ LKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTT^IQGINLLFSSI^GHLFLQTDQPIYHFFGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRKI ΕVYMPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYΛTVJGLLDEDGKKTFFR GLESQTKLVNGQSFFLSLSKAEFQDALEKLNMGITDLQGLPJLWAAALTESPGGEMEEAE LTSW\TVSSPFSLDLSKTKIVIILVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSTPIIΓPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLNLNLRAVGSGATFSHY ^MILSRGQIWMNTEPKRTLTSVSVFVDHHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKEKTTPJ KR2 NFQKAI LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL R REFHLHLRLPMS\Tvi^EQLELRPVLY T.DKNLTNSVHVSPVEGLCLAGGGGLAQ QVLVPAGSARPVAFSVWTAAAAVSLKVVARGSraFPVGDAVSKVLQffiKEGAlΗREEL
VYELΝPLDHRGRTLEΓPGΝSDPΝMΓPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LPVLPRGCGEQTMIYLAPTLAASRYLDKTΕQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPΛ^DRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAA TAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSΝAVSPTPAPRΝPSDPMPQ APALWIETTA YALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTNIALDALSANWTASHTTEERGLΝNTLSSTGRΝGFKSHALQLΝΝRQ IRGLEEELQFSLGSKIΝVKVGGΝSKGTLKVLRTYΝΛ ,D <-ΝTTCQDLQIEVTVKGHVE YTMEANEDYEDYEYDELPAKDDPDAPLQPVTTLQLFEGRRNPPXRREAPKVNEEQESRV FIYTVCIWRΝGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYVSHFETEGPHVLL YFDSVPTSRECVGFEAVQEVPVGLVQPASATLYDYYΝPERRCSVFYGAPSKSRLLATLC
SAEVCQCAEG coπesponding to amino acids 1 - 1593 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1593 of HSCOC4JPEA_l JP26, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL coπesponding to amino acids 1594 - 1691 of HSC0C4J?EA_1JP26, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP26, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
ETEGLGRGSGGGMAGAPPTLSDGFPNFREVPSPASRPGAGSAGRGWLQDEVCLLLPPC GVRSVFPPRPWPDPPSGTGCFGLSGCSLLLLQVMHAACLL in HSCOC4JPEA_l JP26.
It should be noted that the known protein sequence (C04 JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 46 - Changes to C04_HUMAN_V1
The location of die variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCOC4JPEA_l JP26 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 47, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 47 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP26 is encoded by the following franscript(s): HSC0C4JPEA_1JT32, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l JT32 is shown in bold; this coding portion starts at position 501 and ends at position 5573. The transcript also has the following SNPs as listed in Table 48 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 48 - Nucleic acid SNPs
Variant protein HSCOC4_PEA_l JP30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT40. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4J?EA_l JP30 and C04JHUMANJV3 (SEQ ID NO: 487): l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP30, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLTWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTT QGi LFSSPΛGFΪLFLQTOQPrYMJGQR YR ENSHGLRVRKKE^TvlPSSrFQDDFVIPDISEPGTWi SAPxFSDGLESNSSTQFEV KYΛ
PNFEVKITPGKPYILTNPGHLDEMQLDIQARYTYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSFFLSLSKAJEFQDALEKXΝMGITDLQGLRLYVA.AAIIESPGGEMEEAE LTSWYTΛ^SSPFSLDLSKTTV^ IILVPGAPFLLQALVREMSGSPASGLPVKVSATNSSPGSVP EVQDIQQΝTDGSGQVSRPIIRPQTISELQLSVSAGSPHPAIARLTNAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFSFΓ ΎΎMILSRGQΓWMΝREPKRTLTSVSVFVDHHLA PSFYFVAFYΎHGDHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRLSCPKEKTTP <I<-IWVΝFQKAJΝΕKLGQYASPTAKIICCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLPJΑ SRDKGQAGLQRALELLQEEDLID EDDTPVRSFFPENΛVXWRVETVDRFQTLTLWLPDSLTTWEIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLYN DKNLTVSVHVSPVEGLCLAGGGGLAQ QVLWAGSARPVAFSVWTAAAAVSLKΛNARGSFEFPVGDAVSKVLQFFIKEGAIHREEL VYELNPLDHRGRTLEIPGNSDPNMIPDGDFNS YVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMΓTAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSNWLLSQQQADGSFQ DPCPVLDRSMQGGLVGNDETVALT.AFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHNNLMAMAQETGDNLY GS coπesponding to amino acids 1 - 1232 of C04 JHUMAN JV3, which also coπesponds to amino acids 1 - 1232 of HSCOC4JPEA_l JP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RNPVRLLQPRAQMFCVLRGTK coπesponding to amino acids 1233 - 1253 of HSCOC4JPEA_l JP30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2,An isolated polypeptide encoding for a tail of HSCOC4JPEA_1JP30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RNPVRLLQPRAQMFCVLRGTK in HSCOC4JPEA_l JP30. It should be noted that the known protein sequence has one or more changes than the sequence given at the end of the application and named JS being the amino acid sequence for C04JHLTMANJV3. These changes were previously Lnown to occur and are listed in the table below. Table 49 - Changes to C04_HUMAN_V3
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSCOC4JPEA_l JP30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 50, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 50 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP30 is encoded by the following transcript(s): HSCOC4JPEA_1_T40, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_1_T40 is shown in bold; this coding portion starts at position 501 and ends at position 4259. The transcript also has the following SNPs as listed in Table 51 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 51 - Nucleic acid SNPs
Variant protein HSC0C4JPEA_1 JP38 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l_T2. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4J?EA_l JP38 and C04 JHUMAN: l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP38, comprising a first amino acid sequence being at least 90 % homologous to
MPxLLWGLIΛVASSFFTUSLQKFPJ^LLFSPSVVHLGVPLSVGVQLQDVPRGQVNKGSVFLR
ΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK
DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPΓ/ΝPGQRVR\TI ALDQKMRPSTDTITVMV ENSHGLRVT KI I3VYMPSS1TQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYNL PNFEVKITPGKPYILTVPGHLDEMQLDIQ ARYIYGKP VQGVAYVRFGLLDEDGKKTFFR GLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGTPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIΠPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSΓERPD SRPPRVGDTLNLNLRAVGSGATFSHYΎYMILSRGQI MNREPKRTLTSVS VDIIHLA PSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELS VDGAKQYRNGES VKLHLETDS
LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKRI^CPKEKTTRlva IlNNNFQKA EKLGQYASPTAKI CCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEILQEEDLID EDDIPVRSFFPENWLWRVETVDRFQILTLWLPDSLTTWEIHGLSLSKTKG coπesponding to amino acids 1 - 81S of C04 JHUMAN, which also coπesponds to amino acids 1 - 818 of HSCOC4JPEA_l JP38, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVTLSGPQVTLLPFPCTPAPCSLCS coπesponding to amino acids 819 - 843 of HSC0C4JPEA_1JP38, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP38, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DVTLSGPQVTLLPFPCTPAPCSLCS in HSCOC4JPEA_l JP38. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSCOC4JPEA_l JP38 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 52, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 52 - Amino acid mutations
The glycosylation sites of variant protein HSCOC4JPEA_l JP38, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 53 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 53 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP38, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 54 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 54 - Phosphorylation site(s)
Variant protein HSCOC4JPEA_l JP38 is encoded by the following franscript(s): HSCOC4JPEA_l_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCOC4JPEA_l_T2 is shown in bold; this coding portion starts at position 501 and ends at position 3029. The franscript also has the following SNPs as listed in Table 55 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 55 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by rranscript(s) HSC0C4JPEA_1JT5. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4J?EA_l JP39 and C04 JHUMAN: l.An isolated chimeric polypeptide encoding for HSC0C4J?EA_1JP39, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLI ASSFFTLSLQKPPLLLFSPSVNHLGVPLSVGVQLQDVPRGQNVKGSVFLR ΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIΝLLFSSRRGHLFLQTDQPIYΝPGQRVRYR ALDQKMI^^ EΝSHGLRVRKTs£VYMPSSIFQDDFVIPDISEPGT \TαSAPvFSDGLESΝSSTQFEVKKY^ PNFEVTαTPGKP TLTNPGHLDEMQLDIQARYTYGKPVQGVA YVRFGLLDEDGKKTFFR GLESQTKLVΝGQSfflSLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQ coπesponding to amino acids 1 - 387 of C04 JHUMAN, which also coπesponds to amino acids 1 - 387 of HSCOC4JPEA_l JP39, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSSRGEG coπesponding to amino acids 388 - 394 of HSC0C4JPEA_1 JP39, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP39, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSSRGEG in HSCOC4JPEA_l JP39.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSCOC4JPEA_l JP39 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 56, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4J?EA_l JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 56 - Amino acid mutations
The glycosylation sites of variant protein HSCOC4JPEA_l JP39, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 57 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 57 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP39, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 58 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 58 - Phosphorylation site(s)
Variant protein HSCOC4JPEA_l JP39 is encoded by the following franscript(s): HSC0C4JPEA_1JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l JT5 is shown in bold; this coding portion starts at position 501 and ends at position 1682. The transcript also has the following SNPs as listed in Table 59 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 59 - Nucleic acid SNPs
Variant protein HSC0C4JPEA_1 JP40 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSC0C4JPEA_1_T7. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4J?EA_l JP40 and C04 JHUMAN: l.An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP40, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQWKGSVFLR NPSRNNNPCSPKNDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGIKLLFSSRRGHLFLQTDQPΓYT^GQRVRYR ALDQKMRPSTDTITVMV ENSHGLRVRKKE\ΥMPSSLFQDDFNRPDISEPGTWKISARFSDGLESNSSTQFEVKKY coπesponding to amino acids 1 - 236 of C04JHUMAN, which also coπesponds to amino acids 1 - 236 of HSCOC4JPEA_1_P40, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGEWTEPHFPLKGRVPGRPGEAEYGHY coπesponding to amino acids 237 - 263 of HSCOC4JPEA_l JP40, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP40, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AGEWTEPIIFPLKGRVPGRPGEAEYGHY in HSCOC4JPEA_l JP40. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSCOC4JPEA_l JP40 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 60, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP40 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 60 - Amino acid mutations
The glycosylation sites of variant protein HSCOC4JPEA_l JP40, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 61 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 61 - Glycosylation site(s)
The phosphorylation sites of variant protein HSCOC4JPEA_l JP40, as compared to the known protein Complement C4 precursor [Contains: C4a anaphylatoxin], are described in Table 62 (given according to their position(s) on the amino acid sequence in the first column; the second colunm indicates whether the phosphorylation site is present m the variant protein; and the last column indicates whether die position is different on the variant protein). Table 62 - Phosphoiylation site(s)
Variant protein HSCOC4JPEA_l JP40 is encoded by the following franscript(s): HSCOC4JPEA_l_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSC0C4JPEA_1JT7 is shown in bold; this coding portion starts at position 501 and ends at position 1289. The franscript also has the following SNPs as listed in Table 63 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP40 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 63 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP41 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCOC4JPEA_l JT8. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4JPEA_l JP41 and C04 JHUMAN JV1 : l .An isolated chimeric polypeptide encoding for HSCOC4JPEA_l JP41, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLI ASSFFTLSLQKPRLLLFSPSVNHLGVPLSVGVQLQDVPRGQVVKGSVFLR ΝPSRΝΝNPCSPKNDFTLSSERDFALLSLQNPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPI YNPGQRVRYRVFALDQKMRPSTDTITVMV ENSHGLRVRI^KEVYMPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVL PNFEVKITPGKPYTLTVPGHLDEMQLDIQARYIYGKPVQGV A YVRFGLLDEDGKKTFFR GLESQTKLλ^NGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGTPVKVSATVSSPGSVP EVQDIQQNTDGSGQVSIPIITPQTISELQLSVSAGSPHPAIARLTNAAPPSGGP GFLSIERPD SRPPRVGDTLΝL^π:RAVGSGATFSmn(^NIILSRGQIVFMΝREPK TLTSVSVF\ HHLA PSFYFVAFY ΗGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSR10LLSCPKEKTTPΧKI<^NVNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDIPVRSFFPENΛVXWRVET TDRFQILTLWLPDSLTT ΈIHGLSLSKTKGLCVATPVQL RWREFHLHLRLPMSVRRFEQLELRPVLY TDKNL SVITVSPVEGLCLAGGGGLAQ QVLWAGSAPJ>VAFSVWTAAAAVSLKWA GSFEFPVGDAVSKVLQIEKEGAIHREEL VYELΝPLDHRGRTLEΓPGΝSDPΝMIPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQT ΠYLAPTLAASR XDKTEQWSTLPPETKDHAVDLIQKGYMRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝ LLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLG AHA AAIT AY ALTLTKAP ADLRG V AHΝΝLM AMAQETGDNL Y WGS V TGSQSNAVSPTPAPRNPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALSAY IASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKTNVKVGGNSKGTLKVLRTYNΛT.DMKNTTCQDLQFFIVTVKGFTVΕ YTMEANEDYEDYEYDELPAKDDPDAFLQPVTPLQLFEGRIWRRRREAPKVVEEQESRV FT TVCIWRNGKVGLSGMAIADVTLLSGFHALRADLEKLTSLSDRYNSHFETEGPHVLL
YFDSV coπesponding to amino acids 1 - 1529 of C04 JHUMAN JV 1 , which also coπesponds to amino acids 1 - 1529 of HSCOC4JPEA_l JP41, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGER coπesponding to amino acids 1530 - 1533 of HSCOC4JPEA_l JP41, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSC0C4JPEA_1JP41, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGER in HSCOC4 JPEA_1 JP41.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04 JHUMANJV 1. These changes were previously known to occur and are listed in the table below. Table 64 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HSCOC4JPEA_l JP41 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 65, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP41 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 65 - Amino acid mutations
Variant protein HSCOC4JPEA_l JP41 is encoded by the following franscript(s): HSCOC4JPEA_l_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l JT8 is shown in bold; this coding portion starts at position 501 and ends at position 5099. The franscript also has the following SNPs as listed in Table 66 (given according to their position on '.he nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP41 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 66 - Nucleic acid SNPs
Variant protein HSCOC4JPEA_l JP42 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCOC4JPEA_l_T12. An alignment is given to the known protein (Complement C4 precursor [Contains: C4a anaphylatoxin]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCOC4 J»EA_1 JP42 and C04 JHUMAN JV 1 : l.An isolated chimeric polypeptide encoding for HSC0C4JPEA_1JP42, comprising a first amino acid sequence being at least 90 % homologous to MRLLWGLIWAS8FFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ WKGSVFLR NPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLK DSLSRTTMQGINLLFSSRRGHLFLQTDQPIYNPGQRVR TIWALDQKMRPSTDTITVMV ENSHGLRVRKKIIVYMPSSIFQDDFVIPDISEPGTWKISATFSDGLESNSSTQFEVKXYNL POTEVKITPGKPYILTWGHLDEMQLDIQARYIYGKPVQGVAYVPJFGLLDEDGKKTFFR GLESQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYNAAAIIESPGGEMEEAE LTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATNSSPGSVP EVQDIQQΝTDGSGQVSIPIITPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPD SRPPRVGDTLΝLΝLRAVGSGATFS N ^LILSRGQIVFMΝT EPKRTLTSVS VDHHLA PSFYFVAFY ΉGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDS LALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAG LAFSDGDQWTLSRKI LSCPKEKTTPJKKRNNNFQKAINEKLGQYASPTAKRCCQDGVTR LPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKGQAGLQRALEΓLQEEDLΓD EDDTPVRSFI EN LWRVETVDRFQILTLWLPDS TTWEMGLSLSKTKGLCVATPVQL RVFREFHLHLRLPMSVRI 7EQLELRPVLYNYLDI NLR^NSVHVSPVEGLCLAGGGGLAQ QVL AGSARPVAFSVΛΦTAAAAVSLKVVARCSFEFPVGDAVSKVLQFFIKEGAIHREEL WΈLΝPLDHRGRTLEΓPGΝSDPΝMΓPDGDFΝSYVRVTASDPLDTLGSEGALSPGGVASL LRLPRGCGEQTMNT.APTLAASRYLDKTEQWSTTPPETTΦHAVDLIQKG^MRIQQFRK ADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKLQETSΝWLLSQQQADGSFQ DPCPVLDRSMQGGLVGΝDETVALTAFVTIALHHGLAVFQDEGAEPLKQRVEASISKASS FLGEKASAGLLGAHAAAITAYALTLTKAPADLRGVAHΝΝLMAMAQETGDΝLYWGSV TGSQSNAVSPTPAPRNPSDPMPQAPALWΓETTAYALLHLLLHEGKAEMADQAAAWLTR QGSFQGGFRSTQDTVIALDALS A YWI ASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ IRGLEEELQFSLGSKINNKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQFFIVTNKGHVE YTMEAΝEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGPJV ^RRRREAPKVNEEQESRV
HYTNCIW coπesponding to amino acids 1 - 1473 of C04 JHUMAN JV1, which also coπesponds to amino acids 1 - 1473 of HSCOC4JPEA_l JP42, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WAPGAALGQGREGRTQAGAGLLEPAQAEPGRQLTRLHR coπesponding to amino acids 1474 - 1511 of HSCOC4JPEA_l JP42, a third amino acid sequence being at least 90 % homologous to RNGKVGLSGMAIADVTLLSGFHALRADLEK coπesponding to amino acids 1474 - 1503 of C04 JHUMAN JV 1 , which also coπesponds to amino acids 1512 - 1541 of
HSCOC4JPEA_l JP42, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWSATQGNPLCPRY conesponding to amino acids 1542 - 1555 of HSCOC4JPEA_l JP42, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCOC4JPEA_l JP42, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for WAPGAALGQGREGRTQAGAGLLEPAQAEPGRQLTRLHR, coπesponding to HSC0C4J?EA_1J?42. 3.An isolated polypeptide encoding for a tail of HSCOC4JPEA_l JP42, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWSATQGNPLCPRY in HSCOC4_PEA_l JP42.
It should be noted that the known protein sequence (C04JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for C04 JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 67 - Changes to C04_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSCOC4JPEA_l JP42 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 68, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP42 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 68 - Amino acid mutations
λ ariant protein HSCOC4JPEA_l JP42 is encoded by the following franscript(s): HSC0C4JPEA_1JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCOC4JPEA_l _T12 is shown in bold; this coding portion starts at position 501 and ends at position 5165. The transcript also has the following SNPs as listed in Table 69 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCOC4JPEA_l JP42 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 69 - Nucleic acid SNPs
As noted above, cluster HSCOC4 features 79 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. Tliese segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSCOC4JPEA_l_node_l according to die present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4_PEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4_PEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_l JT20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4_PEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l JT40. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_5 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l _T2, HSCOC4JPEA_l JT3, HSCOC4_PEA_l_T4, HSCOC4JPEA_l_T5, HSC0C4JPEA_1JT7, HSCOC4JPEA_l_T8, HSC0C4JPEA_HT1 1, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JT15, HSCOC4JPEA_l JT20, HSCOC4J?EA_l_T21, HSC0C4JPEA_1JT25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1_T30, HSC0C4JPEA_1JT31, HSCOC4JPEA_l JT32 and HSCOC4JPEA_1_T40. Table 71 below describes the starting and ending position of this segment on each franscript. Table 71 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_7 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l _T3, HSCOC4JPEA_l JT4, HSCOC4_PEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4J?EA_l_T14, HSCOC4J?EA_l_T15, HSCOC4J?EA_l JT20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l_T25, HSC0C4J?EA_1JT28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1_T40. Table 72 below describes the starting and ending position of this segment on each franscript. Table 72 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_30 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4JPEA_l JT12, HSCOC4_PEA_l_T14, HSC0C4JPEA_1JT15, HSCOC4J?EA_l JT20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l_T25, HSCOC4J?EA_l_T28, HSCOC4J?EA_1_T30, HSCOC4_PEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_33 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl , HSCOC4_PEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4_PEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4_PEA_l_T12, HSCOC4J?EA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4JPEA_l_T25, HSC0C4JPEA_1JT28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_35 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4J?EA_l JT14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4_PEA_l_T25, HSCOC4_PEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_37 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l JT2, HSCOC4J?EA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4_PEA_l_T12, HSCOC4JPEA_l_T14, HSC0C4J?EA_1JT15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSC0C4J?EA_1JT25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSC0C4J?EA_1JT31, HSCOC4_PEA_l_T32 and HSCOC4JPEA_1_T40. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_39 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l _T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_TS, HSCOC4J?EA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l JT14, HSCOC4J?EA_l_T15, HSCOC4_PEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4_PEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4_PEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_43 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSC C4JPEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4J?EA_l JT4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSC0C4JPEA_1_T11, HSCOC4_PEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1JT20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4J?EA_l JT28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSC0C4J?EA_1JT32 and HSCOC4JPEA_l JT40. Table 78 below describes the starting and ending position of this segment on each franscript. Table 78 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_48 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l _T2 and HSCOC4J?EA_l_T3. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_49 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4_PEA_l fl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSC0C4JPEA_1JT25, HSCOC4JPEA_l JT28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Segment cluster HSCOC4 JPEA_l_node_51 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l _T2, HSCOC4J?EA_l_T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l JT5, HSCOC4JPEA_l_T7, HSCOC4_PEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4_PEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l JT40. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on franscripts
Segment cluster HSCOC4JPEA_l_node_58 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSC0C4JPEA_1JT8, HSCOC4J?EA_lJTl l, HSC0C4J?EA_1JT12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4_PEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSC0C4_PEA_1JT31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1JT40. Table 82 below describes the starting and ending position of this segment on each franscript. Table 82 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_59 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l _T4. Table 83 below describes the starting and ending position of this segment on each franscript. Table 83 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_62 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l JT5, HSCOC4_PEA_l JT7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4J?EA_l JT12, HSCOC4J?EA_l_T14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_66 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4_PEA_l JT15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ72 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l JT2, HSCOC4J?EA_l_T3, HSCOC4_PEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4_PEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1_T20, HSC0C4J?EA_1JT21, HSCOC4J?EA_l_T25, HSCOC4J?EA_l_T28, HSCOC4J?EA_1_T30, HSCOC4JPEA_l_T31 and HSCOC4J?EA_l_T32. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ77 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_T14 and HSCOC4JPEA_l JT20. Table 87 below describes the starting and ending position of this segment on each franscript. Table 87 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ79 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl 1. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_93 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSCOC4JPEA_l JT8, HSCOC4J?EA_l_T12 and HSCOC4JPEA_l_T21. Table 89 below describes the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_100 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l JT21. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Segment cluster HSCOC4JPEA__l_node_105 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_T28 and HSCOC4JPEA_l JT32. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (with regard to breast cancer), shown in Table 92. Table 92 - Oligonucleotides related to this segment
Segment cluster HSCOC4JPEA_l_node_107 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the followmg transcript(s): HSCOC4JPEA_l JT25, HSCOC4JPEA_l_T28 and HSCOC4_PEA_l_T32. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_108 according to the present invention is supported by 120 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4J?EA_lJTl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSC0C4J?EA_1JT15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSC0C4J?EA_1JT25, HSCOC4_PEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_109 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l JT25 and HSCOC4JPEA_l JT28. Table 95 below describes the starting and ending position of this segment on each transcript. Table 95 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_l 10 according to the present invention is supported by 97 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l JT5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1_T20, HSCOC4J?EA_l_T21, HSC0C4JPEA_1JT25, HSCOC4J?EA_l JT28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l JT40. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_l 12 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4J?EA_l JT3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l JT5, HSCOC4_PEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4_PEA_l_Tl l , HSCOC4J?EA_l_T12, HSCOC4J?EA_l JT14, HSCOC4J?EA_l JT15, HSCOC4JPEA_l JT20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l JT25, HSCOC4J?EA_l JT28, HSCOC4J?EA_1_T30, HSC0C4JPEA_1JT31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1_T40. Table 97 below describes the starting and ending position of this segment on each transcript. Table 97 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_l 13 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JT25, HSCOC4_PEA_l_T28 and HSCOC4J?EA_l_T32. Table 98 below describes the starting and ending position of this segment on each transcript. Table 98 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSCOC4JPEA_l_node_2 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l_T5, HSCOC4J?EA_l_T7, HSCOC4J?EA_l_T8. HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4_PEA_JJT14, HSCOC4JPEA_l JT15, HSCOC4JPEA_1_T20, HSC0C4JPEA_1JT21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l JT40. Table 99 below describes the starting and ending position of this segment on each franscript. Table 99 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_8 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l _T3, HSCOC4_PEA_l _T4, HSCOC4_PEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4_PEA_lJTl l, HSCOC4J?EA_l_T12, HSC0C4J?EA_1JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4_PEA_l_T32 and HSCOC4JPEA_l JT40. Table 100 below describes the starting and ending position of this segment on each transcript. Table 100 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_10 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l JT5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4J?EA_l_T14, HSCOC4JPEA_l JT15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4_PEA_l_T25, HSCOC4J?EA_l JT28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l JT32 and HSCOC4JPEA_1_T40. Table 101 below describes the starting and ending position of this segment on each franscript. Table 101 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_12 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l JT2, HSCOC4JPEA_l JT3, H8COC4JPEA_l _T4, HSCOC4J?EA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4JPEA_l_T12, HSCOC4J?EA_l_T14, HSC0C4J?EA_1JT15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l JT21, HSCOC4_PEA_l JT25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 102 below describes the starting and ending position of this segment on each transcript. Table 102 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_14 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4J?EA_l JT3, HSCOC4_PEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4_PEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_lJTl l, HSC0C4JPEA_1JT12, HSCOC4JPEA_l JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1_T40. Table 103 below describes the starting and ending position of this segment on each franscript. Table 103 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_17 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l_T2, HSCOC4_PEA_l_T3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1JT20, HSCOC4J?EA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T2S, HSCOC4JPEA_1_T30, HSC0C4JPEA_1JT31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 104 below describes the starting and ending position of this segment on each franscript. Table 104 - Segment location on transcripts 76:
Segment cluster HSCOC4JPEA_l_node_19 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l _T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l _T5, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4_PEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4_PEA_l _T25, HSCOC4J?EA_l_T28, HSCOC4_PEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4 EA_1_T40. Table 105 below describes the starting and ending position of this segment on each franscript. Table 105 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_21 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4_PEA_lJTl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l JT3, HSCOC4_PEA_l_T4, HSCOC4JPEA_l JT5, HSCOC4_PEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4_PEA_l JT15, HSCOC4J?EA_1_T20, HSCOC4J?EA_l_T21, HSC0C4J?EA_1JT25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSC0C4JPEA_1JT31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1JT40. Table 106 below describes the starting and ending position of this segment on each franscript. Table 106 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_22 according to the present invention is supported by 26 libraries. The number of libraries was detsπnined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l _T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_ljri l, HSCOC4JPEA_l_T12, HSCOC4J?EA_l_T14, HSCOC4J?EA_l JT15, HSCOC4J?EA_1_T20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1_T40. Table 107 below describes the starting and ending position of this segment on each transcript. Table 107 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_28 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4J?EA_l_T14, HSCOC4JPEA_l JT15, HSCOC4J?EA_1_T20, HSCOC4JPEA_l_T21, HSC0C4JPEA_1JT25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 108 below describes the starting and ending position of this segment on each franscript. Table 108 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_29 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l JT5. Table 109 below describes the starting and ending position of this segment on each franscript. Table 109 - Segment location on franscripts
Segment cluster HSCOC4JPEA_l_node_41 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4_PEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l JT5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4_PEA_l_Tl l, HSCOC4_PEA_l_T12, HSCOC4_PEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4_PEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 110 below describes the starting and ending position of this segment on each transcript. Table 110 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_45 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_lJTl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4_PEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4_PEA_l_T7, HSCOC4JPEA_lJTS, HSCOC4_PEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4_PEA_l JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_lJJ21, HSCOC4_PEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1JT30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 111 below describes the starting and ending position of this segment on each transcript. Table 111 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_47 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSC0C4JPEA_1_T1, HSCOC4JPEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA JT25, HSCOC4JPEA_l_T28, HSCOC4_PEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4_PEA_l_T32 and HSCOC4JPEA_1_T40. Table 112 below describes the starting and ending position of this segment on each transcript. Table 112 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_50 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl and HSCOC4JPEA_l _T3. Table 1 13 below describes the starting and ending position of this segment on each transcript. Table 113 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_53 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSC0C4JPEA_1_T1, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4_PEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4J?EA_l JT40. Table 1 14 below describes the starting and ending position of this segment on each franscript. Table 114 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_55 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l _T3, HSCOC4JPEA_l JT4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4_PEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JT15, HSCOC4J?EA_l JT20, HSCOC4_PEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l _T40. Table 115 below describes the starting and ending position of this segment on each franscript. Table 115 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_57 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l JT2, HSCOC4_PEA_l_T3, HSCOC4_PEA_l JT4, HSCOC4_PEA_l _T5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4_PEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JT15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSC0C4_PEA_1JT25, HSCOC4JPEA_l JT28, HSCOC4J?EA_1_T30, HSC0C4JPEA_1JT31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 116 below describes the starting and ending position of this segment on each franscript. Table 116 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_60 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4J?EA_l_T3, HSCOC4JPEA_l_T4, HSC0C4JPEA_1JT5, HSC0C4JPEA__1JT7, HSCOC4J?EA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JT15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l JT25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSC0C4_PEA_1JT31, HSCOC4_PEA_l_T32 and HSCOC4JPEA_l JT40. Table 117 below describes the starting and ending position of this segment on each franscript. Table 117 - Segment location on franscripts
HSCOC4JPEA_1_T40 3655 3730 Segment cluster HSCOC4J?EA_l_node_64 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l JTl, HSCOC4J?EA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l_T5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4J?EA_l_T14, HSCOC4_PEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1_T30, HSCOC4JPEA_l_T31, HSC0C4JPEA_1JT32 and HSCOC4JPEA_1_T40. Table 118 below describes the starting and ending position of this segment on each franscript. Table 118 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_69 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l_T2, HSCOC4J?EA_l JT3, HSCOC4JPEA_l JT4, HSCOC4J?EA_l _T5, HSCOC4JPEA_l JT7, HSCOC4_PEA_l_T8, HSC0C4J?EA_1_T11, HSCOC4JPEA_l_T12, HSC0C4J?EA_1JT14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1_T20, HSCOC4_PEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l JT28, HSCOC4J?EA_1_T30, HSCOC4_PEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_l JT40. Table 119 below describes the starting and ending position of this segment on each franscript.
77 £
Segment cluster HSCOC4_PEA_l_node_70 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l JT2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl l , HSC0C4J?EA_1JT12, HSCOC4JPEA_l_T14, HSC0C4J?EA_1JT15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31 and HSCOC4JPEA_l_T32. Table 120 below describes the starting and ending position of this segment on each transcript. Table 120 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_71 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l JT2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSC0C4J?EA_1JT5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_TS, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l JT21, HSCOC4JPEA_l JT25, HSCOC4_PEA_l_T28, HSCOC4J?EA_1_T30, HSCOC4_PEA_l_T31 and HSCOC4J?EA_l_T32. Table 121 below describes the starting and ending position of this segment on each franscript. Table 121 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ73 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_1_T20. Table 122 below describes the starting and ending position of this segment on each franscript. Table 122 - Segment location on transcripts
Segment cluster HSCOC4_PEA_l_nodeJ74 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4J?EA_l_T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4_PEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JTl 5, HSC0C4JPEA_1 JT20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l JT28, HSCOC4JPEA_1JT30, HSCOC4J?EA_l_T31 and HSCOC4JPEA_l_T32. Table 123 below describes the starting and ending position of this segment on each franscript Table 123 - Segment location on franscripts
Segment cluster HSCOC4_PEA_l_nodeJ75 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l_T5, HSC0C4J?EA_1JT7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSC0C4J?EA_1JT14, HSCOC4J?EA_l_T15, HSCOC4_PEA_1_T20, HSC0C4JPEA_1JT21, HSCOC4J?EA_l_T25, HSC0C4_PEA_1JT28, HSCOC4JPEA_l JT30, HSCOC4JPEA_l_T31 and HSCOC4JPEA_l_T32. Table 124 below describes the starting and ending position of this segment on each franscript. Table 124 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ76 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4JPEA_l JT2, HSCOC4_PEA_l_T3, HSCOC4J?EA_l JT4, HSCOC4JPEA_l JT5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSC0C4J?EA_1JT11, HSCOC4J?EA_l_T12, HSCOC4J?EA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1_T20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l_T25, HSCOC4J?EA_l_T28, HSCOC4J?EA_1JT30, HSCOC4J?EA_l_T31 and HSCOC4J?EA_l_T32. Table 125 below describes the starting and ending position of this segment on each transcript. Table 125 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_nodeJ7S according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4_PEA_l_T2, HSCOC4J?EA_l _T3, HSCOC4_PEA_l _T4, HSCOC4J?EA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4_PEA_l_Tl l , HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4J?EA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSC0C4JPEA_1JT31 and HSCOC4J?EA_l JT32. Table 126 below describes the starting and ending position of this segment on each franscript. Table 126 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_80 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following ttanscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l JT2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l JT4, HSCOC4_PEA_l_T5, HSC0C4JPEA_1JT7, HSCOC4JPEA_l_TS, HSCOC4JPEA_l_Tl l, HSC0C4JPEA_1JT12, HSCOC4J?EA_l_T14, HSC0C4JPEA_1JT15, HSCOC4J?EA_1_T20, HSCOC4JPEA_l_T21, HSC0C4JPEA_1JT25, HSCOC4J?EA_l_T28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l JT31 and HSCOC4JPEA_l_T32. Table 127 below describes the starting and ending position of this segment on each franscript. Table 127 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_S2 according to the present invention can be found in the following transcript(s): HSCOC4J?EA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4_PEA_l_T4, HSC0C4JPEA_1JT5, HSCOC4J?EA_l_T7, HSCOC4JPEA_IJT8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1_T20, HSCOC4J?EA_l_T21, HSCOC4_PEA_l_T25, HSCOC4J?EA_l JT28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31 and HSCOC4JPEA_l_T32. Table 128 below describes the starting and ending position of this segment on each transcript. Table 128 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_83 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4J?EA_l _T2, HSCOC4_PEA_l_T3, HSCOC4JPEA_l_T4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA__l_T8, HSCOC4JPEA_l_Tl l , HSC0C4J?EA_1JT12, HSCOC4JPEA_l JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_l JT20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSC0C4JPEA_1JT31 and HSCOC4JPEA_l_T32. Table 129 below describes the starting and ending position of this segment on each transcript. Table 129 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_84 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4JPEA_l _T4, HSCOC4J?EA_l _T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l_T15, HSCOC4JPEA_1JT20, HSCOC4J?EA_l JT21, HSCOC4JPEA_l_T25, HSCOC4_PEA_l_T28, HSCOC4JPEA_l JT30, HSCOC4J?EA_l_T31 and HSCOC4JPEA_l_T32. Table 130 below describes the starting and ending position of this segment on each transcript. Table 130 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_85 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4J?EA_l JT2, HSC0C4JPEA_1JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4J?EA_l_T7, HSCOC4_PEA_l_T8, HSCOC4_PEA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_1_T20, HSCOC4J?EA_l_T21, HSCOC4J?EA_l JT25, HSCOC4J?EA_l_T28, HSCOC4_PEA_1_T30, HSCOC4JPEA_l_T31 and HSCOC4_PEA_l_T32. Table 131 below describes the starting and ending position of this segment on each franscript. 7 b/e 131 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_86 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JT12. Table 132 below describes the starting and ending position of this segment on each franscript. Table 132 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_87 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l JTl, HSCOC4JPEA_l_T2, HSCOC4J?EA_l_T3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l _T5, HSCOC4JPEA_l_T7, HSCOC4_PEA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4_PEA_1_T20, HSC0C4J?EA_1JT21, HSCOC4JPEA_l_T25, HSCOC4J?EA_l_T28, HSCOC4JPEA_1_T30, HSCOC4 J?EA_1 _T31 and HSCOC4JPEA_l_T32. Table 133 below describes the starting and ending position of this segment on each franscript. Table 133 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_88 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): HSCOC4JPEA_l_T12. Table 134 below describes the starting and ending position of this segment on each franscript. Table 134 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_89 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4J?EA_l _T3, HSCOC4J?EA_l_T4, HSC0C4J?EA_1JT5, HSCOC4J?EA_l_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_lJTl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSC0C4J?EA_1_T15, HSCOC4J?EA_1JT20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l JT25, HSCOC4_PEA_l JT28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l JT31 and HSCOC4_PEA_l_T32. Table 135 below describes the starting and ending position of this segment on each transcript. Table 135 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_90 according to the present invention can be found in the following transcript(s): HSCOC4_PEA_l_Tl, HSCOC4_PEA_l_T2, HSCOC4JPEA_l_T3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l_T5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl 1, HSCOC4JPEA_l JT12, HSCOC4J?EA_l_T14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1JT20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_ 1_T30, HSCOC4JPEA_l_T31 and HSCOC4J?EA_l_T32. Table 136 below describes the starting and ending position of this segment on each transcript. Table 136 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_91 according to the present invention is supported by 78 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4_PEA_l_Tl, HSCOC4J?EA_l_T2, HSCOC4JPEA_l_T3, HSC0C4J?EA_1JT4, HSCOC4J?EA_l JT5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4J?EA_lJTl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l JT15, HSCOC4_PEA_l JT20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4JPEA_l_T31 and HSCOC4J?EA_l_T32. Table 137 below describes the starting and ending position of this segment on each transcript. Table 137 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_92 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4_PEA_l JT2, HSCOC4J?EA_l_T3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4J?EA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4JPEA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA„l_T15, HSCOC4J?EA_1_T20, HSCOC4JPEA_l_T21, HSCOC4J?EA_l JT25, HSCOC4JPEA_l_T28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31 and HSCOC4JPEA_l_T32. Table 138 below describes the starting and ending position of this segment on each transcript. Table 138 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node__94 according to the present invention can be found in the following transcript(s): HSCOC4JPEA_l_TS, HSCOC4J?EA_l_T12 and HSCOC4JPEA_l_T21. Table 139 bebw describes the starting and ending position of this segment on each transcript. Table 139 - Segment location on transcripts
Segment cluster HSCOC4J?EA_l_node_96 according to the present invention can be found in the following transcript(s): HSCOC4_PEA_l_Tl, HSCOC4JPEA JT2, HSC0C4JPEA_1 JT3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l _T5, HSCOC4JPEA_l _T7, HSC0C4J?EA_1JT8, HSCOC4JPEA_lJTl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4J?EA_l JT15, HSCOC4J?EA_1JT20, HSCOC4JPEA_l JT21, HSC0C4JPEA_1JT25, HSCOC4J?EA_l_T28, HSCOC4J?EA_l_T31 and HSCOC4JPEA_l_T32. Table 140 below describes the starting and ending position of this segment on each franscript. Table 140 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_97 according to the present invention can be found in the following transcript(s): HSCOC4J?EA_l_Tl, HSCOC4J?EA_l_T2, HSC0C4J?EA_1JT3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4J?EA_l_Tl l, HSCOC4JPEA_l_T12, HSC0C4J?EA_1JT14, HSCOC4J?EA_l_T15, H3COC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSC0C4JPEA_1JT25, HSC0C4J?EA_1JT28, HSCOC4JPEA_l_T31 and HSCOC4JPEA_l_T32. Table 141 below describes the starting and ending position of this segment on each franscript. Table 141 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_98 according to the present invention is supported by 93 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4_PEA_l JTl, HSCOC4J?EA_l_T2, HSCOC4JPEA_l JT3, HSCOC4JPEA_l_T4, HSCOC4JPEA_l_T5, HSCOC4_PEA_l_T7, HSCOC4JPEA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4JPEA_l_T12, HSC0C4J?EA_1JT14, HSC0C4JPEA_1JT15, HSCOC4JPEA_l JT20, HSCOC4J?EA_l_T21, HSCOC4_PEA_l JT25, HSCOC4J?EA_l_T28, HSCOC4JPEA_l_T31 and HSCOC4J?EA_l_T32. Table 142 below describes the starting and ending position of this segment on each franscript. Table 142 - Segment location on transcripts
i O O
Segment cluster HSCOC4JPEA_l_node_99 according to the present invention is supported by 93 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl , HSCOC4JPEA_l_T2, HSCOC4_PEA_l_T3, HSCOC4JPEA _T4, HSCOC4J?EA_l JT5, HSCOC4JPEA_l_T7, HSCOC4_PEA_l_TS, HSCOC4JPEA_l_Tl l , HSCOC4JPEA_l_T12. HSCOC4J?EA_l_T14> HSC0C4J?EA_1JT15, HSCOC4JPEA_l JT20, HSCOC4JPEA_l JT21, HSCOC4JPEA_l_T25, HSCOC4_PEA_l_T28, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4_PEA_1_T40. Table 143 below describes the starting and ending position of this segment on each transcript. Table 143 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_101 according to the present invention is supported by 116 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4JPEA_l _T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l_T5, HSC0C4JPEA_1JT7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSC0C4JPEA_1JT14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4JPEA_l_T21, HSCOC4JPEA_l_T25, HSCOC4JPEA_l_T28, HSCOC4_PEA_1_T30, HSCOC4JPEA_l_T31, HSCOC4JPEA_l_T32 and HSCOC4JPEA_1_T40. Table 144 below describes the starting and ending position of this segment on each transcript. Table 144 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_102 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_T31 and HSCOC4JPEA_l JT32. Table 145 below describes the starting and ending position of this segment on each franscript. Table 145 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_103 according to the present invention is supported by 106 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l_Tl, HSC0C4JPEA_1JT2, HSCOC4JPEA_l _T3, HSCOC4JPEA_l JT4, HSCOC4_PEA_l _T5, HSC0C4J?EA_1JT7, HSC0C4J?EA_1JT8, HSCOC4JPEA_l_Tl l, HSCOC4J?EA_l_T12, HSC0C4JPEA_1JT14, HSCOC4JPEA_l_T15, HSCOC4J?EA_1_T20, H8COC4JPEA_l _T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4J?EA_1_T30, HSCOC4J?EA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_1_T40. Table 146 below describes the starting and ending position of this segment on each franscript. Table 146 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_104 according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCOC4JPEA_l_Tl, HSCOC4JPEA_l_T2, HSCOC4_PEA_l _T3, HSCOC4JPEA_l_T4, HSCOC4J?EA_l _T5, HSCOC4 _PEA_1_T7, HSCOC4J?EA_l_T8, HSCOC4JPEA_l JTl 1, HSCOC4J?EA_l_T12, HSCOC4 J?EA_1_T14, HSCOC4JPEA_l_T15, HSCOC4JPEA_1_T20, HSCOC4_PEA_l_T21, HSCOC4J?EA_l_T25, HSCOC4JPEA_l_T28, HSCOC4 PEA_1_T30, HSCOC4_PEA_l_T31, HSC0C4JPEA_1JT32 and HSCOC4JPEA_l JT40. Table 147 below describes the starting and ending position of this segment on each franscript. Table 147 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_106 according to the present invention is supported by 110 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4J?EA_l JTl, HSCOC4_PEA_l JT2, HSCOC4J?EA_l_T3, HSCOC4J?EA_l JT4, HSCOC4_PEA_l_T5, HSCOC4JPEA_l_T7, HSCOC4JPEA_l_TS, HSCOC4J?EA_lJTl l, HSCOC4J?EA_l_T12, HSC0C4J?EA_1JT14, HSCOC4J?EA_l_T15, HSCOC4J?EA_1_T20, HSC0C4J?EA_1JT21, HSCOC4JPEA_l_T25, HSC0C4JPEA_1JT28, HSCOC4JPEA_1_T30, HSCOC4J?EA_l_T31, HSCOC4J?EA_l_T32 and HSCOC4JPEA_l JT40. Table 148 below describes the starting and ending position of this segment on each transcript. Table 148 - Segment location on transcripts
Segment cluster HSCOC4JPEA_l_node_l 1 1 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCOC4JPEA_l JTl, HSCOC4J?EA_l_T2, HSCOC4_PEA_l_T3, HSCOC4J?EA_l_T4, HSCOC4JPEA_l JT5, HSCOC4J?EA_l_T7, HSCOC4JPEA_l_T8, HSCOC4J?EA_lJTl l, HSCOC4J?EA_l_T12, HSCOC4JPEA_l_T14, HSCOC4JPEA_l_T15, HSCOC4J?EA_l JT20, HSCOC4JPEA_l_T21, HSCOC4JPEA JT25, HSCOC4J?EA_l JT28, HSCOC4J?EA_1JT30, HSCOC4JPEA_l_T31, HSCOC4_PEA_l_T32 and HSCOC4JPEA_1_T40. Table 149 below describes the starting and ending position of this segment on each transcript. Table 149 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : C04_HUMAN
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P3 x C04_HUMAN
Alignment segment l/l:
Quality: 8438.00 Escore : 0 Matching length: 870 Total length: 870 Matching Percent Similarity: 99.66 Matching Percent Identity: 99.66 Total Percent Similarity: 99.66 Total Percent Identity: 99.66 Gaps : 0
Alignmen : MRLL GLIWASSFFTLSLQKPRL LFSPSWHLGVPLSVGVQLQDVPRGQ 50
MRLL GLIWASSFFTLSLQKPR LLFSPSWHLGVPLSVGVQLQDVPRGQ 50 . . . . . VVKGSVFLR PSRNNVPCSPKNDFTLSSERDFA LSLQVPLKDAKSCGLH 100
VVKGSVFLRΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
QLLRGPEVQLVAHSPWLKDSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIY 150
M M 1111 M M i 111111! 11111 M M 11 M M 111111 ! 111 M 1 M QLLRGPEVQLVAHSPWLKDSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIY 150
ΝPGQRWYRVFALDQKMRPSTDTITVMVEΝSHGLR KKEVYMPSSIFQD 200
ΝPGQRVRYRVFALDQKMRPSTDTITλ/MVEΝSHGLRVRKKEVYMPSSIFQD 200
DFVIPDISEPGT KISARFSDGLESΝSSTQFEVKKYV PΝFEVKITPGKP 250
DFVIPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVLPΝFEVKITPGKP 250
YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
YILTVPGHLDEMQ DIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 . . . . . SQTKLλ^ΝGQSHISLSKAEFQDA EKLΝMGITDLQGLRLYNAAAI IESPGG 350
SQTKLVWGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGG 350
EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 351 EMEEAELTSW7FVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVESPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQ SVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 I M 11 ( 11111 i 11111 M 111111 ! M I M M M 111 M II 11 M 111 !
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVOHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 11111111111111 i 111111111111111111111 E 1111111111111
551 DHPVANSLRλ^VQAGACEGKLELSVDGAKQYRNGESVKLHLETDS ALVA 600
601 LGA DTA YAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 I I || I I I I I I I I I I I M I I I I II I I I I I I I I I M I M M I I I I I M I I M
601 LGALDTALYAAGSKSHKPLNMGKVFEAM SYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKRLSCPKEKTTRl< <-RlNNNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPF SCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPE L RVETVDRFQILTLWLP 800 751 QAGLQRALEILQEEDLIDEDDIPλtRSFFPE WLWRVETVDRFQILTL LP 800 801 DSLTT EIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 851 LRPVLYNYLDKNLTVRPHRS 870 I I 851 LRPVLYNYLDKNLTVSVHVS 870
Sequence name : C04_HUMAN
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P5 x C04_HUMAN
Alignment segment 1/1: Quality: 7969.00
Escore : 0 Matching length: 818 Total length: 818 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLL GLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
1 MR LWGLI ASSFFTLSLQKPR LLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVF RNPSRNNVPCSPKVDFTLSSERDFALLS QVPLKDAKSCGLH 100
51 WKGSVFLRNPSRNWPCSPKVOFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 PGQRWYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 M II I I I M I I I I I I I I II II I I I I I I II I II I I II I I I I I II I I II I II 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGT KISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 201 DFVIPDI SEPGT KI SARFSDGLESNSSTQFEλHCKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
!51 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGL DEDGKKTFFRGLE 300
301 SQTKLλ GQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 301 SQTKLV GQSHISLSKAEFQDALEKL GITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRH VPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQA VREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLN NLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 . . . . .
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHH APSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGK ELSVDGAKQYRNGESVKLHLETDSLA VA 600
551 DHPVANSLRλHDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 I I 1 1 1 I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQ TLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQ TLSRKRLSCPKEKTTRKKRIrvTSJFQKAINEKLGQYASPTAK 700 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPF SCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENW RVETNDRFQILTL LP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEΝWLWRVETλtDRFQILTLWLP 800 801 DSLTTWEIHGLSLSKTKG 818 IIMMIIIIIMIIIII 801 DSLTTWEIHGLSLSKTKG 818
Sequence name: C04_HUMAΝ
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P6 x C04_HUMAΝ
Alignment segment l/l:
Quality: 10211.00 Escore : 0 Matching length: 1052 Total length: 1052 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity.- 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
1 MRLL GLIWASSFFTLSLQKPRLLLFSPS HLGVPLSVGVQLQDVPRGQ 50 51 λΛ/KGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 I I I I I I I I I I I || I I I I I I II I I II I I I II I I I I I 11 I I I I I I I I I I I I I 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYNRFGLLDEDGKKTFFRGLE 300
301 SQTKLλ/ΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGG 350 301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYNAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLΝLΝLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLΝLΝLRAVGSGA 500 . . . . .
501 TFSHYYYMILSRGQIVFMΝREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1
501 TFSHYYYMILSRGQIVFMΝREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDSLALVA 600
551 DHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAA 650 I | | I I I I I I I 1 1 I I 1 1 I 1 1 I I I 1 1 I I I I I I I I I I I 1 1 I I I I I I I I I I I I I
601 LGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKj LSCPKEKTTRKKRISrVΝFQKAIΝEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKKLSCPKEKTTRKKRΝVΝFQKAIΝEKLGQYASPTAK 700 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETORFQILTLWLP 800 801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 M I M I I I I 801 DSLTT EIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDΗLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 I 851 LRPVLY YLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 . . . . . 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KG 1052
1051 KG 1052
Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4J?EA_1_P12 x C04_HUMAN_V1
Alignment segment l/l: Quality: 13367.00
Escore: 0 Matching length: 1380 Total length: 1380 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQ 50 . . . . . 51 VVKGSVFLFJNTPSr^NNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 VVKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
NPGQRWYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 . . . . . YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
SQTKLλ^TGQSHISLSKAEFQDALEKLNMGITDLQGLRLYNAAAI I ESPGG 350
EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450
IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450 451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYΎHG 550
501 TFSHYYΥMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSλtDGAKQYRNGESVKLHLETDSLALVA 600
551 DHPVANSLRλtDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNV FQKAINEKLGQYASPTAK 700 . . . . .
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 851 LRPVLYNYLDK LTVSVHVSPVEGLCLAGGGGLAQQVCNPAGSARPVAFS 900 II 1111111111111111111111111111111111 II 111111111111 851 LRPλ^LYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVTiVPAGSARPVAFS 900
901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELλtYELNP 950
901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSY RVTASDPLDTLGSEGALSPGGV 1000
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 I I II MMll I lllll III llll lllll I III I llll III
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSN LLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 1201 TLTKAPADLRGVAH NLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAA LTRQGSFQGGFR 1300 1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAY IASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1301 STQDTVIALDALSAY IASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKV 1380
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKV 1380
Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA__l_P15 x C04__HUMAN_V1
Alignment segment 1/1:
Quality: 13174.00 Escore: 0 Matching length: 1359 Total length: 1359 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MRLL GLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 11111111 I I II II 11 II I I 11111 I II 11111111 II I 11 II 11111 II 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKNDFTLSSERDFALLSLQVPLKDAKSCGLH 100 51 WKGSVFLRΝPSRΝΝVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
101 QLLRGPEVQLVAHSPWLKDSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIY 150 II 11 M 1111 E 11 M II 1111 E 11111 M I M M 11 IE 111 M 111 El 11 101 QLLRGPEVQLVAHSPWLKDSLSRTTΝIQGIΝLLFSSRRGHLFLQTDQPIY 150 . . . . . 151 ΝPGQRVRYRVFALDQKMRPSTDTITVMVEΝSHGLRVRKKEVYMPSSIFQD 200 IIIIIMIIIIMIMIMIMMIIIIMIIMIIIMIIMIMIIII 151 ΝPGQRVRYRVFALDQKMRPSTDTITVMVEΝSHGLRVRKKEVYMPSSIFQD 200 201 DFVIPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVLPΝFEVKITPGKP 250
201 DFVIPDISEPGTWKISARFSDGLESΝSSTQFEVKKYVLPΝFEVKITPGKP 250 251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 I I I I I I I II I I I I I I I I I I I I II I I II I I I I II I I I I I I II I I II II I I I 251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMIMIMMMMMMMMMMMMMMMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 . . . . .
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 III I llll II II II I Ml III llll 11 II llll llll
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMMMMMMMMMMMMMMMMMIMMMMIM 501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MMMIMMMMMIMMMMMMMMMMMMMMMM
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 . . . . .
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMMMMMMMMMMMMMMMMMMMM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKK-RNVNFQKAINEKLGQYASPTAK 700 651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMIMMMMMMMMMMMMMMMMMMMIMMM 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 II 11 II 11 II II II II 11 II I- 1 II II I II I II II 11 II II I II 1111 II I 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800 801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 MMMMMMMMMIMMMMMMMMMMIMMMMM 801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMMMMMMIIMMMMMMMMMMMMIMIMM 851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 WPTAAAAVSLK ARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMMMMMMMMMMMMMMMMMMMMMMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMMMMMMMMMMMMMMMMMMMMIMMI
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MMMMMMMMMMMMMMMMMMMMMMMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMMMMMMMMMMMMMMMMMMMMMMM
1101 QETSN LLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 MMMMMMMMMMMMMMMMMMMMMMMMM 1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 MMMMMMMMMMMMMMMMMMMMMMMMM
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQ 1359
1351 IRGLEEELQ 1359 Sequence name : C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P16 x C04_HUMAN_V1
Alignment segment 1/1: Quality: 14137.00
Escore: 0 Matching length: 1457 Total length: 1457 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Al ignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
1 MRLL GLI ASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 . . . . . 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMIMMMMMMMMMMMMMMMMIMMMMM
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMMMMMMMMMMMMMMMMMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMMMMMMMMMMMMMMMMMMMMMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMIMMMMMMMMMMMMMMMMMIMMMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMMMMMMMMMMMMMMMMMMMMMMM 401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 MMMIMMMMMMMMMMMMMMMMIMMMMM
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMIMMMMMMMMMMMMMMIMMMMMMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MMIMMMMMMMMMMMMMMIMMMMMMMM
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMIMIMMMMMMMMMMMMMMMMMMMMM 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKjRNVNFQKAINEKLGQYASPTAK 700 MMMIIMMMMMMMMMMMMMMMMMMMMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKI^NVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 MIMMMMMMMMMMMMMMIMMMMMMMMM
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 MMMIMMMMMMMMMMMMMMIMMMMMMM 901 WPTAAAAVSLK ARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMMIMMMMMMMMMMMMMMIMMMMMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMIMMMMMMMMMMMMMMIIMMIMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 . . . . .
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLY GSVTGSQSNAVSPTPAPRNP 1250 MMMIMMMMMMMMMMMMMMIMMMMMMM
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 M II I II 11 II II II I II II II II II I II 111 II 111 II II II I II II II
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 MMMMMMMMMMMMMMMMMMMMMMMMM 1301 STQDTVIALDALSAY IASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMMMMMMMMMMMMMMMMMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1451 RRREAPK 1457
1451 RRREAPK 1457
Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_1_P20 x C04_HUMAN_V1
Alignment segment 1/1: Quality: 12641.00 Escore : 0 Matching length: 1303 Total length: 1303 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMIMMMMMMMMMMMMMMMIMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMMIMMMMMMMMMMMMMMMMIMMMMM 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 . . . . . 101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMIMMMMMMMMMMMMMMMMIMMMMM 101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMIMMMMMMMMMMMMMMMMIMMMMM 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 M M I II M 1111 II II II I II 11 II II II II II II II 11 II 111 II I II 201 DFVIPDISEPGT KISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 83:
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 II I llll II MMMMMMMMM MMll llll
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 . . . . .
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYNAAAIIESPGG 350 lllll llll lllll MMMMMMMMMMMMMMMMMM
301 SQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 MIMMMMMMIMMMMMMMMMMMMMMMMM
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450 II I II II II I II 11 II II II II II II I II II II II II I II I II II 11 II I
401 IPVKVSATVSSPGSVPEVQDIQQΝTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLΝLΝLRAVGSGA 500 II MMMMMMMMM III Mill MM MMMMMMMMM 451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLΝLΝLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMΝREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMMMMMMMMMMMMMMMMMMMMMMM
501 TFSHYYYMILSRGQIVFMΝREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRΝGESVKLHLETDSLALVA 600
551 DHPVAΝSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MMMMMMMMIMMMMMMMMMMMMMIMMM 651 G1AFSDGDQWTLSRKLRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMMMMMMMMMMMMMMMMMMMMMMMM
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 MMMMMMMMMMMMMMMMMMMMMMMMM
801 DSLTT EIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 III MMIMMMMMMMMMMMMMMMMMIMMMI 901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 Ml Mill MUM MMMMMMMMMMMMMMMMMM
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMMMMMMMMMMMMMMMMMMMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSST LTAFVLKVLSLAQEQVGGSPEKL 1100 MMIIMMMMMMMMMMIMMMMMMMMIMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSST LTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSN LLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMMMMMMMMMMMMMMMMMMMMMMM 1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 . . . . .
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 MMMMMMMMMMMMMMMMMMMMMMMMM
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQ 1303
1301 STQ 1303 Sequence name : C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P9 x C04_HUMAN_V1
Alignment segment l/l: Quality: 14831.00
Escore: 0 Matching length: 1529 Total length: 1529 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 . . . . . 51 VVKGSVFLPuPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMMMMMIMMMMMMMMMMMIMMMMMMM 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMMMIMMMMMMMMMMMIMMMMMMMM
151 PGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMMMIMMMMMMMMMMMMIMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMMIMMMMMMMMMMMMIMMMMMMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMIMMMMMMMMMMMMMMMIMMMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMMMIMMMMMMMMMMMMIMMMMMMM 401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 111 M 1111111111 M 11111111 II 111111 M 1111 M II 1111 ! 11
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMIMMMMMMMMMMMMMMMMMMIMMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHIIAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELS\OGAKQYRNGESVKLHLETDSLALVA 600 II I II II II 1111 II I II II I II I II II II II II II II I II II II I II 11
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMIIMMMMIMMM MMMMMMMMMMIMMM 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 . . . . .
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 MMMMMMMMMMMMMMMMMMMMMMMMM
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 II I II II II II II II I II II II II II I II I II 11 II II II II II II II II 901 WPTAAAAVSLKWAROiSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MIMMMMMMMMMMMMMMMIMMMMMMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MIMMMMMMMIMMMMMMMMMMMMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MIMMMMMMIMMMMMMMMMMMMMMMMM 1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSN LLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 111111111 M 111111 M 111111 M 11 M 11111 II 111111 M II 11
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 MMMIMMMMIMMMMMMMMMMMMMMMMM 1301 STQDTVIALDALSAYWIASHTTEΞRGLNVTLSSTGRNGFKSHALQLNNRQ 1350 . . . . . 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMMMIMMMMMMMMMMMMIMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 MMMMMMMMMMMMMMMMMMMMMMMMM 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 MMMIMMMMMMMMMMMMMMIMMMMMMM 1451 RRREAPKVVEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSV 1529 MMMIMMMMMMMMMMM 1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSV 1529
Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4 PEA_1 P22 x C04 HUMAN VI Alignment segment l/l:
Quality: 16066.00 Escore : 0 Matching length: 1654 Total length: 1654 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.94 Total Percent Similarity: 100.00 Total Percent Identity: 99.94 Gaps : 0
Alignment : . . . . . 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MM I llll II I II MMMMMMMMM MMMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 111111111111 ! 111111111111111111111111 M 11111111111 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMMMMMMMMMMMMMMMMMMMMMMM 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMMMMMMMMMMMMMMMMMMMMMMM 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 541 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 111 II Ml M 1111 II I II 11111 M 11111 M 1111111 j I II I II I II
201 DFVIPDISEPGT KISARFSDGLSSNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMIMMMIMMIMMMMMMMIMMMMMMMMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMIMMMMMIMMMMMMMIMMIMMMMMMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 MMMMMMMMIMMMMMIMMMMMMMMMMM 351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 . . . . .
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 51 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 84;
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMIMMMMMMMMMMIMMMMMMMM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MMMMMMIMMIMMMMMMMMMMMMMMMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 I II II M II II I II II 11 II II II I II II II I II II I II I II II I II II I
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENT'JLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 . . . . .
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMIMMMMMMMMMMMMMIMMMIIMMMM 1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MMMIMMMMMMMMMMMMMMIMMMMMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTI LH 1150
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 M I M 111 I II II II II M I II II 11 I II II I II 11 II II II II II II II
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMIMMMMMMMMMMMMMMMIMMMMMMM
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 II II II 111 II II II 11 II I II 11 II I II II I II II II II 11 II I II II I
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 MMMIIMMMMMMMMIMMMMMIMMMMMMM 1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550 MMMIIMMMMMMMMIMMMMMIMMMMMMM
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550 . . . . .
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEGKCPRQRR 1600 MMIMIMMMMMMMMMMMMMMMMMMMMM
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEGKCPRQRR 1600
1601 ALERGLQDEDGYRMKFACYYPRVEYGFQVKVLREDSRAAFRLFETKITQV 1650 MMIMIMMMMMMMMMMMMMMMMMMMMM
1601 ALERGLQDEDGYRMKFACYYPRVEYGFQVKVLREDSRAAFRLFETKITQV 1650
1651 LHFS 1654 | ||:
1651 LHFT 1654 Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P23 x C04_HUMAN_V1
Alignment segment 1/1: Quality: 15806.00
Escore: 0 Matching length: 1626 Total length: 1626 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMIMMMMIMMMMMMMMMMMMMMMMM 1 MRLL GLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 . . . . . 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MM lllll lllll MMll II II M M II II M 11 II II M M I II I M 51 VVKGSVFLPJNIPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMMMMMMMMMMMMMMMMMMMMMMM 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMIMMMMMMMMMMMMMMMMMMIMMM
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMIMMMMMMMMMMMMMMMMIMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MM lllll 111 II 11 llll I lllll I M 11111111 lllll
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMIMMMMMMMMMMMMMMMMMMMIMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 MMMMMMMMMMMMMMMMMMMMMMMMM
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMIMMMMMMMMMMMMMMMMMMIMMM 401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MIMMMMMMMMMMMMMMMMMMMMIMMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MIMMMMMMMMMMMMMMMMMMMMIMMM
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMIMMMMMMMMMMMMMMMMIMMMMM 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 . . . . .
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 M I M I I I II I II II II II I II II II II II I II 11 II I I I II II I II II I
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 MMMMMMMMMMMMMMMMMMMMMMMMM 901 VVPTA7ΛAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMMIMMMMMMMMMMMMMMMMMMIMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMIMMMMMMMMMMMMMMMMIMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 M M I M I I I M M M M M M M M M I I I I I I I I I I I I I I M 1 1 I I 1 1 1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMIMMMMMMMMMMMMMMMMMMIMMM
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 MMIMMMMMMMIMMMMMMMMMMMMMMM
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHA QLNNRQ 1350 I II II I II 11 II 11 II 11 II II II II II I II II II I II II II I ! I II II I
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 . . . . .
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMIMMMMMMMMMMMMMMMMIMMMMM
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 MMMMMMMMMMMMMMMMMMMMIMMIMM
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 II I I II 11 II II II II II II II I II II II II II II 111 II II II II II II
1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEGKCPRQRR 1600
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEGKCPRQRR 1600
1601 ALERGLQDEDGYRMKFACYYPRVEYG 1626
1601 ALERGLQDEDGYRMKFACYYPRVEYG 1626 Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P24 x C04_HUMAN_V1
Alignment segment l/l:
Quality: 14823.00 Escore: 0 Matching length: 1528 Total length: 1528 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 M M M M M M M M M M M M M M M M M M M M M M M M M 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 VVKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 M M M M M M M M M M M M I M M M M M M M M M I M M M 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMMIMIMMMMMIMMMIMMMMMMMMMM
101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MIMMMMMMMMMMMMMMMMMMMMIMMM
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMMMIMMMMMMMMMMMIMMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMMMMMMMMMMMMMMMMMMMMMM 251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 . . . . .
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMMMMMMMMMMMMIMMMMIMMMMMM
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMMMMMMMMMMMMMMMMMMMMMMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 . . . . .
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MMMMMMMMMMMMMMMMMMMMMMMMM
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMMMMMMMMMMMMMMMMMMMM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MMMMMMMMMMMMMMMMMMMMMMMMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMMMMMMMMMMMMMMMMMMMMIMIMM 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 MMMMMMMMMMMMMMMMMMMMMIMIMM
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTL LP 800 . . . . .
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 MMMMMMMMMMMMMMMMMMMMMIMIMM
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMMMMMMMMMMMMMMMMMMMIMIMM ϊ
853 851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 WPTAAAAVSLK ARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 MMMMMMMMMMMMMM MMMMMMMMMMM 901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMMMMMMMMMMMMMMMMIIMMM MMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 II I MIMIII II! MMll MIIMIII MMMMMM MMM! I!
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MMMMMMMMIMMMMMMMMMMMMMIMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMMMMMMIMMMMMMMMMMMMMIMMM
1101 QETSN LLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 MM II MM I MMMMIMMMM MM MMMMMMMMM 1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLY GSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPAL IETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 MMMMMIMMMMMMMMMMMMIMMM MMMM 1251 SDPMPQAPAL IETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 MMMIMM MMMIMMMMMMMMMMMMMMMM 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMMMMMMMMMMMMMMMMMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 . . . . . 1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 MMMIMMMMMIMMMMMMMMMMMM MMMM 1451 RRREAPKVVEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 1501 LEKLTSLSDRYVSHFETEGPHVLLYFDS 1528 MMMMMIMMMMIMMMM 1501 LEKLTSLSDRYVSHFETEGPHVLLYFDS 1528
Sequence name: C04_HUMAN_V1
Sequence documentation: Alignment of: HSCOC4_PEA_l_P25 x C04_HUMAN_V1
Alignment segment l/l:
Quality: 15464.00 Escore : 0 Matching length: 1593 Total length: 1593 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 IMMMIMMMMMMMMMMMMMMMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
51 VVKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 . . . . . 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMMMMMMMMMMMMMMMMMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMMMMMMMMMMMMMMMMMMIMIMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMMMMMMMMMMMMMMMMMMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 MMMMMMMMMMMMMMMMMMMMMMMMM
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMMMIMMMMMMMMMMMMIMMMMMMM
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKNFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMMMMMMMMMMMMMMMMMMMM
601 LGALDTALYAAGSKSHKPLΝMGKVFEAMΝSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRΝVΝFQKAIΝEKLGQYASPTAK 700 MMMMMMMMMMMMMMMMMMMMMMMMM 651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRΝVΝFQKAIΝEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEΝWLWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEΝWLWRVETVDRFQILTLWLP 800
801 DSLTT EIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 MMMMMMMMMMMMMMMMMMMMMMMMM
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYΝYLDKΝLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMMMMMMMMMMMMMMMMMMMMMMM
851 LRPVLYΝYLDKΝLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELΝP 950 MMMMMMMMIMMMMMMMMMMMMMIMMM 901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELΝP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMIMMMMMMMMMMMMMMMMMMIMMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQ STLPPETKDHAVDLIQ 1050 MMMMMMMMMMMMMMMMMMMMMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFV KVLSLAQEQVGGSPEKL 1100 MMMMMMMMMMMMMMMMMMMMMMMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMMMMMMMMMMMMMMMMMMMIMIMM 1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 . . . . .
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAA LTRQGSFQGGFR 1300 MMMMMMMMMMMMMMMMMMMMMIMIMM
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQDTVIALDALSAY IASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 IMMMIMMMMMMMMMMMMIMMMMMIMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 MMMIMMMMMMMMMMMMMMIMMMMMMM 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 MMMIMMMMMMMMMMMMMIMMMMMMMM 1451 RRREAPKVVEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEG 1593 1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEG 1593
Sequence name : C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P26 x C04_HUMAN_V1 Alignment segment l/l:
Quality: 15464.00 Escore: 0 Matching length: 1593 Total length: 1593 Matching Pex-cent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MRLL GLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MRLL GLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMMMMMMMMMMMMMMMMMMMMIMIMM 51 VVKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMMMMMMMMMMMMMMMMMMMMMMM 151 NPGQRVTIYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMMMMMMIMMMMIMMMMMMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKoYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMIMMMMMMMMMMMMMMMMIMMMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 II lllll lllll llll I MMMMIMMMMMM MM llll llll
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 . . . . .
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMIMMMIMMIMMMMMMMMMMMMMMIM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 . . . . .
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MIMMMMMMMMMMMMMMMMMMMMMIMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMMMMMMMM I M II I II I II II I! I II II II M 111 II II I
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWL RVETVDRFQILTL LP 800 MMMMMMMMMMMMMMMMMMMMMMMMM
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTL LP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS. 900
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 . . . . .
901 PTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 951 LDHRGRTLEIPGNSDPN IPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMMMMMMMMMMMMMMMMMMMMMMM 1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 . . . . .
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 MMMMMMMMMMMMMMMMMMMMMMMMM
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250
1251 SDPMPQAPAL IETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMMMMMMMMMMMMMMMMMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 MMMMMMMMMMMMMMMMMMMMIMMMIM 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500 MIMMMMMMMMMMMMMMMMMMMMIMMM 1451 RRREAPKWEEQESRVHYTVCI RNGKVGLSGMAIADVTLLSGFHALRAD 1500 1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSVPTSRECVGFEAVQEVPVGLVQ 1550
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEG 1593
1551 PASATLYDYYNPERRCSVFYGAPSKSRLLATLCSAEVCQCAEG 1593
Sequence name: C04_HUMAN_V3
Sequence documentation:
Alignment of: HSCOC4_PEA_1_P30 x C04_HUMAN_V3
Alignment segment l/l: Quality: 11940.00 Escore: 0 Matching length: 1232 Total length: 1232 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMMMMMMMMMMMMMMMMMMMMMMMM 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMMMMMMMMMMMMMMMMMMMMMMM 101 QLLRGPEVQLVAHSP LKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 IMMMMMMMMMMMMMMMMMMMMIMMMM 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFΞVKKYVLPNFEVKITPGKP 250 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYNLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MIIMMMMMMMMMMMMIMMMMMMMMIMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMMMMMMMMMMMMMMMIMIMM
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 MMMMMMMMMMMMMMMMMMMMMMMMM
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 M I M II II I II 11 II 11 II II I II II II II II I II II II I II II I II II
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 367
601 LGALDTALYAAGSKSHKPLNMGKVF3AMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MMMMMMMMMMMMMMMMMMMMMIMIMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMll MM MIMIIMM I lllll I llll MMMMMMMMM 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMMMMMMIMMMMMMMMMMMMMMMIM
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMIMMMMMMMMMMMMMMMMMMIMMM 1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MIMMMMMMMMMMMMMMMMMMMMIMMM 1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMMMMMMMMMMMMMMMMMMMMMMM 1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 MMMIMMIMMMMMMMMMMMMMMMMMMM 1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGS 1232
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGS 123:
Sequence name : C04_HUMAN
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P38 x C04JHUMAN Alignment segment l/l:
Quality: 7969.00 Escore: 0 Matching length: 818 Total length: 818 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMIMMMMMMMMMMMMMMIMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMIMMMMMMMMMMMMMMIMMMMMMM 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMIMMMMMMMMMMMMMIMMMMMMMM 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
:01 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMIMMMMMMMMMMMMMMMIMIMIMMMM
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMIMMMMMMMMMMMMMIMMMM 301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450 MMMIMMMMMMMMMMMMMMMMMMIMMM
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 MIMMMMMMMMMMMMMMMMMMMMIMMM
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMMMMMMMMMMMMMMMMMMMMMMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
551 DHPVAJTSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 11111111111 llll MMll MMMMMMMM lllll 601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MIMMMMMMMMMMMMMMMMMMMMIMMM 651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMMIMMMMMMMMMMMMMMIMMMMMMM 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKG 818
801 DSLTTWEIHGLSLSKTKG 818
Sequence name : C04__HUMAN
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P39 x C04_HUMAN Alignment segment l/l:
Quality: 3766.00
Escore : 0 Matching length: 387 Total length: 387 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 M M M M M M M M M M M M M M M M M M M M M M M M M 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50
51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 M M M M M M M M M M M M M M M M M M M M M I M I M M 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTNPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 MMMMMMMMMMMMMMMMMMMMMIMIMM 251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYNRFGLLDEDGKKTFFRGLE 300
301 SQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMMMMMMMMMMMMMMMMMMM 301 SQTKLVΝGQSHISLSKAEFQDALEKLΝMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQ 387
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQ 387
Sequence name : C04_HUMAΝ
Sequence documentation:
Alignment of: HSCOC4_PEA_1_P40 x C04_HUMAN
Alignment segment l/l:
Quality: 2309.00 Escore: 0 Matching length: 236 Total length: 236 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MIMMMMMMMMMMMMMMMMMMMMIMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMMMMMMMMMMMMMMMMMMMMMMMM 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMMMMMMMMMMMMMMMMMMMMMMM 151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKY 236 IMMMMMMMMMMMMMMMMM! 201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKY 236
Sequence name: C04_HUMAN_V1
Sequence documentation:
Alignment of: HSCOC4_PEA_l_P41 x C04_HUMAN_V1
Alignment segment 1/1:
Quality: 14831.00 Escore: 0 Matching length: 1529 Total length: 1529 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 M M M I M M M M M M M M M M M M M M M M M M I M M M 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 M I M MM M MM MM MM M M M M M M MM M M IM M M 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 I MIMIMMMMMMMMMMMM I II 11111111 II I II II I
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 . . . . .
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMIMMMMMMMMMMMMMMIMMMMMMM
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 MMMIMMMMMMMMMMMMMMIMMMMMMM
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 II II II II I II I II II II II I II II II I II 11 II 111 II II II II II II I
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 MMMMMMMMMMMMMMMMMMMMMMMMM 301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400 . . . . .
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 MMMMMMMMMMMMMMMMMMMMMIMIMM
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MMMMMMMMMMMMMMMMMMMMMIMIMM
551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMMMMMMMMMMIMMMIMMMMMM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKJRNVNFQKAINEKLGQYASPTAK 700
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800 IMMMMMMMMMMMMIMMMMMMMMMMMM 751 QAGLQRALEILQEEDLIDEDDIPVRSFFPEN LWRVETVDRFQILTLWLP 800
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMIMMMMMIMMMMMMMMMMMMMMMM 851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 MMIMMMMMMMMMMMMMMMMMMIMMMM 901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 I II II II II II II II II I II II II II II II I II II I II II II II I II I II 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 MMMIMMMMMMMMMMMMMMMMIMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 . . . . .
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100 MMMIMMMMMMMMMMMMMMIMMMMMMM
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 MMMIMMMMMMMMMMMMMMMMIMMMMM
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200 M M I M M M M M M M M M M M M M M M M M M I M M M M
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 MIMMMMIMMMMMMMMMMMMMMMMMMM 1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 MIMMMMMIIMMIMMMMMMMMMMIMMIMM 1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 MMMIMMMMMMMMMMMMMMIMMMMMMM 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMIMMMMMMMMMMMMMMIMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 MMMIMMMMMMIMMMMMMMMMMMMMMM 1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 1451 RRREAPKWEEQESRVHYTVCI RNGKVGLSGMAIADVTLLSGFHALRAD 1500
1451 RRREAPKWEEQESRVHYTVCIWRNGKVGLSGMAIADVTLLSGFHALRAD 1500
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSV 1529
1501 LEKLTSLSDRYVSHFETEGPHVLLYFDSV 1529
Sequence name : C04_HUMAN_V1 Sequence documentation:
Alignment of: HSCOC4_PEA_l_P42 x C04_HUMAN_V1
Alignment segment l/l:
Quality: 14480.00 Escore: 0 Matching length: 1506 Total length: 1544 Matching Percent Similarity: 99.93 Matching Percent Identity: 99.87 Total Percent Similarity: 97.47 Total Percent Identity: 97.41 Gaps: 1
Alignment :
1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 MMMIMMMMIMMMMMMMMMMMMMMMMM 1 MRLLWGLIWASSFFTLSLQKPRLLLFSPSWHLGVPLSVGVQLQDVPRGQ 50 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100 MMIMMMMMMMMMMMMMMMMMMIMMMM 51 WKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLH 100
101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150 MMMMMMMMMMMMMMMMMMMMMMMMM 101 QLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIY 150
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200 MMMMMMMMMMMMMMMMMMMMMIMIMM
151 NPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQD 200
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250 IIIMMMMMMMMMMMMMMMMMMMMMMM!
201 DFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKP 250
251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300 IMMMMMMMMMMMMMMMMMMMMMMMM! 251 YILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLE 300
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350
301 SQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGG 350 . . . . .
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
351 EMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASG 400
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
401 IPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSA 450
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500 I M M I M II II I II II II II I II II II II 11 II II II II I II II II I II
451 GSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGA 500
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550
501 TFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHG 550 551 DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600 MMMIMMMMMMMMMMMMMMIMMMMMMM
55 I DHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVA 600
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650 MMMMMMMMMMMMMMMMMMMMMMMMM
601 LGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAA 650
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700 MMMIMMMMMMMMMMMMMMIMMMMMMM
651 GLAFSDGDQWTLSRKRLSCPKEKTTRKKRNVNFQKAINEKLGQYASPTAK 700
701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750 MMMIMMMMMMMMMMMMMMIMMMMMMM 701 RCCQDGVTRLPMMRSCEQRAARVQQPDCREPFLSCCQFAESLRKKSRDKG 750
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 MMMIMMMMMMMMMMMMMMIMMMMMMM
751 QAGLQRALEILQEEDLIDEDDIPVRSFFPENWLWRVETVDRFQILTLWLP 800 . . . . .
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850 IIIIIMM I MM MMMMMMMMMMMMMMMMMM
801 DSLTTWEIHGLSLSKTKGLCVATPVQLRVFREFHLHLRLPMSVRRFEQLE 850
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900 MMMIMMMMIMMMMMMMMMMMMMMMMM
851 LRPVLYNYLDKNLTVSVHVSPVEGLCLAGGGGLAQQVLVPAGSARPVAFS 900
901 WPTAAAAVSLKWARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950
901 VVPTAAAAVSLKVVARGSFEFPVGDAVSKVLQIEKEGAIHREELVYELNP 950 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000 MMMMMMIMMMMMMIMMMMMMMMMMMM 951 LDHRGRTLEIPGNSDPNMIPDGDFNSYVRVTASDPLDTLGSEGALSPGGV 1000
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050 11111111 M I III MMMMMMMMMMMMMMMMMM
1001 ASLLRLPRGCGEQTMIYLAPTLAASRYLDKTEQWSTLPPETKDHAVDLIQ 1050
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1051 KGYMRIQQFRKADGSYAAWLSRDSSTWLTAFVLKVLSLAQEQVGGSPEKL 1100
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150 M M M M M M M M MM M M M M M M M MM M M I MI M M
1101 QETSNWLLSQQQADGSFQDPCPVLDRSMQGGLVGNDETVALTAFVTIALH 1150
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1151 HGLAVFQDEGAEPLKQRVEASISKASSFLGEKASAGLLGAHAAAITAYAL 1200
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYW3SVTGSQSNAVSPTPAPRNP 1250
1201 TLTKAPADLRGVAHNNLMAMAQETGDNLYWGSVTGSQSNAVSPTPAPRNP 1250 . . . . .
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1251 SDPMPQAPALWIETTAYALLHLLLHEGKAEMADQAAAWLTRQGSFQGGFR 1300
1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350 1301 STQDTVIALDALSAYWIASHTTEERGLNVTLSSTGRNGFKSHALQLNNRQ 1350
1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYNVLDMKNTTCQDLQIE 1400 MMMMMMMMMMMMMMMMMMMMMMMMM 1351 IRGLEEELQFSLGSKINVKVGGNSKGTLKVLRTYTJVLDMKNTTCQDLQIE 1400
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450
1401 VTVKGHVEYTMEANEDYEDYEYDELPAKDDPDAPLQPVTPLQLFEGRRNR 1450 . . . . . 1451 RRREAPKWEEQESRVHYTVCIW APGAALGQGREGRTQAGAGLLEPAQA 1500
1451 RRREAPKWEEQESRVHYTVCIW 1473 1501 EPGRQLTRLHRRNGKVGLSG^IADVTLLSGFIIALRADLEKVWS 1544 MMMMMIMMMMIMMMMIh I 1474 RNGKVGLSGMAIADVTLLSGFHALRADLEKLTS 1506
DESCRIPTION FOR CLUSTER HUMTREFAC Cluster HUMTREFAC features 2 transcript(s) and 7 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Trefoil factor 3 precursor (SwissProt accession identifier TFF3JHUMAN; known also according to the synonyms Intestinal trefoil factor; hPl .B), SEQ JD NO: 516, refeπed to herein as the previously known protein. Protein Trefoil factor 3 precursor is known or believed to have the following function(s): May have a role in promoting cell migration (motogen). The sequence for protein Trefoil factor 3 precursor is given at the end of the application, as "Trefoil factor 3 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Trefoil factor 3 precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: defense response; digestion, which are annotation(s) related to Biological Process; and exfracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://wvw.ncbi.nlm.nm.gov/projects/LocusLink/>.
Cluster HUMTREFAC can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 36 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 36 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HUMTREFAC features 2 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Trefoil factor 3 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMTREFACJPEA_2 JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMTREFAC JPE A_2_T5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HUMTREFAC J?EA_2JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTREFAC JPEA_2JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HUMTREFACJPEAJ2JP7 is encoded by the following transcript(s): HUMTREFACJPEA_2_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTREFAC JPEA_2 _T5 is shown in bold; this coding portion starts at position 278 and ends at position 688. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTREFACJPEA_2JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMTREFACJ?EA_2 JP8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUlVlTREFACJPEA_2jr4. An alignment is given to the known protein (Trefoil factor 3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTREFACJ?EA_2J?8 and TFF3 JHUMAN: l.An isolated chimeric polypeptide encoding for HUMTREFACJPEA_2JP8, comprising a first amino acid sequence being at least 90 % homologous to MAARALCMLGLVLALLSSSSAEEYNGL coπesponding to amino acids 1 - 27 of TFF3 JHUMAN, which also coπesponds to amino acids 1 - 27 of HUMTREFAC JPE A_2JP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WKVHLPKGEGFSSG coπesponding to amino acids 28 - 41 of HUMTREFAC JPEA_2JP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of ITUMTREFAC JPEA_2 JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence WKVHLPKGEGFSSG in HUMTREFACJPEA_2JPS.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HUMTREFACJPEA_2JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTREFACJ?EA_2JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HUMTREFAC JPE A_2JP8 is encoded by the following transcript(s): HUMTREFAC JPEA_2_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HLIMTREFACJPEA_2_T4 is shown in bold; this coding portion starts at position 278 and ends at position 400. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTREFACJPEA_2 JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
As noted above, cluster HUMTREFAC features 7 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMTREFACJPEA_2_node_0 according to the present invention is supported by 188 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTREFACJPEA_2_T4 and HUMTREFACJPEA_2_T5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on franscripts
Segment cluster HUMTREFAC JPEA_2_node_9 according to the present invention is supported by 150 libranes. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTREFACJPEA_2_T4 and HUMTREFACJPEAJ2JT5. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMTREFAC JPEA_2_node_2 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTREFACJ?EA_2_T4. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
!93
Segment cluster HUMTREFAC JPE A_2_node _3 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTREFAC JPE A_2_T4 and HUMTREFACJPEA_2_T5. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HUMTREFACJ?EA_2_node_4 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTREFACJPEA_2_T4 and HUMTREFACJPEA_2_T5. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
Segment cluster HUMTREFACJPEA_2_node_5 according to the present invention is supported by 187 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTREFACJPEAJ2JT4 and HUMTREFACJPEA_2JT5. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster HUMTREFAC JPEA_2_node_8 according to the present invention can be found in the following franscript(s): HUMTREFACJ?EA_2_T4 and HUMTREFACJPEA_2_T5. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : TFF3_HUMAN
Sequence documentation:
Alignment of: HUMTREFAC_PEA_2_P8 x TFF3_HUMAN
Alignment segment l/l: Quality: 246.00 Escore: 0 Matching length: 27 Total length: 27 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MAARALCMLGLVLALLSSSSAEEYVGL 27
1 MAARALCMLGLVLALLSSSSAEEYVGL 27
DESCRIPTION FOR CLUSTER HUMOSTRO Cluster HUMOSTRO features 3 transcript(s) and 30 segment(s) of interest, the names for λvhich are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Osteopontin precursor (SwissProt accession identifier OSTPJHUMAN; known also according to the synonyms Bone sialoprotein 1; Urinary stone protein; Secreted phosphoprotein 1; SPP- 1 ; Nephropontin; Uropontin), SEQ JD NO: 552, refeπed to herein as the previously known protein. Protein Osteopontin precursor is known or believed to have the following function(s): Binds tightly to hydroxyapatite. Appears to form an integral part of the mineralized matrix. Probably important to cell- matrix interaction; Acts as a cytokine involved in enhancing production of interferon-gamma and interleukin- 12 and reducing production of interleukin- 10 and is essential in the pathway that leads to type I immunity (By similarity). The sequence for protein Osteopontin precursor is given at the end of the application, as "Osteopontin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Osteopontin precursor localization is believed to be Secreted.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Regeneration, bone. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Bone formation stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Musculoskeletal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ossification; anti-apoptosis; inflammatory response; cell- matrix adhesion; cell-cell signaling, which are annotation(s) related to Biological Process; defense/immunity protein; cytokine; integrin ligand; protein binding; growth factor; apoptosis inhibitor, which are annotation(s) related to Molecular Function; and exfracellular matrix, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HUMOSTRO can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 37 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in
Figure 37 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant rumors from different tissues, lung malignant tumors, breast malignant tumors, ovarian carcinoma and skin malignancies.
Table 5 - Normal tissue disfribution
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster FfUMOSTRO features 3 transcripts), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Osteopontin precursor. A description of each variant protein according to the present invention is now provided. Variant protein HUMOSTRO JPE A_l JPEA_1 JP21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOSTROJPEA_l JPEA_1 _T14. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO JPEA_1 JPEA_1 JP21 and OSTP JHUMAN: l.An isolated chimeric polypeptide encoding for HUMOSTRO JPEA_1 JPEA_1 JP21, comprising a first amino acid sequence being at least 90 % homologous to
MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ coπesponding to amino acids 1 - 58 of OSTPJ UMAN, which also coπesponds to amino acids 1 - 58 of HUMOSTRO JPEA_1 JPEA_1 JP21, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VFLNFS coπesponding to amino acids 59 - 64 of HUMOSTRO J?EA_1 JPEA_1 JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMOSTRO J?EA_1JPEA_1 JP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VFLNFS in HUMOSTRO JPE A_l JPEA_1 JP21.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure. Variant protein HUMOSTROJPEA_lJPEA_lJP21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HLIMOSTROJPEA_lJPEA_l JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
The glycosylation sites of variant protein HUMOSTRO JPE A_l JPEA_1 JP21, as compared to the known protein Osteopontin precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates λvhether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Variant protein HUMOSTRO J?EA_ 1 JPEA_1 JP21 is encoded by the following transcript(s): HUMOSTRO JPE A_l JPEA_1 JTl 4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMOSTRO JPE A_l JPEA_1 JT14 is shown in bold; this coding portion starts at position 199 and ends at position 390. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO JPE A_l JPEA_1 JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HUMOSTRO JPEA_1 JPEA_1 JP25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMOSTROJPEA_lJPEA_l_T16. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO JPE A_l JPEA_1 JP25 and OSTP JHUMAN: l.An isolated chimeric polypeptide encoding for HUMOSTRO JPEA_1 J?EA_1 JP25, comprising a first amino acid sequence being at least 90 % homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQ coπesponding to amino acids 1 - 31 of OSTP JHUMAN, which also coπesponds to amino acids 1 - 31 of HUMOSTRO J?EA_1 JPEA_1 JP25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence H coπesponding to amino acids 32 - 32 of HUMOSTRO JPE A_l JPEA_1 JP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMOSTRO JPEA_1 JPEA_1 JP25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO JPEA_1 JPEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
The glycosylation sites of variant protein HUMOSTRO JPEA_1JPEA_1 JP25, as compared to the known protein Osteopontin precursor, are described in Table 1 1 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein HUMOSTRO J?EA_1 J?EA_1 JP25 is encoded by the following transcript(s): HUMOSTROJPEA_l JPEA_1_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMOSTRO JPEA_1 J?EA_1_T16 is shown in bold; this coding portion starts at position 199 and ends at position 294. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO JPE A_l _PEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HUMOSTRO JPE A_l JPEA_1 JP30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOSTRO JPE A_l JPEA_1 JT30. An alignment is given to the known protein (Osteopontin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOSTRO J?EA_1J?EA_1 JP30 and OSTP JHUMAN: l.An isolated chimeric polypeptide encoding for HUMOSTRO J?EA_1J?EA_1J?30, comprising a first amino acid sequence being at least 90 % homologous to MRIAVICFCLLGITCAIPVKQADSGSSEEKQ coπesponding to amino acids 1 - 31 of OSTP JHUMAN, which also coπesponds to amino acids 1 - 31 of HUMOSTRO JPEA_1 JPEA_1 JP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIFYVFI coπesponding to amino acids 32 - 39 of HUMOSTROJPEA_l JPEA_1 JP30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMOSTRO JPEA_1JPEA_1JP30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIFYVFI in HUMOSTRO _PEA_1 JPEA_1_P30.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMOSTRO JPEA_1JPEA_1_P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTROJPEA_lJ?EA_l JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
The glycosylation sites of variant protein HUMOSTRO JPEA_1 JPEA_1 JP30, as compared to the known protein Osteopontin precursor, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Variant protein HUMOSTRO JPEA_1 JPEA_1 JP30 is encoded by the following franscript(s): HUMOSTRO JPEA_1 _PEA_1 JT30, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMOSTRO JPE A_l JPEA_1_T30 is shown in bold; this coding portion starts at position 199 and ends at position 315. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOSTRO JPEA_1JPEA_1 JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
As noted above, cluster HUMOSTRO features 30 segment(s). which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMOSTROJPEA_l JPEA_l_node_0 according to the present invention is supported by 333 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 J?EA_1JT14, HUMOSTRO J?EA_1 JPEA_1JT16 and HUMOSTRO JPEA_1_PEA_1 JT30. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_10 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTROJPEA_l JPEA_1JT16. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on franscripts
Segment cluster HUMOSTRO JPEA_1 JPEA_l_node_16 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_23 according to the present invention is supported by 334 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO JPE A_l JPEA_1JT16. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HUMOSTRO J?EA_1 JPEA_l_node_31 according to the present invention is supported by 350 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1 JPEA_1_T14 and HUMOSTRO JPEA_1 J?EA_1 JT16. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_43 according to the present invention is supported by 192 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1_PEA_1_T14 and HUMOSTRO J?EA_1 JPEA_1 JT16. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMOSTRO JPEA_lJPEA_l_node_3 according to the present invention is supported by 353 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO _PEA_1JPEA_1_T14, HUMOSTRO_PEA_l_PEA_l_T16 and HUMOSTROJPEA_l JPEA_1 JT30. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMOSTRO_PEA_l JPEA_l_node_5 according to the present invention is supported by 353 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14, HUMOSTRO PEA_1JPEA_1_T16 and HUMOSTRO _PEA_1 JPEA_1_T30. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_nodeJ7 according to the present invention is supported by 357 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1JPEA_1_T14, HUMOSTRO J?EA_1JPEA_1JT16 and HUMOSTRO JPEA_1 JPEA_1 _T30. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_S according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 JPEA_1_T30. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_15 according to the present invention is supported by 366 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 JPEA_1_T14 and HUMOSTRO JPE A_l J?EA_1_T16. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_17 according to the present invention is supported by 261 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 JPEA_1_T14 and HUMOSTRO JPEA_1 JPEA_1 JT16. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l J?EA_l_node_20 according to the present invention can be found in the following transcript(s): HUMOSTRO JPE A_l JPEA_1 JT14 and HUMOSTRO JPE A_l J?EA_1JT16. Table 28 "below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Segment cluster HUMOSTRO JPEA_1 JPEA_l_node_21 according to the present invention is supported by 315 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1J?EA_1JT14 and HUMOSTRO JPE A_l J?EA_1 JT16. Table 29 below describes die starting and ending position of this segment on each transcript. Table 29 - Segment location on franscripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_nodeJ22 according to the present invention is supported by 322 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 PEA_1_T14 and HUMOSTRO J?EA_1 J?EA_1_T16. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on franscripts
Segment cluster HUMOSTRO JPEA_1 JPEA_l_node_24 according to the present invention is supported by 270 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO J?EA_1J?EA_1_T16. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_1_ node_26 according to the present invention can be found in the following franscript(s): HUMOSTRO JPEA_1 JPEA_1 JT14 and HUMOSTRO JPEA_1 JPEA_1_T16. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_27 according to die present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1JPEA_1JT14 and HUMOSTRO JPEA_1JPEA_1_T16. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMOSTRO JPEA_1 JPEA_l_node_28 according to the present invention is supported by 273 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO _PEA_1_PEA_1 JT14 and HUMOSTRO JPEA_1 _PEA_1 JT16. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_29 according to the present invention is supported by 272 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO J?EA_1 J?EA_1 JT14 and HUMOSTRO J?EA_1 J?EA_1 JT16. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMOSTRO JPEA_1 JPEA_l_node_30 according to the present invention can be found in the following transcript(s):HUMOSTROJPEA_lJPEA_l JT14 and HUMOSTROJPEA_lJPEA_l_T16. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_32 according to the present invention is supported by 293 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1 JPEA_1_T14 and HUMOSTRO JPEA_1 _PEA_1 JT16. Table 37 below describes the starting and ending position of this segment on each franscript. 7 b/e 37 - Segment location on transcripts
Segment cluster HUMOSTRO JPEA_lJPEA_l_node_34 according to the present invention is supported by 301 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO J?EA_1J?EA_1_T14 and HUMOSTRO J?EA_1 JPEA_1 JT16. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_36 according to the present invention is supported by 292 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO _PEA_ IJ?EA_1_T16. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HLTMOSTROJPEA_l JPEA_l_node_37 according to the present invention is supported by 295 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO JPE A_l J?EA_1_T16. Table 40 below describes the starting and ending position of this segment on each franscript. Table 40 - Segment location on franscripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_38 according to the present invention can be found in the following franscript(s): ITUMOSTROJPEA_l JPEA_1_T14 and HUMOSTROJPEA_l JPEA_1_T16. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_39 according to the present invention is supported by 268 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO J?EA_1 J?EA_1 JT16. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Segment cluster HUMOSTRO JPE A_l JPEA_l_node_40 according to the present invention can be found in the following franscript(s): HUMOSTRO JPEA_1 _PEA_1 _T14 and HUMOSTROJPEA_lJPEA_l_T16. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMOSTROJPEA_l JPEA_l_node_41 according to the present invention can be found in the following franscript(s): HUMOSTROJPEA_l JPEA_1 JT14 and HUMOSTRO JPE A_l JPEA_1 JT16. Table 44 below describes the starting and ending position of this segment on each franscript Table 44 - Segment location on transcripts
Segment cluster HUMOSTRO JPEA_1 _PEA_l_ node_42 according to the present invention is supported by 224 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMOSTRO JPEA_1JPEA_1_T14 and HUMOSTRO JPEA_1JPEA_1_T16. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: OSTP_HUMA
Sequence documentation:
Alignment of: HUMOSTRO_PEA_l_PEA_l_P21 x OSTP_HUMAN Al ignment segment 1 / 1 :
Quality: 578.00 Escore: 0 Matching length: 58 Total length: 58 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQ 50
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQ 50
51 KQNLLAPQ 58
51 KQNLLAPQ 58
Sequence name: OSTP_HUMAN
Sequence documentation: Alignment of: HUMOSTRO_PEA_l_PEA_l_P25 x OSTPJHUMAN
Alignment segment l/l:
Quality: 301.00 Escore: 0 Matching length: 31 Total length: 31 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31 1 MRIAVI CFCLLGITCAI PVKQADSGSSEEKQ 31
Sequence name: OSTP_HUMAN
Sequence documentation:
Alignment of: HUMOSTRO_PEA_1_PEA_1_P30 x OSTP_HUMAN Alignment segment l/l:
Quality: 301.00 Escore: 0 Matching length: 31 Total length: 31 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment
1 MRIAVICFCLLGITCAIPVKQADSGSSEEKQ 31
1 MRIAVICFCLLGITCAI PVKQADSGSSEEKQ 31
DESCRIPTION FOR CLUSTER RI 1723 Cluster RI 1723 features 6 franscript(s) and 26 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Cluster RI 1723 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of Figure 38 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 38 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and kidney malignant tumors.
Table 4 - Normal tissue disfribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster RI 1723 features 6 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein RI 1723JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) RI 1723JPEA_1_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region- Variant protein RI 1723_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein RI 1723JPEA_1 JP2 is encoded by the following transcript(s): RI 1723JPEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript RI 1723 JPE A_ 1_T6 is shown in bold; this coding portion starts at position 1716 and ends at position 2051. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein RI 1723JPEA_1 JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) RI 1723JPEA_1 JT15. One or more alignments to one or more previously published protein sequences are given at the end of the application A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between RI 1723JPEA_1 JP6 and Q8IXM0 (SEQ ID NO:885): l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTVNVQDMCQKEV MEQSAGIMYRKSCASS AACLIAS AGSPCRGLAPGREEQRALHKAGAVGGGVR coπesponding to amino acids 1 - 110 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 90 % homologous to MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 1 - 112 of Q8TXM0, which also coπesponds to amino acids 111 - 222 of R11723JPEA_1JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of R11723JPEA_1JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIWCTNNVQDMCQKEV MEQSAGIMYPJvSCASSAACLIASAGSPCRGLAPGPvEEQRALHKAGAVGGGV of R11723 PEA 1JP6.
Comparison report between RI 1723JPEA_1 JP6 and Q96AC2 (SEQ ED ΝO:886): l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQL u DCSSPEFιVNCTVNVQDMCQKEV MEQSAGΓMYRKSCASSAACLIASAG coπesponding to amino acids 1 - 83 of Q96AC2, which also coπesponds to amino acids 1 - 83 of RI 1723 JPE A_l JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVPXPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of RI 1723JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLWGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKJRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in RI 1723 J?EA_1 JP6. Comparison report between R11723J?EA_1J?6 and Q8N2G4 (SEQ JD NO:887): l.An isolated chimeric polypeptide encoding for R11723JPEA_1JP6, comprising a first amino acid sequence being at least 90 % homologous to MWNLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTNNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG coπesponding to amino acids 1 - 83 of Q8N2G4, which also coπesponds to amino acids 1 - 83 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGA VGGG VRMY AQ ALL WGVLQRQ AAAQHLHEHPPKLL RGHRVQER\ DRAE\ΕKRLREGEEDHVRPEVGPRPVNLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of RI 1723JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at feast about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGPxEEQRALHKAGAVGGGVRMYAQALLNVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPλ^VLGFGRSHDPPΝLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723J?EA_1J?6.
Comparison report between RI 1723 JPE A_l JP6 and BAC85518 (SEQ JD NO:888): l .An isolated chimeric polypeptide encoding for R11723JPEA_1JP6, comprising a first amino acid sequence being at least 90 % homologous to
M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTNNVQDMCQKEV MEQSAGIMYTIKSCASSAACLIASAG coπesponding to amino acids 24 - 106 of BAC85518, which also coπesponds to amino acids 1 - 83 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGP^EQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of RI 1723 JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723JPEA_1JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRA HKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAE KRLREGEEDHVRPEVGPRPVNLGFGRSrøPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R1 1723JPEA_1JP6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein RI 1723JPEA_1 JP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein RI 1723JPEA_1 JP6 is encoded by the following transcript(s): R11723JPEA_ 1JT15, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript RI 1723JPEA_1_T15 is shown in bold; this coding portion starts at position 434 and ends at position 1099. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein RI 1723 JPEA_1 JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) RI 1723 JPEA_1_T17. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between RI 1723JPEA_1 JP7 and Q96AC2: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to MWNLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNNQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also coπesponds to amino acids 1 - 64 of RI 1723 >EA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, j. referably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT coπesponding to amino acids 65 - 93 of
RI 1723JPEA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723JPEA_1JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723JPEA_1JP7. Comparison report between RI 1723JPEA_1 JP7 and Q8N2G4: l.An isolated chimeric polypeptide encoding for RI 1723J?EA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to M WVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG coπesponding to amino acids 1 - 64 of Q8N2G4, which also coπesponds to amino acids 1 - 64 of RI 1723 JPE A_l JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT coπesponding to amino acids 65 - 93 of
RI 1723JPEA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723JPEA_1JP7.
Comparison report between RI 1723JPEA_1 JP7 and BAC85273: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWNLG coπesponding to amino acids 1 - 5 of R11723JPEA_1JP7, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLΝΝDCSSPEFIλ^CTNΝVQDMCQKEVMEQSAG coπesponding to amino acids 22 - 80 of BAC85273, which also coπesponds to amino acids 6 - 64 of RI 1723JPEA_1 JP7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCNTRLECSGTISAHCΝLCLPGSΝDHPT coπesponding to amino acids 65 - 93 of RI 1723JPEA_1 JP7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of RI 1723JPEA. JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWNLG of RI 1723_PEA_1 JP7. 3.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723 PEA_1JP7.
Comparison report between RI 1723JPEA_1 JP7 and BAC85518: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG coπesponding to amino acids 24 - 87 of BAC85518, which also coπesponds to amino acids 1 - 64 of RI 1723JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT coπesponding to amino acids 65 - 93 of RI 1723J?EA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723JPEA_1JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723JPEA_1JP7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region.. Variant protein RI 1723JPEA_1 JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723 JPE A_l JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein RI 1723JPEA_1 JP7 is encoded by the following transcript(s): RI 1723JPEA_1 JT17, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript RI 1723JPEA_1 JTl 7 is shown in bold; this coding portion starts at position 434 and ends at position 712. The franscript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein R11723JPEA_1JP13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) RI 1723JPEA_1JT19. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between RI 1723 JPE A_l JP13 and Q96AC2: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP13, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA coπesponding to amino acids 1 - 63 of Q96AC2, which also coπesponds to amino acids 1 - 63 of R11723JPEA_1JP13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DTTsJRTNTLLFEMRHFAKQLTT coπesponding to amino acids 64 - 84 of RI 1723JPEA_1 JP13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in RI 1723J?EA_1 JP13. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signatpeptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein R11723JPEA_1 JP13 is encoded by the following franscript(s): RI 1723 JPEA_1_T19 and RI 1723JPEA_1 JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript RI 1723JPEA_1_T19 is shown in bold; this coding portion starts at position 434 and ends at position 685. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein RI 1723JPEA_1 JP10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) RI 1723JPEA_1 JT20. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of die variant protein according to the present invention to each such aligned protein is as follows: Comparison report between RI 1723 _PEA_1 JP10 and Q96AC2: l.An isolated chimeric polypeptide encoding for R11723JPEA_1JP10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQL ^NDCSSPEFIΛ^CTNNVQDMCQKEV MEQSA coπesponding to amino acids 1 - 63 of Q96AC2, which also coπesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of
RI 1723 JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723_PEA_1 JP10.
Comparison report between RI 1723JPEA_1 JP10 and Q8N2G4: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
M LGIAATFCGLFLLPGFALQIQCYQCEEFQLNNϋCSSPEFIVNCTVNVQDMCQKEV MEQSA corresponding to amino acids 1 - 63 of Q8Ν2G4, which also coπesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of
RI 1723J?EA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95?/o homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723JPEA_1 JP10.
Comparison report between RI 1723J?EA_1 JP10 and BAC85273: 1.An isolated chimeric polypeptide encoding for RI 1723 JPEA_1 JP10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWNLG coπesponding to amino acids 1 - 5 of RI 1723JPEA_1 JP10, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLΝΝDCSSPEFI VNCTVNVQDMCQKEVMEQSA coπesponding to amino acids 22 - 79 of BAC85273, which also coπesponds to amino acids 6 - 63 of RI 1723JPEA_1 JP10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of
RI 1723JPEA_1 JP10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI 1723JPEA_1 JP10. 3. An isolated polypeptide encoding for a tail of RI 1723 JPE A_l JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723J?EA_1 JP10.
Comparison report between R11723JPEA_1JP10 and BAC85 18: l.An isolated chimeric polypeptide encoding for RI 1723JPEA_1JP10, comprising a first amino acid sequence being at least 90 % homologous to M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTVNVQDMCQKEV MEQSA coπesponding to amino acids 24 - 86 of BAC85518, which also coπesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of RI 1723JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723JPEA_1JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723JPEA_1 JP10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows withregard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein RI 1723JPEA_1 JP10 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein R11723JPEA_1JP10 is encoded 'ιy the following transcript(s): RI 1723 JPE A_l JT20, for which the sequence(s) is/an- given at the end of the application. The coding portion of franscript RI 1723 JPE A_1_T20 is shown in bold; this coding portion starts at position 434 and ends at position 703. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster RI 1723 features 26 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster RI 1723JPEA_l_node_13 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723J?EA_1_T19, RI 1723 JPE A_1_T5 and RI 1723JPEA_1_T6. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_16 according to the present invention is supported by 3 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following franscript(s): R11723_PEA_1_T17, R11723JPEA_1_T19 and RI 1723 JPE A_l JT20. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_19 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723JPEA_1 _T5 and RI 1723 JPEA_1 _T6. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster RI 1723 JPE A_l_node_2 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723_PEA_1 JT15, RI 1723 JPEA_1 JTl 7, RI 1723 jPEA_l JT19, RI 1723JPEA_1_T20, RI 1723JPEA_1_T5 and RI 1723JPEA_1 _T6. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_22 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723JPEA_1_T5 and RI 1723JPEA_1_T6. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster RI 1723 JPE A_l_node_31 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723J?EA_1_T15, RI 1723JPEA_1_T5 and RI 1723JPEA_1_T6. Table 20 below describes the starting and ending position of this segment on each franscript (it should be noted that these transcripts show alternative polyadenylation). Table 20 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster RI 1723JPEA_l_node_10 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723JPEA_1_T15, R11723JPEA_1_T17, RI 1723J?EA_1 JT19, RI 1723J?EA_1_T20, RI 1723JPEA_1_T5 and RI 1723J?EA_1_T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on franscripts
Segment cluster RI 1723 JPEA_l_node_l 1 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723JPEA_1 JT15, RI 1723JPEA_1 JT17, RI 1723JPEA_1 JT19, RI 1723JPEA_1 JT20, RI 1723 JPEA_1_T5 and RI 1723 J EA_1_T6. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster RI 1723JPEA_ _node_15 according to the present invention can be found in the following transcript(s): RI 1723JPEA_1 JT20. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on franscripts
Segment cluster RI 1723JPEA_l_node_18 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723JPEA_1 JT15, RI 1723JPEA_1 _T5 and RI 1723JPEA_1 _T6. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_20 according to the present invention can be found in the following transcript(s): R11723_PEA_1_T5 and R11723JPEA_1_T6. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster RI 1723 JPEA_l_node_21 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723 JPE A_1_T5 and R11723JPEA_1_T6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster RI 1723 JPEA_l_node_23 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723JPEA_1 JT5 and RI 1723JPEA_1 JT6. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_24 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723JPEA_1 JT15, RI 1723JPEA_1_T5 and RI 1723JPEA_1 _T6. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on franscripts
Segment cluster RI 1723JPEA_l_node_25 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723JPEA_1_T15, R11723JPEA_1_T5 and RI 1723JPEA_1_T6. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster R11723JPEA_l_node_26 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723JPEA_1_T15, R11723JPEA_1JT5 and RI 1723JPEA_1_T6. Table 30 below describes the starting and ending position of this segment on each transcript. 95: Table 30 - Segment location on franscripts
Segment cluster RI 1723 JPEA_l_node_27 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723J?EA_1 JT15, RI 1723 JPEA_1_T5 and RI 1723JPEA_1_T6. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Segment cluster RI 1723 JPEA_l_node_28 according to the present invention can be found in the following transcript(s): R11723_PEA_1_T15, RI 1723 JPE A_1_T5 and RI 1723JPEA_1_T6. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_29 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723JPEA_1JT15, R11723JPEA_1_T5 and RI 1723JPEA_1 JT6. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster RI 1723 JPE A_l_node_3 according to the present invention can be found in the following transcript(s): RI 1723J?EA_1_T15, RI 1723J?EA_1_T17, RI 1723J?EA_1 JT19, RI 1723JPEA JT20, RI 1723JPEA_1 _T5 and RI 1723_PEA_1 JT6. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_30 according to the present invention can be found in the following transcript(s): R11723J?EA_1_T15, R11723J?EA_1_T5 and RI 1723JPEA_1 JT6. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster RI 1723 JPEA_l_node_4 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723JPEA_1_T15, R11723JPEA_1_T17, R11723_PEA_1_T19, RI 1723 JPE A_1_T20, R11723_PEA_1_T5 and R11723J?EA_1_T6. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster RI 1723JPEA_l_node_5 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723JPEA_1_T15, RI 1723J?EA_1_T17, RI 1723 J?EA_1 JT19, RI 1723JPEA_1 JT20, RI 1723 JPE A_l JT5 and RI 1723JPEA_1_T6. Table 37 below describes the starting and ending position of this segment on each franscript.
Segment cluster RI 1723 J?EA_l_node_6 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723J?EA_1_T15, R11723JPEA_1_T17, R11723J?EA_1_T19, RI 1723 JPE A_1JT20, R11723JPEA_1JT5 and R11723J?EA_1_T6. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster R11723JPEA_l_node_7 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723J?EA_1 JT15, RI 1723J?EA_1_T17, R11723JPEA_1_T19, R11723J?EA_1_T20, R11723_PEA_1JT5 and R11723_PEA_1_T6. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster R11723JPEA_l_node_8 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723JPEA_1 JT6. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/gpδeQTLWqk/mFtjUpϋzhb :Q8IXM0 Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q8IXM0
Alignment segment l/l: Quality: 1128.00 Escore : 0 Matching length: 112 Total length: 112 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
111 MYAQALLWGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 160 1 1 1 I I I M I M 1 1 1 1 1 M I M i M 1 1 1 i 1 1 1 M 1 1 1 1 1 1 1 1 M 1 I I M 1 1 1 MYAQALLWGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 50
161 GEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQCH NQP ADTSRRE 210 I 51 GEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQCHNNQP ADTSRRE 100
!11 RQRKEKHSMRTQ 222
101 RQRKEKHSMRTQ 11.
Sequence name : /tmp/gp6eQTLWqk/mFt jUpUshb : Q96AC. Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q96AC2
Alignment segment 1/1:
Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIV CTVNV 50
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 Sequence name: /tmp/gp6eQTL qk/mFtjUpUzhb:Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q8N2G4
Alignment segment 1/1:
Quality: 835.00 Escore : 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIVNCTVNV 50
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL DCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 Sequence name: /tmp/gp6eQTL qk/mFtjUpUzhb:BAC85518
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x BAC85518
Alignment segment l/l:
Quality: 835.00 Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTV V 50 24 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLN DCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
74 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 106
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q96AC2
Alignment segment l/l:
Quality: 654.00
Escore : 0 Matching length: 64 Total length : 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MWNLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIV CTVNV 50
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAG 64 I M I I I I I I I I I I I 51 QDMCQKEVMEQSAG 64 96.
Sequence name: /tmp/VXjdFlsdBX/bexTxThOTh:Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q8N2G4
Alignment segment l/l: Quality: 654.00
Escore : 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL DCSSPEFIVNCTV V 50
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIV CTVNV 50
51 QDMCQKEVMEQSAG 64 MINIMUM!! 51 QDMCQKEVMEQSAG 64
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:BAC85273
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x BAC85273
Alignment segment l/l:
Quality: 600.00 Escore: 0 Matching length: 59 Total length: 59 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 MMMMMMMMMMMMMMMMMMMMMIMIMM 22 IAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIVNCTV VQDMCQ 71 56 KEVMEQSAG 64 IIIIIMI! 72 KEVMEQSAG 80
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:BAC85518
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x BAC85518
Alignment segment l/l:
Quality: 654.00 Escore : 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLN DCSSPEFIVNCTVNV 50 11 II 11 II II II II I II II 111 II I II I II 111 II 11 II 11 II 11 II I II 24 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIVNCTVNV 73 51 QDMCQKEVMEQSAG 64 MMMMMMM 74 QDMCQKEVMEQSAG 87
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x Q96AC2
Alignment segment l/l:
Quality: 645.00 Escore : 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIV CTNNV 50 MMMMMMMMMMMMMMMMMMMMMIMIMM 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVV 50
51 QDMCQKEVMEQSA 63
51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR : Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x Q8N2G4
Alignment segment 1/1:
Quality: 645.00 Escore : 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIV CTVNV 50 MMMMMMMMMMMMMMMMMMMMMIMIMM 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSA 63 MIMMMMM 51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85273
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x BAC85273
Alignment segment 1/1: Quality: 591.00
Escore : 0 Matching length: 58 Total length: 58 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
6 IAATFCGLFLLPGFALQIQCYQCEEFQL NDCSSPEFIV CTVNVQDMCQ 55 MMMMMMMMMMMMMMMMMMMMMMMMM 22 IAATFCGLFLLPGFALQIQCYQCEEFQL DCSSPEFIV CTVNVQDMCQ 71
56 KEVMEQSA 63 IMIIIII 72 KEVMEQSA 79
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC8551ϊ
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x BAC85518
Alignment segment l/l:
Quality: 645.00
Escore : 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: . 100.00 Gaps : 0
Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLN DCSSPEFIV CTVNV 50 MMMMMMMMMMMMMMMMMMMMMMMMM 24 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVV 73 51 QDMCQKEVMEQSA 63 MMMIMMM 74 QDMCQKEVMEQSA 86
Alignment of: R11723_PEA_1_P13 x Q96AC2
Alignment segment 1/1:
Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTV V 50
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIV CTVNV 50
51 QDMCQKEVMEQSA 63
51 QDMCQKEVMEQSA 63
Expression of RI 1723 transcripts which are detectable by amplicon as depicted in sequence name RI 1723 segl3 in normal and cancerous breast tissues Expression of transcripts detectable by or according to segl3, RI 1723 segl3 amplicon(s) and RI 1723 segl3F and R11723 segl3R primers was measured by real time PCR. It should be noted that the variants of this cluster are variants of the hypothetical protein PSEC0181 (refeπed to herein as "PSEC"). In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. MM_000194; amplicon - HPRTl -amplicon), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal postmortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1, "Tissue samples in testing panel" above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 39 is a histogram showing over expression of the above- indicated franscripts in cancerous breast samples relative to the normal samples. As is evident from Figure 39, the expression of franscripts detectable by the above amplicon(s) in cancer samples was higher than in the non-cancerous samples (Sample Nos. 56- 60, 63-67 Table 1, Tissue samples in testing panel). Notably an over-expression of at least 5 fold was found in 5 out of 28 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: RI 1723 segl3F forward primer; and RI 1723 segl3R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: R11723 segl3. RI 1723segl 3F (SEQ JD NO:889)- ACACTAAAAGAACAAACACCTTGCTC RI 1723segl3R (SEQ ID NO:890)- TCCTCAGAAGGCACATGAAAGA RI 1723segl3 amplicon (SEQ ID NO:891): ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTG ACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTCTGAGGA
Expression ofRl 1723 transcripts which are detectable by amplicon as depicted in sequence name R11723segl3 in different normal tissues Expression of R11723 transcripts detectable by or according to R11723segl3 amplicon and R11723segl3F, R11723segl3R was measured by real time PCR. In parallel the expression of four housekeeping genes RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20 Table 2 "Tissue samples in normal panel" above), to obtain a value of relative expression of each sample relative to median of the ovary samples. Primers and amplicon are as above. The results are presented in Figure 40, demonstrating the expression of RI 1723 transcripts which are detectable by amplicon as depicted in sequence name R11723segl3 in different noimal tissues.
Expression of RI 1723 franscripts, which are detectable by amplicon as depicted in sequence name RI 1723 juncl 1- 18 in normal and cancerous breast tissues. Expression of transcripts detectable by or according to juncl 1-18, R11723 juncl 1-18 amplicon(s) and R11723 juncl l-18F and R11723 juncl l-18R primers was measured by real time PCR (this junction and hence the amplicon are found in the previous known protein, also termed the "wild type" or WT protein, for which the sequence is given herein; the protein is also called "PSEC"). Use of the known protein (WT protein) for detection of breast cancer, alone or in combination with one or more variants of this cluster and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention. In parallel the expression of four housekeeping genes PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRT1- amplicon), SDHA (GenBank Accession No. NMJ304168; amplicon - SDHA-amplicon), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 1: Tissue samples in testing panel, above), to obtain a value of fold upregulation for each sample relative to median of the normal PM samples. Figure 41 A is a histogram showing over expression of the above- indicated franscripts in cancerous breast samples relative to the normal samples. As is evident from Figure 41 A, the expression of transcripts detectable by the above amplicon in a few cancer samples was higher than in the non- cancerous samples (Sample Nos. 56-60, 63-67, Table 5: "Tissue samples in breast cancer testing panel"). Notably an over- expression of at least 5 fold was found in 5 out of 28 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: RI 1723 juncl 1-18F forward primer; and R 11723 junc 11 - 18R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non- limiting illusfrative example only of a suitable amplicon: RI 1723 juncl 1- 18. RI 1723juncl 1- 18F (SEQ ID NO:892)- AGTGATGGAGCAAAGTGCCG RI 1723 juncl 1- 18R (SEQ ID NO:893) - CAGCAGCTGATGCAAACTGAG R11723 juncl l- 18 (SEQ ID NO:894)-
AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGG CCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACT CAGTTTGCATCAGCTGCTG
Expression of RI 1723 franscripts, which were detected by amplicon as depicted in the sequence name RI 1723 juncl 1- 18 in different normal tissues.
Expression of RI 1723 transcripts detectable by or according to R11723segl3 amplicon and RI 1723 juncl 1-1 F, R11723 juncl 1-18R was measured by real time PCR (as described above, this junction and hence the amplicon are found in the previous known protein, also termed the "wild type" or WT protein, for which the sequence is given herein; the protein is also called "PSEC"). In parallel the expression of four housekeeping genes RPL 19 (GenBank Accession No. NM_000981; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2: Tissue samples in normal panel, above), to obtain a value of relative expression of each sample relative to median of the ovary samples. Figure 41B shows the level of expression of this franscript. Primers and amplicon are as for the example above.
The variant franscript expression pattern for this cluster is similar to the wild type transcript expression. However, in some cases (e.g. ovary cancer) over expression of the variant seems to be higher (for example, with regard to RI 1723 _PEA_1 JT5).
DESCRIPTION FOR CLUSTER T46984 Cluster T46984 features 21 transcript(s) and 49 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor (SwissProt accession identifier RIB2 JHUMAN; known also according to the synonyms EC 2.4.1.119; Ribophorin II; RPN-II; RIBIIR), SEQ ID NO: 663, refeπed to herein as the previously known protein. Protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor is known or believed to have the following function(s): Essential subunit of N- oligosaccharyl transferase enzyme which catalyzes the transfer of a high mannose oligosaccharide from a lipid- linked oligosaccharide donor to an asparagine residue within an Asn-X-Ser/Thr consensus motif in nascent polypeptide chains. The sequence for protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor is given at the end of the application, as "Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor localization is believed to be Type I membrane protein. Endoplasmic reticulum. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: protein modification, which are annotation(s) related to Biological Process; oligosaccharyl fransferase; dolichyl-diphosphooligosaccharide-protein glycosylfransferase; fransferase, which are annotation(s) related to Molecular Function; and oligosaccharyl fransferase complex; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster T46984 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 42 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 42 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues, breast malignant tumors, ovarian carcinoma and pancreas carcinoma.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
r
1 uterus 2.3e-01 1.3e-01 2.2e-02 1.5 5.0e-02 1.4 As noted above, cluster T46984 features 21 franscript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor. A description of each variant protein according to the present invention is now provided.
Nariant protein T46984JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1_T2. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T469S4JPEA_1 JP2 and RTB2 JHUMAN: l .An isolated chimeric polypeptide encoding for T46984JPEA_1JP2, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTHYLTKHDNERLKASLDRPFTΝLESAFYSIVGLSSL GAQWDAKKACTYIRSΝLDPSΝNDSLFYAAQASQALSGCEISISΝETKDLLLAAVSEDSS VTQ1YHAVAALSGFGLPLASQEALSALTARLSK ETNLATVQALQTASHLSQQADLRSI EEIEDLNARLDELGGNYLQFEEGLETTALFVAAT T ΝlDHVGTEPSIKEDQNIQLMΝA IFSKKNFESLSEAFSVASAAANLSFINR ΗWNNNΛ EGSASDTHEQAILrLQVT,NNLSQ PLTQATVKLEHAXSVASRATVLQKTSFTPVGD ELΝF 'IΝNKFSSGYYDFLNEVEGDΝ RYIAΝTNELRNI<-ISTEVGITΝNDLST\ΦKI)QSIAPKTTRVT^ FFQLNDWTGAELTTHQTFNRLHNQKTGQE r^AEPDNKj YTO^ ASGTYTLYLIIGDATLKNPILWNN coπesponding to amino acids 1 - 498 of RTB2_HUMAΝ, which also coπesponds to amino acids 1 - 498 of T46984JPEA_1 JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NCA coπesponding to amino acids 499 - 501 of T46984JPEA_1 JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region.
The glycosylation sites of variant protein T46984JPEA_1 JP2, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP2 is encoded by the following transcript(s): T46984JPEA_1_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T46984JPEA_1_T2 is shown in bold; this coding portion starts at position 316 and ends at position 1818. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; die last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T46984JPEA_1 JT3. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previous ly published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP3 and RIB2_HUMAN: l.An isolated chimeric polypeptide encoding for T4ό9S4JPEA_l JP3, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPT T.TKlTDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAV AALSGFGLPLASQE ALS ALTARLSKEETNLATVQ ALQTASHLSQQADLRSl VEEIEDLVARLDELGG ^YLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMΝA IFSKKΝFESLSEAFSVASAAAVLSHΝRYHVPVVNNPEGSASDTHEQAILRLQVTΝVLSQ PLTQATλ^KLEHAKSVASRATNLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLVEVEGDΝ RYlAΝTVELRVKISTEVGIT TDLST\ DKDQSIAPKTTRVTYPAKAKGTFIADSHQΝFAL FFQLλ VNTGAELTPHQ coπesponding to amino acids 1 - 433 of RTB2 JHUMAN, which also coπesponds to amino acids 1 - 433 of T46984JPEA_1JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence ICHIWKLIFLP coπesponding to amino acids 434 - 444 of T46984JPEA_1 JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP3, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence ICHIWKLIFLP in T46984JPEA_1 JP3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984JPEA_1 JP3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP3, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 10 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 10 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP3 is encoded by the following transcript(s): T46984J?EA_1 JT3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1_T3 is shown in bold; this coding portion starts at position 316 and ends at position 1647. The franscript also has the following SNPs as listed in Table 11 (given according to their position on the nucfeotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP10 according to the present invention has an anuno acid sequence as given at the end of the application; it is encoded by franscript(s) T46984JPEA_1_T13. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1 JP10 and PJB2 JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSST LLALTπASTWALTPTITYTTKHDVEP KASLDRPFTNLESAFYSINGLSSL GAQWDAKKACTYIRSΝLDPSΝNDSLFYAAQASQALSGCEISISΝETKDLLLAAVSEDSS VTQIYHAV AALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDL V ARLDELGG VYLQFEEGLETT ALF V AAT YKLMDHVGTEPSIKEDQ VIQLMNA IESKKNFESLSEAFSVASAAAVLSHNRYH VVλ^VPEGSASDTHEQAILRLQVTNNLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELΝFMΝNKFSSGYYDFLVEVEGDΝ RYIAΝTVELRVKISTEVGITΝVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQΝFAL FFQLVDVΝTGAELTPHQTFVTLHΝQKTGQEVNFVAEPDΝKΝNYK^ELDTSEPJKTEFDS ASGTYTLYLIIGDATLKΝPILWΝV coπesponding to amino acids 1 - 498 of RIB2JHUMAΝ, which also coπesponds to amino acids 1 - 498 of T46984JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence LMDQK coπesponding to amino acids 499 - 503 of T46984J?EA_1 JP10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T469S4JPEA_1JP10, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LMDQK in T46984J?EA_1 JP10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T46984JPEA_1 JP10 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T469S4JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP10, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor, are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP10 is encoded by the following transcript(s): T46984JPEA_1JT13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T469S4JPEA_1_T13 is shown in bold; this coding portion starts at position 316 and ends at position 1824. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein T4ό984JPEA_l JP11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1_T14. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP11 and RTO2JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP11, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVT^LALTIIASTWALTPTTTΫTTKHDλΗ^ GAQVPDAKKACTYTRSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYITAVAALSGFGLPLASQEALSALTAP SKΕETVLATVQALQTASHLSQQADLRSI VEEffiDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHNGTEPSTKEDQVIQLMNA IFSKKNFESLSEAFSVASAAAVLSIINR\ΗWV EGSASDTHEQAm PLTQATVKLEHAKSVASRATNLQKTSFTPVGDWELNFMNVKFSSGYYDFLVEVEGDN RYIANTVELRVKTSTEVGITT^ NDLSTVDKDQSIAPKTTRVTYPAKAKGTFlADSHQNFAL FFQLVDVNTGAELTPHQTFVP HNQKTGQEVNFVAEPDNK ^YKEELDTSEPJOEFDS ASGTYTLYLIIGDATLKNPILWNNADλ^αKITEEEAPSTVLSQNLFTPKQEIQHLFREPEK RPPTVNSNTFTALILSPLLLLFALWiPJGANNSNFTFAPSTπFHLGHAAMLGLMYN^^ T QLΝTVIFQTLKYLAILGSVTFLAGΝPJVTLAQQAVKR coπesponding to amino acids 1 - 628 of Rffi2 JHUMAN, which also coπesponds to amino acids 1 - 628 of T46984 JPEA_1 J? 1 1.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both frans-membrane region prediction programs predict that this protein has a frans-membrane region downstream of this signal peptide. Variant protein T469S4JPEA_1 JP11 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP11, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 16 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 16 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP11 is encoded by the following franscript(s): T46984JPEA_1 JT14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1 JTl 4 is shown in bold; this coding portion starts at position 316 and ends at position 2199. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1JP1 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1_T15. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP12 and Rffi2_HUMAN: 1.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP12, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTHYLTKΗDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYIRSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQTYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEffiDLVARLDELGGVYLQFEEGLETTALF VAATΎKLMDHVGTEPSTKEDQVIQLMNA IFS CNFESLSEAFSVASAAAVLSHNRYH VVNVPEGSASDTHEQAILRLQVTNVLSQ PLTQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN coπesponding to amino acids 1 - 338 of RIB2JHUMAN, which also coπesponds to amino acids 1 - 338 of T46984JPEA_1 JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SQDLH conesponding to amino acids 339 - 343 of T469S4JPEA_1 JP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP12, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SQDLH in T46984JPEA_1 JP12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T46984JPEA_1 JP12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 18, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_J JP12, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 19 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 19 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP12 is encoded by the following transcript(s): T46984JPEA_1_T15, for which the sequence(s) is/are given at the end of die application. The coding portion of transcript T46984JPEA_1JT15 is shown in bold; this coding portion starts at position 316 and ends at position 1344. The franscript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of 1001 known SNPs in variant protein T46984JPEA_1 JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_ 1 JT27. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP21 and PJB2JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence M coπesponding to amino acids 1 - 1 of T469S4JPEA_1 JP21, and a second amino acid sequence being at least 90 % homologous to
KACTYΓRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSSVTQIYHAV
AALSGFGLPLASQEALSALTARLSKEETNLATVQALQTASHLSQQADLRS1\'ΕET£DLVA
RLDELGGVYLQFEEGLETTALFVAATYKLMDHNGTEPSRI^DQVIQLMΝAIFSKKΝEES LSEAFSVASAAAVLSHNRYHVPVWNPEGS ASDTHEQAILRLQVTNNLSQPLTQATVKL EHAKSVASRATVLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLVEVEGDΝRYIAΝTVEL RVKISTEVGIΩWDLST\ KJ3QSIAPKTTRVTYPAKAKGTFIADSHQΝFALFFQLVDVΝT GAELTPHQTTVP HNQKTGQEV VAEPDNKNVYKFELDTSERKIEFDSASGTYTLYLII GDATLKΝPILWΝNAD\^IKFPEEEAPSTVLSQΝLFTPKQEIQHLFPJEPEKRPPT\NSΝTF TALILSPLLLLFAL RIGANNSNFTFAPSTIIFFFLGHAAN LGLMYNYWTQLNMFQTLKY
LAILGSVTFLAGNRMLAQQAVKRTAH coπesponding to amino acids 70 - 631 of RIB2JHUMAN, which also coπesponds to amino acids 2 - 563 of T46984JPEA_1 JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- embrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein T469S4JPEA_1 JP21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP21, as compared to the known protein Dolichyl-diphosphooligosaccharide--protein glycosylfransferase 63 kDa subunit precursor, are described in Table 22 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 22 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP21 is encoded by the following transcript(s): T46984JPEA_1JT27, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984 JPEA_1 JT27 is shown in bold; this coding portion starts at position 338 and ends at position 2026. The transcript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP27 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T46984JPEA_1 JT34. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984_PEA_1 JP27 and RIB2JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP27, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTmT,TKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQ DAKKACTY SNLDPS TvOSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYΗAVAALSGFGLPLASQEALSALTAP SKΕETNLATNQALQTASHLSQQADLRSI NEEffiDLVARLDELGGVYTQFEEGLETTALFVAATYKLMDFiNGTEPSIKEDQVIQLMΝA IFSKKΝTESLSEAJFSVASAAAVLSFIΝTiYHNPVΛNNPEGSASDTHEQAlXR^ ' PLTQATVKLEHAKSVASPvATVLQKTSFTTVGDVFELΝFMΝNKJSSGYYDFLVEVEGDΝ RYIAΝTNELRNKISTEVGITΝNDLSTVDI DQS1APKTTRVT\TAKAKGTFIADSHQΝFA coπesponding to amino acids 1 - 415 of RTB2 JHUMAN, which also coπesponds to amino acids 1 - 415 of T46984JPEA_1 JP27, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%) homologous to a polypeptide having the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP coπesponding to amino acids 416 - 459 of T46984JPEA_1 JP27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP27, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence FGSGLVPMSPTSLLLLARLYFTWDMLLCWDSCMSTGLSSTCSRP in T46984J?EA_1J?27. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T46984JPEA_1 JP27 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 24, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Amino acid mutations
The glycosylation sites of variant protein T46984J?EA_1 JP27, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 25 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 25 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP27 is encoded by the following franscript(s): T46984JPEA_1 JT34, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T46984JPEA_1_T34 is shown in bold; this coding portion starts at position 316 and ends at position 1692. The franscript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
101:
Variant protein T46984JPEA_1 JP32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T469S4JPEA_1 _T40. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP32 and RIB2JHUMAN: 1.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP32, comprising a first amino acid sequence being at least 90 % homologous to
M,\PPGSSTWLLALTIIASTWALTPTMT.TΕΗDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAKKACTYTRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQΠΉAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTASHLSQQADLRSI VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNA TFSKKNFESLSE AFS VAS A A A VLSHNRYHVP V WVPEGS ASDTHEQ AILRLQ VTNVLSQ PLTQATVKLEHAKSVASRATNLQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDN
RY1ANTVE coπesponding to amino acids 1 - 364 of RIB2 JHUMAN, which also coπesponds to amino acids 1 - 364 of T46984JPEA_1 JP32, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GQVRWLTPVrPALWEAKAGGSPEVRSSILAWPT coπesponding to amino acids 365 - 397 of T46984JPEA_1 JP32, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP32, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GQVRWLTPVTPALWEAKAGGSPEVRSSILAWPT in T46984JPEA_1 JP32.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T46984J?EA_1 JP32 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 27, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - A mino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP32, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor, are described in Table 28 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 28 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP32 is encoded by the following franscript(s): T46984JPEA_1_T40, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T46984JPEA_1 JT40 is shown in bold; this coding portion starts at position 316 and ends at position 1506. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1 _T42. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP34 and RIB2 JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1JP34, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTI STWALTPTHYLTKΗDVER^ GAQ DAKKACTYlPvSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAANSEDSS NTQIYITAVAALSGFGLPLASQEALSA TARLSKEETVLATVQALQTASHLSQQADLRSI VEEffiDLVARLDELGGVYLQFEEGLETTALFVAATΥKLMDHNGTEPSTKEDQVIQLMΝA IFSKKΝFESLSEAFSVASAAAVLSFiΝllYΗWVW^ PLTQATVKLEHAKSVASRATVLQKTSFTPVG coπesponding to amino acids 1 - 329 of RIB2 JHUMAN, which also coπesponds to amino acids 1 - 329 of T46984JPEA_1 JP34.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984JPEA_1 JP34 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 30, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 30 - Amino acid utations
The glycosylation sites of variant protein T46984JPEA_1 JP34, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 31 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and die last column indicates whether the position is different on the variant protein). Table 31 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP34 is encoded by the following franscript(s): T46984 JPEA_1_T42, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1_T42 is shown in bold; this coding portion starts at position 316 and ends at position 1302. The transcript also has the following SNPs as listed in Table 32 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by ttanscript(s) T46984JPEA_1_T43. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984 JPEA_ 1 JP35 and RIB2 JHUMAN : l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP35, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTWLLALTIIASTWALTPTHYLTKΉDVERLKASLDRPFTNLESAFYSIVGLSSL GAQ DAO ACTYIRSNLDPSNVDSLFYA\QASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAV AALSGFGLPLASQEALS ALT ARLSKEETVLATNQ ALQTASHLSQQADLRSL VEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMΝA IFSKKΝFESLSEAFSVASAAAVLSHΝR\ΗWVNWPEGSASDTHEQAI coπesponding to amino acids 1 - 287 of RTB2 JHUMAN, which also coπesponds to amino acids 1 - 287 of T46984JPEA_1 JP35, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GCWPSRQSREQfflSSRRKMEILKTECQEKESRTmSMRRKivffi^ coπesponding to amino acids 288 - 334 of T46984JPEA_1 JP35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1JP35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GCWPSRQSREQfflSSRRKMEILKTECQEKΕSRTmSMRRKMEKKNOFI in T46984JPEA_1JP35. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984JPEA_1 JP35 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 33, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 -Amino acid mutations
10. The glycosylation sites of variant protein T469S4JPEA_1 JP35, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 34 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 34 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP35 is encoded by the following transcript(s): T46984JPEA_1JT43, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1_T43 is shown in bold; this coding portion starts at position 316 and ends at position 1317. The franscript also has the following SNPs as listed in Table 35 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T469S4JPEA_1 JP35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP38 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1_T47. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP38 and PJB2 JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP38, comprising a first amino acid sequence being at least 90 % homologous to
MAPPGSSTVFLLALTI STWALTPTI V TKΗDVERLKASLDRPFTN^ GAQWDAKXACTYTRSNLDPSNNDSLFYAAQASQALSGCEISISNETKDLLLAAVSEDSS VTQIYHAV AALSGFGLPLASQEAL coπesponding to amino acids 1 - 145 of RTB2JHLTMAN, which also coπesponds to amino acids 1 - 145 of T46984JPEA_1 JP38, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MDPDWCQCLQLHFCS coπesponding to amino acids 146 - 160 of T469S4JPEA_1 JP38, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP38, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MDPDWCQCLQLHFCS in T46984JPEA_1 JP38. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T46984JPEA_1 JP38 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 36, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP38, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 37 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 37 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP38 is encoded by the following transcript(s): T46984JPEA_1 JT47, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1_T47 is shown in bold; this coding portion starts at position 316 and ends at position 795. The franscript also has the following SNPs as listed in Table 38 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP38 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T46984JPEA_1 JT48. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP39 and RTB2 JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP39, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSST LLALTIIASTWALTPTT TTKITDVERLKASLDRPFTNLESAFYSINGLSSL GAQVPDAKKACTYIRSΝLDPSΝNDSLFYAAQASQALSGCEISISΝETKDLLLAAVSEDSS VTQIYHAV AALSGFGLPLASQEALSALTARLSKEETVLA coπesponding to amino acids 1 - 160 of RTB2JHUMAΝ, which also coπesponds to amino acids 1 - 160 of T46984JPEA_1 JP39.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T46984JPEA_1 JP39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 39, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last colunm indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 39 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP39, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 40 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 40 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP39 is encoded by the following franscript(s): T46984JPEA_1 JT48, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T46984JPEA__1_T48 is shown in bold; this coding portion starts at position 316 and ends at position 795. The franscript also has the following SNPs as listed in Table 41 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP45 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T46984JPEA_1_T32. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984JPEA_1 JP45 and RIB2 JHUMAN: l.An isolated chimeric polypeptide encoding for T46984JPEA_1 JP45, comprising a first amino acid sequence being at least 90 % homologous to MAJPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQ DAKKACTYIRSNLDPSNVDSLFYAAQASQALSGCE coπesponding to amino acids 1 - 101 of RIB2 JHUMAN, which also coπesponds to amino acids 1 - 101 of T46984J?EA_1 JP45, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NSPGSADSIPPVPAG coπesponding to amino acids 102 - 116 of T46984JPEA_1 JP45, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T46984JPEA_1 JP45, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984_PEA_1 JP45.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T469S4_PEA_1_P45 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 42, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of knoλvn SNPs in variant protein T46984JPEA_1 JP45 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 42 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP45, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor, are described in Table 43 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 43 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP45 is encoded by the following franscript(s): T46984JPEA_1_T32, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T46984JPEA_1_T32 is shown in bold; this coding portion starts at position 316 and ends at position 663. The transcript also has the following SNPs as listed in Table 44 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP45 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 44 - Nucleic acid SNPs
Variant protein T46984JPEA_1 JP46 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T46984JPEA_1_T35. An alignment is given to the known protein (Dolichyl- diphosphooligosaccharide— protein glycosylfransferase 63 kDa subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T46984J?EA_1 JP46 and RIB2JHUMAN: l .An isolated chimeric polypeptide encoding for T46984JPEA_1 JP46, comprising a first amino acid sequence being at least 90 % homologous to MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLESAFYSIVGLSSL GAQVPDAK coπesponding to amino acids 1 - 69 of RTB2 JHUMAN, which also coπesponds to amino acids 1 - 69 of T46984JPEA_1 JP46, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence NSPGSADSIPPVPAG coπesponding to amino acids 70 - 84 of T46984JPEA_1JP46, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T469S4JPEA_1 JP46, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NSPGSADSIPPVPAG in T46984J?EA_1 JP46. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T469S4JPEA_1 JP46 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 45, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984 JPEA_1 JP46 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 45 - Amino acid mutations
The glycosylation sites of variant protein T46984JPEA_1 JP46, as compared to the known protein Dolichyl-diphosphooligosaccharide— protein glycosyltransferase 63 kDa subunit precursor, are described in Table 46 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 46 - Glycosylation site(s)
Variant protein T46984JPEA_1 JP46 is encoded by the following transcript(s): T46984JPEA_1_T35, for which the sequence(s) is/are given at die end of the application. The coding portion of franscript T46984JPEA_1 _T35 is shown in bold; this coding portion starts at position 316 and ends at position 567. The transcript also has the following SNPs as listed in Table 47 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T46984JPEA_1 JP46 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 47 - Nucleic acid SNPs
As noted above, cluster T46984 features 49 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T46984JPEA_l_node_2 according to the present invention is supported by 240 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T2, T469S4JPEA_1JT3, T46984JPEA_1JT12, T46984JPEA_1 JT13, T46984JPEA_1_T14, T46984J?EA_1_T15, T469S4_PEA_1 JTl 9, T46984_PEA_1 JT23, T46984_PEA_1_T32, T46984J?EA_1 JT34, T469S4JPEA_1 JT35, T46984_PEA_1 JT40, T46984JPEA_1 JT42, T469S4JPEA_1 JT43, T46984J?EA_1 JT47 and T46984JPEA_1 JT48. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_4 according to the present invention is supported by 321 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1 _T2, T46984JPEA_1_T3, T46984JPEA_1 JT12, T46984J?EA_1 JT13, T46984J?EA_1_T14, T46984J?EA_1 JT15, T46984JPEA_1 JTl 9, T46984JPEA_1_T23, T46984J?EA_1 JT32, T46984JPEA_1_T34, T46984JPEA_1 JT35, T46984J?EA_1_T40, T46984_PEA_1 JT42, T46984J?EA_1 JT43, T46984JPEA_1_T47 and T46984J?EA_1 JT48. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_6 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1 JT27. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
Segment cluster T469S4JPEA_l_node_12 according to the present invention is supported by 262 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T40984JPEAJJT2, T46984J?EA_1 _T3, T46984J?EA_1_T12, T46984J?EA_1 JT13, T469S4_PEA_1_T14, T46984J?EA_1_T15, T46984JPEA_1JT19, T46984_PEA_1 JT23, T46984J?EA_1 JT27, T46984_PEA_1 JT34, T46984JPEA_1 JT40, T46984J?EA_1 JT42, T46984JPEA_1 JT43, T46984J?EA_1 JT47 and T46984J?EA_1_T48. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_14 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T48. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_25 according to the present invention is supported by 257 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1 _T3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984JPEA_1JT14, T46984_PEA_1_T15, T46984JPEA_1_T19, T46984JPEA_1_T23, T469S4_PEA_1JT27, T46984JPEA_1_T32, T46984JPEA_1_T34, T46984J?EA_1_T35, T46984JPEA_1 JT40, T46984_PEA_1 JT42 and T46984JPEA_1_T43. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_29 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T42. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
104:
Segment cluster T46984JPEA_l_node_34 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T469S4JPEA_1 JT40. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_46 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1 JT46. Table 56 below describes the starting and ending position of this segment on each franscript. Table 56 - Segment location on franscripts
Segment cluster T469S4JPEA_l_node_47 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1JT3, T46984JPEA_1_T19 and T46984JPEA_1 JT46. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster T46984JPEA_l_node__52 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984J?EA_1_T2, T46984_PEA_1_T19 and T46984JPEA_1 _T23. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_65 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T51. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster T46984J?EA_l_node_69 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA JT52 and T469S4JPEA_1 JT54. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ75 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T14. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_86 according to the present invention is supported by 314 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1_T12, T46984_PEA_1_T13, T46984J?EA_1 JT15, T46984_PEA_1_T19, T46984JPEA_1_T23, T46984JPEA_1 JT27, T46984JPEA_1_T32, T46984JPEA_1_T34, T46984JPEA_1 JT35, T46984_PEA_1 JT43, T46984J?EA_1 JT46, T46984_PEA_1 JT47, T46984_PEA_1_T51, T46984JPEA_1_T52 and T46984JPEA_1_T54. Table 62 below describes the starting and ending position of this segment on each transcript. 7αb/e 62 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T46984JPEA_l_node_9 according to the present invention is supported by 304 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984J?EA_1_T12, T46984JPEA_1_T13, T46984J?EA_1_T14, T46984JPEA_1_T15, T46984JPEA_1_T19, T46984JPEA_1_T23, T46984JPEA_1 JT27, T46984J?EA_1_T32, T46984J?EA_1_T34, T46984_PEA_1_T40, T469S4_PEA_J JT42, T46984J?EA_1_T43, T46984JPEA_1_T47 and T46984JPEA_1 JT48. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_13 according to the present invention is supported by 232 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1 _T3, T46984JPEA_1 JT12, T46984JPEA_1_T13, T46984_PEA_1_T14, T46984JPEA_1 JT15, T46984_PEA_1 JT19, T46984JPEA_1_T23, T46984JPEA_1_T27, T46984J?EA_1_T34, T46984JPEA_1 _T40, T46984J?EA_1_T42, T46984J?EA_1 JT43 and T46984J?EA_1 JT48. Table 64 below describes the starting and ending position of this segment on each franscript. Table 64 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_19 according to the present invention is supported by 237 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1 JT2, T46984JPEA_1 _T3, T46984JPEA_1 JT12, T46984J?EA_1_T13, T46984J?EA_1_T14, T46984JPEA_1 JT15, T46984J?EA_1_T19, T46984JPEA_1_T23, T46984_PEA_1_T27, T46984JPEA_1_T32, T46984J?EA_1 JT34, T46984JPEA_1 JT35, T46984JPEA_1 JT40, T469S4JPEA_1 _T42 and T469S4JPEA_1 JT43. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_21 according to the present invention is supported by 242 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984JPEA_1 JT3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984J?EA_1JT14, T46984JPEA_1_T15, T46984J?EA_1_T19, T46984JPEA_1_T23, T46984J?EA_1 JT27, T46984JPEA_1_T32, T46984JPEA_1_T34, T46984JPEA_1_T35, T46984JPEA_1 JT40, T46984JPEA_1_T42 and T46984JPEA_1_T43. Table 66 below describes the starting and ending position of this segment on each franscript. Table 66 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_22 according to die present invention is supported by 205 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984J?EA_1_T3, T46984JPEA_1_T12, T46984_PEA_1_T13, T46984JPEA_1 JT14, T46984JPEA_1 JT15, T46984JPEA_1_T19, T46984J?EA_1_T23, T46984_PEA_1_T27, T46984J?EA_1JT32, T46984_PEA_1_T34, T46984J?EA_1_T35, T46984_PEA_1_T40, T46984J?EA_1 JT42 and T46984JPEA_1_T43. Table 67 below describes the starting and ending position of this segment on each franscript. Table 67 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_26 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1 JT3, T46984J?EA_1JT12, T46984JPEA_1_T13, T46984JPEA_1_T14, T46984J?EA_1_T15, T46984JPEA_1_T19, T46984JPEA JT23, T46984_PEA_1 JT27, T46984J?EA_1JT32, T46984J?EA_1 JT34, T46984JPEA_1 JT35, T46984J?EA_1 JT40 and T46984JPEA_1 JT42. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_28 according to the present invention is supported by 242 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984JPEA_1 _T3, T46984J?EA_1_T12, T46984JPEA_1_T13, T46984JPEA_1_T14, T46984J?EA_1 JT15, T46984JPEAJ JT19, T46984JPEA_1_T23, T46984JPEA_1JT27, T46984JPEA_1 JT32, T46984J?EA_1 JT34, T46984JPEA_1_T35, T46984_PEA_1 JT40 and T46984JPEA_1 JT42. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_31 according to the present invention is supported by 207 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984_PEA_1_T14, T46984J?EA_1_T15, T46984JPEA_1_T19, T46984J?EA_1_T23, T46984JPEA_1 JT27, T46984JPEA_1_T32, T46984JPEA_1 JT34, T46984JPEA_1_T35 and T46984J?EA_1 JT40. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_32 according to the present invention is supported by 226 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984JPEA_1 JT14, T46984J?EA_1 JT19, T46984J?EA_1 JT23, T46984_PEA_1 JT27, T46984JPEA_1 JT32, T46984JPEA_1_T34, T46984J?EA_1_T35 and T46984JPEA_1_T40. Table 71 below describes the starting and ending position of this segment on each franscript. Table 71 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_38 according to the present invention can be found in the following transcript(s): T46984J?EA_1_T2, T46984JPEA_1 _T3, T469S4JPEA_1_T12, T46984J?EA_1_T13, T46984J?EA_1 JT14, T46984_PEA_1 JT19, T46984JPEA_1 JT23, T46984JPEA_1 JT27, T46984J?EA_1 JT32, T46984JPEA_1 JT34 and T46984JPEA_ 1JT35. Table 72 below describes the starting and ending position of this segment on each franscript. Table 72 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_39 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1 JT12, T46984JPEA_1_T13, T46984_PEA_1_T14, T46984JPEA_1 JT15, T46984J?EA_1JT19, T46984JPEA_1_T23, T46984JPEA_1 JT27, T46984JPEA_1 JT32, T46984JPEA_1_T34 and T46984JPEA_1 JT35. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_40 according to the present invention is supported by 227 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1 JT2, T46984JPEA_1 JT3, T46984J?EA_1JT12, T46984JPEA_1 JT13, T46984J?EA_1JT14, T46984J?EA_1_T15, T46984J?EA_1 JT19, T46984_PEA_1_T23, T46984J?EA_1_T27, T46984JPEA_1_T32, T46984J?EA_1_T34 and T46984JPEA_1 _T35. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_42 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): T46984JPEA_1_T2, T46984JPEA_1 JT3, T46984J?EA_1_T12, T46984JPEA_1_T13, T46984JPEA_1_T14, T46984J?EA_1JT15, T46984JPEA_1_T19, T46984JPEA_1 JT23, T46984_PEA_1 _T27, T46984JPEA_1_T32, T46984J?EA_1_T34 and T469S4JPEA_1 _T35. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_43 according to the present invention is supported by 235 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1 _T2, T46984JPEA_1_T3, T46984JPEA_1 JT12, T46984J?EA_1_T13, T46984J?EA_1JT14, T46984JPEA_1_T15, T46984J?EA_1_T19, T46984J?EA_1_T23, T46984J?EA_1_T27, T46984JPEA_1_T32 and T46984JPEA_1_T35. Table 76 below describes the starting and ending position of this segment on each franscript. Table 76 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_48 according to the present invention is supported by 282 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984J?EA_1 JT2, T46984JPEA_1 _T3, T46984JPEA _T12, T46984JPEA_1_T13, T46984JPEA_1_T14, T46984J?EA_1JT15, T46984JPEA_1_T19, T46984J?EA_1_T23, T46984JPEA_1_T27, T46984JPEA_1JT32, T46984JPEA_1_T35 and T46984J?EA_1 JT46. Table 77 below describes the starting and ending position of this segment on each trans cript. Table 77 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_49 according to the present invention is supported by 262 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T469S4JPEA_1_T2, T46984JPEA_1_T3, T46?84JPEA_1 JT12, T46984_PEA_1 JT13, T46984JPEA_1_T14, T46984JPEA_1 JT15, T46 84J?EA_1 JT19, T46984JPEA_1JT23, T46984JPEA_1 JT27, T46984JPEA_1JT32, T46984J?EA_1 JT35 and T46984J?EA_1 JT46. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_50 according to the present invention is supported by 277 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984J?EA_1JT2, T469S4JPEA_1JT3, T46984JPEA_1 JT12, T469S4_PEA_1 _T13, T46984J?EA_1_T14, T46984JPEA_1_T15, T46984JPEA_1 JT19, T46984_PEA_1_T23, T46984JPEA_1 JT27, T46984JPEA_1_T32, T46984JPEA_1 JT35 and T46984J?EA_1_T46. Table 79 below describes the starting and ending position of this segment on each franscript. Table 79 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_51 according to the present invention is supported by 6 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T469S4JPEA_1 JT2, T46984JPEA_1_T12, T46984JPEA_1_T19 and T46984JPEA_1_T23. Table 80 below describes the starting and ending position of this segment on each franscript. Table 80 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_53 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984JPEA_1JT13, T46984J?EA_1 JT19 and T46984_PEA_1 JT23. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_54 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1 JT2, T46984JPEA_1_T19 and T46984JPEA_1 _T23. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_55 according to the present invention is supported by 335 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1 JT2, T46984JPEA_1_T3, T46984JPEA_1_T12, T46984JPEA_1JT13, T46984JPEA_1_T14, T46984_PEA_1JT15, T46984J?EA_1_T19, T46984JPEA_1JT23, T46984JPEA_1_T27, T46984JPEA_1_T32, T46984JPEA_1 JT35 and T46984JPEA_1 JT46. Table 83 below describes the starting and ending position of this segment on each franscript. Table 83 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_57 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984_PEA_1 JT3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984J?EA_1_T14, T46984JPEA_1_T15, T46984J?EA_1_T19, T46984JPEA_1_T23, T46984J?EA_1_T27, T46984J?EA_1JT32, T46984JPEA_1_T35 and T46984JPEA_1_T46. Table 84 below describes the starting and ending position of this segment on each franscript. Table 84 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_60 according to the present invention is supported by 326 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984JPEA_1 JT2, T46984JPEA_1 _T3, T46984_PEA_1_T12, T46984JPEA_1 JT13, T46984JPEA_1_T14, T46984_PEA_1_T15, T46984JPEA_1 JT19, T46984JPEA_1 JT27, T46984JPEA_1 JT32, T46984JPEA_1_T35 and T46984J?EA_1_T46. Table 85 below describes the starting and ending position of this segment on each franscript. Table 85 - Segment location on transcripts
Segment cluster T46984_PEA_l_node_62 according to the present invention is supported by 335 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1 JTl 2, T46984_PEA_1_T13, T46984_PEA_1_T14, T46984JPEΛ_1_T15, T46984_PEA_1_T19, T46984JPEA_1 JT27, T46 84JPEA_1_T32, T46984JPEA_1_T35 and T46984JPEA_1 JT46. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster T46984JPEA_l_node_66 according to the present invention is supported by 336 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984J?EA_1 JT2, T46984J?EA_1 JT3, T46984JPEA_1_T12, T46984JPEA_1JT13, T46984JPEA_1_T14, T46984JPEA_1 JT15, T46984_PEA_1_T19, T46984_PEA_1_T23, T46984JPEA_1_T27, T46984JPEA_1_T32, T469S4J?EA_1 JT34, T46984JPEA_1 JT35, T46984JPEA_1_T46. T46984JPEA_1_T47 and T46984JPEA_1JT51. Table 87 below describes the starting and ending position of this segment on each transcnpt. Table 87 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_67 according to the present invention is supported by 323 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984JPEA_1 _T3, T46984JPEA_1_T12, T46984JPEA_1 JT13, T46984JPEA_1 JT14, T46984J?EA_1_T15, T469S4JPEA JT19, T46984JPEA_1_T23, T46984JPEA_1_T27, T46984JPEA_1_T32, T46984JPEA_1 JI34, T46984J?EA_1_T35, T46984J?EA_1 JT46, T46984_PEA_1_T47 and T46984JPEA_1 JT51. Table 88 below describes the starting and ending position of this segment on each franscript. Table 88 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ70 according to the present invention is supported by 337 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T46984JPEA_1_T2, T46984J?EA_1_T3, T46984J?EA_1_T12, T469S4JPEA_1_T13, T46984J?EA_1_T14, T46984J?EA_1JT15, T46984J?EA_1_T19, T46984JPEA_1JT23, T46984JPEA_1_T27, T46984JPEA_1JT32, T46984JPEA_1 JT34, T46984JPEA_1 _T35, T46984J?EA_1 JT46, T46984JPEA_1 JT47, T46984J?EA_1_T51, T46984JPEA_1_T52 and T46984JPEA_1_T54. Table 89 below describes the starting and ending position of this segment on each franscript. Table 89 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ71 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984_PEA_1_T12, T46984J?EA_1_T13, T46984JPEA_1_T14, T46984J?EA_1_T15, T46984J?EA_1JT19, T46984J?EA_1JT23, T46984JPEA_1_T27, T46984JPEA_1_T32, T46984JPEA JT34, T46984J?EA_1 JT35, T46984J?EA_1 JT46, T46984_PEA_1 _T47, T46984JPEA_1_T51, T46984JPEA_1_T52 and T46984JPEA_1_T54. Table 90 below describes the starting and ending position of this segment on each franscript. Table 90 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ72 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984J?EA_1_T3, T46984JPEA_1 JT12, T46984JPEA_1_T13, T469S4J?EA_1JT14, T46984JPEA_1 JT15, T46984J?EA_1_T19, T46984_PEA_1_T23, T46984_PEA_1_T27, T46984J?EA_1_T32, T46984JPEA_1_T34, T46984JPEA_1 JT35, T46984JPEA_1_T43, T46984J?EA_1_T46, T46984JPEA_1JT47, T46984J?EA_1JT51, T46984J?EA_1_T52 and T46984JPEA_1_T54. Table 91 below describes the starting and ending position of this segment on each franscript. Table 91 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ73 according to the present invention can be found in the following transcript(s): T46984_PEA_1_T2, T46984JPEA_1_T3, T46984J?EA_1_T12, T46984JPEA_1_T13, T46984JPEA_1_T14, T46984J?EA_1_T15, T46984J?EA_1_T19, T46984JPEA_1_T23, T46984JPEA_1_T27, T46984JPEA_1_T32, T46984J?EA_1 JT34, T46984JPEA_1 JT35, T46984JPEA_1 JT43, T46984_PEA_1 JT46, T46984JPEA_1_T47, T46984JPEA_1 JT51, T46984JPEA_1JT52 and T46984JPEA_1_T54. Table 92 betow describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodeJ74 according to the present invention can be found in the following transcript(s): T46984JPEA_1_T2, T46984JPEA_1_T3, T46984JPEA_1_T12, T46984JPEA_1JT13, T46984JPEA_1 JT14, T46984_PEA_1_T15, T46984JPEA_1_T19, T46984J?EA_1_T23, T46984JPEA_1_T27, T46984JPEA_1 JT32, T46984JPEA_1_T34, T46984JPEA_1 JT35, T46984J?EA_1 JT43, T46984JPEA_1 JT46, T46984JPEA_1_T47, T46984JPEA_1 JT51, T46984JPEA_1_T52 and T46984JPEA_1_T54. Table 93 below describes the starting and ending position of this segment on each franscript. Table 93 - Segment location on transcripts
107;
Segment cluster T46984JPEA_l_node_83 according to the present invention can be found in the following transcript(s): T46984JPEA_1 JT2, T46984J?EA_1 _T3, T46984J?EA_1_T12, T46984JPEA_1_T13, T46984JPEA_1_T15, T46984JPEA_1_T19, T46984J?EA_1 JT23, T46984J?EA_1_T27, T46984JPEA_1 JT32, T46984_PEA_1_T34, T46984J?EA_1 JT35, T46984J?EA_1_T43, T46984 EA_1_T46, T46984J?EA_1 JT47, T46984J?EA_1_T51, T46984JPEA_1_T52 and T46984J?EA_1_T54. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on franscripts
Segment cluster T46984JPEA_l_node_84 according to the present invention can be found in the following transcript(s): T46984JPEA_1 JT2, T46984_PEA_1_T3, T46984JPEA_1_T12, T46984JPEA_1_T13, T46984JPEA_1_T15, T46984JPEA_1_T19, T46984J?EA_1 JT23, T46984JPEA_1_T27, T46984JPEA_1_T32, T46984JPEA_1 JT34, T46984JPEA_1 _T35, T46984JPEA_1 JT43, T46984JPEA_1_T46, T46984JPEA_1_T47, T46984JPEA_1_T51, T46984JPEA_1_T52 and T46984J?EA_1_T54. Table 95 below describes the starting and ending position of this segment on each franscript. Table 95 - Segment location on transcripts
Segment cluster T46984JPEA_l_nodej85 according to the present invention is supported by 295 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T46984_PEA_1_T2, T46984JPEA_1_T3, T46984J?EA_1 JT12, T46984JPEA_1 JT13, T46984JPEA_1_T15, T46984JPEA_1 JT19, T46984JPEA_1 JT23, T46984JPEA_1_T27, T46984JPEA_1 JT32, T46984J?EA_1 JT34, T46984J?EA_1 JT35, T46984JPEA_1_T43, T46984JPEA_1_T46, T46984J?EA_1 JT47, T46984JPEA_1 JT51, T46984J?EA_1_T52 and T46984JPEA_1_T54. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P2 x RIB2_HUMAN
Alignment segment 1/1
Quality: 4716.00 Escore : 0 Matching length: 498 Total length: 498 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 i ! 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 ! I f 1 1 1 1 1 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNλ/DSLFYAAQASQALSGC 100 01 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 01 EISISNETKDLL AAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 51 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGλ/Y Q 200 51 RLSKEETV ATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 . . . . . 01 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 01 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 51 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
>51 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300 01 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 M I I M 1 1 M I M 1 1 I I I I I I 1 1 I M I M I I I I I I 1 1 M M I I I 1 1 I M I 01 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 51 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 51 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEWFN 450 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 i 1 1 1 1 1 1 1 M 1 1 i I E 401 KAKGTFIADSHQΝFALFFQLVDVΝTGAELTPHQTFVRLHΝQKTGQEWFV 450 451 AEPDΝKΝVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKΝPILWΝV 498
451 AEPDΝKΝVYKFELDTSERKIEFDSASGTYTLY IIGDATLKΝPILWΝV 498
Sequence name : RIB2_HUMA
Sequence documentation:
Alignment of: T46984_PEA_1_P3 x RIB2JHUMAN
Alignment segment l/l:
Quality: 4085.00 Escore: 0 Matching length: 433 Total length: 433 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTI IASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 1 1 1 1 I I i 1 1 I I I I 1 1 | 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 MAPPGSSTVFLLALTI IAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSF /DSLFYAAQASQALSGC 100
AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPS AtDSLFYAAQASQALSGC 100 . . . . . EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFG PLASQEALSALTA 150
EISISNETKDLL AAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
I M 1 1 1 1 I I I i 1 1 ! 1 1 1 1 1 1 1 1 1 M 1 M 1 1 ! 1 1 M 1 1 1 E 1 1 1 ! 1 1 1 i 1 I I R SKEETVLATVQALQTASHLSQQADLRSIVEEIED VARLDELGGVYLQ 200
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFES S 250
EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
EAFSVASAAAVLSHNRYHVPVWVPEGSASDTHEQAILRLQVTNVLSQPL 300
TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFΞSGYYDFLV 350 . . . . . EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQ 433 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQ 433
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P10 x RIB2_HUMAN
Alignment segment l/l:
Quality: 4716.00 Escore : 0 Matching length: 498 Total length: 498 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVF LALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 MAPPGSSTVF LALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNNDSLFYAAQASQALSGC 100
11111 i 11111111111 i 1111 E 111111111 i 11111 i 111111 i 1111 AFYSIVGLSSLGAQVPDAKKACTYIRSΝLDPSΝVDSLFYAAQASQALSGC 100
EISISΝETIΦLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
EISISΝETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
RLSKEETV ATVQALQTASHLSQQADLRSIVEEIED VARLDELGGVYLQ 200
1111 i 11111 i 11111111 ! 1111111111111 E 1111111111111111 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMΝAIFSKKΝFESLS 250
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMΝAIFSKKΝFESLS 250
EAFSVASAAAVLSHΝRYHVPWWPEGSASDTHEQAI RLQVTΝVLSQPL 300
M II 111 M 11 M I11 M II 1 M M 11 II 111 M II II 111 M II 11 M I EAFSVASAAAVLSHΝRYHVPWWPEGSASDTHEQAILRLQVTΝVLSQPL 300 . . . . . TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLV 350
111111 II 1111 II 11111111 II 1111111 II 111111111111111 II- TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLV 350
EVEGDΝRYIAΝTVELRVKISTEVGITΝVDLSTVDKDQSIAPKTTRVTYPA 400
I ! 11111111 i ! 1111111111 } 1111111 E 11111 i 11111111 ! 1 E 11 EVEGDΝRYIAΝTVELRVKISTEVGITΝVDLSTVDKDQSIAPKTTRVTYPA 400
KAKGTFIADSHQΝFALFFQLVDVΝTGAELTPHQTFVRLHΝQKTGQEWFV 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II KAKGTFIADSHQΝFALFFQLVDVΝTGAELTPHQTFVRLHΝQKTGQEWFV 450 451 AEPDNKNVYKFELDTSERKIEFDSASGTYTLYLI IG1 ATLKNPILWNV 498
451 AEPDNKNVYKFELDTSERKI EFDSASGTYTLYLI IGDATLKNPIL NV 498
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P11 x RIB2__HUMAN
Alignment segment l/l:
Quality: 5974.00 Escore: 0 Matching length: 628 Total length: 628 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 108:
MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
EISISNETKDL LAAVSEDSSVTQIYHAVAALSGFG PLASQEALSALTA 150
RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
1 1 1 ! I i 1 1 1 1 1 i 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E ! 1 1 1 1 E 1 1 1 1 1 1 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIED VARLDE GGVYLQ 200 . . . . . FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAI FSKKNFESLS 250
EAFSVASAAAVLSHNRYHVPVWVPEGSASDTHEQAILRLQVTNV SQPL 300
EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
TQATVKLEHAKSVASRATVLQKTSFTPVGDVFE NFMNVKFSSGYYDFLV 350 I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350
EVEGDNRYIANTVE RVKISTEVGITNOLSTVDKDQSIAPKTTRλ/TYPA 400
EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400 401 KAKGTFIADSHQNFALFFQLVDVNTGAELTPHQTFVRLHNQKTGQEWFV 450 11 I i I 11 E 1111 II 111 II 11111111111111 I 1111 E t E I E 111111 I 401 KAKGTFIADSHQNFALFFQLVDVNTGAE TPHQTFVRLHNQKTGQEWFV 450
451 AEPDNKNVΎKFELDTSERKIEFDSASGTYTLYLUGDAT KNPILWNVAD 500 I E 11 E 1111111111 i 11 E 11 E 11 E 111111111111 E 111111111 E 11 451 EPDNKNVYKFELDTSERKIEFDSASGTYTLYLIIGDATLKNPILWNVAD 500 501 WIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTWSNTFTALIL 550
501 VVIKFPEEEAPSTVLSQNLFTPKQEIQHLFREPEKRPPTWSNTFTALIL 550
551 SPLLLLFA WIRIGANVSNFTFAPSTIIFHLGHAAMLGLMYNYWTQLNMF 600 551 SPLLLLFAL IRIGANVSNFTFAPSTIIFH GHAAMLGLMYVYWTQLNMF 600 601 QTLKYLAILGSVTFLAGNRMLAQQAVKR 628
601 QTLKYLAILGSVTFLAGNRMLAQQAVKR 628
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P12 x RIB2_HUMAN Alignment segment 1/1:
Quality: 3179.00
Escore : 0 Matching length: 338 Total length: 338 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 1 1 1 1 1 1 E I M 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 ! 1 1 I I 1 1 1 1 1 1 1 1 1 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
101 FEEGLETTALFVAATYK1MDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 201 FEEGLETTALR TAATYKLMDHVGTEPSI KΕDQVIQLMNAI FSKKNFESLS 250
251 EΆFSVASAAAVLSHNRYΉVPVVVVPEGSASDTHEQAILRLQVTNVLSQPL 300
25i EAFSVASAAAVLSHNRYΉVPVVWPEGSASDTHEQAILRLQVTNVLSQPL 300
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN 338 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 E t 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMN 338
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P21 x RIB2_HUMAN
Alignment segment l/l: Quality: 5348.00
Escore: 0 Matching length: 562 Total length: 562 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
2 KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSED 51
70 KACTYIRSNLDPSNVDSLFYAAQASQALSGCEISISNETKDLLLAAVSED 119
52 SSVTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTAS 101
120 SSVTQIYHAVAALSGFGLPLASQEALSALTARLSKEETVLATVQALQTAS 169 102 HLSQQADLRSIVEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLM 151
170 HLSQQADLRSIVEEIEDLVARLDELGGVYLQFEEGLETTALFVAATYKLM 219
152 DHVGTEPSIKEDQVIQLMNAIFSKKNFESLSEAFSVASAAAVLSHNRYHV 201
220 DHVGTEPSIKEDQVIQLMNAIFSKKNFESLSEAFSVASAAAVLSHNRYHV 269
202 PVWVPEGSASDTHEQAILRLQVTNVLSQPLTQATVKLEHAKSVASRATV 251 270 PV VPEGSASDTHEQAILRLQVTNVLSQPLTQATVKLEHAKSVASRATV 319
152 LQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVELRVKI 301
320 LQKTSFTPVGDVFELNFMNVKFSSGYYDFLVEVEGDNRYIANTVELRVKI 369
302 STEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQ 351 370 STEVGITNVDLSTVDKDQSIAPKTTRVTYPAKAKGTFIADSHQNFALFFQ 419 352 LVDVNTGAELTPHQTFVRLHNQKTGQEWFVAEPDNKNVYKFELDTSERK 401 1 1 M 1 1 I I 1 1 1 1 1 I I I I I E 1 1 1 1 1 1 1 1 1 1 1 1 1 I I E 1 1 1 1 I E I M 1 1 1 1 1 1 420 LVDVNTGAELTPHQTFVRLHNQKTGQEVVFVAEPDNKNNYKFELDTSERK 469 402 IEFDSASGTYTLYLI IGDATLKNPIL NVADWIKFPEEEAPSTVLSQNL 451 1 1 1 1 1 1 1 E 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 E E 1 1 E 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 470 IEFDSASGTYTLYLIIGDATLKNPIL NVADWIKFPEEEAPSTVLSQNL 519 452 FTPKQEIQHLFREPEKRPPT SNTFTALILSPLLLLFALWIRIGANVSN 501
520 FTPKQEIQHLFREPEKRPPTλΛ/SNTFTALILSPLLLLFALWIRIGANVSN 569 . . . . . 502 FTFAPSTIIFHLGHAAMLGLMYVY TQLNMFQTLKYLAILGSVTFLAGNR 551
570 FTFAPSTIIFHLGHAAMLGLMYVY TQLNMFQTLKYLAILGSVTFLAGNR 619 552 MLAQQAVKRTAH 563
620 MLAQQAVKRTAH . 631
Sequence name: RIB2_HUMAN
Sequence documentation: Alignment of: T46984_PEA_1_P27 x RIB2_HUMAN
Alignment segment l/l:
Quality: 3910.00 Escore : 0 Matching length: 415 Total length: 415 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 1 1 E 1 1 1 1 1 1 f 1 1 1 i 1 1 f I E I E 1 1 1 1 E E 1 1 1 ! 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 . . . . . 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 1 1 1 1 M I M 1 1 1 1 1 E 1 1 1 1 1 1 1 ! 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 E 1 1 1 1 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 2CO
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAI FSKKNFESLS 250 251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300
251 EAFSVASAAAVLSHNRYHVPVWNPEGSASDTHEQAILRLQVTNVLSQPL 300 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 1 1 1 ! I E 1 1 E I E E 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 ! 1 E I ! 1 1 1 1 1 1 E 1 1 1 1 E I E 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELNFMNVKFSSGYYDFLV 350 351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
351 EVEGDNRYIANTVELRVKISTEVGITNVDLSTVDKDQSIAPKTTRVTYPA 400
401 KAKGTFIADSHQNFA 415
401 KAKGTFIADSHQNFA 415
Sequence name: RIB2_HUMAN
Sequence documentation: Alignment of : T46984_PEA_1__P32 x RIB2_HUMAN
Alignment segment l/l: Quality: 3434.00
Escore: 0 Matching length: 373 Total length: 373 Matching Percent Similarity: 98.93 Matching Percent Identity: 98.39 Total Percent Similarity: 98.93 Total Percent Identity: 98.39 Gaps : 0
Alignment:
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 . . . . . 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 M E I M 1 1 1 1 1 1 1 1 1 1 I I I E 1 1 M 1 1 M 1 1 I I 1 1 1 1 M I M M 1 1 1 1 1 1 1 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 . . . . . 251 EAFSVASAAAVLSHNRYHNPV VPEGSASDTHEQAILRLQVTNNLSQPL 300 1111111111111111111111 ! 1111111 M 111111 M I II I ! I II 11 251 EAFSVASAAAVLSHΝRYHVPWWPEGSASDTHEQAILRLQVTΝVLSQPL 300 301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLV 350
301 TQATVKLEHAKSVASRATVLQKTSFTPVGDVFELΝFMΝVKFSSGYYDFLV 350 351 EVEGDΝRYIAΝTVEGQVR LTPV 373 MM I III 11 llll I I 351 EVEGDΝRYIAΝTVELRVKISTEV 373
Sequence name : RIB2JHUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P34 x RIB2_HUMAN
Alignment segment l/l: Quality: 3087.00 Escore: 0 Matching length: 329 Total length: 329 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 111 M 111111 M 1111111 M 11 M I M 111 M 1111111111111111 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
5i AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 IMMMMMMMMMMMMMMMIMIMMIIIMMMM si AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 . . . . . 101 EISISNETKDLLLAAVSEDSSλ/TQIYHAVAALSGFGLPLASQEALSALTA 150 MMMMIMMMIIMMMMMMMMMIMMIIMMMM 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 M 1111 II II 11111111111111 M 111111111111111 M 11 II 111 151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 I M M II II II I II 11 II I II I II II II II 111 II II I II 11 II I II II I 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250 251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300 1 1 1 1 1 M M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 M I I I 1 1 1 1 1 1 1 M 1 1 1 1 251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAILRLQVTNVLSQPL 300 301 TQATVKLEHAKSVASRATVLQKTSFTPVG 329 I I M 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 I I 1 1 1 1 1 301 TQATVKLEHAKSVASRATVLQKTSFTPVG 329
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P35 x RIB2_HUMAN
Alignment segment l/l:
Quality: 2697.00
Escore : 0 Matching length: 287 Total length: 287 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 1111111111 M 111 M 111111111111111111111111111111111 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 11111111111111111 j 11111111111111 M 1111111111111111 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 MMMMMMMMMMIMMMMMMMIIMMMMMIM 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200
151 RLSKEETVLATVQALQTASHLSQQADLRSIVEEIEDLVARLDELGGVYLQ 200 201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
201 FEEGLETTALFVAATYKLMDHVGTEPSIKEDQVIQLMNAIFSKKNFESLS 250
251 EAFSVASAAAVLSHNRYHVPWWPEGSASDTHEQAI 287
251 EAFSVASAAAVLSHNRYHVPVWVPEGSASDTHEQAI 287 Sequence name: RIB2__HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P38 x RIB2JHUMAN
Alignment segment 1/1: Quality: 1368.00
Escore : 0 Matching length: 145 Total length: 145 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 . . . . . 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEAL 145 101 EIS":SNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEAL 145
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P39 x RIB2_HUMAN
Alignment segment l/l:
Quality: 1500.00 Escore : 0 Matching length: 160 Total length: 160 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 I I I M 1 1 1 M I 1 MAPPGSSTVFLLALTI IASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 I M 1111111111111111111 M 11111111111111111111111111 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 MMIMMIMMIMMMMMMMMMIMMMMMIMMI 101 EISISNETKDLLLAAVSEDSSVTQIYHAVAALSGFGLPLASQEALSALTA 150 151 RLSKEETVLA 160 1111111111 151 RLSKEETVLA 160
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T4β984_PEA_l_P45 x RIB2_HUMAN
Alignment segment 1/1:
Quality: 970.00 Escore: 0 Matching length: 103 Total length: 103 Matching Percent Similarity: 99.03 Matching Percent Identity: 99.03 Total Percent Similarity: 99.03 Total Percent Identity: 99.03 Gaps : 0
Alignment:
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M M I I I 1 1 1 1 1 1 1 1 M I 1 MAPPGSSTVFLLALTIIAST ALTPTHYLTKHDVERLKASLDRPFTNLES 50 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 AFYSIVGLSSLGAQVPDAKKACTYIRSNLDPSNVDSLFYAAQASQALSGC 100
101 ENS 103
101 EIS 103
Sequence name: RIB2_HUMAN
Sequence documentation:
Alignment of: T46984_PEA_1_P46 x RIB2_HUMAN
Alignment segment 1/1: Quality: 656.00 Escore: 0 Matching length: 69 Total length: 69 5 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
10 Alignment
1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
15 1 MAPPGSSTVFLLALTIIASTWALTPTHYLTKHDVERLKASLDRPFTNLES 50
51 AFYSIVGLSSLGAQVPDAK 69
51 AFYSIVGLSSLGAQVPDAK 69
Ϊ0
DESCRIPTION FOR CLUSTER TI 1628 Cluster TI 1628 features 6 transcript(s) and 25 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the 5 application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Myoglobin (SwissProt accession identifier MYGJHTJMAN), SEQ JD NO: 709, refeπed to herein as the previously known protein. Protein Myoglobin is known or believed to have the following function(s): Serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles. The sequence for protein Myoglobin is given at the end of the application, as "Myoglobin amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
As noted above, cluster T11628 features 6 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Myoglobin. A description of each variant protein according to the present invention is now provided.
Variant protein TI 1628JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) TI 1628JPEA_1 JT3. An alignment is given to the known protein (Myoglobin) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T11628J?EA_1 JP2 and Q8WVH6 (SEQ ID NO:71 1): l .An isolated chimeric polypeptide encoding for TI 1628JPEA_1 JP2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MGLSDGEWQLλT.NVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKJKFTLKSEDE coπesponding to amino acids 1 - 55 of TI 1628JPEA_1 JP2, and a second amino acid sequence being at least 90 % homologous to MKASEDLKIMGATVLTALGGILKKKGH^
LQSK11PGDFGADAQGAMNKALELFRKDMASNYKELGFQG coπesponding to amino acids 1 - 99 of Q8WVH6, which also coπesponds to amino acids 56 - 154 of TI 1628JPEA_1 JP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of TI 1628JPEA_1 JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIT LFKGHPETL^ T11628J?EA_1 P2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein TI 1628JPEA_1 JP2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein TI 1628JPEA_1 JP2 is encoded by the following transcript(s): TI 1628JPEA_1 JT3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript TI 1628JPEA_1 JT3 is shown in bold; this coding portion starts at position 220 and ends at position 681. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628 _PEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according ιo the present invention). Table 6 - Nucleic acid SNPs
Variant protein TI 162SJPEA_1 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) TI 1628 JPEA_1_T9. An alignment is given to the known protein (Myoglobin) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between TI 1628_PEA_1 _P5 and MYGJrlUMANJVl (SEQ ID NO:710): l .An isolated chimeric polypeptide encoding for TI 1628JPEA_1 JP5, comprising a first amino acid sequence being at least 90 % homologous to MKASEDLKIOIGATVLTALGGILKi HHEAElI^^ LQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG coπesponding to amino acids 56 - 154 of MYGJHUMAN JV 1 , which also coπesponds to amino acids 1 - 99 of T1 1628JPEA_1JP5.
It should be noted that the known protein sequence (MYGJHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for MYG JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 7 - Changes to MYGJWMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein TI 1628JPEA_1 JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628JPEA_1 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein TI 1628JPEA_1 JP5 is encoded by the following franscript(s): TI 1628JPEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript TI 1628JPEA_1_T9 is shown in bold; this coding portion starts at position 21 1 and ends at position 507. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628JPEA_1 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein TI 1628JPEA_1 JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) TI 1628JPEA_1 JTl 1. An alignment is given to the known protein (Myoglobin) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between TI 162SJPEA_1 JP7 and MYGJTUMANJV1: l.An isolated chimeric polypeptide encoding for TI 1628JPEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGITPETLEKFDK.FK1TLKSEDEMK SKHPGDFGADAQGAMNK coπesponding to amino acids 1 - 134 of MYGJHUMANJV1, which also coπesponds to amino acids 1 - 134 of TI 1628JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence G coπesponding to amino acids 135 - 135 of TI 1628JPEA_1 JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. It should be noted that the known protein sequence (MY G JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for MYG JHUMAN JV 1. These changes were previously known to occur and are listed in the table below. Table 10 - Changes to MrG_HUMAN_Vl
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein. Variant protein TI 1628JPEA_1 JP7 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628 JPE A_1JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
Variant protein TI 1628JPEA_1 JP7 is encoded by the following transcript(s): TI 1628JPEA_1 JTl 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript TI 1628 JPE A_1_T11 is shown in bold; this coding portion starts at position 319 and ends at position 723. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628J?EA_1 JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein TI 1628JPEA_1 JP10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) TI 162SJPEA_1 _T4. An alignment is given to the known protein (Myoglobin) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between TI 1628_PEA_1 JP10 and QSWVH6 (SEQ ID NO:711): l.An isolated chimeric polypeptide encoding for TI 1628JPEA_1 JP10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGLSDGEWQLVL ^GKVEADIPGHGQEVLIRLFKGIIPETLEKFDKFKliLKSEDE coπesponding to amino acids 1 - 55 of TI 1628JPEA_1 JP10, and a second amino acid sequence being at least 90 % homologous to MKASEDLKJ ΗGATVLTALGGILK1<KGHITEAEI^^ LQSK^PGDFGADAQGA^ll^ LELFPI )MAS^^ι^ ELGFQG coπesponding to amino acids 1 - 99 of Q8WNH6, which also coπesponds to amino acids 56 - 154 of TI 1628JPEA_1 JP10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of T11628JPEA_1JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, llll more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGLSDGEWQLVLNN GKVEADff GHGQEVLTRLFKGHPETLEK TOKFKHLKSEDE of T11628JPEA_1 P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein T11628JPEA_1JP10 also has the following non-silent SΝPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 162δJ?EA_l JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein TI 1628JPEA_1 JP10 is encoded by the following transcript(s): TI 1628JPEA_1JT4, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript TI 1628JPEA_1 _T4 is shown in bold; this coding portion starts at position 205 and ends at position 666. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein TI 1628JPEA_1 JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster TI 1628 features 25 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster TI 1628JPEA_l_node_7 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): TI 1628JPEA_1_T3. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_l 1 according to the present invention is supported by 1 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): T11628JPEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_16 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): TI 1628JPEA_1 JTl 1. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T11628JPEA_l_node_22 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): TI 1628JPEA_1_T9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_25 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): TI 1628J?EA_1_T3, TI 1628JPEA_1_T4, TI 1628J?EA_1_T5, TI 1628JPEA_1 JT7, TI 1628JPEA_1 JT9 and TI 1628 JPEA_1 JTl 1. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 20. Table 20 - Oligonucleotides related to this segment
Segment cluster TI 1628 JPE A_l_node_31 according to the present invention is supported by 137 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(sh TI 1628JPEA_1 JT3, TI 1628JPEA_1_T4, TI 1628_PEA_1 _T5, TI 1628JPEA_1_T7, TI 1628JPEA_1_T9 and TI 1628JPEA_1_T11. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_37 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): TI 1628JPEA_1_T3, T11628JPEA_1 JT4, TI 1628JPEA_1_T5, TI 1628_PEA_1_T7, TI 1628JPEA_1_T9 and TI 1628JPEA_1 JTl 1. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster TI 1628JPEA_l_node_0 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T11628JPEA_1_T4. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T11628JPEA_1_T4. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_9 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): TI 1628JPEA_1_T5 and TI 162SJPEA_1 JT7. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_13 according to the present invention can be found in the following transcript(s): TI 1628JPEA_1_T7. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node_14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): TI 1628JPEA_1_T7. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_node__17 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): TI 1628JPEA_1 JTl 1. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster TI 1628 JPEA_l_node_18 according to the present invention is supported by 98 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): T11628JPEA_1_T3, T11628JPEA_1_T4, TI 1628JPEA_1_T5, TI 1628JPEA_1_T7 and TI 1628JPEA_1_T11. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster TI 1628_PEA_l_node_19 according to the present invention can be found in the following transcript(s): TI 162SJPEA_1_T3, TI 1628JPEA_1_T4, TI 1628JPEA_1 _T5, TI 1628J?EA_1 _T7 and TI 1628JPEA_1_T1 1. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Segment cluster TI 1628 JPE A_l_node_24 according to the present invention is supported by 112 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T11628J?EA_1_T3, T11628J?EA_1_T4, TI 1628JPEA_1 _T5, TI 1628_PEA_1_T7, TI 1628JPEA_1 JT9 and TI 1628JPEA_1 JTl 1. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster TI 1628JPEA_l_nodeJ27 according to the present invention is supported by 119 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): TI 1628J?EA_1_T3, T11628J?EA_1_T4, TI 1628J?EA_1_T5, TI 1628J?EA_1_T7, TI 1628_PEA_1 _T9 and TI 1628JPEA_1 JTl 1. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 33. Table 33 - Oligonucleotides related to this segment Til 628 0_9_0 breast malignant tumors BRS
Segment cluster TI 1628JPEA_l_node_28 according to the present invention is supported by 115 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Tl 1628JPEA_1 _T3, TI 1628JPEA_1 JT4, TI 1628J?EA_1_T5, Tl 1628JPEA_1 _T7 and Tl 1628JPEA_1_T9. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Segment cluster Tl 1628JPEA_l_node_29 according to the present invention is supported by 1 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Tl 1628JPEA_1 _T3, Tl 1628JPEA_1JT4, Tl 1628JPEA_1_T5, Tl 1628JPEA_1_T7 and Tl 1628JPEA_1_T9. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on franscripts
Segment cluster T11628JPEA_l_node_30 according to the present invention can be found in the following transcript(s): Tl 1628JPEA_1_T3, Tl 1628JPEA_1 JT4, Tl 1628JPEA_1_T5, Tl 1628_PEA_1_T7, Tl 162SJPEA_1_T9 and Tl 1628JPEA_1_T11. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster Tl 1628JPEA_l_node_32 according to the present invention can be found in the following transcript(s): Tl 1628JPEA_1 JT3, Tl 1628J?EA_1 JT4, Tl 1628J?EA_1_T5, Tl 1628J?EA_1 _T7, Tl 1628J?EA_1 _T9 and Tl 1628_PEA_1_T1 1. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster Tl 1628JPEA_l_node_33 according to the present invention can be found in the following transcript(s): Tl 1628J?EA_1_T3, Tl 1628_PEA_1_T4, Tl 1628JPEA_1 _T5, Tl 1628JPEA_1_T7, Tl 1628JPEA_1 JT9 and Tl 1628_PEA_1 JTl 1. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on franscripts
Segment cluster Tl 1628JPEA_l_node_34 according to the present invention is supported by 122 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T11628JPEA_1_T3, T11628JPEA_1_T4, Tl 1628JPEA_1 _T5, Tl 1628JPEA_1_T7, Tl 1628JPEA_1_T9 and Tl 1628JPEAJLT11. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster Tl 1628JPEA_l_node_35 according to the present invention is supported by 126 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T11628J?EA_1JT3, T11628JPEA_1_T4, T11628JPEA_1_T5, T11628JPEA_1_T7, T11628 »EA_1_T9 and T1 1628JPEA_lJTl l . Table 40 below describes the starting and ending position of this segment on each franscript. Table 40 - Segment location on franscripts
Segment cluster Tl 1628JPEA_l_node_36 according to the present invention is supported by 122 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Tl 1628JPEA_1_T3, T11628_PEA_1_T4, Tl 162δ_PEA_l_T5, Tl 1628JPEA_1_T7, Tl 1628_PEA_1 _T9 and Tl 1628JPEA_1_T11. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : Q8WVH6
Sequence documentation: Alignment of: T11628_PEA_1_P2 x Q8WVH6 Alignment segment l/l:
Quality: 962.00 Escore: 0 Matching length: 99 Total length: 99 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : Alignment :
56 MKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPλtKYL 105 1 1 1 M M 1 1 1 1 M 1 1 1 1 M I M E 1 1 1 1 1 [ 1 1 1 1 1 1 M E I I 1 1 1 1 1 ! I E f I 1 MKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYL 50 106 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 51 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 99
Sequence name : MYG_HUMAN_V1
Sequence documentation:
Alignment of: T11628_PEA_1_P5 x MYG_HUMAN_V1
Alignment segment l/l: Quality: 962.00
Escore: 0 Matching length: 99 Total length: 99 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MKASEDLKKHGATλ^LTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYL 50
56 MKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYL 105 . . . . 51 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 99 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I E 1 1 1 1 1 M I I M 1 1 1 1 1 I E M 1 M 106 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
Sequence name: MYG_HUMAN_V1
Sequence documentation:
Alignment of: T11628_PEA_1_P7 x MYG_HUMAN_V1
Alignment segment l/l:
Quality: 1315.00 Escore: 0 Matching length: 134 Total length: 134 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGLSDGE QLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL 50
1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL 50
51 KSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKI 100 51 KSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKI 100
101 PVKYLEFISECIIQVLQSKHPGDFGADAQGAMNK 134 1 1 1 1 1 1 1 1 E I E 1 1 1 E 1 1 ! 1 1 1 E I E 1 1 1 1 1 1 1 1 1 1 101 PVKYLEFISECIIQVLQSKHPGDFGADAQGAMNK 134
Sequence name: Q8WVH6
Sequence documentation:
Alignment of: T11628_PEA_1_P10 x Q8 VH6 Alignment segment l/l:
Quality : 962 . 00 Escore : 0 Matching length: 99 Total length : 99 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 56 MKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYL 105
1 MKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYL 50
106 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
51 EFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG 99
DESCRIPTION FOR CLUSTER M78076 Cluster M78076 features 9 franscript(s) and 35 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of 'Λe known protein Amyloid- like protein 1 precursor (SwissProt accession identifier APP1 IUMAN; known also according to the synonyms APLP; APLP-1), SEQ ID NO: 760, refeπed io herein as the previously known protein. Protein Amyloid- like protein 1 precursor is known or believed to have the following function(s): May play a role in postsynaptic function. The C-terminal gamma- secretase processed fragment, ALTDl, activates transcription activation through APBB1 (Fe65) binding (By similarity). Couples to JTP signal transduction through C-terminal binding. May interact with cellular G-protein signaling pathways. Can regulate neurite outgrowth through binding to components of the exfracellular matrix such as heparin and collagen I. The gamma-CTF peptide, C30, is a potent enhancer of neuronal apoptosis (By similarity). The sequence for protein Amyloid- like protein 1 precursor is given at the end of the application, as "Amyloid- like protein 1 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Amyloid- like protein 1 precursor localization is believed to be Type I membrane protein. C-terminally processed in the Golgi complex. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: endocytosis; apoptosis; cell adhesion; neurogenesis; cell death, which are annotation(s) related to Biological Process; protein binding; heparin binding, which are annotation(s) related to Molecular Function; and basement membrane; coated pit; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster M78076 features 9 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Amyloid- like protein 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein M7S076JPEA_1 JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) M78076JPEA_1_T2. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M7δ076JPEA_l JP3 and APP1 JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1 JP3, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSPvRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQWPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLHHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRλ^EGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEHEGFLRA MDLEEPJlλrRQINEVMREWAMADNQSKNLPKADRQAL EITFQSILQTLEEQVSGERQRLVETHATRVIALTNDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVTEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKD coπesponding to amino acids 1 - 517 of APP 1 JHUMAN, which also coπesponds to amino acids 1 - 517 of
M7δ076JPEA_l JP3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE coπesponding to amino acids 518 - 519 of M78076JPEA_1 JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein M78076JPEA_1 JP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein M7δ076JPEA_lJP3, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Glycosylation site(s)
Variant protein M7δ076JPEA_l JP3 is encoded by the following transcript(s): M7δ076JPEA_l JT2, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript M78076JPEA_1_T2 is shown in bold; this coding portion starts at position 142 and ends at position 1698. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein M7S076_PEA_1 JP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) M78076JPEA_1_T3. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M7δ076JPEA_l JP4 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1 JP4, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLFil^DLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAHPHHQNVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQE ACSSQGL1LHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTA VGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVNGKVTPTPRPTDGV DTYFGMPGEISEITEGFLRA] _MDLEERRMRQIΝEVMREWAMADΝQSKΝL^ EHFQSiLQTTEEQVSGERQRLVETHAmVIALrNDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRI YQIWAAVDPEKAQQMRFQ\ΗTHLQVIEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSIGDDTPMTLPKG coπesponding to amino acids 1 - 526 of APPl JHUMAN, which also coπesponds to amino acids 1 - 526 of M7δ076JPEA_l JP4, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ECLTVNPSLQIPLNP coπesponding to amino acids 527 - 541 of M78076JPEA_1 JP4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78076JPEA_1 JP4, comprising a polypeptide being at feast 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECLTVNPSLQIPLNP in M78076JPEA_1 JP4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein M78076JPEA_1 JP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein M78076JPEA_1 JP4, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 9 (given according to then- position^) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP4 is encoded by the following franscript(s): M7S076JPEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78076JPEA_1_T3 is shown in bold; this coding portion starts at position 142 and ends at position 1764. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein M78076JPEA_1 JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78076JPEA_1_T13. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M7S076JPEA_1 JP12 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1 JP12, comprising a first amino acid sequence being at least 90 % homologous to
MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLITRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAΓPME RWCGGSRSGSCAIIPITHQVNPFRCLPGEFNSEALLVPEGCP^LHQERMDQCESSTΕRHQ EAQEACSSQGLTLHGSGMLLPCGSDRFRGVEYNCCPPPGTPDPSGTANGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEFFFIGFLP U VTOLEERRMRQIΝEVMDREWA EITFQSILQTLEEQVSGERQRLVETITATRVIALΓΝDQRRAALEGFLAALQADPPQAER\^L ALRRYLRAJEQKEQRHTLRITYQITNAAVDPEBCAQQMPTQVΗTITLQVIEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG coπesponding to amino acids 1 - 526 of APPl JHUMAN, which also coπesponds to amino acids 1 - 526 of M78076JPEA_1 JP12, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
ECVCSKGFPFPLIGDSEG coπesponding to amino acids 527 - 544 of M78076_PEA_1 JP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78076JPEA_1 JP12, comprising a polypeptide being at least 70%, optionally at least about δ0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ECVCSKGFPFPLIGDSEG in M7δ076JPEA_l JP12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein M78076JPEA_1JP12 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 1 1, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076 JPE A_l JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
The glycosylation sites of variant protein M7S076JPEA_1 JP12, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP12 is encoded by the following franscript(s): M78076JPEA_1_T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcnpt M78076JPEA_1JT13 is shown in bold; this coding portion starts at position 142 and ends at position 1773. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein M78076JPEA_1 JP14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) M7S076JPEA_1_T15. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78076_PEA_1JP14 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1 JP14, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLFfRDLRTGRAVEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSCAIiPHHQVNPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLTLHGSGMLLPCGSDRFRGVEYNCCPPPGTPDPSGTA VGDPSTRS WPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV DIYFGMPGEISEFf£GrT.R,AK DLEERRMRQIΝEVMREWAMADΝQSKΝLPKADRQA^ EITFQSILQTT.EEQVSGERQP VETHATRVIALINDQR1^AALEGFLAALQADPPQAERV^ ALRR\XRAEQKEQRHTLRITΥQΪ AAVDPEKAQQMrFQVΗTFiLQVIEERVNQSLGLLD QNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGST EQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDEL coπesponding to amino acids 1 - 570 of APPl JHUMAN, which also coπesponds to amino acids 1 - 570 of M7δ076JPEA_l JP14, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP coπesponding to amino acids 571 - 619 of M78076JPEA_1 JP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78076JPEA_1 JP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP in M78076JPEA_1JP14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein M78076JPEA_1 JP14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M7δ076JPEA_l JP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
The glycosylation sites of variant protein M7δ076JPEA_l JP14, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP14 is encoded by the following franscript(s): M78076JPEA_1_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78076JPEA_1_T15 is shown in bold; this coding portion starts at position 142 and ends at position 1998. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein M7 δ076 JPE A_l JP21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M7δ076JPEA_l_T23. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78076JPEA_1JP21 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1JP21, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLFfRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARλ'ΕQATQAlPME RWCGGSRSGSCAITPFfflQVNPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVENVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAλ GKVTPTPRPTDGV DIYFG^T GEISEFffiGFLPAKMDLEERRMRQrNEVMREWAMADNQSKNLPKADRQALN E coπesponding to amino acids 1 - 352 of APPl JHUMAN, which also coπesponds to amino acids 1 - 352 of M78076JPEA_1 JP21, and a second amino acid sequence being at least 90 % homologous to AERVLLALRRYLRAEQKEQPHTLRHYQITVAAVDPEKAQQMRFQVHTHLQVTEERVNQ SLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMT LPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDELAPAGTGVSREA VSGLLIMGAGGGSLTvTSMLLLRRKKPYGAISHGVVEVDPMLTLEEQQLRELQRHGYE ΝPTYRFLEERP coπesponding to amino acids 406 - 650 of APP 1 JHUMAN, which also coπesponds to amino acids 353 - 597 of M78076JPEA_1 JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of M7δ076JPEA_l JP21, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EA, having a structure as follows: a sequence starting from any of amino acid numbers 352-x to 352; and ending at any of amino acid numbers 353+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a trans- membrane region downstream of this signal peptide. Variant protein M78076JPEA_1 JP21 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M7δ076JPEA_l JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The glycosylation sites of variant protein M78076JPEA_1JP21, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP21 is encoded by the following transcript(s): M78076JPEA_1JT23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78076JPEA_1_T23 is shown in bold; this coding portion starts at position 142 and ends at position 1932. The franscript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M7δ076JPEA_l JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
λ^riant protein M78076JPEA_1 JP24 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78076JPEA_1_T26. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brie f description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78076JPEA_1_P24 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M78076JPEA_1JP24, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQATPME RWCGGSRSGSCAHPHHQVΛ FRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGV D^ΪTGMPGEISEIT GFLRA MDLEER MRQ1 EV^IREWAMADNQSK^JLP1 ADRQALN EITFQSILQTLEEQVSGERQRLVETILATRVIALN DQRRAALEGFLAALQADPPQAERNLL ALRRYLRAEQKEQRHTLRI YQI VAAVDPEKAQQMPJ^QVHTHLQVΓEERVΝQSLGLLD QΝPHLAQELRPQI coπesponding to amino acids 1 - 481 of APPl JHUMAN, which also coπesponds to amino acids 1 - 481 of M7δ076JPEA_l JP24, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RECLLPWLPLQISEGRS coπesponding to amino acids 482 - 498 of M7S076_PEA_1 JP24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78076JPEA_1 JP24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RECLLPWLPLQISEGRS in M78076JPEA JP24. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein M78076JPEA_1 JP24 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
The glycosylation sites of variant protein M78076JPEA_1JP24, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP24 is encoded by the following franscript(s): M78076JPEA_1_T26, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript M78076JPEA_1_T26 is shown in bold; this coding portion starts at position 142 and ends at position 1635. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M7δ076JPEA_l JP24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein M78076JPEA_1 JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) M78076JPEA_1_T27. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78076JPEA_1 JP2 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M7δ076JPEA_l JP2, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPME RWCGGSRSGSC AITPHHQV VPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQPVDDYTVEPPQAEEEEETWPPSSHTLA GKVTPTPRPTDGV DIYFGMPGEISEHEGFLRAK ΪDLEERRMRQINEVMREWAMADNQSKNLPKADRQALN EFTFQSTLQTLEEQVSGERQRLVETHATRVIALINDQPJIAALEGFLAALQADPPQAERVLL ALRRYLRAEQKΕQRHTLR 'QFΪNAAVDPEKAQQMRFQV coπesponding to amino acids 1 - 449 of APPl JHUMAN, which also coπesponds to amino acids 1 - 449 of M7δ076JPEA__l JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLT CCλT^DPCFLALGFLLPPPSILCSWWIFTAFPRJVFFFFFFLRQVLALSPRQESSVRSWLlAT STSWVQAILLPQPLE coπesponding to amino acids 450 - 58δ of M78076_PEA_1 _P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M7δ076JPEA_l JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence LTSFQLPNAPLFLR PPO RLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLT CC DPCFLALGFLLPPPSIXCSVPWIFTAFPRΓVFFFFFFLRQVLALSPRQESSVRSWLIAT STSWVQAILLPQPLE in M78076JPEA_1 JP2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a trans- membrane region downstream of this signal peptide. Variant protein M78076JPEA_1 JP2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
The glycosylation sites of variant protein M78076JPEA_1JP2, as compared to the know protein Amyloid- like protein 1 precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP2 is encoded by the following transcript(s): M78076JPEA_1_T27, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript M78076JPEA_1 JT27 is shown in bold; this coding portion starts at position 142 and ends at position 1905. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
Variant protein M78076JPEA_1JP25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M7δ076JPEA_l_T28. An alignment is given to the known protein (Amyloid- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78076JPEA_ 1 JP25 and APPl JHUMAN: l.An isolated chimeric polypeptide encoding for M7δ076JPEA_l JP25, comprising a first amino acid sequence being at least 90 % homologous to MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGL CGPvLTLHRDLRTGRWEPDPQRSRRCLRDPQRλ'TEYCRQMYPELQIARVEQATQArPME RWCGGSRSGSCAHPHHQVNPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQ EAQEACSSQGLILHGSGMLLPCGSDRFRGVEYNCCPPPGTPDPSGTAVGDPSTRSWPPG SRVEGAEDEEEEESFPQP\ DDYFVEPPQAEEEEETVPPPSSHTLAVNGKVTPTPRPTDGV DIYFGMPGEISEHEGFLP^KMDLEERRΛIRQIΝEVMTEWAMADΝQSKΝLPKADRQALΝ EHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLL ALRRYLRAEQKEQRHTLRHYQ1 VAAVDPEKAQQMRFQ coπesponding to amino acids 1 - 448 of APPl JHUMAN, which also coπesponds to amino acids 1 - 448 of M78076JPEA_1 JP25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PQNPNSQPPVAAGSLEVΠSHPFVPJ^LEΓLISPFQFQNSΓPKNSQΓVPAASPRGTSSP coπesponding to amino acids 449 - 505 of M78076JPEA_1 JP25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M78076JPEA_1 JP25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PQNPNSQPPAAGSLEVIISITPFVRRLEILISPFQFQNSIPKNSQrVPAASPRGTSSP in M78076JPEA_1 JP25. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein M78076JPEA_1 JP25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
The glycosylation sites of variant protein M7δ076JPEA_lJP25, as compared to the known protein Amyloid- like protein 1 precursor, are described in Table 27 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 27 - Glycosylation site(s)
Variant protein M78076JPEA_1 JP25 is encoded by the following transcript(s): M78076JPEA_1_T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78076JPEA_1_T28 is shown in bold; this coding portion starts at position 142 and ends at position 1656. The franscript also has die following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78076JPEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs
As noted above, cluster M78076 features 35 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster M78076JPEA_l_node_0 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076_PEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1 JT5, M78076_PEA_1 JT13, M78076JPEA_1 JT15, M7S076JPEA_1_T23, M78076JPEA_1_T26, M78076_PEA_1_T27 and M78076JPEA_1_T28. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_10 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076J?EA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15, M78076JPEA_1 JT23, M78076JPEA_1_T26, M78076JPEA_1_T27 and M78076J?EA_1_T28. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster M7δ076JPEA_l_node_15 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076_PEA_1_T2, M78076JPEA_1 _T3, M78076JPEA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15, M78076JPEA_1_T23, M78076J?EA_1_T26, M78076JPEA_1_T27 and M78076JPEA_1_T28. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_18 according to the present invention is supported by 95 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076J?EA_1 JT2, M78076JPEA_1_T3, M78076_PEA_1_T5, M78076JPEA_1_T13, M78076_PEA_1_T15, M78076JPEA_1_T23, M78076 EA_1 JT26, M78076J?EA_1_T27 and M78076JPEA_1_T28. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_20 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M7δ076JPEA_l_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1 JT13, M78076JPEA_1_T15, M78076JPEA_1_T23. M78076J?EA_1_T26, M78076J?EA_1 JT27 and M78076JPEA_1_T28. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_24 according to the present invention is supported by 105 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076J?EA_1JT5, M78076JPEA_1 JT13, M78076JPEA_1_T15, M78076_PEA_1_T26, M78076JPEA_1_T27 and M78076JPEA_1_T28. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_26 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1_T2, M78076_PEA_1_T3, M78076JPEA_1_T5, M78076_PEA_1_T13, M78076_PEA_1_T15, M78076JPEA_1_T23, M78076JPEA_1_T26, M7δ076JPEA_l JT27 and M78076JPEA_1_T28. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
13 <5 9
Segment cluster M78076JPEA_l_node_29 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 JT27. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster M7δ076JPEA_l_node_32 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 JT26 and M78076JPEA_1_T27. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster M78076_PEA_l_node_35 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1 JT2 and M78076_PEA_1_T5. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_37 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1_T3 and M78076JPEA_1_T5. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_46 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following trans cript(s): M78076_PEA_1JT15. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_47 according to the present invention is supported by 155 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1JT5, M78076JPEA_1_T13, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_54 according to the present invention is supported by 133 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M7δ076J?EA_l_T2, M78076JPEA_1 JT3, M78076JPEA_1 JT5, M7δ076JPEA_l JT13, M78076_PEA_1 JT15, M7δ076JPEA_l JT23 and M78076JPEA_1_T28. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
117;
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster M78076JPEA_l_node_l according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 _T2, M78076_PEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15, M78076JPEA_1_T23, M78076J?EA_1 JT26, M78076JPEA_1_T27 and M78076_PEA_1_T28. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Segment cluster M7S076JPEA_l_node_2 according to the present invention can be found in the following franscript(s): M78076JPEA_1_T2, M78076J?EA_1_T3, M78076 JPE A_1_T5, M78076J?EA_1JT13, M78076JPEA_1_T15, M78076JPEA_1_T23, M78076JPEA_1 JT26, M78076J?EA_1_T27 and M7δ076J?EA_l JT2δ. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster M7δ076JPEA_l_node_3 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1 _T2, M78076JPEA_1 JT3, M78076_PEA_1_T5, M78076JPEA_1_T13, M7δ076JPEA_lJT15, M7δ076_PEA_l_T23, M7δ076J?EA_l JT26, M7δ076J?EA_l_T27 and M78076JPEA_1_T28. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_6 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M7δ076J?EA_l JT2, M78076JPEA_1 JT3, M78076JPEA_1_T5, M78076JPEA_1JT13, M78076_PEA_1_T15, M78076JPEA_1_T23, M78076 JPE A_1_T26, M78076JPEA_1_T27 and M78076JPEA_1_T28. Table 46 below describes the starting and ending position of this segment on each franscript. Table 46 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_7 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076_PEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076J?EA_1 JT13, M7δ076JPEA_lJT15, M78076JPEA_1_T23, M7δ076JPEA_l JT26, M78076JPEA_1 JT27 and M78076JPEA_1 JT28. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on franscripts
Segment cluster M7S076JPEA_l_node_12 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1 _T2, M78076JPEA_1 JT3, M78076JPEA_1_T5, M78076J?EA_1_T13, M78076JPEA_1_T15, M78076JPEA_1_T23, M7δ076J?EA_l JT26, M7δ076J?EA_l_T27 and M7δ076J?EA_l JT2δ. Table 4δ below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_22 according to the present invention is supported by 92 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 _T2, M7δ076JPEA_l JT3, M7δ076J?EA_lJT5, M78076J?EA_1_T13, M78076_PEA_1_T15, M7δ076JPEA_l_T23, M78076J?EA_1 JT26, M78076J?EA_1_T27 and M7S076JPEA_1 JT28. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster M78076_PEA_l_node_27 according to the present invention can be found in the following transcript(s): M78076JPEA_1_T27. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_30 according to the present invention is supported by 90 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 JT2, M78076_PEA_1_T3, M78076_PEA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15, M78076JPEA_1_T23, M78076J?EA_1_T26 and M78076JPEA_1_T27. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_31 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076J?EA_1_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1 _T13, M78076JPEA_1_T15, M7δ076JPEA_l_T23, M78076JPEA_1_T26 and M78076JPEA_1_T27. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_34 according to the present invention is supported by 103 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript (s): M78076JPEA_1_T2, M7S076_PEA_1 JT3, M78076JPEA_1 JT5, M78076J?EA_1 JT13, M78076_PEA_1_T15 and M78076JPEA_1_T23. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_36 according to the present invention can be found in the following transcript(s): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1 _T5, M78076_PEA_1 JT13, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_41 according to the present invention can be found in the following transcript(s): M78076JPEA_1_T3 and M78076_PEA_1_T5. Table 55 below describes the starting and ending position of this segment on each transcript . 7 b/e 55 - Segment location on franscripts
Segment cluster M78076JPEA_l_node_42 according to the present invention can be found in the following franscript(s): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_43 according to the present invention is supported by 1 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1 _T2, M78076JPEA_1_T3, M7δ076JPEA_l_T5, M78076_PEA_1_T15 and M78076JPEA_1_T23. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on franscripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 58. Table 58 - Oligonucleotides related to this segment
Segment cluster M78076JPEA_l_node_45 according to the present invention is supported by 132 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): M78076J?EA_1 JT2, M78076JPEA_1_T3, M78076 JPE A_1JT5, M78076JPEA_1_T13, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 59 below describes the starting and ending position of this segment on each franscript. Table 59 - Segment location on transcripts
Microaπay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 60. Table 60 - Oligonucleotides related to this segment
Segment cluster M78076JPEA_l_node_49 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78076J?EA_1_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_50 according to the present invention is supported by 125 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78076JPEA_1 _T2, M78076_PEA_1_T3, M78076_PEA_1_T5, M7δ076J?EA_l JT13, M7S076_PEA_1 JT15 and M78076JPEA_1_T23. Table 62 below describes the starting and ending position of this segment on each franscript. Table 62 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_51 according to the present invention is supported by 123 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following transcript(s): M78076JPEA_1 JT2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076 JPE A_1_T13, M78076JPEA_1_T15 and M7S076JPEA_1JT23. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_52 according to the present invention can be found in the following transcript(s): M78076JPEA _T2, M78076J?EA_1 JT3, M78076JPEA_1_T5, M78076JPEA_1_T13, M78076JPEA_1_T15 and M78076JPEA_1_T23. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster M78076JPEA_l_node_53 according to the present invention can be found in the following transcript(s): M78076JPEA_1_T2, M78076JPEA_1_T3, M78076JPEA_1_T5, M78076JPEA_1_T13, M7δ076J?EA_l_T15, M78076 JPE A_1_T23 and M78076JPEA_1_T28. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: APP1_HUMAN
Sequence documentation:
Aliqnment of: M78076 PEA 1 P3 x APPl HUMAN
Alignment segment 1/1:
Quality: 5132.00 Escore: 0 Matching length: 517 Total length: 517 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 1111 II 111 I 1111111 II 111111111111111 II 1111 I I II M II 11 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100
101 ELQIARVEQATQAI PMERWCGGSRSGSCAHPHHQVNPFRCLPGEFVSEAL 150
101 ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 . . . . . 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGM LPCGSD 200
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 I I I I || I i I I I I I I I I II I I I I I I I I I I I I I I I II 11 I I I I I I I I I I I I I 251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAλA/GKVTPTPRPTDGVDIYFGM 300
301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMREWAMADΝQSKΝLPKADRQAL 350 301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMRE AMADΝQSKΝLPKADRQAL 350 351 NEHFQSILQTLEEQVSGERQR VETHATRVIALINDQRRAALEGFLAALQ 400
351 NEHFQSILQTLEEQVSGERQR VETHATRVIALINDQRRAALEGFLAALQ 400 401 ADPPQAERVLl-ALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAλtDPEKAQQMRFQVH 450 451 THLQVIEERWQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500 I I I I I I 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 451 THLQVIEERWQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500 501 GGSSEDKGGLQPPDSKD 517 501 GGSSEDKGGLQPPDSKD 517
Sequence name : APP1_HUMAN
Sequence documentation:
Alignment of: M78076_PEA_1_P4 x APP1_HUMAN
Alignment segment l/l: Quality: 5223.00
Escore: 0 Matching length: 526 Total length: 526 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 1 1 1 1 1 ! 1 1 1 1 M 1 1 M M 1 1 1 M 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 M 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100
101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150
101 ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKNTPTPRPTDGVDIYFGM 300 301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMRE AMADΝQSKΝLPKADRQAL 350
301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMREWAMADΝQSKΝLPKADRQAL 350 351 ΝEHFQSILQTLEEQVSGERQRLVETHATRVIALIΝDQRRAALEGFLAALQ 400 351 ΝEHFQSILQTLEEQVSGERQRLVETHATRVIALIΝDQRRAALEGFLAALQ 400
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450 . . . . . 451 THLQVIEERVΝQSLGLLDQΝPHLAQELRPQIQELLHSEHLGPSELEAPAP 500
451 THLQVIEERVΝQSLGLLDQΝPHLAQELRPQIQELLHSEHLGPSELEAPAP 500 501 GGSSEDKGGLQPPDSKDDTPMTLPKG 526
501 GGSSEDKGGLQPPDSKDDTPMTLPKG 526
Sequence name: APP1_HUMAΝ
Sequence documentation: Alignment of: M78076_PEA_1_P12 x APP1_HUMAN
Alignment segment 1/1:
Quality: 5223.00 Escore: 0 Matching length: 526 Total length: 526 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100
51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100 . . . . . 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150
101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGM 300
301 PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350
301 PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500
451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500
501 GGSSEDKGGLQPPDSKDDTPMTLPKG 526
501 GGSSEDKGGLQPPDSKDDTPMTLPKG 526
Sequence name: APP1_HUMAN
Sequence documentation:
Alignment of: M78076_PEA_1_P14 x APP1JHUMAN
Alignment segment 1/1:
Quality: 5672.00 Escore : 0 Matching length: 575 Total length: 575 Matching Percent Similarity: 99.48 Matching Percent Identity: 99.48 Total Percent Similarity: 99.48 Total Percent Identity: 99.48 Gaps : 0
Alignment :
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100
51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100 101 ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFNSEAL 150
101 ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 . . . . .
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMREWAMADΝQSKΝLPKADRQAL 350
301 PGEISEHEGFLRAKMDLEERRMRQIΝEVMRE AMADΝQSKΝLPKADRQAL 350
351 ΝEHFQSILQTLEEQVSGERQRLVETHATRVIALIΝDQRRAALEGFLAALQ 400
351 ΝEHFQSILQTLEEQVSGERQRLVETHATRVIALIΝDQRRAALEGFLAALQ 400
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
451 THLQVIEERVΝQSLGLLDQΝPHLAQELRPQIQELLHSEHLGPSELEAPAP 500 451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500
501 GGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKV 550 501 GGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKV 550 551 NASVPRGFPFHSSEIQRDELVRGGT 575
551 NASVPRGFPFHSSEIQRDELAPAGT 575
Sequence name: APP1_HUMAN
Sequence documentation:
Alignment of: M78076_PEA_1_P21 x APP1_HUMAN
Alignment segment l/l:
Quality: 5822.00 Escore: 0 Matching length: 597 Total length: 650 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.85 Total Percent Identity: 91.85 Gaps : 1
Alignment : 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100
51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 IMIIIMMIMIMIIIII 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 . . . . . 201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250 251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 301 PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I I 301 PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350 1195
351 NE 352 II 351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400 . . . . . 353 AERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 397
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450 398 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 447
451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAP 500 448 GGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKV 497
501 GGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKV 550 498 NASVPRGFPFHSSEIQRDELAPAGTGVSREAVSGLLIMGAGGGSLIVLSM 547 551 NASVPRGFPFHSSEIQRDELAPAGTGVSREAVSGLLIMGAGGGSLIVLSM 600
548 LLLRRKKPYGAISHGWEVDPMLTLEEQQLRELQRHGYENPTYRFLEERP 597 11111 i 1111111111 E 1111 M 11111111111111 E 11 E 111111111 601 LLLRRKKPYGAISHGWEVDPMLTLEEQQLRELQRHGYENPTYRFLEERP 650
Sequence name: APPlJHUMAN Sequence documentation:
Alignment of: M78076_PEA_1_P24 x APP1_HUMAN
Alignment segment l/l:
Quality: 4791.00
Escore : 0 Matching length: 485 Total length: 485 Matching Percent Similarity: 99.79 Matching Percent Identity: 99.59 Total Percent Similarity: 99.79 Total Percent Identity: 99.59 Gaps : 0
Alignment : 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 I 1 1 1 1 I I 1 1 1 1 1 I I I I I 1 1 1 1 1 I 1 1
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDI YFGM 300
301 PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350
301 PGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQAL 350
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIRECL 485
451 THLQVIEERVNQSLGLLDQNPHLAQELRPQIQELL 485 Sequence name : APP1_HUMAN
Sequence documentation:
Alignment of: M78076_PEA_1_P2 x APP1_HUMAN
Alignment segment l/l:
Quality: 4474.00 Escore : 0 Matching length: 454 Total length: 454 Matching Percent Similarity: 99.56 Matching Percent Identity: 99.34 Total Percent Similarity: 99.56 Total Percent Identity: 99.34 Gaps : 0
Alignmen :
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50 1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100
51 PGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYP 100
101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 101 ELQIARVEQATQAIPMER CGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
151 LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250
201 RFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFP 250
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 M I M 1 1 M 1 1 I I 1 1 1 1 1 1 1 1 M I M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1
251 QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300 . . . . . .
301 PGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQAL 350
301 PGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQAL 350
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400 M I M M II I II II M M 11 M M 1111 M M I M 11 II M I M 11 M 11
351 NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVL 450
401 ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVH 450
451 TSFQ 454
451 THLQ 454
Sequence name: APP1_HUMAN
Sequence documentation:
Alignment of : M78076_PEA_1_P25 x APP1_HTJMAN
Alignment segment 1/1:
Quality: 4455.00 Escore: 0 Matching length: 448 Total length: 448 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
1 MGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEA 50
51 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100 PGSAQVAGLCGRLTLHRDLRTGR EPDPQRSRRCLRDPQRVLEYCRQMYP 100
ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150 t E 1 M 11111 E I E 111 ! 11111 f 1 E 111111111 E ϊ 11 f 11111 ! 11 f 11 ELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQWPFRCLPGEFVSEAL 150
LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200 111111 f E 11111 E 111 ) E I E 11111 i 1 E 111 ! I E i I ! 1 E 1111111 E 1 E LVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSD 200
RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250 11111111 M I M 11 E I II I M I M IE 1111111 IE 11111111111 IE I RFRGVEYVCCPPPGTPDPSGTAVGDPSTRS PPGSRVEGAEDEEEEESFP 250
QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
QPVDDYFVEPPQAEEEEETVPPPSSHTLAWGKVTPTPRPTDGVDIYFGM 300
PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350
PGEISEHEGFLRAKMDLEERRMRQINEVMRE AMADNQSKNLPKADRQAL 350
NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
11 E 11 E 11 T E 111 ! E 1 E 11111111 E 11111 E ! 111111 E E ! 11 E 1111 E NEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQ 400
ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ 448
ADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ 448 DESCRIPTION FOR CLUSTER HSMUCIA Cluster HSMUCIA features 14 franscript(s) and 22 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Mucin 1 precursor (SwissProt accession identifier MUCl JHUMAN; known also according to the synonyms MUC- 1; Polymoφhic epithelial mucin; PEM; PEMT; Episialin; Tumor-associated mucin; Carcinoma- associated mucin; Tumor- associated epithelial membrane antigen; EMA; H23AG; Peanut- reactive urinary mucin; PUM; Breast carcinoma- associated antigen DF3; CD227 antigen), SEQ ID NO: δ05, refened to herein as the previously known protein. Protein Mucin 1 precursor is known or believed to have the following function(s): May play a role in adhesive functions and in cell-cell interactions, metastasis and signaling. May provide a protective layer on epithelial surfaces. Direct or indirect interaction with actin cytoskeleton. Isoform 7 behaves as a receptor and binds the secreted isoform 5. The binding induces the phosphorylation of the isoform 7, alters cellular morphology and initiates cell signaling. Can bind to GRB2 adapter protein. The sequence for protein Mucin 1 precursor is given at the end of the application, as "Mucin 1 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Mucin 1 precursor localization is believed to be Type I membrane protein. Two secreted forms (5 and 9) are also produced. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, breast; Cancer, lung, non-small cell; Cancer, ovarian; Cancer, prostate. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: CDS agonist; DNA antagonist; Immunostimulant; Interferon gamma agonist; MUC-1 inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; Monoclonal antibody, murine; Immunotoxin; Immunostimulant; Immunoconjugate. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: actin binding, which are annotation(s) related to Molecular Function; and cytoskeleton; integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSMUCIA can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 43 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 43 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
For this cluster, at least one oligonucleotide was found to demonsfrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments transcripts below (in relation to breast cancer), shown in Table 7. Table 7 - Oligonucleotides related to this cluster
As noted above, cluster HSMUCIA features 14 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Mucin 1 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSMUCl A JPEA_1 JP25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUCl A JPEA_1JT26. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUCl A JPEA_1 JP25 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1AJPEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSMUCl A JPEA_1 JP25 is encoded by the following franscript(s): HSMUCl A JPEA_1_T26, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A _PEA_1JT26 is shown in bold; this coding portion starts at position 507 and ends at position 1115. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1AJPEA_1 JP25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1 JP29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1AJPEA_1_T33. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUC1AJPEA_1 JP29 is encoded by the following transcript(s): HSMUCl AJPEA_1_T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A JPE A_1_T33 is shown in bold; this coding portion starts at position 507 and ends at position 953. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1AJPEA_1JP29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1JP30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1AJ?EA_1_T34. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC1AJPEA_1 JP30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1 JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSMUCl A JPEA_1 JP30 is encoded by the following transcript(s): HSMUCl AJPEA_1JT34, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUC1AJPEA_1 JT34 is shown in bold; this coding portion starts at position 507 and ends at position 1004. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPE A_l JP30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HSMUC1AJPEA_1 JP32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUCl A JPEA_1JT36. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC 1A_PEA_1 JP32 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1 JP32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSMUC1AJPEA_1 JP32 is encoded by the following frans cript(s): HSMUC1AJPEA_1JT36, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1AJPEA_1_T36 is shown in bold; this coding portion starts at position 507 and ends at position 977. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1 JP36 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1AJPEA_ 1_T40. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUCl A J?EA_1 JP36 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl AJPEA_1 JP36 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein HSMUC1AJPEA_1JP36 is encoded by the following transcript(s): HSMUC1AJPEA_1_T40, for wliich the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A JPEA_1_T40 is shown in bold; this coding portion starts at position 507 and ends at position 983. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1JP36 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1 JP39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUCl A JPEA_1_T43. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUCl A J?EA_1 JP39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Variant protein HSMUCl A _PEA_1JP39 is encoded by the following franscript(s): HSMUCl A JPE A_1_T43, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUCl A J?EA_1_T43 is shown in bold; this coding portion starts at position 507 and ends at position 914. The franscript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1AJPEA_1 JP39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1 JP45 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1AJPEA_1_T29. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUCl A JPEA_1 JP45 is encoded by the following franscript(s): HSMUC 1AJPEA_1 JT29, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1AJPEA_1_T29 is shown in bold; this coding portion starts at position 507 and ends at position 746. The franscript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last colunm indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPE A_l JP45 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HSMUCl A JPEA_1JP49 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1AJPEA_1_T12. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein HSMUC1AJPEA_1 JP49 is encoded by the following transcript(s): HSMUC 1AJPEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl AJPEA_1 JTl 2 is shown in bold; this coding portion starts at position 507 and ends at position 8δ4. The franscript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is knowτι or not; the presence of known SNPs in variant protein HSMLTC1AJPEA_1 JP49 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Variant protein HSMUC1AJPEA_1 JP52 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUCl A J?EA_1_T30. The location of the variant protein was detemiined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSMUC1AJPEA_1 JP52 is encoded by the following franscript(s): HSMUC 1AJPEA_1_T30, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUC 1AJPEA_1_T30 is shown in bold; this coding portion starts at position 507 and ends at position 719. The franscript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1 JP52 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Variant protein HSMUCl A JPE A_l JP53 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1AJPEA_1 JT31. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC 1AJPEA_1 JP53 is encoded by the following franscript(s): HSMUC1AJPEA_1_T31, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUCl A JPE A_1_T31 is shown in bold; this coding portion starts at position 507 and ends at position 665. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1 JP53 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HSMUC1AJPEA_1 JP56 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUCl A JPEA_1_T42. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSMUC 1AJPEA_1 JP56 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl AJPEA_1 JP56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Variant protein HSMUC 1AJPEA_1 JP56 is encoded by the following transcripts): HSMUC 1AJPEA_1_T42, for which the sequence(s) is/are given at tiie end of the application. The coding portion of transcript HSMUCl A JPE A_l JT42 is shown in bold; this coding portion starts at position 507 and ends at position 890. The franscript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1 JP56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSMUC1AJPEA_1 JP58 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUCl A JPEA_1_T35. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUC1AJPEA_1 JP58 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to tiieir position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1JP5S sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
Variant protein HSMUC1AJPEA_1 JP58 is encoded by the following transcript(s): HSMUC1AJPEA_1_T35, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUC1AJPEA_1 JT35 is shown in bold; this coding portion starts at position 507 and ends at position 9δ0. The transcript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1JP58 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
Variant protein HSMUCl AJPEA_1 JP59 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1AJPEA_1_T28. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC1AJPEA_1 JP59 is encoded by the following transcript(s): HSMUC1AJPEA_1_T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A JPE A_1_T28 is shown in bold; this coding portion starts at position 507 and ends at position 794. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1AJPEA_1 JP59 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Variant protein HSMUCl A JPE A_l JP63 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC 1AJPEA_1_T47. An alignment is given to the known protein (Mucin 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSMUC 1AJPEA_1 JP63 and MUCl JHUMAN: l.An isolated chimeric polypeptide encoding for HSMUC1AJPEA_1JP63, comprising a first amino acid sequence being at least 90 % homologous to MTPGTQSPFFLLLLLTNLT TGSGHASSTPGGEKETSATQRSSV conesponding to amino acids 1 - 45 of MUCl JHUMAN, λvhich also conesponds to amino acids 1 - 45 of HSMUCl A JPEA_1JP63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK coπesponding to amino acids 46 - 85 of HSMUC 1AJPEA_1 JP63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSMUC 1AJPEA_1JP63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUCIA PEA 1 P63.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. The glycosylation sites of variant protein HSMUC1AJPEA_1 JP63, as compared to the known protein Mucin 1 precursor, are described in Table 28 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 28 - Glycosylation site(s)
Variant protein HSMUC1AJPEA_1 JP63 is encoded by the following franscript(s): HSMUCl A JPEA_1 JT47, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A JPE A_1_T47 is shown in bold; this coding portion starts at position 507 and ends at position 761. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A JPEA_1 JP63 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
As noted above, cluster HSMUCIA features 22 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSMUCl A JPE A_l_node_0 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC 1AJPEA_1_T12, HSMUCl A JPE A_1_T26, HSMUCl A JPEA_1_T28, HSMUCl A JPEA_1_T29, HSMUCl A _PEA_1_T30, HSMUCl A J?EA_1_T31, HSMUCl A JPEA_1_T33, HSMUC1A_PEA_1JT34, HSMUC 1A_PEA_1_T35, HSMUC1AJPEA_1_T36, HSMUC1A_PEA_1_T40, HSMUCl A JPEA_1_T42, HSMUCl A _PEA_1_T43 and HSMUCl A JPE A_1_T47. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HSMUClAJPEA_l_node_14 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUCl AJPEA_1 JT12. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_node_24 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in die following franscript(s): HSMUC 1AJPEA_1 JT12. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster HSMUClAJPEA_l_node_29 according to the present invention is supported by 156 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1AJPEA_1JT 12, HSMUCl A JPEA_1JT26, HSMUClAJPEA_l_T2δ, HSMUC 1AJ?EA_1_T29, HSMUCl A JPEA JT30, HSMUCl A J?EA_1_T31, HSMLTC1AJPEA_1_T33, HSMUC1AJ?EA_1_T34, HSMUCl A J?EA_1JT35, HSMUC 1AJPEA_1JT36, HSMUCl A JPEA JT40, HSMUCl A JPEA_1_T42 and HSMUC1AJ?EA_1_T43. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster HSMUClAJPEA_l_node_35 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUCl A JPE A_1JT47. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to breast cancer), shown in Table 35. Table 35 - Oligonucleotides related to this segment
Segment cluster HSMUClAJPEA_l_node_3S according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1AJPEA_1_T12, HSMUCl A J?EA_1_T26, HSMUC1AJPEA_1_T28, HSMUCl A _PEA_1_T29, HSMUC1AJPEA_1_T30, HSMUC 1AJPEA_1_T31, HSMUCl A JPEA_1_T33, HSMUCl A JPEA_1_T34, HSMUC 1AJ?EA_1_T35, HSMUCl A J?EA_1_T36, HSMUC1AJ?EA_1 JT40, HSMUC 1AJPEA JT42, HSMUCl A JPEA_1_T43 and HSMUC1AJPEA_1 JT47. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSMUCl A JPEA_l_node_3 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC 1AJPEA_1 JT29, HSMUCl A JPEA_1_T34, HSMUCl A J?EA_1_T40 and HSMUC 1AJPEA_1_T43. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_ ιode_4 according to the present invention can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC1AJ?EA_1JT28, HSMUCl A J?EA_1JT29, HSMUCl A JPEA_1_T30, HSMUC1AJPEA_1_T31, HSMUC1A_PEA_1_T33, HSMUCl A JPE A_1_T34, HSMUCl A JPEA_1_T35, HSMUC 1AJ?EA_1JT36, HSMUC 1AJPEA_1JT40, HSMUC1AJ?EA_1_T42, HSMUCl A JPE A_1_T43 and HSMUC 1AJPEA_1_T47. Table 3δ below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_node_5 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1AJPEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1AJPEA_1_T28, HSMUCl A JPE A_1_T29, HSMUC 1AJ?EA_1_T30, HSMUCl A_PEA_1_T31, HSMUC 1AJPEA_1_T33, HSMUCIA PEA 1 T34, HSMUC 1AJPEA 1 T35, HSMUCIA PEA 1 T36, HSMUCl A J?EA_1 JT40, HSMUCl A JPEA_1JT42, HSMUCl A J?EA_1_T43 and HSMUCl A JPEA_1_T47. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on franscripts
Segment cluster HSMUCl A JPEA_l_node_6 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC 1AJPEA_1_T 12, HSMUC 1AJPEA_1_T26, HSMUCl A J?EA_1_T28, HSMUC1AJ?EA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC1AJ?EA_1_T31 , HSMUCl A J?EA_1_T33, HSMUC 1AJPEA_1_T34, HSMUC1A_PEA_1 JT35, HSMUCl A JPEA_1JT36, HSMUC 1A_PEA_1_T40, HSMUC1A_PEA_1JT42, HSMUCl A JPE A_1_T43 and HSMUCl A JPEA_1_T47. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_node_7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1AJPEA_1_T12, HSMUC1AJPEA_1_T26, HSMUCl A JPEA_1_T28, HSMUCl A JPEA_1_T29, HSMUC 1AJPEA_1_T30, HSMUC 1AJ?EA_1_T31, HSMUC 1A_PEA_1_T33, HSMUC1AJPEA_1 JT34, HSMUCl A JPEA_1_T35, HSMUC1AJPEA_1 JT36, HSMUCl A J?EA_1_T40, HSMUC 1AJPEA_1_T42 and HSMUC 1AJPEA_1_T43. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HSMUC lAJPEA_l_node_ 17 according to the present invention can be found in the following transcript(s): HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1 JT33 and HSMUC 1AJ?EA_1_T40. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Segment cluster HSMUC lAJPEA_l_node_ 18 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMLTC1AJPEA_1JT12, HSMUC 1AJPEA_1_T26, HSMUC lAJPEA_lJT2δ, HSMUCl AJPEA_1_T29, HSMUCl A _PEA_1_T30, HSMUC 1AJPEA_1_T33, HSMUCl A J?EA_1JT35, HSMUCl A JPE A_l JT40 and HSMUCl A JPE A_l JT42. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on franscripts
Segment cluster HSMUCl A JPE A_l_node_20 according to the present invention can be found in the following transcript(s): HSMUC1AJPEA_1_T12, HSMUC1AJPEA_1_T26, HSMUC1AJPEA_1_T28, HSMUCl A J?EA_1_T33, HSMUC1A_PEA_1 _T35 and HSMUCl A JPEA_1_T42. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HSMUCl A JPE A_l_node_21 according to the present invention is supported by 97 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1AJPEA_1_T 12, HSMUCl A JPEA_1JT26, HSMUC 1AJPEA_1_T28, HSMUC 1AJPEA_1_T33, HSMUC1AJPEA_1JT35 and HSMUCl A JPE A_l JT42. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HSMUClAJPEA_l_node_23 according to the present invention can be found in the following transcript(s): HSMUC1AJPEA_1JT12. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_node_26 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1AJPEA_1_T12, HSMUC1AJ?EA_1_T26, HSMUCl A JPEA_1_T28, HSMUCl A JPE A_1_T29, HSMUCl A JPEA_1_T30 and HSMUC 1AJPEA_1_T31. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on franscripts
Segment cluster HSMUCl AJPEA_l_node_27 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC 1AJPEA_1 JT12, HSMUC1AJPEA_1_T26, HSMUC1AJ?EA_1JT28, HSMUCl A JPEA_1_T29, HSMUC1AJPEA_1_T30, HSMUC 1A_PEA_1JT31, HSMUC1AJPEA_1 _T33, HSMUCl A JPEA_1JT34, HSMUC1AJPEA_1_T35 and HSMUC 1AJPEA_1_T36. Table 50 below describes the starting and ending position of this segment on each franscript. 7 b/e 50 - Segment location on transcripts
Segment cluster HSMUC lAJPEA_l_node_31 according to the present invention can be found in the following transcript(s): HSMUC 1AJ?EA_1_T 12, HSMUCl A JPEA_1JT26, HSMUC1AJPEA_1_T28, HSMUCl A JPEA_1_T29, HSMUC 1AJ?EA_1_T30, HSMUC1AJ?EA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1AJ?EA_1JT34, HSMUC1AJPEA_1 JT35, HSMUCl A J?EA_1_T36, HSMUCl A _PEA_1_T40, HSMUCl A _PEA_1 JT42 and HSMUCl A JPE A_l JT43. Table 51 below describes the starting and ending position of this segment on each franscript. Table 51 - Segment location on transcripts
Segment cluster HSMUCl A JPE A_l_node_34 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1AJPEA_1_T47. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on franscripts
Segment cluster HSMUClAJPEA_l_node_36 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1AJPEA_1_T12, HSMUC1AJPEA_1JT26, HSMUC 1AJ?EA_1_T28, HSMLTC 1AJPEA_1_T29, HSMUCl A JPEA_1_T30, HSMUC1AJ?EA_1_T31, HSMUC 1AJ?EA_1_T33, HSMUC1AJ?EA_1JT34, HSMUCl A JPEA_1_T35, HSMUCl A JPEA_1JT36, HSMUCl A _PEA_1 JT40, HSMUCl A JPEA_1_T42, HSMUC1AJPEA_1_T43 and HSMUCl A JPE A_l JT47. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HSMUCl A JPEA_l_node_37 according to the present invention is supported by 146 libraries. The number of libraries was detemiined as previously described. This segment can be found in the following franscript(s): HSMUC1AJPEA_1 JTl 2, HSMUCl A _PEA_1 JT26, HSMUCl A J?EA_1_T28, HSMUC 1AJ?EA_1_T29, HSMUCl A JPE A_1JT30, HSMUC1AJ?EA_1_T31, HSMUCl A JPEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC 1AJPEA_1_T35, HSMUCl AJPEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC1AJPEA_1_T42, HSMUC 1A_PEA_1 T43 and HSMUCl A JPEA_1_T47. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: MUClJHUMAN
Sequence documentation:
Alignment of: HSMUC1A_PEA_1_P63 x MUC1_HUMAN
Alignment segment l/l
Quality: 429.00 Escore : Matching length: 59 Total length: 59 Matching Percent Similarity: 86.44 Matching Percent Identity: 81.36 Total Percent Similarity: 86.44 Total Percent Identity: 81.36 Gaps : 0
Alignment :
1 MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSVEEEVS 50
1 MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSVPSSTE 50 51 ADQVSVGAS 59 : Ih :| 51 KNAVSMTSS 59
Combined expression of 8 sequences (TlOSδδsegl 1- 17, HUMGR5E junc3-7, HSSTROL3seg24, T94936 Seg 14, Z2136δ seg39, Z2136δ June 17-21 T59δ32jun6-25-26 and M85491seg24) in normal and cancerous breast tissues Expression of CEA6 TUMAN Carcinoembryonic antigen-related cell adhesion molecule 6, GRPJHUMAN - gastrin-releasing peptide, Sfromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP- 1 1) (ST3) (SL-3), Homo sapiens breast cancer membrane protein 11 (BCMP11), SULl JHUMAN, Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3 and gamma- interferon inducible lysosomal thiol reductase (GELT) franscripts detectable by or according to T10δδδsegl l- 17, HUMGR5E junc3-7, HSSTROL3seg24, T94936segl4, Z21368 seg39, Z21368 juncl7-21, T59δ32 jun6-25- 26 and Mδ5491seg24 amplicons and T10δδ8segl l-17F, T10888segl l- 17R, HUMGR5E junc3- 7F, HUMGR5E junc3-7F, HSSTROL3seg24F, HSSTROL3seg24R, T94936segl4F, T94936segl4R, Z2136δ seg39F, Z2136δ seg39R, Z2136δjuncl7-21F, Z2136δjuncl7-21R, T59δ32 jun6-25-26F, T59832 jun6-25-26F, M85491seg24F and M85491seg24R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon), G6PD (GenBank Accession No. NM_000402; G6PD amplicon) and SDHA (GenBank Accession No. NM_00416δ; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicons was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample of each amplicon was then divided by the median of the quantities of the normal post-mortem (PM) samples detected for the same amplicon (Sample Nos. 56-60, 63-67 Table 1, "Tissue samples in testing panel" above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples.
Figures 44-47 are histograms showing differential expression of the above- indicated transcripts in cancerous breast samples relative to the normal samples, in different combinations. The number and percentage of samples that exhibit at least 5 fold differential of at least one of the sequences, out of the total number of samples tested is indicated in the bottom. As is evident from Figures 44-47, differential expression of at least 5 fold in at least one of the sequences was found in 25 out of 28 adenocarcinoma samples in all different combinations. Statistical analysis was applied to verify the significance of these results, as described below. Threshold of 5 fold differential expression of at least one of the amplicons was found to differentiate between cancer and normal samples. The above values demonstrate statistical significance of the results.
DESCRIPTION FOR CLUSTER HSU33147
Cluster HSU33147 features 2 franscript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Mammaglobin A precursor (SwissProt accession identifier MGBAJHLTMAN; known also according to the synonyms Mammaglobin 1; Secretoglobin family 2 A member 2), SEQ JD NO: δ27, refened to herein as the previously known protein. The sequence for protein Mammaglobin A precursor is given at the end of the application, as "Mammaglobin A precursor amino acid sequence".
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: steroid binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSU33147 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in nonnal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of Figure 48 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 48 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues.
Table 4 - Normal tissue distribution
Table 5 - P values and ratios for expression in cancerous tissue
As noted above, cluster HSU33147 features 2 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Mammaglobin A precursor. A description of each variant protein according to the present invention is now provided.
Nariant protein HSU33147JPEA_1 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSU33147JPEA_1_T1. An alignment is given to the known protein (Mammaglobin A precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between FISU33147J?EA_1J?5 and MGBAJHUMAΝ: 1.An isolated chimeric polypeptide encoding for HSU33147JPEA_1 JP5, comprising a first amino acid sequence being at least 90 % homologous to MKLLMVLMLAALSQHCYAGSGCPLLEΝVISKTiΝPQVSKTEYKELLQEFTDDΝATTΝAI DELKECFLΝQTDETLSΝVE conesponding to amino acids 1 - 78 of MGBAJHUMAΝ, which also coπesponds to amino acids 1 - 78 of HSU33147JPEA_1JP5, and a second amino acid sequence being at least 90 % homologous to QLLYDSSLCDLF conesponding to amino acids 82 - 93 of MGBAJHUMAΝ, which also coπesponds to amino acids 79 - 90 of HSU33147JPEA_1 JP5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSU33147JPEA_1 JP5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EQ, having a structure as follows: a sequence starting from any of amino acid numbers 7δ-x to 7δ; and ending at any of amino acid numbers 79+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. The glycosylation sites of variant protein HSLT33147JPEA_1JP5, as compared to the known protein Mammaglobin A precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Glycosylation site(s)
Variant protein HSU33147JPEA_1JP5 is encoded by the following franscript(s): HSU33147JPEA_1 JTl, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSU33147JPEA_1_T1 is shown in bold; this coding portion starts at position 72 and ends at position 341. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU33147JPEA_1 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster HSU33147 features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSU33147JPEA_l_node_0 according to the present invention is supported by 3δ libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU33147JPEA_1JT1 and HSU33147JPEA_1_T2. Table δ below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Segment cluster HSU33147JPEA_l_node_2 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following ttanscript(s): HSU33147JPEA_1_T1 and HSU33147JPEA_1 _T2. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster HSU33147JPEA_l_node_4 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU33147JPEA_1_T2. Table 10 below describes the starting and ending position of this segment on each franscript. Table 10 - Segment location on transcripts
Segment cluster HSU33147JPEA_l_node_7 according to the present invention is supported by 35 libraries. The number of libraries v/as determined as previously described. This segment can be found in the following transcript(s): HSU33147JPEA_1_T1. Table 1 1 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSU33147JPEA_l_node_3 according to the present invention can be found in the following transcript(s): HSU33147JPEA_1_T2. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : MGBA_HUMAN
Sequence documentation :
Al ignment of : HSU33147_PEA_1_P5 x MGBA_HUMAN
Al ignment segment l/l :
Quality : 776 . 00 Escore : Matching length: 90 Total length: 93 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 96.77 Total Percent Identity: 96.77 Gaps :
Alignment :
1 MKLLMVT,MLAALSQHCYAGSGCPLLENVISKTINPQVSKTEYKEL QEFI 50
1 MKLLMVLMLAALSQHCYAGSGCPLLENVISKTINPQVSKTEYKELLQEFI 50
51 DDNATTNAIDELKECFLNQTDETLSNVE... QLIYDSSLCDLF 90
51 DDNATTNAIDELKECFLNQTDETLSNVEVFMQLI YDSS CDLF 9-3
Therapeutic applications of splice variants of the present invention Splice variants described herein (including any polynucleotide, oligonucleotide, polypeptide, peptide or fragments thereof) or antibodies that specifically bind thereto may optionally be used for therapeutic applications, for example to treat the diseases described herein with regard to diagnostic applications thereof. A "variant-treatable" disease refers to any disease that is treatable by using a splice variant of any of the therapeutic proteins according to the present invention. "Treatment" also encompasses prevention, amelioration, elimination and control of the disease and/or pathological condition. The diseases for which such variants may be useful therapeutic agents are described in greater detail below for each of the variants. The variants themselves are described by "cluster" or by gene, as these variants are splice variants of known proteins. Therefore, a "cluster-related disease" or a "variant-related disease" refers to a disease that may be treated by a particular protein, with regard to the description of such diseases below a therapeutic protein variant according to the present invention. The term "biologically active", as used herein, refers to a protein having structural, regulatory, or biochemical functions of a naturally occurring molecule. Likewise, "immunologically active" refers to the capability of the natural, recombinant, or synthetic ligand, or any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies. The term "modulate", as used herein, refers to a change in the activity of at least one receptor mediated activity. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional or immunological properties of a ligand.
METHODS OF TREATMENT As mentioned hereinabove the novel therapeutic protein variants of the present invention and compositions derived therefrom (i.e., peptides, oligonucleotides) can be used to freat cluster- related diseases. Thus, according to an additional aspect of the present invention there is provided a method of treating cluster-related disease in a subject. The subject according to the present invention is a mammal, preferably a human which has at least one type of the cluster-related diseases described hereinabove. As mentioned hereinabove, the biomolecular sequences of die present invention can be used to treat subjects with the above-described diseases. The subject according to the present invention is a mammal, preferably a human which is diagnosed with one of the diseases described hereinabove, or alternatively is predisposed to having one of the diseases described hereinabove. As used herein the term "treating" refers to preventing, curing, reversing, attenuating, alleviating, minimizing, suppressing or halting the deleterious effects of the above-described diseases. Treating, according to the present invention, can be effected by specifically upregulating or alternatively downregulating the expression of at least one of the polypeptides of the present invention in the subject. Optionally, upregulation may be effected by administering to the subject at least one of the polypeptides of the present invention (e.g., recombinant or synthetic) or an active portion thereof, as described herein. However, since the bioavailability of large polypeptides may potentially be relatively small due to high degradation rate and low penetration rate, administration of polypeptides is preferably confined to small peptide fragments (e.g., about 100 amino acids). The polypeptide or peptide may optionally be administered in a pharmaceutical composition, described in more detail below. It will be appreciated that treatment of the above-described diseases according to the present invention may be combined with other treatment methods known in the art (i.e., combination therapy) . Thus, freatment of malignancies using the agents of the present invention may be combined with, for example, radiation therapy, antibody therapy and or chemotherapy. Alternatively or additionally, an upregulating method may optionally be effected by specifically upregulating the amount (optionally expression) in the subject of at least one of the polypeptides of the present invention or active portions thereof. As is mentioned hereinabove and in the Examples section which follows, the biomolecular sequences of this aspect of the present invention may be used as valuable therapeutic tools in the treatment of diseases in which altered activity or expression of the wild- type gene product is known to contribute to disease onset or progression. For example in case a disease is caused by overexpression of a membrane bound receptor, a soluble variant thereof may be used as an antagonist which competes with the receptor for binding the ligand, to thereby terminate signaling from the receptor. Examples of such diseases are listed in the Examples section which follows. It will be appreciated that the polypeptides of the present invention may also have agonistic properties. These include increasing the stability of the ligand (e.g., IJL-4), protection from proteolysis and modification of the pharmacokinetic properties of the ligand (i.e., increasing the half- life of the ligand, while decreasing the clearance thereof). As such, the biomolecular sequences of this aspect of the present invention may be used to freat conditions or diseases in which the wild-type gene product plays a favorable role, for example, increasing angiogenesis in cases of diabetes or ischemia. Upregulating expression of the therapeutic protein variants of the present invention may be effected via the administration of at least one of the exogenous polynucleotide sequences of the present invention, ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells), as described above. Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants of the present invention or active portions thereof. It will be appreciated that the nucleic acid construct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Nucleic acid constructs are described in greater detail above. It will be appreciated that the present methodology may also be effected by specifically upregulating the expression of the variants of the present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern of the gene. This approach has been successfully used for shifting the balance of expression of the two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097- 1 100; and Mercatante (2001) J. Biol. Chem. 276: 1641 1- 16417]; IL-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Drug Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane -bound receptor, while the shorter form encodes for a secreted soluble non- functional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression of the wild type receptor and increase the expression of the shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sa∑ani and Kole (2003) Progress in Moleclular and Subcellular Biology 31:217-239. Upregulating expression of the polypeptides of the present invention in a subject may be effected via the administration of at least one of the exogenous polynucleotide sequences of the present invention (e.g., SEQ JD NOs: 3, 7, 1 1, 15, 19, 23, 27, 31, 35, 39 or 43) ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells). Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants of the present invention or active portions thereof. It will be appreciated that the nucleic acid construct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Preferably, the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transformed. Examples of cell type-specific and/or tissue- specific promoters include promoters, such as albumin that is liver specific [Pinkert et al., (1987) Genes Dev. 1 :268-277], lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729- 733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473- 5477], pancreas- specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Patent Application No. EP 264, 166). Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/- ), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMN/myc/cyto each of which is commercially available from Invifrogen Co. (wrww.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, including Refro-X vectors pLΝCX and pLXSΝ, which permit cloning into multiple cloning sites and the frasgene is transcribed from CMN promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5 'LTR promoter. Cmrently preferred in vivo nucleic acid transfer techniques include transfection with viral or non- viral constructs, such as adenovirus, lentivirus, Heφes simplex I virus, or adeno- associated virus (AAV) and lipid-based systems. Useful lipids for lipid- mediated transfer of the gene are. for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or refroviruses. A viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus- defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is already present in the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second- strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers. It will be appreciated that the present methodology may also be performed by specifically upregulating the expression of the splice variants of the present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern of the gene. This approach has been successfully used for shifting the balance of expression of the two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097- 1100; and Mercatante (2001) J. Biol. Chem. 276:16411-16417]; IL-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Drug Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the TL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane -bound receptor, while the shorter form encodes for a secreted soluble non- functional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression of the wild type receptor and increase the expression of the shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31:217-239. Treatment can preferably effected by agents which are capable of specifically downregulating expression (or activity) of at least one of the polypeptide variants of the present invention. Down regulating the expression of the therapeutic protein variants of the present invention may be achieved using oligonucleotide agents such as those described in greater detail below. SiRNA molecules - Small interfering RNA (siRNA) molecules can be used to down- regulate expression of the therapeutic protein variants of the present invention. RNA interference is a two-step process. The first step, which is termed as the initiation step, input dsRNA is digested into 21-23 nucleotide (nt) small interfering RNAs (siRNA), probably by the action of Dicer, a member of the RNase Til family of dsRNA- specific ribonucleases, which processes (cleaves) dsRNA (introduced directly or via a transgene or a virus) in an ATP- dependent manner. Successive cleavage events degrade the RNA to 19-21 bp duplexes (siRNA), each with 2-nucleotide 3' overhangs [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002); and Bernstein Nature 409:363-366 (2001)]. In the effector step, the siRNA duplexes bind to a nuclease complex to from the RNA- induced silencing complex (RISC). An ATP-dependent unwinding of the siRNA duplex is required for activation of the RISC. The active RISC then targets the homologous franscript by base pairing interactions and cleaves the mRNA into 12 nucleotide fragments from the 3' terminus of the siRNA [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225- 232 (2002); Hammond et al. (2001) Nat. Rev. Gen. 2: 1 10-119 (2001); and Sharp Genes. Dev. 15:485-90 (2001)]. Although the mechanism of cleavage is still to be elucidated, research indicates that each RISC contains a single siRNA and an RNase [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002)]. Because of the remarkable potency of RNAi, an amplification step within the RNAi pathway has been suggested. Amplification could occur by copying of the input dsRNAs which would generate more siRNAs, or by replication of the siRNAs formed. Alternatively or additionally, amplification could be effected by multiple turnover events of the RISC [Hammond et al. Nat. Rev. Gen. 2:110- 1 19 (2001), Sharp Genes. Dev. 15:485-90 (2001); Hutvagner and 1'262
Zamore Cun. Opin. Genetics and Development 1 225-232 (2002)]. For more information on RNAi see the following reviews Tuschl ChemBiochem. 2:239-245 (2001); Cullen Nat. Immunol. 3:597-599 (2002); and Brantl Biochem. Biophys. Act. 1575: 15-25 (2002). Synthesis of RNAi molecules suitable for use with the present invention can be effected as follows. First, the mRNA sequence is scanned downsfream of the AUG start codon for AA dinucleotide sequences. Occwrence of each AA and the 3' adjacent 19 nucleotides is recorded as potential siRNA target sites. Preferably, siRNA target sites are selected from the open reading frame, as untranslated regions (UTRs) are richer in regulatory protein binding sites. UTR- binding proteins and/or translation initiation complexes may interfere with binding of the siRNA endonuclease complex [Tuschl ChemBiochem. 2:239-245]. It will be appreciated though, that siRNAs directed at untranslated regions may also be effective, as demonstrated for GAPDH wherein siRNA directed at the 5' UTR mediated about 90 % decrease in cellular GAPDH mRNA and completely abolished protein level (www.ambion.com/techlib/tn/91/912.html). Second, potential target sites are compared to an appropriate genomic database (e.g., human, mouse, rat etc.) using any sequence alignment software, such as the BLAST software available from the NCBI server (www.ncbi.nlm.nih.gov/BLAST/). Putative target sites which exhibit significant homology to other coding sequences are filtered out. Qualifying target sequences are selected as template for siRNA synthesis. Prefened sequences are those including low G/C content as these have proven to be more effective in mediating gene silencing as compared to those with G/C content higher than 55 %. Several target sites are preferably selected along the length of the target gene for evaluation. Target sites are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. For better evaluation of the selected siRNAs, a negative confrol is preferably used in conjunction. Negative confrol siRNA preferably include the same nucleotide composition as the siRNAs but lack significant homology to the genome. Thus, a scrambled nucleotide sequence of the siRNA is preferably used, provided it does not display any significant homology to any other gene. DNAzyme molecules - Another agent capable of downregulating expression of the polypeptides of the present invention is a DNAzyme molecule capable of specifically cleaving an mRNA franscript or DNA sequence of the polynucleotides of the present invention.
DNAzymes are single- stranded polynucleotides which are capable of cleaving both single and double stranded target sequences (Breaker, R.R. and Joyce, G. Chemistry and Biology 1995;2:655; Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 1997;943:4262) A general model (the "10-23" model) for the DNAzyme has been proposed. "10-23" DNAzymes have a catalytic domain of 15 deoxyribonucleotides, flanked by two subsfrate- recognition domains of seven to nine deoxyribonucleotides each. This type of DNAzyme can effectively cleave its subsfrate RNA at purine:pyrimidine junctions (Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 199; for rev of DNAzymes see Khachigian, LM [Cun Opin Mol Ther 4: 119-21 (2002)]. Target sites for DNAzymes are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. Examples of construction and amplification of synthetic, engineered DNAzymes recognizing single and double- stranded target cleavage sites have been disclosed in U.S. Pat. No. 6,326,174 to Joyce et al. DNAzymes of similar design directed against the human Urokinase receptor were recently observed to inhibit Urokinase receptor expression, and successfully inhibit colon cancer cell metastasis in vivo (Itoh et al , 20002, Abstract 409, Ann Meeting Am Soc Gen Ther www.asgt.org). In another application, DNAzymes complementary to bcr-abl oncogenes were successful in inhibiting the oncogenes expression in leukemia cells, and lessening relapse rates in autologous bone manow transplant in cases of CML and ALL. Antisense molecules - Downregulation of the polynucleotides of the present invention can also be effected by using an antisense polynucleotide capable of specifically hybridizing with an mRNA transcript encoding the polypeptide variants of the present invention. The term "antisense", as used herein, refers to any composition containing nucleotide sequences, which are complementary to a specific DNA or RNA sequence. The term "antisense strand" is used in reference to a nucleic acid sfrand that is complementary to the "sense" sfrand. Antisense molecules also include peptide nucleic acids and may be produced by any method including synthesis or transcription. Once introduced into a cell, the complementary nucleotides combine with natural sequences produced by the cell to form duplexes and block either transcription or translation. The designation "negative" is sometimes used in reference to the antisense strand, and "positive" is sometimes used in reference to the sense strand. Antisense oligonucleotides are also used for modulation of alternative splicing in vivo and for diagnostics in vivo and in vitro (Khelifi C. et al., 2002, Current Pharmaceutical Design 8:451-1466; Sazani, P., and Kole. R. Progress in Molecular and Cellular Biology, 2003, 31 :217-239). Design of antisense molecules which can be used to efficiently downregulate expression of the polypeptides of the present invention must be effected while considering two aspects important to die antisense approach. The first aspect is delivery of the oligonucleotide into the cytoplasm of the appropriate cells, while the second aspect is design of an oligonucleotide which specifically binds the designated mRNA within cells in a way which inhibits translation thereof. The prior art teaches of a number of delivery strategies which can be used to efficiently deliver oligonucleotides into a wide variety of cell types [see, for example, Luft J Mol Med 76: 75-6 (1998); Kronenwett et al. Blood 91 : 852-62 (1998); Rajur et al. Bioconjug Chem 8: 935-40 (1997); Lavigne et al. Biochem Biophys Res Commun 237: 566-71 (1997) and Aoki et al. (1997) Biochem Biophys Res Commun 231: 540-5 (1997)]. In addition, algorithms for identifying those sequences with the highest predicted binding affinity for their target mRNA based on a thermodynamic cycle that accounts for the energetics of structural alterations in both the target mRNA and the oligonucleotide are also available [see, for example, Walton et al. Biotechnol Bioeng 65: 1-9 (1999)]. Such algorithms have been successfully used to implement an artisense approach in cells. For example, the algorithm developed by Walton et al. enabled scientists to successfully design antisense oligonucleotides for rabbit beta-globin (RBG) and mouse tumor necrosis factor-alpha (TNF alpha) transcripts. The same research group has more recently reported that the antisense activity of rationally selected oligonucleotides against three model target mRNAs (human lactate dehydrogenase A and B and rat gpl30) in cell culture as evaluated by a kinetic PCR technique proved effective in almost all cases, including tests against three different targets in two cell types with phosphodiester and phosphorothioate oligonucleotide chemistries. In addition, several approaches for designing and predicting efficiency of specific oligonucleotides using an in vitro system were also published (Matveeva et al., Nature Biotechnology 16: 1374 - 1375 (1998)]. Several clinical trials have demonstrated safety, feasibility and activity of antisense oligonucleotides. For example, antisense oligonucleotides suitable for the freatment of cancer have been successfully used [Holmund et al., Cun Opin Mol Ther 1 :372-85 (1999)], while treatment of hematological malignancies via antisense oligonucleotides targeting c-myb gene, p53 and Bel- 2 had entered clinical trials and had been shown to be tolerated by patients [Gerwitz Curr Opin Mol Ther 1:297-306 (1999)]. More recently, antisense- ediated suppression of human heparanase gene expression has been reported to inhibit pleural dissemination of human cancer cells in a mouse model [Uno et al., Cancer Res 61:7855-60 (2001)]. Thus, the cureent consensus is that recent developments in the field of antisense technology which, as described above, have led to the generation of highly accurate antisense design algorithms and a wide variety of oligonucleotide delivery systems, enable an ordinarily skilled artisan to design and implement antisense approaches suitable for downregulating expression of known sequences without having to resort to undue trial and eπor experimentation. Target sites for antisense molecules are selected from the unique nucleotide sequences of each of the polynucleotides of the present invention, such that each polynucleotide is specifically down regulated. Ribozymes - Another agent capable of downregulating expression of the polypeptides of the present invention is a ribozyme molecule capable of specifically cleaving an mRNA transcript encoding the polypeptide variants of the present invention. Ribozymes are being increasingly used for the sequence- specific inhibition of gene expression by the cleavage of mRNAs encoding proteins of interest [Welch et al., Cun Opin Biotechnol. 9:486-96 (1998)]. The possibility of designing ribozymes to cleave any specific target RNA has rendered them valuable tools in both basic research and therapeutic applications. In therapeutics area, ribozymes have been exploited to target viral RNAs in infectious diseases, dominant oncogenes in cancers and specific somatic mutations in genetic disorders [Welch et al., Clin Diagn Virol. 10:163-71 (1998)]. Most notably, several ribozyme gene therapy protocols for HIN patients are already in Phase 1 trials. More recently, ribozymes have been used for transgenic animal research, gene target validation and pathway elucidation. Several ribozymes are in various stages of clinical trials. AΝGIOZYME was the first chemically synthesized ribozyme to be studied in human clinical trials. AΝGIOZYME specifically inhibits formation of the VEGF-r (Vascular Endothelial Growth Factor receptor), a key component in the angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other firms have demonsfrated the importance of anti-angiogenesis therapeutics in animal models. HEPTAZYME, a ribozyme designed to selectively destroy Hepatitis C Virus (HCV) RNA, was found effective in decreasing Hepatitis C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, Incorporated - WEB home page). Alternatively, down regulation of the polypeptide variants of the present invention may be achieved at the polypeptide level using downregulating agents such as antibodies or antibody fragments capabale of specifically binding the polypeptides of the present invention and inhibiting the activity thereof (i.e., neutralizing antibodies). Such antibodies can be directed for example, to the heterodimerizing domain on the variant, or to a putative ligand binding domain. Further description of antibodies and methods of generating same is provided below. PHARMACEUTICAL COMPOSITIONS AND DELIVERY THEREOF The present invention features a pharmaceutical composition comprising a therapeutically effective amount of a therapeutic agent according to the present invention, which is preferably a therapeutic protein variant as described herein. Optionally and alternatively, the therapeutic agent could be an antibody or an oligonucleotide that specifically recognizes and binds to the therapeutic protein variant, but not to the conesponding full length known protein. Alternatively, the pharmaceutical composition of the present invention includes a therapeutically effective amount of at least an active portion of a therapeutic protein variant polypeptide. The pharmaceutical composition according to the present invention is preferably used for the treatment of cluster-related diseases. "Treatment" refers to both therapeutic treatment and prophylactic or preventative measures. Those in need of treatment include those already with the disorder as well as those in which the disorder is to be prevented. Hence, the mammal to be treated herein may have been diagnosed as having the disorder or may be predisposed or susceptible to the disorder. "Mammal" for purposes of freatment refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. Preferably, the mammal is human. A "disorder" is any condition that would benefit from treatment with the agent according to the present invention. This includes chronic and acute disorders or diseases including those pathological conditions which predispose the mammal to the disorder in question. Non- limiting examples of disorders to be treated herein are described with regard to specific examples given herein. The term "therapeutically effective amount" refers to an amount of agent according to the present invention that is effective to freat a disease or disorder in a mammal. In the case of cancer, the therapeutically effective amount of the agent may reduce the number of cancer cells; reduce the tumor size; inhibit (i.e., slow to some extent and preferably stop) cancer cell infiltration into peripheral organs; inhibit (i.e., slow to some extent and preferably stop) tumor metastasis; inhibit, to some extent, tumor growth; and/or relieve to some extent one or more of the symptoms associated with the cancer. To the extent the agent may prevent growth and/or kill existing cancer cells, it may be cytostatic and/or cytotoxic. For cancer therapy, efficacy can, for example, be measured by assessing the time to disease progression (TTP) and/or determining the response rate (RR). The therapeutic agents of the present invention can be provided to the subject per se, or as part of a pharmaceutical composition where they are mixed with a pharmaceutically acceptable carrier. As used herein a "pharmaceutical composition" refers to a preparation of one or more of the active ingredients described herein with other chemical components such as physiologically suitable caniers and excipients. The purpose of a pharmaceutical composition is to facilitate administration of a compound to an organism. Herein the term "active ingredient" refers to the preparation accountable for the biological effect. Hereinafter, the phrases "physiologically acceptable canier" and "pharmaceutically acceptable carrier" which may be interchangeably used refer to a carrier or a diluent that does not cause significant irritation to an organism and does not abrogate the biological activity and properties of the administered conpound. An adjuvant is included under these phrases. One of the ingredients included in the pharmaceutically acceptable carrier can be for example polyethylene glycol (PEG), a biocompatible polymer with a wide range of solubility in both organic and aqueous media (Mutter et al. (1979). Herein the term "excipient" refers to an inert substance added to a pharmaceutical composition to further facilitate administration of an active ingredient. Examples, without limitation, of excipients include calcium carbonate, calcium phosphate, various sugars and types of starch, cellulose derivatives, gelatin, vegetable oils and polyethylene glycols. Techniques for formulation and administration of drugs may be found in "Remington's Pharmaceutical Sciences," Mack Publishing Co., Easton, PA, latest edition, which is incoφorated herein by reference. Suitable routes of administration may, for example, include oral, rectal, transmucosal, especially transnasal, intestinal or parenteral delivery, including intramuscular, subcutaneous and intrarnedullary injections as well as infrathecal, direct infraventricular, intravenous, intraperitoneal, infranasal, or intraocular injections. Alternately, one may administer a preparation in a local rather than systemic manner, for example, via injection of the preparation directly into a specific region of a patient's body. Pharmaceutical compositions of the present invention may be manufactured by processes well known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee- making, levigating, emulsifying, encapsulating, entrapping or lyophilizing processes. Pharmaceutical compositions for use in accordance with the present invention may be formulated in conventional manner using one or more physiologically acceptable caniers comprising excipients and auxiliaries, which facilitate processing of the active ingredients into preparations which, can be used phannaceutically. Proper formulation is dependent upon the route of administration chosen. For injection, the active ingredients of the invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer's solution, or physiological salt buffer. For transmucosal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art. For oral administration, the compounds can be formulated readily by combining the active compounds with phannaceutically acceptable caniers well known in the art. Such carriers enable the compounds of the invention to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions, and the like, for oral ingestion by a patient. Pharmacological preparations for oral use can be made using a solid excipient, optionally grinding the resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries if desired, to obtain tablets or dragee cores. Suitable excipients are, in particular. fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; cellulose preparations such as, for example, maize starch, wheat starch, rice starch, potato starch, gelatin, gum tragacanth, methyl cellulose, hydroxypropylmethyl-cellulose, sodium carbomethylcellulose; and/or physiologically acceptable polymers such as polyvinylpyrrolidone (PVP). If desired, disintegrating agents may be added, such as cross- linked polyvinyl pyrrolidone, agar, or alginic acid or a salt thereof such as sodium alginate. Dragee cores are provided with suitable coatings. For this puφose, concentrated sugar solutions may be used which may optionally contain gum arabic, talc, polyvinyl pyrrolidone, carbopol gel, polyethylene glycol, titanium dioxide, lacquer solutions and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for identification or to characterize different combinations of active compound doses. Pharmaceutical compositions, which can be used orally, include push-fit capsules made of gelatin as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or sorbitol. The push- fit capsules may contain the active ingredients in admixture with filler such as lactose, binders such as starches, lubricants such as talc or magnesium stearate and, optionally, stabilizers. In soft capsules, the active ingredients may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycols. In addition, stabilizers may be added. All formulations for oral administration should be in dosages suitable for the chosen route of administration. For buccal administration, the compositions may take the form of tablets or lozenges formulated in conventional manner. For administration by nasal inhalation, the active ingredients for use according to the present invention are conveniently delivered in the form of an aerosol spray presentation from a pressurized pack or a nebulizer with the use of a suitable propellant, e.g., dichlorodifluoromethane, trichlorofluoromethane, dichloro-tefrafluoroethane or carbon dioxide. In the case of a pressurized aerosol, the dosage unit may be determined by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g., gelatin for use in a dispenser may be formulated containing a powder mix of the compound and a suitable powder base such as lactose or starch. The preparations described herein may be formulated for parenteral administration, e.g., by bolus injection or continuous infusion. Formulations for injection may be presented in unit dosage form, e.g., in ampoules or in multidose containers with optionally, an added preservative. The compositions may be suspensions, solutions or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Pharmaceutical compositions for parenteral administration include aqueous solutions of the active preparation in water- soluble form. Additionally, suspensions of the active ingredients may be prepared as appropriate oily or water based injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acids esters such as ethyl oleate, triglycerides or liposomes. Aqueous injection suspensions may contain substances, which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol or dexfran. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility of the active ingredients to allow for the preparation of highly concentrated solutions. Alternatively, the active ingredient may be in powder form for constitution with a suitable vehicle, e.g., sterile, pyrogen-free water based solution, before use. The preparation of the present invention may also be formulated in rectal compositions such as suppositories or retention enemas, using, e.g., conventional suppository bases such as cocoa butter or other glycerides. Phaπnaceutical compositions suitable for use in context of the present invention include compositions wherein the active ingredients are contained in an amount effective to achieve the intended puφose. More specifically, a therapeutically effective amount means an amount of active ingredients effective to prevent, alleviate or ameliorate symptoms of disease or prolong the survival of the subject being treated. Determination of a therapeutically effective amount is well within the capability of those skilled in the art. For any preparation used in the methods of the invention, the therapeutically effective amount or dose can be estimated initially from in vitro assays. For example, a dose can be formulated in animal models and such information can be used to more accurately determine useful doses in humans. Toxicity and therapeutic efficacy of the active ingredients described herein can be determined by standard pharmaceutical procedures in vitro, in cell cultures or experimental animals. The data obtained from these in vitro and cell culture assays and animal studies can be used in formulating a range of dosage for use in human. The dosage may vary depending upon the dosage form employed and the route of administration utilized. The exact formulation, route of administration and dosage can be chosen by the individual physician in view of the patient's condition. (See e.g., Fingl, et al., 1975, in "The Pharmacological Basis of Therapeutics", Ch. 1 p.l). Depending on the severity and responsiveness of the condition to be freated, dosing can be of a single or a plurality of administrations, with course of freatment lasting from several days to several weeks or until cure is effected or diminution of the disease state is achieved. The amount of a composition to be administered will, of course, be dependent on the subject being freated, the severity of the affliction, the manner of administration, the judgment of the prescribing physician, etc. Compositions including the preparation of the present invention formulated in a compatible pharmaceutical carrier may also be prepared, placed in an appropriate container, and labeled for freatment of an indicated condition. Pharmaceutical compositions of the present invention may, if desired, be presented in a pack or dispenser device, such as an FDA approved kit, which may contain one or more unit dosage forms containing the active ingredient. The pack may, for example, comprise metal or plastic foil, such as a blister pack. The pack or dispenser device may be accompanied by instructions for administration. The pack or dispenser may also be accommodated by a notice associated with the container in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals, which notice is reflective of approval by the agency of the form of the compositions or human or veterinary administration. Such notice, for example, may be of labeling approved by die U.S. Food and Drug Administration for prescription drugs or of an approved product insert.
IMMUNOGENIC COMPOSITIONS A therapeutic agent according to the present invention may optionally be a molecule, which promotes a specific immunogenic response against at least one of the polypeptides of the present invention in the subject. The molecule can be polypeptide variants of the present invention, a fragment derived therefrom or a nucleic acid sequence encoding thereof. Although such a molecule can be provided to the subject per se, the agent is preferably administered with an immunostimulant in an immunogenic composiiton. An immunostimulant may be any substance that enhances or potentiates an immune response (antibody and/or cell- mediated) to an exogenous antigen. Examples of immunostimulants include adjuvants, biodegradable microspheres (e.g., polylactic galactide) and liposomes into which the compound is incoφorated (see e.g., U.S. Pat. No. 4,235,877). Vaccine preparation is generally described in, for example, M. F. Powell and M. J. Newman, eds., "Vaccine Design (the subunit and adjuvant approach)," Plenum Press (NY, 1995). Illusfrative immunogenic compositions may contain DNA encoding one or more of the polypeptides as described above, such that the polypeptide is generated in situ. The DNA may be present within any of a variety of delivery systems known to those of ordinary skill in the art, including nucleic acid expression systems (see below), bacteria and viral expression systems. Numerous gene delivery techniques are well known in the art, such as those described by Rolland, Crit. Rev. Therap. Drug Carrier Systems 15:143-198, 1998, and references cited therein. Appropriate nucleic acid expression systems contain the necessary DNA sequences for expression in the subject (such as a suitable promoter and terminating signal). Bacterial delivery systems involve the administration of a bacterium (such as Bacillus- Calmette-Guerrin) that expresses an immunogenic portion of the polypeptide on its cell surface or secretes such an epitope. In a prefeπed embodiment, the DNA may be introduced using a viral expression system (e.g., vaccinia or other pox virus, refrovirus, or adenovirus), which may involve the use of a non- pathogenic (defective), replication competent virus. Suitable systems are disclosed, for example, in Fisher-Hoch et al., Proc. Natl. Acad. Sci. USA 86:317-321, 1989; Flexner et al., Ann. N.Y Acad. Sci. 569:86- 103, 1989; Flexner et al., Vaccine 8:17-21, 1990; U.S. Pat. Nos. 4,603,112, 4,769,330, and 5,017,487; WO 89/01973; U.S. Pat. No. 4,777,127; GB 2,200,651; EP 0,345,242; WO 91/02805; Berkner, Biotechniques 6:616-627, 1988; Rosenfeld et al., Science 252:431-434, 1991; Rolls et al., Proc. Natl. Acad. Sci. USA 91:215-219, 1994; Kass-Eisler et al., Proc. Natl. Acad. Sci. USA 90:11498-11502, 1993; Guzman et al., Circulation 8δ:2δ38-2848, 1993; and Guzman et al., Cir. Res. 73:1202-1207, 1993. Techniques for incoφorating DNA into such expression systems are well known to those of ordinary skill in the art. The DNA may also be "naked," as described, for example, in Ulmer et al., Science 259:1745-1749, 1993 and reviewed by Cohen, Science 259:1691-1692, 1993. The uptake of naked DNA may be increased by coating the DNA onto biodegradable beads, winch are efficiently transported into the cells. It will be appreciated that an immunogenic composition may comprise both a polynucleotide and a polypeptide component. Such immunogenic compositions may provide for an enhanced immune response. Any of a variety of immunostimulants may be employed in the immunogenic compositions of this invention. For example, an adjuvant may be included. Most adjuvants contain a substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or Mycobacterium tuberculosis derived proteins. Suitable adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Defroit, Mich.); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, N.J.); AS-2 (SmithKline Beecham, Philadelphia, Pa.); aluminum salts such as aluminum hydroxide gel (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. Cytokines, such as GM-CSF or interleukin-2,-7, or - 12, may also be used as adjuvants. The adjuvant composition may be designed to induce an immune response predominantly of the Thl type. High levels of Thl-type cytokines (e.g., IFN-.gamma., TNF.alpha., TJL-2 and TL- 12) tend to favor the induction of cell mediated immune responses to an administered antigen. In contrast, high levels of Th2-type cytokines (e.g., IL-4, IL-5, IL-6 and IL- 10) tend to favor the induction of humoral immune responses. Following application of an immunogenic composition as provided herein, the subject will support an immune response that includes Thl- and Th2-type responses. The levels of these cytokines may be readily assessed using standard assays. For a review of the families of cytokines, see Mosmann and Coffinan, Ann. Rev. Immunol. 7:145-173, 1989. Prefened adjuvants for use in eliciting a predominantly Thl-type response include, for example, a combination of monophosphoryl lipid A, preferably 3-de-O-acylated monophosphoryl lipid A (3D-MPL), together with an aluminum salt. MPL adjuvants are available from Corixa Coφoration (Seattle, Wash.; see U.S. Pat. Nos. 4,436,727; 4,877,611; 4,866,034 and 4,912,094). CpG-containing oligonucleotides (in which the CpG dinucleotide is unmethylated) also induce a predominantly Thl response. Such oligonucleotides are well known and are described, for example, in WO 96/02555, WO 99/33488 and U.S. Pat. Nos. 6,008,200 uad 5,856,462. Irnmunostimulatory DNA sequences are also described, for example, by Sato et al , Science 273:352, 1996. Another prefened adjuvant is a saponin, preferably QS21 (Aquila Biopharmaceuticals Inc., Framingham, Mass.), which may be used alone or in combination with other adjuvants. For example, an enhanced system involves the combination of a monophosphoryl lipid A and saponin derivative, such as the combination of QS21 and 3D-MPL as described in WO 94/00153, or a less reactogenic composition where the QS21 is quenched with cholesterol, as described in WO 96/33739. Other prefened formulations comprise an oil-in- water emulsion and tocopherol. A particularly potent adjuvant formulation involving QS21, 3D- MPL and tocopherol in an oil- in- water emulsion is described in WO 95/17210. Other prefeπed adjuvants include Montanide ISA 720 (Seppic, France), SAF (Chiron,
Calif, United States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of adjuvants (e.g., SBAS-2 or SBAS-4, available from SmithKline Beecham, Rixensart, Belgium), Detox (Corixa, Hamilton, Mont.), RC-529 (Corixa, Hamilton, Mont.) and other aminoalkyl glucosaminide 4- phosphates (AGPs), such as those described in pending U.S. patent application Ser. Nos. 08/S53.δ26 and 09/074,720. A delivery vehicle may be employed within the immunogenic composition of the present invention to facilitate production of an antigen-specific immune response that targets tumor cells. Delivery vehicles include antigen presenting cells (APCs), such as dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered to be efficient APCs. Such cells may be genetically modified to increase the capacity for presenting the antigen, to improve activation and/or maintenance of the T cell response, to anti-tumor effects per se and/or to be immunologically compatible with the receiver (i.e., matched HLA haplotype). APCs may generally be isolated from any of a variety of biological fluids and organs, including tumor and peritumoral tissues, and may be autologous, allogeneic, syngeneic or xenogeneic cells. Dendritic cells are highly potent APCs (Banchereau and Steinman, Nature 392:245-251,
1998) and have been shown to be effective as a physiological adjuvant for eliciting prophylactic or therapeutic antitumor immunity (see Timmernan and Levy, Ann. Rev. Med. 50:507-529, 1999). In general, dendritic cells may be identified based on their typical shape (stellate in situ, with marked cytoplasmic processes (dendrites) visible in vitro), their ability to take up, process and present antigens with high efficiency and their ability to activate naive T cell responses.
Dendritic cells may, of course, be engineered to express specific cell- surface receptors or ligands that are not commonly found on dendritic cells in vivo or ex vivo, and such modified dendritic cells are contemplated by the present invention. As an alternative to dendritic cells, secreted vesicles antigen- loaded dendritic cells (called exosomes) may be used within an immunogenic composition (see Zitvogel et al., Nature Med. 4:594-600, 1998). Dendritic cells and progenitors may be obtained from peripheral blood, bone manow, tumor- infiltrating cells, peritumoral tissues- infiltrating cells, lymph nodes, spleen, skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, TL-4, JL- 13 and/or TNF.alpha. to cultures of monocytes harvested from peripheral blood. Alternatively, CD34 positive cells harvested from peripheral blood, umbilical cord blood or bone manow may be differentiated into dendritic cells by adding to the culture medium combinations of GM-CSF, JL- 3, TNF.alpha., CD40 ligand, LPS, flt3 ligand and or other compound(s) that induce differentiation, maturation and proliferation of dendritic cells. Dendritic cells are categorized as "immature" and "mature" cells, which allows a simple way to discriminate between two well characterized phenotypes. Immature dendritic cells are characterized as APC with a high capacity for antigen uptake and processing, which coπelates with the high expression of Fey receptor and mannose receptor. The mature phenotype is typically characterized by a lower expression of these markers, but a high expression of cell surface molecules responsible for T cell activation such as class I and class II MHC, adhesion molecules (e.g., CD54 and CDl 1) and costimulatory molecules (e.g., CD40, CD80, CD86 and 4- 1BB). APCs may generally be fransfected with at least one polynucleotide encoding a polypeptide of the present invention, such that variant II, or an immunogenic portion thereof, is expressed on the cell surface. Such transfection may take place ex vivo, and a composition comprising such fransfected cells may then be used for therapeutic puφoses, as described herein. Alternatively, a gene delivery vehicle that targets a dendritic or other antigen presenting cell may be administered to the subject, resulting in transfection that occurs in vivo. In vivo and ex vivo transfection of dendritic cells, for example, may generally be performed using any methods known in the art, such as those described in WO 97/24447, or the gene gun approach described by Mahvi et al., Immunology and cell Biology 75:456-460, 1997. Antigen loading of dendritic cells may be achieved by incubating dendritic cells or progenitor cells with a polypeptide of the present inventio, DNA (naked or within a plasmid vector) or RNA; or with antigen-expressing recombinant bacterium or viruses (e.g., vaccinia, fowlpox, adenovirus or lentivirus vectors). Prior to loading, the polypeptide may be covalently conjugated to an immunological partner that provides T cell help (e.g., a carrier molecule) such as described above. Alternatively, a dendritic cell maybe pulsed with a non-conjugated immunological partner, separately or in the presence of the polypeptide.
It is appreciated that certain features of the invention, which are, for clarity, described in the conte <X of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incoφorated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incoφorated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.

Claims

WHAT IS CLAIMED IS:
1. An isolated polynucleotide comprising a polynucleotide having a sequence selected from the group consisting of: RI 1723J?EA_1_T15, RI 1723JPEA_1_T17,
RI 1723 J?EA_1 JT19, RI 1723 JPEA_1_T20, RI 1723 JPEA_1 JT5, or RI 1723 J?EA_1 _T6.
2. An isolated polynucleotide comprising a node having a sequence selected from the group consisting of: RI 1723_PEA_l_node_13, RI 1723JPEA jιode_16,
RI 1723_PEA _node_19, RI 1723J?EA_l_node_2, RI 1723J?EA_l_node_22, R11723J?EA_l_node_31, R11723J?EA_l_node_10, R11723J?EA_l_node_l l, RI 1723JPEA_l_node_15, RI 1723 JPE A_l_node_lδ, RI 1723 JPEA_l_node_20, RI 1723JPEA_l_node_21, RI 1723J?EA_l_node_23, RI 1723 JPEA_l_node_24, RI 1723J?EA_l_node_25, RI 1723J?EA_l_node_26, RI 1723JPEA_l_node_27, RI 1723 JPEA_l_node_28, RI 1723 J?EA_l_node_29, RI 1723 JPEA_l_node_3, R1172 JPEA_l_node_30, R11723J?EA_l_node_4, R11723J?EA_l_node_5, R11723J?EA_l_node_6, RI 1723 JPE A_l_nodeJ7 or R11723J?EA_l_node_8.
3. An isolated polypeptide comprising a polypeptide having a sequence selected from the group consisting of: RI 1723J?EA_1 JP2, RI 1723J?EA_1 JP6, RI 1723JPEA_1 JP7, RI 1723JPEA_1 JP13, or RI 1723JPEA_1 JP10.
4. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNNQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR conesponding to amino acids 1 - 110 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 90 % homologous to
MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPP^VNLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 1 - 112 of QδLXMO, which also corresponds to amino acids 111 - 222 of RI 1723 _PEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
5. An isolated polypeptide encoding for a head of RI 1723JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at least about δ0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTNNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGV of R11723JPEA_1JP6.
6. An isolated chimeric polypeptide encoding for RI 1723 JPEA_1 JP6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTNNVQDMCQKEV MEQSAGTMYRKSCASSAACLIASAG coπesponding to amino acids 1 - 83 of Q96AC2, which also conesponds to amino acids 1 - 83 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least δ5%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGPvEEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPNVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of
RI 1723JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
7. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGPJEEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVPJEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723J?EA_1JP6.
8. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTVNVQDMCQKEV MEQSAGTMYRKSCASSAACLIASAG conesponding to amino acids 1 - δ3 of QδN2G4, which also corresponds to amino acids 1 - δ3 of R11723JPEA_1JP6, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least δ5%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAE\ΗKRLREGEEDHVRPEVGPRPVNLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of
RI 1723JPEA_1 JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
9. An isolated polypeptide encoding for a tail of RI 1723 JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGP^EQRALI KAGAVGGGVR ^AQA LVΛ^GVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVNLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723J?EA_1J?6.
10. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP6, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFrVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 24 - 106 of BAC85518, which also conesponds to amino acids 1 - 83 of RI 1723JPEA_1 JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLWGVLQRQAAAQHLHEHPPKLL
RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVNLGFGRSHDPPNLVGHPAYGQ
CHNNQPWADTSRRERQRKEKHSMRTQ coπesponding to amino acids 84 - 222 of
RI 1723 JPE A_l JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order.
11. An isolated polypeptide encoding for a tail of RI 1723 JPEA_1 JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVNGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKR REGEEDHVI^EVGPRPNvXGFGRSHDPPΝLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in R11723_PEA_1_P6.
12. An isolated chimeric polypeptide encoding for RI 1723 JPE A_l JP7, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also conesponds to amino acids 1 - 64 of RI 1723JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
RI 1723JPEA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
13. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723J?EA_1J?7. 128:
14. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to
MW tLGIAATFCGLFLLPGFALQIQCYQCEEFQLNMDCSSPEFIVNCTVNv'QDMCQKEV MEQSAG coπesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 of RI 1723JPEA_1 JP7, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT coπesponding to amino acids 65 - 93 of
RI 1723JPEA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
15. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723JPEA_1JP7.
16. An isolated chimeric polypeptide encoding for RI 1723 JPEA_1 JP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of RI 1723 _PEA_1 JP7, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTVNNQDMCQKEVMEQSAG coπesponding to amino acids 22 - 80 of BAC85273, which also conesponds to amino acids 6 - 64 of RI 1723_PEA_1JP7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of
RI 1723JPEA_1 JP7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order.
17. An isolated polypeptide encoding for a head of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about δ0%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWNLG of R11723J?EA_1JP7.
18. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in R11723JPFA_1JP7.
19. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP7, comprising a first amino acid sequence being at least 90 % homologous to
M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG coπesponding to amino acids 24 - δ7 of BAC85518, which also conesponds to amino acids 1 - 64 of RI 1723JPEA_1 JP7, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT coπesponding to amino acids 65 - 93 of
RI 1723 JPEA_1 JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order.
20. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP7, comprising a polypeptide being at least 70%, optionally at least about δ0%>, preferably at least about δ5%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 172 JPEA_1 JP7.
21. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP13, comprising a first amino acid sequence being at least 90 % homologous to
MWNLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of R11723JPEA_1JP13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to polypeptide having the sequence DTT RTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of RI 1723JPEA_1 JP13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
22. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP13, comprising a polypeptide being at least 70%, optionally at least about δ0%, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in R11723_PEA_1JP13.
23. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also coπesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of
RI 1723JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
24. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723JPEA JP10.
25. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MΛWLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also coπesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least δ0%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of RI 1723JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
26. An isolated polypeptide encoding for a tail of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about S5%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 »EA_1 JP10.
27. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWNLG coπesponding to amino acids 1 - 5 of R11723JPEA_1JP10, second amino acid sequence being at least 90 % homologous to
IAATFCGLFLLPGFALQIQCYQCEEFQLΝΝDCSSPEFINΝCTNΝNQDMCQKEVMEQSA coπesponding to amino acids 22 - 79 of BAC85273, which also coπesponds to amino acids 6 - 63 of RI 1723JPEA_1 JP10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK coπesponding to amino acids 64 - 90 of
RI 1723JPEA_1 JP10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order.
28. An isolated polypeptide encoding for a head of RI 1723JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI 1723JPEA_1 JP10.
29. An isolated polypeptide encoding for a tail of RI 1723 JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723J?EA_1 JP10.
30. An isolated chimeric polypeptide encoding for RI 1723JPEA_1 JP10, comprising a first amino acid sequence being at least 90 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTNNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of RI 1723JPEA_1 JP10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of
RI 1723 JPEA_1 JP10, wherein said first and second amino acid sequences are contiguous and in a sequential order.
31. An isolated polypeptide encoding for a tail of RI 1723 JPEA_1 JP10, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in R11723J?EA_1J?10.
32. An isolated oligonucleotide, comprising an amplicon selected from the group consisting of SEQ JD NOs: 891 or 894.
33. A primer pair, comprising a pair of isolated oligonucleotides capable of amplifying said amplicon of claim 32.
34. The primer pair of claim 33, comprising a pair of isolated oligonucleotides selected from the group consisting of: SEQ NOs 889 and 890; or 892 and 893.
35. An antibody capable of specifically binding to an epitope of an amino acid sequence of any of claims 3-31.
36. The antibody of claim 35, wherein said amino acid sequence comprises said tail of claims 4-31.
37. The antibody of claims 35 or 36, wherein said antibody is capable of differentiating between a splice variant having said epitope and a coπesponding known protein PSEC.
38. A kit for detecting breast cancer, comprising a kit detecting overexpression of a splice variant according to any of the above claims.
39. The kit of claim 38, wherein said kit comprises a NAT-based technology.
40. The kit of claim 39, wherein said kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
41. The kit of claim 38, wherein said kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence according to claims 1 or 2.
42. The kit of claim 38, wherein said kit comprises an antibody according to any of claims 35-37.
43. The kit of claim 42, wherein said kit further comprises at least one reagent for performing an ELISA or a Western blot.
44. A method for detecting breast cancer, comprising detecting overexpression of a splice variant according to any of the above claims.
45. The method of claim 44, wherein said detecting overexpression is performed with a NAT-based technology.
46. The method of claim 44, wherein said detecting overexpression is performed with an immunoassay.
47. The method of claim 46, wherein said immunoassay comprises an antibody according to any of the above claims.
48. A biomarker capable of detecting breast cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof.
49. A method for screening for breast cancer, comprising detecting breastcancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
50. A method for diagnosing breast cancer, comprising detecting breast cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
51. A method for monitoring disease progression and/or treatment efficacy and /or relapse of breast cancer, comprising detecting breast cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims.
52. A method of selecting a therapy for breast cancer, comprising detecting breast cancer cells with a biomarker or an antibody or a method or assay according to any of the above claims and selecting a therapy according to said detection.
EP05718182A 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer Withdrawn EP1732943A4 (en)

Applications Claiming Priority (25)

Application Number Priority Date Filing Date Title
US53912904P 2004-01-27 2004-01-27
US53912804P 2004-01-27 2004-01-27
US62087404P 2004-10-22 2004-10-22
US62097404P 2004-10-22 2004-10-22
US62092404P 2004-10-22 2004-10-22
US62097504P 2004-10-22 2004-10-22
US62100404P 2004-10-22 2004-10-22
US62091604P 2004-10-22 2004-10-22
US62091804P 2004-10-22 2004-10-22
US62091704P 2004-10-22 2004-10-22
US62065604P 2004-10-22 2004-10-22
US62085304P 2004-10-22 2004-10-22
US62113104P 2004-10-25 2004-10-25
US62811204P 2004-11-17 2004-11-17
US62012304P 2004-11-17 2004-11-17
US62810104P 2004-11-17 2004-11-17
US62823104P 2004-11-17 2004-11-17
US62815604P 2004-11-17 2004-11-17
US62825104P 2004-11-17 2004-11-17
US62817804P 2004-11-17 2004-11-17
US62816704P 2004-11-17 2004-11-17
US62814504P 2004-11-17 2004-11-17
US62813404P 2004-11-17 2004-11-17
US62811104P 2004-11-17 2004-11-17
PCT/IB2005/000433 WO2005072050A2 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer

Publications (2)

Publication Number Publication Date
EP1732943A2 true EP1732943A2 (en) 2006-12-20
EP1732943A4 EP1732943A4 (en) 2011-01-12

Family

ID=37398553

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05718182A Withdrawn EP1732943A4 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer

Country Status (1)

Country Link
EP (1) EP1732943A4 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030100723A1 (en) * 2000-07-26 2003-05-29 Genentech, Inc. Secreted and transmembrane polypeptides and nucleic acids encoding the same
WO2003004989A2 (en) * 2001-06-21 2003-01-16 Millennium Pharmaceuticals, Inc. Compositions, kits, and methods for identification, assessment, prevention, and therapy of breast cancer

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DATABASE EMBL [Online] XP002611793, retrieved from ebi accession no. EM_NEW:GX181231 Database accession no. GX181231 & US 2003/124128 A1 (LILLIE JAMES [US] ET AL LILLIE JAMES [US] ET AL) 3 July 2003 (2003-07-03) *
DATABASE EMBL [Online] XP002611794, retrieved from ebi accession no. EM_PAT:GX181230 Database accession no. GX181230 *
DATABASE Geneseq [Online] 29 January 2004 (2004-01-29), XP002611795, retrieved from EBI accession no. GSP:ADE05342 Database accession no. ADE05342 & US 2003/100723 A1 (BAKER KEVIN P [US] ET AL) 29 May 2003 (2003-05-29) *
FLETCHER G C ET AL: "hAG-2 and hAG-3, human homologues of genes involved in differentiation, are associated with oestrogen receptor-positive breast tumours and interact with metastasis gene C4.4a and dystroglycan", BRITISH JOURNAL OF CANCER 20030224 GB LNKD- DOI:10.1038/SJ.BJC.6600740, vol. 88, no. 4, 24 February 2003 (2003-02-24), pages 579-585, XP002611796, ISSN: 0007-0920 *

Also Published As

Publication number Publication date
EP1732943A4 (en) 2011-01-12

Similar Documents

Publication Publication Date Title
US7368548B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
EP1414477B1 (en) Repeat sequences of the ca125 gene and their use for diagnostic interventions
US20060046257A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of lung cancer
EP0939824B1 (en) Reagents and methods useful for detecting diseases of the breast
US7553948B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of ovarian cancer
WO2006090389A2 (en) Novel diagnostic markers, especially for in vivo imaging, and assays and methods of use thereof
US20060263786A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
WO2005116850A2 (en) Differential expression of markers in ovarian cancer
EP1774046A2 (en) Novel nucleotide and amino acid sequences and assays and methods of use thereof for diagnosis of lung cancer
WO2005072050A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
WO2010061393A1 (en) He4 variant nucleotide and amino acid sequences, and methods of use thereof
US7528243B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
US8216792B2 (en) Compositions and methods for detection and treatment of proliferative abnormalities associated with overexpression of human transketolase like-1 gene
US20060275314A1 (en) Transmembrane protein differentially expressed in cancer
EP1749025A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
US6544742B1 (en) Detection of genes regulated by EGF in breast cancer
WO2006043271A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
EP1732943A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
WO2005107364A9 (en) Polynucleotide, polypeptides, and diagnostic methods
EP1735468A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
WO2006021874A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
US20100124786A1 (en) Novel Nucleotide and Amino Acid Sequences, and Assays and Methods of use Thereof for Diagnosis of Ovarian Cancer
JP2007520217A (en) Novel nucleotide and amino acid sequences, and assays and methods of use for breast cancer diagnosis using the same
WO2007107996A2 (en) Novel nucleotide and amino acid sequences and methods of use thereof for diagnosis
AU2005276208A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060825

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR LV MK YU

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SHKLAR, MAXIM

Inventor name: KEREN, NAOMI

Inventor name: SHEMESH, RONEN

Inventor name: SAMEAH-GREENWALD, SHIRLEY

Inventor name: WALACH, SHIRA

Inventor name: AYALON-SOFER, MICHAL

Inventor name: SELLA-TAVOR, OSNAT

Inventor name: NOVIK, AMIT

Inventor name: DIBER, ALEXANDER

Inventor name: AKIVA, PINCHAS

Inventor name: LEVINE, ZURIT

Inventor name: POLLOCK, SARAH

Inventor name: SOREK, ROTEM

Inventor name: DAHARY, DVIR

Inventor name: TOPORIK, AMIR

DAX Request for extension of the european patent (deleted)
RIN1 Information on inventor provided before grant (corrected)

Inventor name: SHKLAR, MAXIM

Inventor name: KEREN, NAOMI

Inventor name: SHEMESH, RONEN

Inventor name: SAMEAH-GREENWALD, SHIRLEY

Inventor name: WALACH, SHIRA

Inventor name: AYALON-SOFER, MICHAL

Inventor name: SELLA-TAVOR, OSNAT

Inventor name: NOVIK, AMIT

Inventor name: DIBER, ALEXANDER

Inventor name: AKIVA, PINCHAS

Inventor name: LEVINE, ZURIT

Inventor name: POLLOCK, SARAH

Inventor name: SOREK, ROTEM

Inventor name: DAHARY, DVIR

Inventor name: TOPORIK, AMIR

PUAK Availability of information related to the publication of the international search report

Free format text: ORIGINAL CODE: 0009015

RIC1 Information provided on ipc code assigned before grant

Ipc: C07H 21/02 20060101ALI20100526BHEP

Ipc: C12Q 1/68 20060101ALI20100526BHEP

Ipc: G01N 33/53 20060101AFI20100526BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: C12Q 1/68 20060101ALI20101201BHEP

Ipc: C07K 14/47 20060101ALI20101201BHEP

Ipc: G01N 33/574 20060101ALI20101201BHEP

Ipc: G06F 19/00 20110101ALI20101201BHEP

Ipc: G01N 33/48 20060101AFI20101201BHEP

A4 Supplementary search report drawn up and despatched

Effective date: 20101210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110308